IIT Database Group

Fork me on GitHub

Provenance using Temporal Databases

In this project we explore how provenance computation can benefit from temporal database techniques. This project is funded by and executed in collaboration with the Oracle corporation. Starting from porting rewrite-based techniques such as the ones used in Perm (Perm) to the Oracle SQL dialect, we will study how to 1) compute the provenance of past query and 2) compute the provenance for updates and transactions. This requires non-trivial extensions to current provenance techniques, because of, e.g., interaction of transactions under lower serialization level. Our solution can retroactively trace transaction provenance as long as an audit log and time travel functionality are available (both are supported by most DBMS). One of the major outcomes so far is the development of the concept of reenactment queries, queries that reenact the effects of a transaction. Reenactment queries are the main enabler of retroactive provenance computation for transactions.

Within this project we have made the following major contributions to provenance management

  • Development of MV-relations, a provenance model for queries, updates, and transactional histories that extends the seminal semiring annotation model (defined for queries) with support for updates and transactions.
  • Development of reenactment, a declarative replay technique with provenance capture that enables tracking the provenance of a past update or transaction retroactively by executing a query.
  • Implementation of provenance tracking for transactions over a standard relational database as part of the GProM system.

MV-relations - A Provenance Model for Transactional Updates

As part of this project we have developed a provenance model that allows tracking the provenance of tuples through queries and transactional updates. In our model, the complete derivation history of a tuple - which update operations derived the tuple and one which inputs of these operations does it depend on - can be encoded in the annotation of the tuple.

Reenactment - Declarative Replay with Provenance Capture

Reenactment is a declarative replay technique that enables a transactional history (or part thereof) to be repeated by executing a so-called reenactment query. We have proven that reenactment queries produce the same result and have the same provenance as the operation(s) they are replaying. Thus, a reenactment query can be used to retroactively compute the provenance of an operation executed some time in the past as long as the database state seen by this operation can be accessed.

Implementation in GProM

To retrieve the provenance of a past update (transaction, or history) we construct its reenactment query based on a log of SQL operations executed in the past (e.g., Oracle's audit log facility). Such an reenactment query needs to be executed over the database state seen by the operation(s) to be replayed. We use time travel to access such past database states. The techniques developed in this project have been integrated in the GProM system, a database independent middleware application for computing provenance.

Collaborators

  • Dieter Gawlick, Architect in Special Projects, Oracle
  • Vasudha Krishnaswamy, Oracle
  • Zhen Hua Liu, Oracle
  • Venkatesh Radhakrishnan, Facebook

Funding

  • Title: Provenance using temporal databases (extension)
  • Amount: $95,829
  • Funding Agency: Oracle Corporation
  • Project Page: webpage
  • Recipient: Dr. Glavic
  • Principal Contact at Oracle: Dieter Gawlick, Architect in Special Projects
  • Title: Provenance using temporal databases
  • Amount: $85,000
  • Funding Agency: Oracle Corporation
  • Project Page: webpage
  • Recipient: Dr. Glavic
  • Principal Contact at Oracle: Dieter Gawlick, Architect in Special Projects

Publications

2017 (to appear)
[8] Using Reenactment to Retroactively Capture Provenance for Transactions (Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan, Boris Glavic), In IEEE Transactions on Knowledge and Data Engineering (TKDE), volume , 2017 (to appear). [bibtex] [pdf]
2017
[7] Debugging Transactions and Tracking their Provenance with Reenactment (Xing Niu, Boris Glavic, Seokki Lee, Bahareh Arab, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Feng Su, Xun Zou), In Proceedings of the VLDB Endowment (Demonstration Track) (PVLDB), volume 10, 2017. [bibtex] [pdf]
[6] Answering Historical What-if Queries with Provenance, Reenactment, and Symbolic Execution (Bahareh Arab, Boris Glavic), In Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2017. [bibtex] [pdf]
2016
[5] Reenactment for Read-Committed Snapshot Isolation (long version) (Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan, Boris Glavic), Technical report, Illinois Institute of Technology, 2016. [bibtex] [pdf]
[4] Reenactment for Read-Committed Snapshot Isolation (Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan, Boris Glavic), In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM), 2016. [bibtex] [pdf]
[3] Formal Foundations of Reenactment and Transaction Provenance (Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan, Boris Glavic), Technical report, Illinois Institute of Technology, IIT/CS-DB-2016-01, 2016. [bibtex] [pdf]
2014
[2] Reenacting Transactions to Compute their Provenance (Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan, Boris Glavic), Technical report, Illinois Institute of Technology, IIT/CS-DB-2014-02, 2014. [bibtex] [pdf]
[1] A Generic Provenance Middleware for Database Queries, Updates, and Transactions (Bahareh Arab, Dieter Gawlick, Venkatesh Radhakrishnan, Hao Guo, Boris Glavic), In Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2014. [bibtex] [pdf] [slides]