IIT Database Group

Header bar
Fork me on GitHub

Provenance Unification through Graphs with Summarization (PUGS)

Provenance for relational queries records how results of a query depend on the query’s inputs. This type of information can be used to explain why (and how) a result is derived by a query over a given database. Recently, approaches have been developed that use provenance-like techniques to explain why a tuple (or a set of tuples described declaratively by a pattern) is missing from the query result. However, the two problems of computing provenance and explaining missing answers have been treated mostly in isolation.

Unifying Why and Why-not Provenance

In collaboration with UIUC, we have developed a unified approach for efficiently computing provenance (why) and missing answers (why-not). This approach is based on the observation that in provenance model for queries with unsafe negation, why-not questions can be translated into why questions and vice versa.

Typically only a part of the provenance, which we call explanation, is actually relevant for answering the user’s provenance question about the existence or absence of a result. We have developed an approach that tries to restrict provenance capture to what is relevant to explain the outcome of interest specified by the user.

Approximate Summarization of Provenance

While explanations are useful for reducing the size of provenance, the result may still overwhelm users with too much information and waste computational resources. In particular for why-not, where the provenance explains all failed ways of how a result could have been derived using the rules of a query, provenance graphs may be too large to compute even for small datasets. We address the computational and usability challenges of large provenance graphs by creating summaries based on structural commonalities that exist in the provenance. Importantly, our approach computes summaries based on a sample of the provenance only and, thus, avoids the computationally infeasible step of generating the full why-not provenance for a user question. We have implemented these techniques in our PUGS system.

The PUGS System

We have build a system which supports computing explanations for a user's why and why-not provenance question as a, optionally summarized, provenance graph. Our approach offloads the computation that captures relevant provenance and summarizes it to a datalog engine or relational database backend. The figure below gives an overview of the architecture of the proposed approach.

Collaborators

Publications

  1. Approximate Summaries for Why and Why-not Provenance
    Seokki Lee, Bertram Ludäscher and Boris Glavic
    Proceedings of the VLDB Endowment. 13, 6 (2020) , 912–924.
    details
  2. Why and Why-Not Provenance for Queries with Negation
    Seokki Lee
    Illinois Institute of Technology.
    details
  3. PUG: a framework and practical implementation for why and why-not provenance
    Seokki Lee, Bertram Ludäscher and Boris Glavic
    The VLDB Journal. 28, 1 (Aug. 2019) , 47—71.
    details
  4. Provenance Summaries for Answers and Non-Answers
    Seokki Lee, Bertram Ludäscher and Boris Glavic
    Proceedings of the VLDB Endowment (Demonstration Track). 11, 12 (2018) , 1954–1957.
    details
  5. A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries
    Seokki Lee, Sven Köhler, Bertram Ludäscher and Boris Glavic
    Proceedings of the 33rd IEEE International Conference on Data Engineering (2017), pp. 485–496.
    details
  6. Integrating Approximate Summarization with Provenance Capture
    Seokki Lee, Xing Niu, Bertram Ludäscher and Boris Glavic
    Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (2017).
    details
  7. Implementing Unified Why- and Why-Not Provenance Through Games
    Seokki Lee, Sven Köhler, Bertram Ludäscher and Boris Glavic
    Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (Poster) (2016).
    details
  8. An Efficient Implementation of Game Provenance in DBMS
    Seokki Lee, Yuchen Tang, Sven Köhler, Bertram Ludäscher and Boris Glavic
    Technical Report #IIT/CS-DB-2015-02
    Illinois Institute of Technology.
    details
  9. Towards Constraint-based Explanations for Answers and Non-Answers
    Boris Glavic, Sven Köhler, Sean Riddle and Bertram Ludäscher
    Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (2015).
    details