Provenance Games and Generalization
Explaining why a certain answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data, and answering hypothetical questions about data. Both types of questions, i.e., why provenance and why-not (missing answer) provenance have been studied extensively independently. Based on a game theoretic notion of provenance developed by our collaborators at UIUC, we investigate how to developed a unified approach towards provenance and missing answers. Our approach is based on the observation that for a provenance model for queries with full negation, why-not questions can be translated into why questions and vice versa. In addition to investigating how provenance games can be computed efficiently using a goal oriented approach that outsources most of the computation to a datalog engine or relational database, we also are interested in the theoretical properties of provenance games and their relationship to existing provenance and missing answer models. Furthermore, provenance for missing answers (and queries with negation in general) can be very large or even infinite. Thus, a pratical approach requires the size of provenance to be reduced through finite encoding of infinite structures, compression, generalization, and summarization.
We envision an system which supports computes explanations of a user's why and why-not questions as (summarized) provenance game fragments. The computation of the relevant fragment of a provenance game will be offloaded to a datalog engine or relational database backend. Here we will benefit from the optimized relational algebra to SQL compilation of our GProM system. The figure below gives an overview of the architecture of the proposed approach.
|||Integrating Approximate Summarization with Provenance Capture , In Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2017. [bibtex] [pdf]|
|||A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries , In Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE), 2017. [bibtex] [pdf]|
|||Implementing Unified Why- and Why-Not Provenance Through Games , In Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (Poster) (TaPP), 2016. [bibtex] [pdf]|
|||An Efficient Implementation of Game Provenance in DBMS , Technical report, Illinois Institute of Technology, IIT/CS-DB-2015-02, 2015. [bibtex] [pdf]|
|||Towards Constraint-based Explanations for Answers and Non-Answers , In Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2015. [bibtex] [pdf] [slides]|