IIT Database Group

Header bar
Fork me on GitHub

Explanations beyond Provenance

In many data analysis applications there is a need to explain why a surprising or interesting result was produced by a query. Past work in this area has mostly focused on data provenance, i.e., the input data that was used to derive the result of interest. However, it is often the case that the provenance of a query result only contains a small fraction of the information that is relevant for explaining an answer. In this work we explore novel types of explanations that are not (just) based on data provenance.


We propose a new approach for explaining interesting query results by augmenting provenance with information from other related tables in the database. Specifically, given a schema graph that encodes the semantic relationships of tables in a database schema, we devise algorithms for enriching the provenance of a query result by joining it with data from tables that are connected to tables from the provenance in the schema graph. Furthermore, we summarize the results of such joins to generate rich, high-level patterns as explanations.


Provenance and intervention-based techniques have been used to explain surprising outcomes of aggregate queries based on the outcome’s provenance. However, such techniques may miss interesting explanations emerging from data that is not in the provenance. For instance, an unusually low number of publications of a prolific researcher in a certain venue in a year can be explained by an increase in his publication in another venue in the same year. In this project we investigate how to mine patterns that describe inherent trends in the data and to use these patterns to identify potential causes for an outcome of interest.

As an initial contribution we have developed a novel system called Cape (Counterbalancing with Aggregation Patterns for Explanations) for explaining outliers in aggregation queries through counterbalancing. That is, explanations are outliers in the opposite direction of the outlier of interest. Outliers are defined w.r.t. patterns that hold over the data in aggregate. We have developed efficient methods for mining such aggregate regression patterns (ARPs) and have demonstrated how to use ARPs to generate and rank explanations.



  1. Interpretable Data-Based Explanations for Fairness Debugging
    Babak Salimi, Romila Pradhan, Jiongli Zhu and Boris Glavic
    Proceedings of the 47th International Conference on Management of Data (2022).
  2. Putting Things into Context: Rich Explanations for Query Answers using Join Graphs
    Chenjie Li, Zhengjie Miao, Qitian Zeng, Boris Glavic and Sudeepa Roy
    Proceedings of the 46th International Conference on Management of Data (2021), pp. 1051–1063.
  3. Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances
    Zhengjie Miao, Qitian Zeng, Boris Glavic and Sudeepa Roy
    Proceedings of the 44th International Conference on Management of Data (2019), pp. 485–502.
  4. CAPE: Explaining Outliers by Counterbalancing
    Zhengjie Miao, Qitian Zeng, Chenjie Li, Boris Glavic, Oliver Kennedy and Sudeepa Roy
    Proceedings of the VLDB Endowment (Demonstration Track). 12, 12 (2019) , 1806–1809.