IIT Database Group

Header bar

Uncertainty-Annotated Databases

Uncertainty arises naturally in many application domains due to data entry errors, sensor errors and noise, uncertainty in information extraction and parsing, ambiguity from data integration, and heuristic data wrangling. Analyzing uncertain data without accounting for its uncertainty can create hard to trace errors with severe real world implications.

Incomplete and probabilistic database techniques are principled methods for dealing with uncertain data. However, the class of queries that can be answered efficiently over such databases is severely limited. In this project, we study a light-weight, practical alternative called uncertainty-annotated databases (UA-DBs). UA-DBs accommodate a wide range of uncertain data sources, are compatible with many existing incomplete and probabilistic data models, and, most importantly, query answering is efficient for a very large class of queries.

Implementations

We currently maintain two implementations of the UA-DB model. mimir-caveats written in Scala extends Apache Spark with uncertainty management capabilities and is the driver of uncertainty handling in our Vizier notebook system. Additionally, we have developed an implementation in GProm.

Collaborators

Funding

Publications

  1. Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds
    Su Feng, Aaron Huber, Boris Glavic and Oliver Kennedy
    Proceedings of the 45th International Conference on Management of Data (2021).
    details
  2. Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers
    Su Feng, Aaron Huber, Boris Glavic and Oliver Kennedy
    Proceedings of the 44th International Conference on Management of Data (2019), pp. 1313–1330.
    details
  3. Analyzing Uncertain Tabular Data
    Oliver Kennedy and Boris Glavic
    Information Quality in Information Fusion and Decision Making
    Éloi Bossé and G. Rogova, eds. Springer. 291–320.
    details