IIT Database Group


We conduct research that spans several areas of database systems such as information integration, data provenance, scheduling, stream processing, and data mining. Our main contributions impacted the theoretical and practical research in data provenance. We strive to develop solutions to emerging challenges in database systems such as tight integration of provenance support into database engines, provenance for distributed data processing paradigms, and provenance for sequences of database operations. For a list of our publications click here.


  • Ariadne - Computing fine-grained Provenance for Data Streams using Operator Instrumentation.
  • BART - BART is an error-generation tool for data cleaning applications. Its purpose is to introduce errors into clean databases for the purpose of benchmarking data-repairing algorithms.
  • Big Provenance - developing algorithms and systems for scaling provenance to Big Data dimensions.
  • GProM - A database-independent middleware for computing the provenance of queries, updates, and transactions.
  • HRDBMS - is a distributed database build from scratch that combines the best of traditional relational platforms with ideas from Big Data platforms.
  • iBench - A new, generic benchmark for data integration tasks.
  • LDV - LDV is a lightweight database virtualization system marrying OS and DB provenance.
  • Native Database Provenance - In this project we study how to integrate provenance techniques with a database core to improve various aspects of provenance managements including performance and storage requirements.
  • PUGS - PUGS is a unified framework for capturing why and why-not provenance of Datalog queries with negation and for automatic generation of concise provenance summaries.
  • Provenance using Temporal Databases - Using temporal database techniques to provide new and improved provenance functionality such as provenance of updates and transactions.
  • Relevance-based Data Management - We use provenance to determine what data is relevant for which task and then exploit this information to improve a wide range of data management tasks.
  • Vagabond - Automatic generation of explanations for data exchange errors.
  • Vizier - A framework for user-friendly and effective data curation.

Past Projects

Research for Students

There are several ways for students to get involved in our research such as working on a project or doing a master thesis at our group. We are always looking for bright and motivated students with interest in database research. Click here for more information about potential topics and how to get involved.