IIT Database Group

Header bar


We teach several courses most of which are related to databases.

CS595 - Modern Big Data Analytics

Big data technologies, in particular, scalable distributed platforms for storage and analytics en- able processing of massive datasets for analytics, machine learning, and other use cases. This course provides a comprehensive overview of algorithms, systems, and techniques for Big Data processing. In a semester-long project, students will extend existing big data platforms. Additionally, in the seminar component of this course we will discuss cutting edge research and industrial developments in the field.

CS116 - Introduction to Object-Oriented Programming II

Continuation of CS 115. Introduces more advanced elements of object-oriented programming – including dynamic data structures, recursion, searching and sorting, and advanced object-oriented programming techniques. For students in CS and CS related degree programs.

CS425 - Database Organization

Databases management systems are a crucial part of most large-scale industry and open-source systems. This course familiarizes students with important concepts of database systems and design. We will learn how to design a database using the Entity-Relationship model, how query and modify a database using the declarative SQL language, and study APIs for writing application programs that use a database system to persist data. Furthermore, the course gives an overview of important database systems concepts such as indexing, query optimization and execution, concurrency control, and recovery.

Students will develop a database application in a group project. This project will cover all phases of development: assessing the application requirements, designing the database schema, and implementing the application.

CS520 - Data Integration, Warehousing, and Provenance

This course introduces the basic concepts of data integration, data warehousing, and provenance. We will learn how to resolve structural heterogeneity through schema matching and mapping. The course introduces techniques for querying several heterogeneous datasources at once (data integration) and translating data between databases with different data representations (data exchange). Furthermore, we will cover the data-warehouse paradigm including the Extract-Transform-Load (ETL) process, the data cube model and its relational representations (such as snowflake and star schema), and efficient processing of analytical queries. This will be contrasted with Big Data analytics approaches that (besides other differences) significantly reduce the upfront cost of analytics. When feeding data through complex processing pipelines such as data exchange transformations or ETL workflows, it is easy to loose track of the origin of data. In the last part of the course we therefore cover techniques for representing and keeping track of the origin and creation process of data - aka its provenance.

The course is emphasizing practical skills through a series of homework assignments that help students develop a strong background in data integration systems and techniques. At the same time, it also addresses the underlying formalisms. For example, we will discuss the logic based languages used for schema mapping and the dimensional data model as well as their practical application (e.g., developing an ETL workflow with rapid miner and creating a mapping between two example schemata). The literature reviews will familiarize students with data integration and provenance research.

CS525 - Advanced Database Organization

Databases management systems are a crucial part of most large-scale industry and open-source systems. This course provides comprehensive coverage of issues associated with database system development and an in-depth examination of structures and techniques used in contemporary database management systems (DBMSs). Students will learn about the inner workings of these exciting systems: Which algorithms are used? What are typical architectures used to build a system as complex as a DBMS? What are implementation strategies? These questions and more will be answered during the course.

The course is highly applied, emphasizing practical skills and habits through a series of programming assignments during which students will develop their own tiny DBMS like engine. We will cover the most important aspects/components of a DBMS: storage and buffer management, indexing, query optimization, query execution, and concurrency control and recovery.

CS595-06 - Hop Topics in Database Systems: Data Provenance

With the ever increasing amount of digital information comes an increasing need to understand "where" an piece of data (data item) is coming from, "why" it is in the result of a data transfor- mation, and "how" it was produced by the transformation. For example, biologists use complex digital workflow and simulations to gain new insights from measurement and derived data. The result data of a complex workflow is meaningless without information of how the data was produced from which input data. This type of information, i.e., information about the creation process and origin of data, is called data provenance. Systems that automatically track provenance information for data produced by e.g., workflows or SQL queries are becoming more and more important. Data provenance is an emerging technology which is used to, e.g., trace errors in transformed data back to its origin or gain additional insights about the data.

We will study several models of provenance developed for domains such as databases and workflow systems. This course covers approaches for automatically tracking provenance, and study query languages and storage mechanism for provenance information. Furthermore, we will discuss real systems that generate provenance data. This course gives the students the opportunity to learn about a hot topic in database research and work with novel research prototype provenance systems.