IIT Database Group

Fork me on GitHub


HRDBMS is a novel distributed relational database that combines the best of traditional (distributed) relational databases with ideas from modern distributed dataflow engines such as Hadoop or Spark. This allows HRDBMS to take advantage of years worth of research regarding query optimization, while also taking advantage of the scalability of Big Data platforms. The system was build from ground up to avoid many of the bottlenecks of SQL on Hadoop and Spark as well as the scalability issues of most traditional relational DBMS. The ultimate goal is to build a system that combines the per node performance of relational databases with the scalability of Big Data platforms. Some of the unique and not so unique features of HRDBMS are:

  • A cost-based query optimizer
  • Fully parallel and distributed execution engine
  • Support for index structures
  • Automatic caching through a rather traditional buffer manager
  • Support for efficient disk-based query execution using proven traditional query execution algorithms
  • Support for transactions
  • A non-blocking shuffle implementation
  • Support for horizontal partitioning and locality-aware query processing


HRDBMS is a collaboration between Boris and Ioan Raicu heading the IIT Data-Intensive Distributed Systems Laboratory. Together Ioan and Boris are supervising Ph.D. student Jason who build HRDBMS more or less singlehanded.


[2] Improving Data-Shuffle Performance In Data-Parallel Distributed Systems Shweelan Samson (Master Thesis), Master's thesis, IIT, 2018 [bibtex] [pdf]
[1] HRDBMS: A NewSQL Database for Analytics Jason Arnold, Boris Glavic, Ioan RaicuIn Proceedings of the IEEE International Conference on Cluster Computing (Poster) (Cluster), 2015 [bibtex] [pdf]