IIT Database Group

Fork me on GitHub

LDV

Light-weight database virtualization (LDV) is a novel approach for sharing and repeating applications that use a relational database. LDV monitors a DB application to create a repeatability package that encapsulates the application and its dependencies (input files, binaries, and libraries) as well as the necessary and relevant data from the database required for successful repetition. LDV relies on data provenance to determine which database tuples and input files are relevant. While monitoring an application to create a package we incrementally construct an execution trace (provenance graph), that records dependencies across OS and DB boundaries. Such a package can be shared and reexecuted on any compatible machine without requiring installation of dependencies (e.g., database system or libraries) and without having to manually create and setup a database.

LDV is available as open-source software at http://github.com/IITDBGroup/ldv.git

Monitoring Applications

To use our LDV system, the user monitors an execution by prefixing the command starting the application with ldv-audit: ldv-audit run-dbapp The figure below shows how LDV monitors the execution of the user's application and its interactions with the OS and DB system. By intercepting system calls such as file operations and process creation as well as SQL statements send to the DB we incrementally build what we call an execution trace - a provenance graph that records both OS and DB operations and data dependencies. In addition to creating the execution traces and including it in the reproducibility package for the application, LDV also copies accessed files and database tuples into the package. We support two options for shipping the database. The server-include packaging option includes a DB server and the relevant DB slice in the package. The server-excluded packaging option captures the results to queries issued by the application and stores these query results in the package.

Sharing and Reexecution

To replay the execution of an application stored in a package, without any installation or configuration, the user issues for a shared package: ldv-exec run-dbapp Before starting the actual application, LDV will start-up the database server included in the package, create the schema of the application, and load the DB slice. During application execution we reroute SQL queries to the package database and file operations into the package. If the server-excluded packaging option was chosen then we replay query results included in the package from files instead of actually executing any SQL operations.

Collaborators

  • Tanu Malik - Research Associate at the University of Chicago, Computation Institute and Argonne National Labs
  • Quan Pham - Ph.D. Student at the University of Chicago (graduated)
  • Richard Whaling - Emerging Technologies Developer at the University of Chicago
  • Ian Foster - University of Chicago, Director Computation Institute

Publications

2015
[4] Sharing and Reproducing Database Applications (Quan Pham, Richard Whaling, Boris Glavic, Tanu Malik), In Proceedings of the VLDB Endowment (PVLDB) (Demonstration Track), volume 8, 2015. [bibtex] [pdf]
[3] LDV: Light-weight Database Virtualization (Quan Pham, Tanu Malik, Boris Glavic, Ian Foster), In Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE), 2015. [bibtex] [pdf] [slides]
[2] Making Database Applications Shareable (Boris Glavic, Tanu Malik, Quan Pham), In Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP) (Poster), 2015. [bibtex] [pdf]
2014
[1] LDV: Light-weight Database Virtualization (Quan Pham, Tanu Malik, Boris Glavic, Ian Foster), Technical report, Illinois Institute of Technology, IIT/CS-DB-2014-03, 2014. [bibtex] [pdf]