Ioan Raicu

Illinois Institute of Technology

Argonne National Laboratory

CS554: Data-Intensive Computing

Project Ideas

Please read all of the following documents to get ideas for viable projects. I would encourage all of you to choose one of the following projects. I will allow other project ideas, but you have to discuss it with me before writing it up (to make sure both relevance and difficulty is appropriate). I encourage groups of 2 for these projects, although some projects might work with 1 person, or some might require 3 poeple becuase of the ambitious nature of the project.

Project Acronym Project Title Project Mentor(s) Group Size
CloudKon:CKHPC Extending CloudKon to support HPC Jobs Scheduling Iman & Ke 2~3
CloudKon:DQS Implementing a Scalable Distributed Queue Service Iman 2~3
CloudKon:DTS Distributed Task Scheduling with Amazon SQS Iman 1~2
CloudKon:CKMR CloudKon:CKMR – Accelerating MapReduce with CloudKon Iman 2~3
FusionFS:Cache Cost-Effective Caching for Distributed File Systems  Dongfang 2
FusionFS:CKPT Efficient Checkpointing with Distributed File Systems Dongfang 2
FusionFS:DAS Data-Aware Scheduling for Distributed File Systems  Dongfang 2~3
FusionFS:FFSZ Exploring Data Compression in Distributed File Systems  Dongfang 1~2
FusionFS:FHFS Improving Hadoop through FusionFS Dongfang 2~3
FusionFS:IDA Towards Storage-Efficient Data Reliability in Distributed File Systems Dongfang 2~3
FusionFS:Lib Improving FusionFS Performance through User-level Library Interfaces Dongfang 1~2
FusionFS:Prov Scalable Distributed Data Provenance Dongfang 2
GeMTC:GApp Porting applications to the GeMTC Framework Scott 2
GeMTC:Gmon Extend GeMTC with Monitoring and Visualization Capabilities Scott 1~2
GeMTC:MIC Supporting MTC Applications on Intel Xeon Phi Many-Core Accelerators Scott 2
GeMTC:MTCSim Analyze Many-Task Computing Workflows on GPU Simulators Scott 2
GeMTC:OpenMTC Support MTC Applications on Accelerators Scott 2~3
GeMTC:Test Automated Testing & Benchmarking Suite for the GeMTC project Scott 1
MATRIX:BenchJMS Benchmarking State-of-the-art Job Management Systems for High Performance Computing  Ke & Iman 1
MATRIX:BenchTEF Benchmarking of the state-of-the-art Task Execution Frameworks for Many-Task Computing Ke & Iman 1
MATRIX:DJLSim Exploring Resource Allocation Techniques for Distributed Job Launch under High System Utilization through Simulation Ke 2
MATRIX:DJLSys Exploring Resource Allocation Techniques for Distributed Job Launch under High System Utilization Ke 2~3
MATRIX:HJLSim Exploring HPC Hierarchical Job Launch and MTC Distributed Task Scheduling at Extreme Scales through Simulation  Ke 2
MATRIX:MonSim Evaluating Communication Overheads for Distributed Monitoring with Aggregation Trees through Simulation  Ke 2
MATRIX:Swift/M Integrating Swift with MATRIX to Support Large-Scale Scientific Many-Task Computing Applications  Ke 2~3
NET:MPNet Improving Network Throughput through Multipath Network Routing Ioan 2~3
OS:DistOS Support for Legacy Applications through Distributed Operating Systems Ioan 2~3
ZHT:Bench Benchmarking mainstream NoSQL databases Tony 1
ZHT:Const Eventual consistency support for ZHT Tony & Ke 2
ZHT:DMHDFS Distributed Metadata Management for the Hadoop File System Tony 2~3
ZHT:Graph Design and implement a graph database on ZHT  Tony 2~3
ZHT:MonSys Evaluating Communication Overheads for Distributed Monitoring with Aggregation Trees  Tony & Ke 2~3
ZHT:OHT Hierarchical Distributed Hash Tables Tony 2~3
ZHT:ZST Enhance ZHT through Range Queries and Interators Tony 2

In order to better guide the brainstorming discussion on Wednesday, please complete the following Google form at https://docs.google.com/forms/d/19rrIUVx4T4hfQjzKURNyQoOMnXJB7nVtoWw9CIs3cF4/viewform, on your top choices for project ideas. A summary of the results can be found at project_people_statistics.xlsx

Next Semester Spring 2014

TBA