CS595: Hot Topics in Distributed Systems: Data-Intensive Computing

Quarter: Fall 2010
Lecture Time: Monday/Wednesday, 1:50PM - 3:15PM
Lecture Location: Stuart Building 106
Office Hours Time: Wednesday, 3:15PM - 4:15PM
Office Hours Location: Stuart Building 237D
Professor: Dr. Ioan Raicu
(iraicu@cs.iit.edu)

The support for Data Intensive Computing is critical to advancing modern science as storage systems have experienced an increasing gap between its capacity and its bandwidth by more than 10-fold over the last decade. There is an emerging need for advanced techniques to manipulate, visualize and interpret large datasets. Building large scale distributed systems that support data-intensive computing involves challenges at multiple levels, from the network (e.g., transport, routing) to the algorithmic (e.g., data distribution, resource management) and even the social (e.g., incentives). This course is a tour through various research topics in distributed systems, covering topics in cluster computing, grid computing, supercomputing, and cloud computing. We will explore solutions and learn design principles for building large network-based computational systems to support data intensive computing. Our readings and discussions will help us identify research problems and understand methods and general approaches to design, implement, and evaluate distributed systems to support data intensive computing. Topics include resource management (e.g. discovery, allocation, compute models, data models, data locality, virtualization, monitoring, provenance), programming models, application models, and system characterization. Our discussions will often be grounded in the context of deployed distributed systems, such as the TeraGrid, Amazon EC2 and S3, various top supercomputers (e.g. IBM BlueGene/P, Sun Constellation, Cray XT5), and various software/programming platforms (e.g. Google's MapReduce, Hadoop, Dryad, Sphere/Sector, Swift/Falkon, and Parrot/Chirp). The course involves lectures, outside invited speakers, discussions of research papers, and a major project (including both a written report and an oral presentation).

Lecture topics:

Date Lecture Topic Reading (To be completed by posted date) Assignments
08-23-2010 Syllabus (Slides, PDF)   Reading Write-up Instructions
08-25-2010 Data Intensive Computing Overview (Slides)    
08-30-2010 Data Intensive Computing Overview Continued (Slides) Reading #1
Foreword (PDF)
Jim Gray on eScience (PDF)
Reading Writeup #1 Due at 12PM (just "Summary of Paper", 300 words)
Homework 1 (PDF)
09-01-2010 High-Performance Computing (Slides) Reading #2
What's next in high-performance computing? (PDF)
Reading Writeup #2 Due at 12PM (just "Summary of Paper", 300 words for each)
09-06-2010 Labor Day - NO CLASS    
09-08-2010 Many-Task Computing (Slides) Reading #2
Many-Task Computing for Grids and Supercomputers (PDF)

Optional:
Falkon: a Fast and Light-weight tasK executiON framework (PDF)
Reading Writeup #3 Due at 1:45PM (just "Summary of Paper", 300 words for each)
Homework 1 due at 1:45PM
09-13-2010 Projects Brainstorming (Slides)   Project Proposal (PDF)
09-15-2010 Cloud Computing and Grid Computing (Slides) Reading #3
Cloud Computing and Grid Computing 360-Degree Compared (PDF)

Optional:
Above the clouds: A Berkeley view of cloud computing (PDF)
 
09-20-2010 Parallel File Systems (Slides)
Guest Lecture by Sam Lang
Reading #4
GPFS: A Shared-Disk File System for Large Computing Clusters (PDF)
I/O Performance Challenges at Leadership Scale (PDF)

Optional:
PVFS: A Parallel File System for Linux Clusters (PDF)
Reading Writeup #4 Due at 1:45PM (Reading Write-up Instructions)
09-22-2010 Cloud Computing and Grid Computing (Slides)   Project proposal due at 1:45PM
09-27-2010 Cloud Computing and Grid Computing (Slides)    
09-29-2010 Parallel File Systems (Slides) Reading #5
Lustre: Building a File System for 1,000-node Clusters (PDF)
Reading Writeup #5 Due at 1:45PM
10-04-2010 Distributed File Systems (Slides)
Discussion Leader: Raman Verma
Reading #6
The Google File System (PDF)
Sector and Sphere: The Design and Implementation of a High Performance Data Cloud (PDF)
Reading Writeup #6 Due at 1:45PM
10-06-2010 Distributed File Systems (Slides)    
10-11-2010 Fall Break - NO CLASS    
10-13-2010 Parallel Programming Systems and Models (Slides)    
10-18-2010 MapReduce (Slides)
Discussion Leader: Xi Yang
Reading #7
MapReduce: Simplified Data Processing on Large Clusters (PDF)
Reading Writeup #7 Due at 1:45PM
10-20-2010 MapReduce (Slides)
Discussion Leaders: Harit Shah & Krishnaprasad Shetty
Reading #8
A comparison of approaches to large-scale data analysis (PDF)
MapReduce: A Flexible Data Processing Tool (PDF)
MapReduce and Parallel DBMSs: Friends or Foes? (PDF)
Reading Writeup #8 Due at 1:45PM
10-25-2010 MapReduce (Slides)
Discussion Leader: Juan Carlos Hernández Munuera
Reading #9
MapReduce Online (PDF)
MAD Skills: New Analysis Practices for Big Data (PDF)

Optional:
Large-scale Incremental Processing Using Distributed Transactions and Notifications (PDF)
Reading Writeup #9 Due at 1:45PM
10-27-2010 MapReduce (Slides)
Discussion Leader: Tonglin Li
Reading #10
Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience (PDF)
Reading Writeup #10 Due at 1:45PM
11-01-2010 Project mid-quarter status presentations (15 min each)
Xi Yang: Map/Reduce Scheduling under the voluntary computing environment (Slides)
Jin-Chuan Chen: The programming framework of distributed file system: Hadoop/MapReduce, Sector/Sphere, and Windows HPC Server/Dryad (Slides)
Xi Duan: The Impact of Stripe Size on Parallel Distributed File Systems (Slides)
Raman Verma: Implementing Data Replication and High Availability in Parallel Virtual File System – 2 (Slides)
  Project mid-quarter status presentations due at 1:45PM
Presenters: Xi Yang, Jin-Chuan Chen, Xi Duan, Raman Verma
11-03-2010 Project mid-quarter status presentations (15 min each)
Harit Shah & Krishnaprasad Shetty: Exascale File System: Snapshot and Record Append Operation (Slides)
Juan Carlos Hernández Munuera: Adapting Distributed Hash Tables to be implemented into a Distributed File System (Slides)
Zhou Zhou: D3: Distributed Data Structure (Slides)
Tonglin (Tony) Li: An Initial and Experimental Approach for Fusion Distributed File System (Slides)
  Project mid-quarter status presentations due at 1:45PM
Presenters: Harit Shah, Krishnaprasad Shetty, Juan Carlos Hernández Munuera, Zhou Zhou, Tonglin (Tony) Li
11-08-2010 Workflow Systems (Slides)
Discussion Leader: Jin-Chuan Chen
Reading #11
Swift: Fast, Reliable, Loosely Coupled Parallel Computation (PDF)
Parallel Scripting for Applications at the Petascale and Beyond (PDF)
Reading Writeup #11 Due at 1:45PM
11-10-2010 Workflow Systems (Slides)    
11-15-2010 Distributed Data Mining (Slides 1, Slides 2, Slides 3)
Guest Lecture by David Grossman
Reading #12
Planet: massively parallel learning of tree ensembles with mapreduce (PDF
Map-Reduce for Machine Learning on Multicore (PDF)
Reading Writeup #12 Due at 1:45PM
11-17-2010 Query Prediction in Large Scale Data Intensive Event Stream Analysis Systems (Slides)
Guest Lecture by Huaiming Song
No Office Hours
Optional Reading
Query Prediction in Large Scale Data Intensive Event Stream Analysis Systems (PDF)
 
11-22-2010 Workflow Systems (Slides)    
11-24-2010 Thanksgiving Break - NO CLASS    
11-29-2010 Distributed Hash Tables
Discussion Leader: Zhou Zhou
Reading #13
Dynamo: Amazon’s Highly Available Key-value Store Dynamo (PDF)
Kademlia: A Peer-to-peer Information System Based on the XOR Metric (PDF)
Reading Writeup #13 Due at 1:45PM
12-01-2010 Many-core Computing (Slides)
Discussion Leader: Xi Duan
Reading #14
Amdahl's law in the multicore era (PDF)
Reevaluating Amdahl's Law in the Multicore Era (PDF)
Reading Writeup #14 Due at 1:45PM
Project final report write-up instructions (PDF)
12-08-2010
1:15PM - 4:15PM
Final Presentations (20 min each)
Xi Yang: Map/Reduce Scheduling under the voluntary computing environment (Slides)
Jin-Chuan Chen: The programming framework of distributed file system: Hadoop/MapReduce, Sector/Sphere, and Windows HPC Server/Dryad (Slides)
Xi Duan: The Impact of Stripe Size on Parallel Distributed File Systems (Slides)
Raman Verma: Implementing Data Replication and High Availability in Parallel Virtual File System – 2 (Slides)
Harit Shah & Krishnaprasad Shetty: Exascale File System: Snapshot and Record Append Operation (Slides)
Juan Carlos Hernández Munuera: Adapting Distributed Hash Tables to be implemented into a Distributed File System (Slides)
Zhou Zhou: D3: Distributed Data Structure (Slides)
Tonglin (Tony) Li: An Initial and Experimental Approach for Fusion Distributed File System (Slides)
  Project final report due at 1:15PM
Project final presentations due in class

 

Last modified: July 07, 2011