EECS 395 / EECS 495

Hot Topics in Distributed Systems:

Data-Intensive Computing

Quarter: Winter 2010
Lecture Time: Tuesday/Thursday, 12:30PM - 1:50PM
Lecture Location: TECH L158
Office Hours Time: Thursday, 2:00PM - 3:00PM
Office Hours Location: TECH M384
Instructor: Dr. Ioan Raicu
(iraicu@eecs.northwestern.edu, 1-847-491-8163)

The support for Data Intensive Computing is critical to advancing modern science as storage systems have experienced an increasing gap between its capacity and its bandwidth by more than 10-fold over the last decade. There is an emerging need for advanced techniques to manipulate, visualize and interpret large datasets. Building large scale distributed systems that support data-intensive computing involves challenges at multiple levels, from the network (e.g., transport, routing) to the algorithmic (e.g., data distribution, resource management) and even the social (e.g., incentives). This course is a tour through various research topics in distributed systems, covering topics in cluster computing, grid computing, supercomputing, and cloud computing. We will explore solutions and learn design principles for building large network-based computational systems to support data intensive computing. Our readings and discussions will help us identify research problems and understand methods and general approaches to design, implement, and evaluate distributed systems to support data intensive computing. Topics include resource management (e.g. discovery, allocation, compute models, data models, data locality, virtualization, monitoring, provenance), programming models, application models, and system characterization. Our discussions will often be grounded in the context of deployed distributed systems, such as the TeraGrid, Amazon EC2 and S3, various top supercomputers (e.g. IBM BlueGene/P, Sun Constellation, Cray XT5), and various software/programming platforms (e.g. Google's MapReduce, Hadoop, Dryad, Sphere/Sector, Swift/Falkon, and Parrot/Chirp). The course involves lectures, outside invited speakers, discussions of research papers, and a major project (including both a written report and an oral presentation).

Lecture topics:

Date Lecture Topic Reading Assignments
01-05-2010 Syllabus (Slides, PDF) Reading #1
Foreword (PDF)
Jim Gray on eScience (PDF)
Reading Write-up Instructions
01-06-2010     Reading Writeup #1 Due at 11:59PM
(just "Summary of Paper", at least 300 words collectively)
01-07-2010 Data Intensive Computing Overview (Slides)    
01-12-2010 Data Intensive Computing Overview Continued (Slides) Reading #2
Cloud Computing and Grid Computing 360-Degree Compared (PDF)
Homework 1 (PDF)
01-13-2010     Homework 1 due at 11:59PM
Reading Writeup #2 Due at 11:59PM
(just "Summary of Paper", at least 300 words collectively)
01-14-2010 Distributed Systems: Clusters, Supercomputers, Grids and Clouds (Slides)   Homework 2
01-18-2010     Homework 2 due at 11:59PM
01-19-2010 Projects Brainstorming (Slides)   Project
01-21-2010 Local Resource Management Systems (Slides)    
01-26-2010 Storage Systems: Data Diffusion (Slides) Reading #3
The Google File System (PDF)
The Hadoop Distributed File System: Architecture and Design (PDF)
Sector and Sphere: The Design and Implementation of a High Performance Data Cloud (PDF)
Project proposal due at 12PM
01-27-2010     Reading Writeup #3 Due at 11:59PM
Reading Write-up Instructions
01-28-2010 Distributed File Systems (Slides) Reading #4
MapReduce: Simplified Data Processing on Large Clusters (PDF)
 
02-01-2010     Reading Writeup #4 Due at 11:59PM
Reading Write-up Instructions
02-02-2010 MapReduce (Slides) Reading #5
GPFS: A Shared-Disk File System for Large Computing Clusters (PDF)
Lustre: Building a File System for 1,000-node Clusters (PDF)
PVFS: A Parallel File System for Linux Clusters (PDF)
 
02-08-2010     Reading Writeup #5 Due at 11:59PM
Reading Write-up Instructions
02-04-2010 Shared and Parallel File Systems (Slides)    
02-09-2010 Parallel Programming Systems and Models (Slides) Reading #6
Reevaluating Amdahl's Law in the Multicore Era (PDF)
 
02-10-2010     Reading Writeup #6 Due at 11:59PM
(just "Summary of Paper", at least 300 words collectively)
02-11-2010 Parallel Programming Systems and Models (Slides)    
02-16-2010 Project mid-quarter status presentations
Vaibhav Rastogi & Yinzhi Cao, DataShed: Monitoring and Diagnosis of Large Scale P2P Video Streaming Networks (Slides)
Arefin Huq, Tunebot in the Cloud (Slides)
  Project mid-quarter status presentations due at 12:30PM
02-18-2010 Project mid-quarter status presentations
Hongyu Gao, Automatic Parallelism Discovery (Slides)
Chen Jin, Distributed File System (Slides)
Reading #7
Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches (PDF)
 
02-22-2010     Reading Writeup #7 Due at 11:59PM
Reading Write-up Instructions
02-23-2010 Guest Lecture: Dr. Nikos Hardavellas
Many-core Computing Era and New Challenges (Slides)
Reading #8
Parallel Scripting for Applications at the Petascale and Beyond (PDF)
Workflows and e-Science: An overview of workflow system features and capabilities (PDF)
 
02-24-2010     Reading Writeup #8 Due at 11:59PM
Reading Write-up Instructions
02-25-2010 Workflow Systems (Slides)    
03-02-2010 Workflow Systems (Slides)    
03-04-2010 Workflow Systems (Slides) Reading #9
A high-performance, portable implementation of the MPI message passing interface standard (PDF)
 
03-08-2010     Reading Writeup #9 Due at 11:59PM
Reading Write-up Instructions
03-09-2010 MPI (Slides)    
03-11-2010 Future Research Directions: Exascale Many-Task Computing with Billions of Processors (Slides)   Project final report write-up instructions
03-17-2010     Project final report due at 11:59PM
03-18-2010
12:30PM - 3:30PM
Final Presentations (Schedule)
  Project final presentations due in class

 

Last modified: July 07, 2011