Ioan Raicu

Illinois Institute of Technology

Argonne National Laboratory

CS554: Data-Intensive Computing

Semester: Fall 2013

Lecture Time: Monday/Wednesday, 11:25AM - 12:40PM

Lecture Location: Stuart Building 238

Overflow Lecture Location: Stuart Building 204Professor: Dr. Ioan Raicu (iraicu@cs.iit.edu)

Office Hours Time: Wednesday 12:45PM-1:45PM

Office Hours Location: Stuart Building 237D

Teaching Assistant: Ke Wang (kwang22@hawk.iit.edu)

Office Hours Time: Monday 1:40PM-2:30PM, Tuesday 12:45PM-1:45PM

Office Hours Location: Stuart Building 002

Teaching Assistant: Tonglin Li (tli13@iit.edu)

Office Hours Time: Thursday 10AM-11AM, Friday 12:45PM-1:45PM

Office Hours Location: Stuart Building 002

 

This course is a tour through various research topics in distributed data-intensive computing, covering topics in cluster computing, grid computing, supercomputing, and cloud computing. We will explore solutions and learn design principles for building large network-based computational systems to support data intensive computing. This course is geared for junior/senior level undergraduates and graduate students in computer science. Prerequsites: CS450; however, one or more of the following courses would be recommended: 495 (Intro to Distributed Systems), CS546, CS550, CS553, or CS570. 

We will be using Piazza to facilitate course discussions, at http://piazza.com/iit/fall2013/cs554/home. The mailing list previously announced will be removed.  

In order to highight some of the best projects from the class this year (11 of the 30 projects), I have posted some of the final reports and presentation slides below (for a complete list of project titles and students, click here):

  1. Dongfang Zhao, Ioan Raicu. "Exploring Data Compression in Distributed File Systems", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
    • Slides
    • Improved paper to be submitted to USENIX ATC 2014        
  2. Xiaobing Zhou, Hao Chen, Ke Wang, Michael Lang, Ioan Raicu. "Exploring Distributed Resource Allocation Techniques in the SLURM Job Management System", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
    • Slides
    • Improved paper to be submitted to ACM HPDC 2014
  3. Ajay Anthony, Sandeep Palur, Iman Sadooghi, Ioan Raicu. "CloudKon Reloaded with efficient Monitoring, Bundled Response and Dynamic Provisioning", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
    • Slides
    • Some results from this report were submitted as part of a CloudKon paper to IEEE CCGrid 2014
  4. Kiran Ramamurthy, Ke Wang, Ioan Raicu. "Exploring Distributed HPC Scheduling in MATRIX", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
    • Slides
    • Some results from this report to be submitted as part of a MATRIX paper to ACM HPDC 2014 
  5. Isha Kapur, Karthik Belgodu, Pankaj Purandare, Iman Sadooghi, Ioan Raicu. "Extending CloudKon to Support HPC Job Scheduling", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
    • Slides
    •  Some results from this report were submitted as part of a CloudKon paper to IEEE CCGrid 2014
  6. Digvijay Singh Gahlot, Scott Krieder, Ioan Raicu. "Accelerating Simulation Codes through the GeMTC Framework", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
    • Slides
    • Some results from this report to be submitted as part of a GeMTC paper to ACM HPDC 2014
  7. Kun Feng, Tianyang Che, Tonglin Li, Ioan Raicu. "OHT: Hierarchical Distributed Hash Tables", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
  8. Shukun Xie, Ran Xin, Tonglin Li, Ioan Raicu. "Exploring Eventual Consistency Support in ZHT", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
  9. Dharmit Patel, Faraj Khasib, Shiva Srivastava, Iman Sadooghi, Ioan Raicu. "HDMQ: Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013
  10. Sean Wallace, Scott Krieder, Ioan Raicu. "Power Profiling of GeMTC Many Task Computing", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013

Schedule

Date Lecture Topic Reading (To be completed by posted date) Assignments
08-19-2013 Syllabus (Slides, PDF)    
08-21-2013 Introduction to Distributed Systems (Slides)

 

 
08-26-2013 Introduction to Distributed Systems    
08-28-2013 Project Brainstorming (Slides) Project Ideas

Project Proposal Handout

08-30-2013
12:45PM-1:45PM in SB238
ZHT: a zero-hop distributed hashtable (Slides) -- Tonglin Li    
09-02-2013 Labor Day -- NO CLASS    
09-03-2013
12:45PM-1:45PM in SB238
CANCELED ZHT Tutorial (slides) -- Tonglin Li    
09-03-2013
1:50PM-3:05PM in LS111
Linux & Jarvis Cluster -- Scott Krieder (Slides - part of CS550, optional, it will not be recorded)    
09-04-2013 FusionFS: Fusion Distributed File System (Slides) -- Dongfang Zhao

Reading #1

  1. Supporting Large Scale Data-Intensive Computing with the FusionFS Distributed File System, Technical Report, IIT 2013
 
09-04-2013
2PM-3PM in SB106
ZHT: a zero-hop distributed hashtable (Slides) -- Tonglin Li

Reading #2

  1. ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table, IPDPS 2013
 
09-04-2013
3:15PM-4:15PM in SB106
ZHT Tutorial (slides) -- Tonglin Li    
09-05-2013
11:25AM-12:40PM in SB238
FusionFS Tutorial -- Dongfang Zhao FusionFS Manual (no writeup)  
09-06-2013
12:45PM-1:45PM in SB238
MATRIX: MAny-Task computing execution fabRIc at eXascales (Slides) -- Ke Wang

Reading #3

  1. MATRIX: Many-Task Computing Execution Fabric for Extreme Scales, Technical Report, IIT 2013
 
09-06-2013
2:00PM-3:00PM in SB238
MATRIX Tutorial (Slides) -- Ke Wang    
09-09-2013 GeMTC: GPU-Enabled Many-Task Computing (Slides) -- Scott Krieder

Reading #4

  1. GEMTC: GPU Enabled Many-Task Computing , Technical Report, IIT, 2013
 
09-09-2013
12:45PM-1:45PM in SB238
GeMTC Tutorial (Slides) -- Scott Krieder CUDA Slides  
09-11-2013 CloudKon: a Cloud-enabled Distributed Task Execution Framework (Slides) -- Iman Sadooghi

Reading #5

  1. CloudKon: a Cloud-enabled Distributed Task Execution Framework, Technical Report, IIT, 2013
 
09-12-2013
12:45PM-1:45PM in SB238
CloudKon & AWS Tutorial (Slides) -- Iman Sadooghi    
09-13-2013     Reading Write-up Instructions
Reading Writeup #1, #2, #3, #4, and #5 due (09-13-2013 at 11:59PM)
09-16-2013 Data Intensive Computing Overview (Slides)

Reading #6 (one review for both)

  1. Foreword (PDF)
  2. Jim Gray on eScience (PDF)
Project Proposal Due on 09-16-2013 at 11:59PM
09-16-2013
12:45PM-1:45PM in SB238
Linux Tutorial and Sirius Cluster (Slides) -- Tonglin Li    
09-18-2013 Data Intensive Computing Overview

Reading #7

  1. Cloud Computing and Grid Computing 360-Degree Compared (PDF)

Optional (no review needed):

  1. Above the clouds: A Berkeley view of cloud computing (PDF)
  2. The Anatomy of the Grid (PDF)
 
09-20-2013     Reading Writeup #6 and #7 due
09-23-2013 Grid Computing and Cloud Computing (Slides)    
09-25-2013 Grid Computing and Cloud Computing    
09-30-2013 Workflow Systems [Swift] (Slides)

Reading #8

  1. Swift: A language for distributed parallel scripting

Optional (no review needed):

  1. Swift/T: Large-scale Application Composition via Distributed-memory Dataflow Processing
  2. Parallel Scripting for Applications at the Petascale and Beyond
  3. Swift: Fast, Reliable, Loosely Coupled Parallel Computation
 
10-02-2013 NO CLASS -- Please attend the CS Reunion in MTCC at 11:15AM-12:30PM    
10-07-2013 NO CLASS (Fall Break Day)

 

 
10-09-2013 Workflow Systems [Swift]   Reading Writeup #8 due
10-14-2013 Workflow Systems [Swift]   Project Midterm Progress Report Writeup
10-16-2013 Workflow Systems [Falkon] (Slides)

Reading #9

  1. Falkon: a Fast and Light-weight tasK executiON framework

Optional (no review needed)

  1. Toward Loosely Coupled Programming on Petascale Systems"
- Extra Credit Writeup on Data Science Panel
- Reading Writeup #9 due 
10-21-2013 Workflow Systems (Data-Diffusion)

Reading #10

  1. The Quest for Scalable Support of Data Intensive Workloads in Distributed Systems
- Project Midterm Progress Report Due
- Reading Writeup #10 due
10-23-2013 MapReduce (Slides)

Reading #11

  1. MapReduce: Simplified Data Processing on Large Clusters
Reading Writeup #11 due
10-25-2013
12:45PM-1:45PM in SB201
ZHT Tutorial (Slides)
Xiaobing Zhou
   
10-28-2013 MapReduce    
10-30-2013 A Berkeley View of Resource Management(Spark, Mesos, RDD, Shark, Sparrow) (Slides #1, Slides #2)

Reading #12

  1. Sparrow: distributed, low latency scheduling

Optional (no review needed)

  1. Spark: cluster computing with working sets
  2. Mesos: A platform for fine-grained resource sharing in the data center
  3. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing
  4. Shark: fast data analysis using coarse-grained distributed memory
Reading Writeup #12 due
11-04-2013 HPC Storage
Guest Lecture by Rob Ross (Argonne National Laboratory) (Slides)

Reading #13

  1. I/O Performance Challenges at Leadership Scale
Reading Writeup #13 due
11-06-2013 Parallel File Systems (Slides #1, Slides #2)

Reading #13 Optinoal (NO REVIEW NEEDED)

  1. GPFS: A Shared-Disk File System for Large Computing Clusters (PDF)
  2. PVFS: A Parallel File System for Linux Clusters (PDF)
  3. Lustre: Building a File System for 1,000-node Clusters (PDF)
  4. Scalable Performance of the Panasas Parallel File System (PDF)
 
11-11-2013 Distributed File Systems (Slides)

Reading #14

  1. The Google File System
Reading Writeup #14 due
11-13-2013 Distributed File Systems (Slides #1, Slides #2)

Reading #15

  1. Ceph: A Scalable, High-Performance Distributed File System

Optional (no review needed)

  1. Ceph as a scalable alternative to the Hadoop Distributed File System
- Reading Writeup #15 due
- Project Final Report Writeup
11-18-2013 Distributed Databases (Guest Lecture Boris Glavic)

Reading #16

  1. Hive-a petabyte scale data warehouse using hadoop

Optional (no review needed)

  1. Pig latin: a not-so-foreign language for data processing
  2. Dremel: interactive analysis of web-scale datasets
  3. Spanner: Google's Globally-Distributed Database
Reading Writeup #16 due
11-20-2013 Distributed Hash Tables (Guest Lecture Tonglin Li)

Reading #17

  1. Dynamo: amazon's highly available key-value store

Optional (no review needed)

  1. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
Reading Writeup #17 due
11-25-2013 TBA

TBA

TBA
11-27-2013 NO CLASS (Thanksgiving)    
12-02-2013
9AM - 9PM
SB 212
Project Final Presentations    
12-04-2013 NO CLASS   Project Final Reports Due at 11:59PM