CS595-02 Fault Tolerance Computing

Prerequisites

CS450 Operating Systems
CS470 Computer Architecture

Contents

All information provide here in are tentative and subject to minor change



General Information


Instructor

Zhiling Lan, email: lan@iit.edu , SB #226D, 312.567.5710 
Class Hours: 1:50pm - 3:05pm (Tu. & Th)
Office Hours: 3:05pm - 4:00pm  (Tu. & Th.), or by appointment


Course Description

This is a research-oriented course that will cover challenges and opportunities in fault tolerance computing. There are no required textbooks. Instead, research publications will be used as reference materials. Each lecture will have 1-2 assigned papers to read. Students should read the papers before coming to class, participate in class discussions, present at least one research topic during the course, and do a term project individually or in a two-member team. and be prepared to discuss them. Major topics that will be covered include fautl measurement and modeling, fault detection and diagnosis, fault avoidance/prevention techniques, and  FT applications.  

Upon completion, the student should be able to: (1) understand research problems and challenges in fault tolerance computing; (2) identify the state-of-the-art  techniques and tools to address research problems and challenges; and (3) develop strong technical reviewing, writing, and  presentation skills.


Course Materials

There is no textbook.  Lecture notes and reading papers will be available from the class web page.
 

Lectures

Grading:



Academic Integrity:

Academic dishonesty (e.g. cheating or plagiarism) will not be tolerated under any circumstances. If you are having difficultly with any part of the course material, please see me as soon as possible. I will do everything I can to help you with any course-related problems you may have. If you are found to be guilty of academic dishonesty, however, I will then do everything I can to see that you are punished as forcefully as possible. This may include asking to have you suspended or expelled from the course, the program, and/or the university. 


Tentative Class Schedule:

 week

Topic

Assigned Readings

Assignments

1 (1/22 & 1/24)

Introduction

[slides]

 

Paper presentation signup. Please send an email to the instructor to bid three topics listed in week #3-#11). Please list your choices in decreasing order (from 3 to 1). You will be allocated with 1-2 topic based on the FCFS policy and availability.

2

(1/29 & 1/31)

redundancy & error detection
[slides]

 
Investigate your term project idea and do preparation for it. Talk to the instructor about your project idea and talk to other students in forming a group if you would like to work in a two-member group.

3
(2/5 & 2/7)
RAID Sunday midnight: reviews due for Lu's paper. Please use the review form [DOC]
fault injection

4

(2/12 & 2/14)

checkpointing(1)

 

Sunday midnight: review due for Plank's paper. Please use the review form [DOC]

checkpointing (2)

5

(2/19 & 2/21)

process migration  (1)
  • Milojicic, D., Douglis, F., Paindaveine, Y., Wheeler, R., Zhou, S, Process Migration Survey, ACM Computing Surveys, September 2000.

Sunday midnight: reviews due for Clark's paper. Please use the review form [DOC]

 

process migration (2)

6

(2/26 & 2/28)

other FT techniques

 

Sunday midnight: reviews due for Lan's paper. Please use the review form [DOC]

 

Student Project Proposal Presentation

7

(3/4 & 3/6)

reliability modeling

 

Sunday midnight: reviews due for Cohen and Chase's paper. Please use the review form [DOC]

 

trouble shooting (1) 

8

(3/11 & 3/13)

troubleshooting (2)

Sunday midnight: reviews due for Sahoo's paper. Please use the review form [DOC]

 

troubleshooting (3)

Spring break

  • No class

 

 

9

(3/25 & 3/27)

debugging (1)

Sunday midnight: reviews due for Chen's paper. Please use the review form [DOC]

 

debugging (2) 


10

(4/1 &4/3)

empirical analysis


 

Sunday midnight: reviews due for DeCandia's paper. Please use the review form [DOC]

 

system design

11

(4/8 & 4/10)

misc



No paper reading assigned. You should spend time on your term projects.

 

 

misc

12
(4/15 & 4/17)

Out of Town (IPDPS08)

 

13
(4/22 & 4/24)

Student Presentation

 

14
(4/29 & 5/1)

Student Presentation

 

15
(5/6 & 5/8)
Student Presentation