Lectures

This is an overview of the content covered in the lectures:

Date Topic Description Reading
2021-08-24 Logistics + Foundataions of Scalable and Distributed Storage and Comptuation We will discuss logistics of the course, projects, and paper assignments. Furthermore, we will discuss important foundations of distributed analytics including fault tolerance, consistency and consensus, load-balancing, sacalable algorithm desin and data placement.
Slides:
2021-08-26 2021-08-31 2021-09-02 2021-09-07 2021-09-09 2021-09-14 Distributed Storage & NoSQL Databases We will learn about distributed approaches for handling storage of unstructured and data structed data.
Slides:

Required Reading:

  • LSM-Based Storage Techniques: a Survey
  • Bigtable: A Distributed Storage System for Structured Data
  • The hadoop distributed file system

2021-09-16 2021-09-21 2021-09-23 2021-09-28 2021-09-30 2021-10-05 Distributed Batch Processing We will learn about DISC (data-intensive scalable computing) systems which automatically distribute the processing of complex tasks to a cluster of nodes.
Slides:

Required Reading:

  • MapReduce: simplified data processing on large clusters
  • Spark: cluster computing with working sets
  • A comparison of join algorithms for log processing in MapReduce

2021-10-07 2021-10-12 2021-10-14 2021-10-19 2021-10-21 High-level Dataflow & Query Languages We will learn about the high-level dataflow languages used in big data processing and how these languages are implemented by DISC systems.
Slides:

Required Reading:

  • Spark SQL: Relational data processing in Spark

2021-10-26 2021-10-28 2021-11-02 2021-11-04 2021-11-09 2021-11-11 Distributed Transaction Processing, Consensus, and Consistency We will learn about scalable distributed approaches for transaction processing and consistency including so called NewSQL systems.
Slides:

Required Reading:

  • A New Presumed Commit Optimization for Two Phase Commit
  • In Search of an Understandable Consensus Algorithm.
  • H-store: a high-performance, distributed main memory transaction processing system

2021-11-16 2021-11-18 2021-11-23 2021-11-30 Distributed Stream Processing & Publish-Subscribe We will learn about distributed systems for processing streams of data.
Slides:

Required Reading:

  • Kafka: A distributed messaging system for log processing