Main content

CS 673 — Big Data Systems
(dt. Big-Data-Systems)

Level, degree of commitment Specialization module, depends on importing study program
Forms of teaching and learning,
workload
Lecture (4 SWS), recitation class (2 SWS),
270 hours (90 h attendance, 180 h private study)
Credit points,
formal requirements
9 CP
Course requirement(s): Successful completion of at least 50 percent of the points from the weekly exercises as well as at least 2 presentations of the tasks.
Examination type: Oral examination (individual examination) or written examination
Language,
Grading
English,
The grading is done with 0 to 15 points according to the examination regulations for the degree program M.Sc. Computer Science.
Origin M.Sc. Computer Science
Duration,
frequency
One semester,
each winter semester
Person in charge of the module's outline Prof. Dr. Thorsten Papenbrock

Contents

  • Actor-, service-, batch-, and stream-based distributed programming.
  • Big Data systems
  • Data serialization and message passing
  • Data structures for distributed data management
  • OSI model and communication protocols
  • Data partitioning and replication
  • Consistency and reconciliation protocols
  • Time synchronization and change propagation
  • Distributed request scheduling

Qualification Goals

Translation is missing, sorry. German original:

Die Studierenden

  • können Herausforderungen beim Bau verteilter Systeme (Distributed Systems) benennen,
  • können reaktives, verteiltes Programmieren (Actor Programming) erklären,
  • können Techniken zur digitalen Representation und zum Serialisieren von Daten (Encoding) erläutern,
  • können Verfahren zur Funktionsweise von Netzwerken (Communication) beschreiben,
  • können Standards zur Strukturierung und Anfrage von Daten (Data Models and Query Languages) angeben,
  • können Algorithmen und Datenstrukturen zum verteilten Arbeiten mit Daten (Storage and Retrieval) erläutern,
  • können Techniken zur Gewährleistung von Ausfallsicherheit und Verfügbarkeit (Replication and Partitioning) beschreiben,
  • können Techniken zur Gewährleistung von Konsistenz und Einigkeit (Consistency and Consensus) beschreiben,
  • können Algorithmen für verteiltes Transaktionsmanagement (Transactions) verstehen,
  • können Frameworks zur verteilten Stapelverarbeitung datenintensiver Aufgaben (Batch Processing) und zur verteilten Datenstromverarbeitung (Stream Processing) erläutern,
  • können die Funktionsweise verteilter Datenbankmanagement Systeme (Distributed DBMSs) erklären,
  • können Grundlagen der verteilten Anfrageverarbeitung (Distributed Query Optimization) erklären,
  • sind in der Lage, diese Kenntnisse praktisch in der Programmierung datenintensiver, verteilter Algorithmen anzuwenden,
  • sind in der Lage, wissenschaftliche Arbeitsweisen beim eigenständigen Erkennen, Formulieren und Lösen von Problemen anzuwenden,
  • sind in der Lage, über wissenschaftliche Inhalte frei zu sprechen, sowohl vor einem Publikum als auch in einer Diskussion.

Prerequisites

None. The competences taught in the following modules are recommended: either Algorithms and Data Structures or Practical Informatics II: Data Structures and Algorithms for Pre-Service-Teachers, Database Systems.


Applicability

The module can be attended at FB12 in study program(s)

  • B.Sc. Data Science
  • B.Sc. Computer Science
  • M.Sc. Data Science
  • M.Sc. Computer Science
  • M.Sc. Mathematics
  • M.Sc. Business Informatics
  • M.Sc. Business Mathematics

When studying M.Sc. Computer Science, this module can be attended in the study area Compulsory Elective Modules in Computer Science.

The module can also be used in other study programs (export module).

The module is assigned to Practical Computer Science. Further information on eligibility can be found in the description of the study area.


Recommended Reading

  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Martin Kleppmann, 2017, 978-1449373320
  • Distributed Systems, Maarten van Steen and Andrew S. Tanenbaum, 2017, 978-1543057386
  • Principles of Distributed Database Systems, M. Tamer Özsu and Patrick Valduriez, 2011, 978-1441988331
  • Web-Scale Data Management for the Cloud, Wolfgang Lehner and Kai-Uwe Sattler, 2013, 1489997717
  • Introduction to Parallel Computing, Zbigniew J. Czech, 2017, 978-1107174399
  • Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services, Brendan Burns, 2017, 978-1491983645
  • Spark: Big Data Cluster Computing in Production, Ilya Ganelin and Ema Orhian and Kai Sasaki and Brennon York, 2016, 978-1119254010
  • Reactive Messaging Patterns with the Actor Model, Vaughn Vernon, 2015, 978-0133846836
  • Mining Massive Datasets, Jure Leskovec and Anand Rajaraman and Jeffrey David Ullman, 2014, 978-1107077232
  • Algorithmische Geometrie, Rolf Klein, 2005, 978-3540209560



Please note:

This page describes a module according to the latest valid module guide in Winter semester 2025/26. Most rules valid for a module are not covered by the examination regulations and can therefore be updated on a semesterly basis. The following versions are available in the online module guide:

  • Winter 2016/17 (no corresponding element)
  • Summer 2018 (no corresponding element)
  • Winter 2018/19 (no corresponding element)
  • Winter 2019/20 (no corresponding element)
  • Winter 2020/21 (no corresponding element)
  • Summer 2021 (no corresponding element)
  • Winter 2021/22 (no corresponding element)
  • Winter 2022/23 (no corresponding element)
  • Winter 2023/24 (no corresponding element)
  • Winter 2025/26

The module guide contains all modules, independent of the current event offer. Please compare the current course catalogue in Marvin.

The information in this online module guide was created automatically. Legally binding is only the information in the examination regulations (Prüfungsordnung). If you notice any discrepancies or errors, we would be grateful for any advice.