Main content
CS 673 — Big Data Systems
(dt. Big-Data-Systems)
| Level, degree of commitment | Specialization module, compulsory elective module |
| Forms of teaching and learning, workload |
Lecture (4 SWS), recitation class (2 SWS), 270 hours (90 h attendance, 180 h private study) |
| Credit points, formal requirements |
9 CP Course requirement(s): Successful completion of at least 50 percent of the points from the weekly exercises as well as at least 2 presentations of the tasks. Examination type: Oral examination (individual examination) or written examination |
| Language, Grading |
English,The grading is done with 0 to 15 points according to the examination regulations for the degree program M.Sc. Computer Science. |
| Duration, frequency |
One semester, each winter semester |
| Person in charge of the module's outline | Prof. Dr. Thorsten Papenbrock |
Contents
- Actor-, service-, batch-, and stream-based distributed programming.
- Big Data systems
- Data serialization and message passing
- Data structures for distributed data management
- OSI model and communication protocols
- Data partitioning and replication
- Consistency and reconciliation protocols
- Time synchronization and change propagation
- Distributed request scheduling
Qualification Goals
Translation is missing, sorry. German original:
Die Studierenden
- können Herausforderungen beim Bau verteilter Systeme (Distributed Systems) benennen,
- können reaktives, verteiltes Programmieren (Actor Programming) erklären,
- können Techniken zur digitalen Representation und zum Serialisieren von Daten (Encoding) erläutern,
- können Verfahren zur Funktionsweise von Netzwerken (Communication) beschreiben,
- können Standards zur Strukturierung und Anfrage von Daten (Data Models and Query Languages) angeben,
- können Algorithmen und Datenstrukturen zum verteilten Arbeiten mit Daten (Storage and Retrieval) erläutern,
- können Techniken zur Gewährleistung von Ausfallsicherheit und Verfügbarkeit (Replication and Partitioning) beschreiben,
- können Techniken zur Gewährleistung von Konsistenz und Einigkeit (Consistency and Consensus) beschreiben,
- können Algorithmen für verteiltes Transaktionsmanagement (Transactions) verstehen,
- können Frameworks zur verteilten Stapelverarbeitung datenintensiver Aufgaben (Batch Processing) und zur verteilten Datenstromverarbeitung (Stream Processing) erläutern,
- können die Funktionsweise verteilter Datenbankmanagement Systeme (Distributed DBMSs) erklären,
- können Grundlagen der verteilten Anfrageverarbeitung (Distributed Query Optimization) erklären,
- sind in der Lage, diese Kenntnisse praktisch in der Programmierung datenintensiver, verteilter Algorithmen anzuwenden,
- sind in der Lage, wissenschaftliche Arbeitsweisen beim eigenständigen Erkennen, Formulieren und Lösen von Problemen anzuwenden,
- sind in der Lage, über wissenschaftliche Inhalte frei zu sprechen, sowohl vor einem Publikum als auch in einer Diskussion.
Prerequisites
None. The competences taught in the following modules are recommended: either Algorithms and Data Structures or Practical Informatics II: Data Structures and Algorithms for Pre-Service-Teachers, Database Systems.
Applicability
Module imported from M.Sc. Computer Science.
It can be attended at FB12 in study program(s)
- B.Sc. Data Science
- B.Sc. Computer Science
- M.Sc. Data Science
- M.Sc. Computer Science
- M.Sc. Mathematics
- M.Sc. Business Informatics
- M.Sc. Business Mathematics
When studying M.Sc. Data Science, this module can be attended in the study area Free Compulsory Elective Modules.
Recommended Reading
- Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Martin Kleppmann, 2017, 978-1449373320
- Distributed Systems, Maarten van Steen and Andrew S. Tanenbaum, 2017, 978-1543057386
- Principles of Distributed Database Systems, M. Tamer Özsu and Patrick Valduriez, 2011, 978-1441988331
- Web-Scale Data Management for the Cloud, Wolfgang Lehner and Kai-Uwe Sattler, 2013, 1489997717
- Introduction to Parallel Computing, Zbigniew J. Czech, 2017, 978-1107174399
- Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services, Brendan Burns, 2017, 978-1491983645
- Spark: Big Data Cluster Computing in Production, Ilya Ganelin and Ema Orhian and Kai Sasaki and Brennon York, 2016, 978-1119254010
- Reactive Messaging Patterns with the Actor Model, Vaughn Vernon, 2015, 978-0133846836
- Mining Massive Datasets, Jure Leskovec and Anand Rajaraman and Jeffrey David Ullman, 2014, 978-1107077232
- Algorithmische Geometrie, Rolf Klein, 2005, 978-3540209560
Please note:
This page describes a module according to the latest valid module guide in Winter semester 2025/26. Most rules valid for a module are not covered by the examination regulations and can therefore be updated on a semesterly basis. The following versions are available in the online module guide:
- Winter 2016/17 (no corresponding element)
- Summer 2018 (no corresponding element)
- Winter 2018/19 (no corresponding element)
- Winter 2019/20 (no corresponding element)
- Winter 2020/21 (no corresponding element)
- Summer 2021 (no corresponding element)
- Winter 2021/22 (no corresponding element)
- Winter 2022/23 (no corresponding element)
- Winter 2023/24 (no corresponding element)
- Winter 2025/26
The module guide contains all modules, independent of the current event offer. Please compare the current course catalogue in Marvin.
The information in this online module guide was created automatically. Legally binding is only the information in the examination regulations (Prüfungsordnung). If you notice any discrepancies or errors, we would be grateful for any advice.