Main content

CS 673 — Big Data Systems
(dt. Big-Data-Systems)

Level, degree of commitment Specialization module, compulsory elective module
Forms of teaching and learning,
workload
Lecture (4 SWS), recitation class (2 SWS),
270 hours (90 h attendance, 180 h private study)
Credit points,
formal requirements
9 CP
Course requirement(s): Successful completion of at least 50 percent of the points from the weekly exercises as well as at least 2 presentations of the tasks.
Examination type: Oral examination (individual examination) or written examination
Language,
Grading
English,
The grading is done with 0 to 15 points according to the examination regulations for the degree program M.Sc. Computer Science.
Duration,
frequency
One semester,
each winter semester
Person in charge of the module's outline Prof. Dr. Thorsten Papenbrock

Contents

  • Actor-, service-, batch-, and stream-based distributed programming.
  • Big Data systems
  • Data serialization and message passing
  • Data structures for distributed data management
  • OSI model and communication protocols
  • Data partitioning and replication
  • Consistency and reconciliation protocols
  • Time synchronization and change propagation
  • Distributed request scheduling

Qualification Goals

The students

  • can name challenges in the construction of distributed systems,
  • can explain reactive, distributed programming (actor programming),
  • can explain techniques for digital representation and serialization of data (encoding),
  • can describe procedures for the functioning of networks (communication),
  • can specify standards for structuring and querying data (Data Models and Query Languages),
  • can explain algorithms and data structures for distributed working with data (Storage and Retrieval),
  • can describe techniques for ensuring reliability and availability (Replication and Partitioning),
  • can describe techniques for ensuring consistency and consensus,
  • can understand algorithms for distributed transaction management (Transactions),
  • can explain frameworks for distributed batch processing of data-intensive tasks (Batch Processing) and for distributed data stream processing (Stream Processing),
  • can explain the functionality of distributed database management systems (Distributed DBMSs),
  • can explain the basics of distributed query processing (Distributed Query Optimization),
  • are able to apply this knowledge practically in the programming of data-intensive, distributed algorithms,
  • are able to apply scientific working methods when independently recognizing, formulating and solving problems,
  • are able to speak freely about scientific content, both in front of an audience and in a discussion.

Prerequisites

None. The competences taught in the following modules are recommended: either Algorithms and Data Structures or Practical Informatics II: Data Structures and Algorithms for Pre-Service-Teachers, Database Systems.


Applicability

Module imported from M.Sc. Computer Science.

It can be attended at FB12 in study program(s)

  • B.Sc. Data Science
  • B.Sc. Computer Science
  • M.Sc. Data Science
  • M.Sc. Computer Science
  • M.Sc. Mathematics
  • M.Sc. Business Informatics
  • M.Sc. Business Mathematics

When studying M.Sc. Business Mathematics, this module can be attended in the study area Free Compulsory Elective Modules.


Recommended Reading

  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Martin Kleppmann, 2017, 978-1449373320
  • Distributed Systems, Maarten van Steen and Andrew S. Tanenbaum, 2017, 978-1543057386
  • Principles of Distributed Database Systems, M. Tamer Özsu and Patrick Valduriez, 2011, 978-1441988331
  • Web-Scale Data Management for the Cloud, Wolfgang Lehner and Kai-Uwe Sattler, 2013, 1489997717
  • Introduction to Parallel Computing, Zbigniew J. Czech, 2017, 978-1107174399
  • Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services, Brendan Burns, 2017, 978-1491983645
  • Spark: Big Data Cluster Computing in Production, Ilya Ganelin and Ema Orhian and Kai Sasaki and Brennon York, 2016, 978-1119254010
  • Reactive Messaging Patterns with the Actor Model, Vaughn Vernon, 2015, 978-0133846836
  • Mining Massive Datasets, Jure Leskovec and Anand Rajaraman and Jeffrey David Ullman, 2014, 978-1107077232
  • Algorithmische Geometrie, Rolf Klein, 2005, 978-3540209560



Please note:

This page describes a module according to the latest valid module guide in Winter semester 2025/26. Most rules valid for a module are not covered by the examination regulations and can therefore be updated on a semesterly basis. The following versions are available in the online module guide:

  • Winter 2016/17 (no corresponding element)
  • Summer 2018 (no corresponding element)
  • Winter 2018/19 (no corresponding element)
  • Winter 2019/20 (no corresponding element)
  • Winter 2020/21 (no corresponding element)
  • Summer 2021 (no corresponding element)
  • Winter 2021/22 (no corresponding element)
  • Winter 2022/23 (no corresponding element)
  • Winter 2023/24 (no corresponding element)
  • Winter 2025/26

The module guide contains all modules, independent of the current event offer. Please compare the current course catalogue in Marvin.

The information in this online module guide was created automatically. Legally binding is only the information in the examination regulations (Prüfungsordnung). If you notice any discrepancies or errors, we would be grateful for any advice.