Main content

CS 671 — Data Integration
(dt. Datenintegration)

Level, degree of commitment Specialization module, compulsory elective module
Forms of teaching and learning,
workload
Lecture (2 SWS), recitation class (2 SWS),
180 hours (60 h attendance, 120 h private study)
Credit points,
formal requirements
6 CP
Course requirement(s): Successful completion of at least 50 percent of the points from the weekly exercises as well as at least 2 presentations of the tasks.
Examination type: Written or oral examination (individual examination)
Language,
Grading
English,
The grading is done with 0 to 15 points according to the examination regulations for the degree program M.Sc. Data Science.
Duration,
frequency
One semester,
each summer semester
Person in charge of the module's outline Prof. Dr. Thorsten Papenbrock, Prof. Dr. Bernhard Seeger

Contents

  • Data models and query languages
  • Data extraction and preparation
  • Similarity measures for simple and complex data types
  • Metadata and dependency search
  • Schema transformation and mapping
  • Data transformation and cleaning
  • Entity search and resolution
  • Architectures of integrated information systems
  • Practical exercise of data integration

Qualification Goals

Translation is missing, sorry. German original:

Die Studierenden

  • können grundlegende Ähnlichkeitsmaße für einfache und komplexe Datentypen (Data Matching) beschreiben,
  • können Verfahren zur Metadatenextraktion und zur Bestimmung von Datenabhängigkeiten (Data Profiling) erläutern,
  • können Techniken zur Abbildung, Integration und Transformation von Schemata und deren Daten (Schema Alignment) erläutern,
  • können Algorithmen zur Erkennung und Auflösung von Duplikaten und anderer Datenfehler (Entity Resolution) erklären und einsetzen,
  • können Architekturen und Funktionsweisen moderner, integrierter Informationssysteme (Integrated Information Systems) erklären,
  • können mit heterogenen, verunreinigten Daten und deren Integration umgehen,
  • sind in der Lage, wissenschaftliche Arbeitsweisen beim eigenständigen Erkennen, Formulieren und Lösen von Problemen anzuwenden,
  • sind in der Lage, über wissenschaftliche Inhalte frei zu sprechen, sowohl vor einem Publikum als auch in einer Diskussion.

Prerequisites

None. The competences taught in the following modules are recommended: either Algorithms and Data Structures or Practical Informatics II: Data Structures and Algorithms for Pre-Service-Teachers, Database Systems.


Applicability

Module imported from M.Sc. Data Science.

It can be attended at FB12 in study program(s)

  • B.Sc. Data Science
  • B.Sc. Computer Science
  • M.Sc. Data Science
  • M.Sc. Computer Science
  • M.Sc. Mathematics
  • M.Sc. Business Informatics
  • M.Sc. Business Mathematics
  • LAaG Computer Science

When studying LAaG Computer Science, this module can be attended in the study area Specialization Modules.


Recommended Reading

  • Ulf Leser, Felix Naumann: Informationsintegration (dpunkt, 2006)
  • AnHai Doan, Alon Halevy, Zachary Ives: Principles of Data Integration (Morgan Kaufmann, 2012)
  • Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock: Data Profiling Synthesis Lectures on Data Management (Morgan & Claypool, 2018)
  • George Papadakis, Ekaterini Ioannou, Emanouil Thanos, Themis Palpanas: The Four Generations of Entity Resolution (Morgan & Claypool, 2021)



Please note:

This page describes a module according to the latest valid module guide in Winter semester 2025/26. Most rules valid for a module are not covered by the examination regulations and can therefore be updated on a semesterly basis. The following versions are available in the online module guide:

The module guide contains all modules, independent of the current event offer. Please compare the current course catalogue in Marvin.

The information in this online module guide was created automatically. Legally binding is only the information in the examination regulations (Prüfungsordnung). If you notice any discrepancies or errors, we would be grateful for any advice.