Main content

CS 671 — Data Integration
(dt. Datenintegration)

Level, degree of commitment Specialization module, compulsory elective module
Forms of teaching and learning,
workload
Lecture (2 SWS), recitation class (2 SWS),
180 hours (60 h attendance, 120 h private study)
Credit points,
formal requirements
6 CP
Course requirement(s): Successful completion of at least 50 percent of the points from the weekly exercises as well as at least 2 presentations of the tasks.
Examination type: Written or oral examination (individual examination)
Language,
Grading
English,
The grading is done with 0 to 15 points according to the examination regulations for the degree program M.Sc. Data Science.
Duration,
frequency
One semester,
each summer semester
Person in charge of the module's outline Prof. Dr. Thorsten Papenbrock, Prof. Dr. Bernhard Seeger

Contents

  • Data models and query languages
  • Data extraction and preparation
  • Similarity measures for simple and complex data types
  • Metadata and dependency search
  • Schema transformation and mapping
  • Data transformation and cleaning
  • Entity search and resolution
  • Architectures of integrated information systems
  • Practical exercise of data integration

Qualification Goals

Students will

  • know basic similarity measures for simple and complex data types (data matching),
  • know procedures for metadata extraction and for determining data dependencies (data profiling),
  • know techniques for mapping, integrating and transforming schemas and their data (Schema Alignment),
  • know algorithms for detecting and resolving duplicates and other data errors (Entity Resolution),
  • know architectures and functionalities of modern, integrated information systems (Integrated Information Systems),
  • have practical skills in dealing with heterogeneous, contaminated data and their integration,
  • are able to apply scientific working methods when independently identifying, formulating and solving problems,
  • are able to speak freely about scientific content, both in front of an audience and in a discussion.

Prerequisites

None. The competences taught in the following modules are recommended: either Algorithms and Data Structures or Practical Informatics II: Data Structures and Algorithms for Pre-Service-Teachers, Database Systems.


Applicability

Module imported from M.Sc. Data Science.

It can be attended at FB12 in study program(s)

  • B.Sc. Data Science
  • B.Sc. Computer Science
  • M.Sc. Data Science
  • M.Sc. Computer Science
  • M.Sc. Mathematics
  • M.Sc. Business Informatics
  • M.Sc. Business Mathematics
  • LAaG Computer Science

When studying M.Sc. Business Informatics, this module can be attended in the study area Compulsory Elective Modules in Computer Science And Mathematics.


Recommended Reading

  • Ulf Leser, Felix Naumann: Informationsintegration (dpunkt, 2006)
  • AnHai Doan, Alon Halevy, Zachary Ives: Principles of Data Integration (Morgan Kaufmann, 2012)
  • Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock: Data Profiling Synthesis Lectures on Data Management (Morgan & Claypool, 2018)
  • George Papadakis, Ekaterini Ioannou, Emanouil Thanos, Themis Palpanas: The Four Generations of Entity Resolution (Morgan & Claypool, 2021)



Please note:

This page describes a module according to the latest valid module guide in Winter semester 2023/24. Most rules valid for a module are not covered by the examination regulations and can therefore be updated on a semesterly basis. The following versions are available in the online module guide:

The module guide contains all modules, independent of the current event offer. Please compare the current course catalogue in Marvin.

The information in this online module guide was created automatically. Legally binding is only the information in the examination regulations (Prüfungsordnung). If you notice any discrepancies or errors, we would be grateful for any advice.