Main content
CS 671 — Data Integration
(dt. Datenintegration)
Level, degree of commitment | Specialization module, compulsory elective module |
Forms of teaching and learning, workload |
Lecture (2 SWS), recitation class (2 SWS), 180 hours (60 h attendance, 120 h private study) |
Credit points, formal requirements |
6 CP Course requirement(s): Successful completion of at least 50 percent of the points from the weekly exercises as well as at least 2 presentations of the tasks. Examination type: Written or oral examination (individual examination) |
Language, Grading |
English,The grading is done with 0 to 15 points according to the examination regulations for the degree program M.Sc. Data Science. |
Duration, frequency |
One semester, each summer semester |
Person in charge of the module's outline | Prof. Dr. Thorsten Papenbrock, Prof. Dr. Bernhard Seeger |
Contents
- Data models and query languages
- Data extraction and preparation
- Similarity measures for simple and complex data types
- Metadata and dependency search
- Schema transformation and mapping
- Data transformation and cleaning
- Entity search and resolution
- Architectures of integrated information systems
- Practical exercise of data integration
Qualification Goals
Students will
- know basic similarity measures for simple and complex data types (data matching),
- know procedures for metadata extraction and for determining data dependencies (data profiling),
- know techniques for mapping, integrating and transforming schemas and their data (Schema Alignment),
- know algorithms for detecting and resolving duplicates and other data errors (Entity Resolution),
- know architectures and functionalities of modern, integrated information systems (Integrated Information Systems),
- have practical skills in dealing with heterogeneous, contaminated data and their integration,
- are able to apply scientific working methods when independently identifying, formulating and solving problems,
- are able to speak freely about scientific content, both in front of an audience and in a discussion.
Prerequisites
None. The competences taught in the following modules are recommended: either Algorithms and Data Structures or Practical Informatics II: Data Structures and Algorithms for Pre-Service-Teachers, Database Systems.
Applicability
Module imported from M.Sc. Data Science.
It can be attended at FB12 in study program(s)
- B.Sc. Data Science
- B.Sc. Computer Science
- M.Sc. Data Science
- M.Sc. Computer Science
- M.Sc. Mathematics
- M.Sc. Business Informatics
- M.Sc. Business Mathematics
- LAaG Computer Science
When studying M.Sc. Business Informatics, this module can be attended in the study area Compulsory Elective Modules in Computer Science And Mathematics.
Recommended Reading
- Ulf Leser, Felix Naumann: Informationsintegration (dpunkt, 2006)
- AnHai Doan, Alon Halevy, Zachary Ives: Principles of Data Integration (Morgan Kaufmann, 2012)
- Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock: Data Profiling Synthesis Lectures on Data Management (Morgan & Claypool, 2018)
- George Papadakis, Ekaterini Ioannou, Emanouil Thanos, Themis Palpanas: The Four Generations of Entity Resolution (Morgan & Claypool, 2021)
Please note:
This page describes a module according to the latest valid module guide in Winter semester 2023/24. Most rules valid for a module are not covered by the examination regulations and can therefore be updated on a semesterly basis. The following versions are available in the online module guide:
- Winter 2016/17
- Summer 2018
- Winter 2018/19
- Winter 2019/20
- Winter 2020/21
- Summer 2021
- Winter 2021/22
- Winter 2022/23
- Winter 2023/24
The module guide contains all modules, independent of the current event offer. Please compare the current course catalogue in Marvin.
The information in this online module guide was created automatically. Legally binding is only the information in the examination regulations (Prüfungsordnung). If you notice any discrepancies or errors, we would be grateful for any advice.