+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Montag, 13. Juli 2020, 16:00 Uhr
Zoom: https://rwth.zoom.us/j/95676455814?pwd=NUEvVnFVNEVLSjFsTWY2OEw2VWhrdz09
Meeting-ID: 956 7645 5814
Passwort: 302988
Referentin: M.Eng. Rihan Hai
Lehrstuhl Informatik 5
Thema: Data integration and Metadata Management in Data Lakes
Abstract:
Although big data has been discussed for some years, it still has many research challenges, such as the variety of data. Non-integrated data management systems with heterogeneous schemas, query languages, and data models result in information silos. As traditional 'schema-on-write' approaches such as data warehouses cannot solve the challenges to efficiently integrate, access, and query the information silos, data lake systems have been proposed as a solution to this problem. Data lakes are repositories storing raw data in its original format and providing a common access interface.
In this thesis, we present a comprehensive and flexible data lake architecture and the prototype system Constance. First, we propose a native mapping representation to capture the hierarchical structures of nested mappings and efficient mapping generation algorithms. Second, to provide a unified querying interface, we design a novel query rewriting engine that combines logical methods for data integration based on declarative mappings with the big data processing system Apache Spark. Third, we also study the formalism of the generated schema mappings as dependencies. Our algorithmic approach transforms schema mappings expressed in second-order logic to their logically equivalent first-order forms. Finally, we introduce clustering-based algorithms to discover relaxed functional dependencies, which enrich the metadata and improve data quality in the data lake.
Es laden ein: die Dozentinnen und Dozenten der Informatik
_______________________________
Leany Maaßen
RWTH Aachen University
Lehrstuhl Informatik 5, LuFG Informatik 5
Prof. Dr. Stefan Decker, Prof. Dr. Matthias Jarke,
Prof. Gerhard Lakemeyer Ph.D.
Ahornstrasse 55
D-52074 Aachen
Tel: 0241-80-21509
Fax: 0241-80-22321
E-Mail: maassen@dbis.rwth-aachen.de