+**********************************************************************
*
*
*                          Einladung
*
*
*
*                     Informatik-Oberseminar
*
*
*
+**********************************************************************
 
Zeit:        Montag, 13. Juli 2020, 16:00 Uhr

Zoom:     https://rwth.zoom.us/j/95676455814?pwd=NUEvVnFVNEVLSjFsTWY2OEw2VWhrdz09

 

Meeting-ID:    956 7645 5814

Passwort:        302988

 
 
Referentin:  M.Eng. Rihan Hai
                    Lehrstuhl Informatik 5
           
Thema: Data integration and Metadata Management in Data Lakes
 
 
Abstract:

Although big data has been discussed for some years, it still has many research challenges, such as the variety of data. Non-integrated data management systems with heterogeneous schemas, query languages, and data models result in information silos. As traditional 'schema-on-write' approaches such as data warehouses cannot solve the challenges to efficiently integrate, access, and query the information silos, data lake systems have been proposed as a solution to this problem. Data lakes are repositories storing raw data in its original format and providing a common access interface.

In this thesis, we present a comprehensive and flexible data lake architecture and the prototype system Constance. First, we propose a native mapping representation to capture the hierarchical structures of nested mappings and efficient mapping generation algorithms. Second, to provide a unified querying interface, we design a novel query rewriting engine that combines logical methods for data integration based on declarative mappings with the big data processing system Apache Spark. Third, we also study the formalism of the generated schema mappings as dependencies. Our algorithmic approach transforms schema mappings expressed in second-order logic to their logically equivalent first-order forms. Finally, we introduce clustering-based algorithms to discover relaxed functional dependencies, which enrich the metadata and improve data quality in the data lake.

 

 
 
 
Es laden ein: die Dozentinnen und Dozenten der Informatik

 

 

 

 

_______________________________

Leany Maaßen

RWTH Aachen University

Lehrstuhl Informatik 5, LuFG Informatik 5

Prof. Dr. Stefan Decker, Prof. Dr. Matthias Jarke,

Prof. Gerhard Lakemeyer Ph.D.

Ahornstrasse 55

D-52074 Aachen

 

Tel: 0241-80-21509

Fax: 0241-80-22321

E-Mail: maassen@dbis.rwth-aachen.de