+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+****************************************************************
Zeit: Montag, 8. April 2024, 13.30 Uhr
Ort: Seminarraum 3, Kopernikusstraße 6
Referent: Benedikt Heinrichs M.Sc.
IT Center, RWTH Aachen University
Thema: Asynchronous Tracking and Description of Research Data Changes in Distributed Systems with Interoperable Metadata
Abstract:
With the digital revolution, the way to approach research has fundamentally changed.
Suddenly, research processes created digital research data that needed to be stored.
Initially, no standards for this existed, so practices diverged wildly.
Consequently, data was produced that was not findable without a management system.
For this reason, movements entered the picture intending to standardize these processes and define how research data should be managed.
One recommendation is the FAIR Guiding Principles, which describe that research data should be findable, accessible, interoperable, and reusable.
While these principles have set goals, no implementation guideline is provided since the different research areas are too diverse.
Therefore, research data management (RDM) teams around the globe have created numerous implementations.
Some of them are platforms like Coscine, which can manage research data and try to adhere to parts of the FAIR principles.
However, such platforms face the issue that researchers want to store their research data with an enterprise-ready and openly accessible storage provider.
Therefore, research data often does not move through these platforms but directly through the storage providers.
This circumstance contradicts the aim of following the FAIR principles because the platforms cannot account for the research data movement and miss critical provenance information.
The presented thesis aims to close that gap by providing a method to calculate the missing provenance information after changes occur.
This so-called asynchronous data provenance is produced by comparing representations of research data.
If the representations have changed, a new version or variant of the research data has likely been created.
Representations can range from a generated hash to interoperable metadata about the research data.
This interoperable metadata is created by running a pipeline that receives research data and extracts valuable information about its content.
This information is annotated as interoperable metadata by following existing application profiles and ontologies.
Interoperable metadata can be used to compute the similarity of research data with a method called FSS Jaccard.
The created methods are integrated into a standards-based RDM system (RDMS), defined in this thesis, to show their applicability.
For this standards-based RDMS, Coscine is used as a use case.
Thereby, this thesis presents a method that can provide additional information about research data and close the presented gap for any standards-based RDMS.
By using this method, RDM teams can come closer to supporting the implementation of the FAIR principles and improving the processes for researchers.
Es laden ein: die Dozentinnen und Dozenten der Informatik