+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Freitag, 12. Juli 2019, 10.00 Uhr
Ort: Informatikzentrum, E3, Raum 9222
Referent: Dipl.-Inform. Malte Nuhn
Thema: Unsupervised Training with Applications in Natural Language
Processing//
Abstract:
The state-of-the-art algorithms for various natural language processing
tasks require large amounts of labeled training data. At the same time,
obtaining labeled data of high quality is often the most costly step in
setting up natural language processing systems.Opposed to this,
unlabeled data is much cheaper to obtain and available in larger
amounts.Currently, only few training algorithms make use of unlabeled
data. In practice, training with only unlabeled data is not performed at
all. In this thesis, we study how unlabeled data can be used to train a
variety of models used in natural language processing. In particular, we
study models applicable to solving substitution ciphers, spelling
correction, and machine translation. This thesis lays the groundwork for
unsupervised training by presenting and analyzing the corresponding
models and unsupervised training problems in a consistent manner.We show
that the unsupervised training problem that occurs when breaking
one-to-one substitution ciphers is equivalent to the quadratic
assignment problem (QAP) if a bigram language model is incorporated and
therefore NP-hard. Based on this analysis, we present an effective
algorithm for unsupervised training for deterministic substitutions. In
the case of English one-to-one substitution ciphers, we show that our
novel algorithm achieves results close to human performance, as
presented in [Shannon 49].
Also, with this algorithm, we present, to the best of our knowledge, the
first automatic decipherment of the second part of the Beale
ciphers.Further, for the task of spelling correction, we work out the
details of the EM algorithm [Dempster & Laird + 77] and experimentally
show that the error rates achieved using purely unsupervised training
reach those of supervised training.For handling large vocabularies, we
introduce a novel model initialization as well as multiple training
procedures that significantly speed up training without hurting the
performance of the resulting models significantly.By incorporating an
alignment model, we further extend this model such that it can be
applied to the task of machine translation. We show that the true
lexical and alignment model parameters can be learned without any
labeled data: We experimentally show that the corresponding likelihood
function attains its maximum for the true model parameters if a
sufficient amount of unlabeled data is available. Further, for the
problem of spelling correction with symbol substitutions and local
swaps, we also show experimentally that the performance achieved with
purely unsupervised EM training reaches that of supervised training.
Finally, using the methods developed in this thesis, we present results
on an unsupervised training task for machine translation with a ten
times larger vocabulary than that of tasks investigated in previous work.
Es laden ein: die Dozentinnen und Dozenten der Informatik
_______________________________________________
--
--
Stephanie Jansen
Faculty of Mathematics, Computer Science and Natural Sciences
HLTPR - Human Language Technology and Pattern Recognition
RWTH Aachen University
Ahornstraße 55
D-52074 Aachen
Tel. Frau Jansen: +49 241 80-216 06
Tel. Frau Andersen: +49 241 80-216 01
Fax: +49 241 80-22219
sek(a)i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
Tel: +49 241 80-216 01/06
Fax: +49 241 80-22219
sek(a)i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Montag, 13. Juli 2020, 16:00 Uhr
Zoom:
<https://rwth.zoom.us/j/95676455814?pwd=NUEvVnFVNEVLSjFsTWY2OEw2VWhrdz09>
https://rwth.zoom.us/j/95676455814?pwd=NUEvVnFVNEVLSjFsTWY2OEw2VWhrdz09
Meeting-ID: 956 7645 5814
Passwort: 302988
Referentin: M.Eng. Rihan Hai
Lehrstuhl Informatik 5
Thema: Data integration and Metadata Management in Data Lakes
Abstract:
Although big data has been discussed for some years, it still has many
research challenges, such as the variety of data. Non-integrated data
management systems with heterogeneous schemas, query languages, and data
models result in information silos. As traditional 'schema-on-write'
approaches such as data warehouses cannot solve the challenges to
efficiently integrate, access, and query the information silos, data lake
systems have been proposed as a solution to this problem. Data lakes are
repositories storing raw data in its original format and providing a common
access interface.
In this thesis, we present a comprehensive and flexible data lake
architecture and the prototype system Constance. First, we propose a native
mapping representation to capture the hierarchical structures of nested
mappings and efficient mapping generation algorithms. Second, to provide a
unified querying interface, we design a novel query rewriting engine that
combines logical methods for data integration based on declarative mappings
with the big data processing system Apache Spark. Third, we also study the
formalism of the generated schema mappings as dependencies. Our algorithmic
approach transforms schema mappings expressed in second-order logic to their
logically equivalent first-order forms. Finally, we introduce
clustering-based algorithms to discover relaxed functional dependencies,
which enrich the metadata and improve data quality in the data lake.
Es laden ein: die Dozentinnen und Dozenten der Informatik
_______________________________
Leany Maaßen
RWTH Aachen University
Lehrstuhl Informatik 5, LuFG Informatik 5
Prof. Dr. Stefan Decker, Prof. Dr. Matthias Jarke,
Prof. Gerhard Lakemeyer Ph.D.
Ahornstrasse 55
D-52074 Aachen
Tel: 0241-80-21509
Fax: 0241-80-22321
E-Mail: maassen(a)dbis.rwth-aachen.de
+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Dienstag, 30. Juni 2020, 14.00-15.00 Uhr
Zoom:
<https://rwth.zoom.us/j/95676455814?pwd=NUEvVnFVNEVLSjFsTWY2OEw2VWhrdz09>
https://rwth.zoom.us/j/95676455814?pwd=NUEvVnFVNEVLSjFsTWY2OEw2VWhrdz09
Meeting-ID: 956 7645 5814
Passwort: 302988
Referent: Dipl.-Kfm. Markus Beutel
Thema: End-to-End-Integration von komplementären Mobilitätsdienstleistungen
durch unternehmensübergreifende Anbieterkooperation
Abstract:
Reisenden steht heutzutage eine Vielzahl unterschiedlicher Verkehrsmodi zur
Verfügung. Dabei können sich viele Fortbewegungsmittel aufgrund
individueller Charakteristika gegenseitig ergänzen, anstatt sich zu
substituieren. Durch eine zumindest in Teilen segmentierte und
anbieterspezifische Bereitstellung von Verkehrsdienstleistungen können
beispielsweise ökonomische und ökologische Ineffizienzen entstehen. Im
Zentrum einer vollständigen Digitalisierung und Integration heterogener
Mobilitätsdienstleistungen sollte daher der ganzheitliche
Dienstleistungsprozess, über die gesamte Servicekette hinweg, in den
Mittelpunkt gestellt werden.
Das übergeordnete Ziel dieser Arbeit besteht in der Erforschung einer
integrierten Bereitstellung sich ergänzender Mobilitätsdienstleistungen,
über Unternehmensgrenzen hinweg. Ausgangspunkt dieser Arbeit bildet die
Untersuchung einer spezifischen Mobilitätsplattform im Hinblick auf
verschiedene Integrationsfaktoren. Auf Basis einer umfassenden Analyse von
Mobilitätsplattformen und in Verbindung mit der Beschreibung eines
organisatorischen Rollenmodells werden daraufhin mögliche
Anbieterkooperationsszenarien beschrieben. Um die unternehmensübergreifende
Integration auf Prozessebene zu betrachten, wird schließlich ein Ansatz zur
Erweiterung eines Softwarewerkzeugs zur Fusion von Geschäftsprozessmodellen
evaluiert.
Es laden ein: die Dozentinnen und Dozenten der Informatik
_______________________________
Leany Maaßen
RWTH Aachen University
Lehrstuhl Informatik 5, LuFG Informatik 5
Prof. Dr. Stefan Decker, Prof. Dr. Matthias Jarke,
Prof. Gerhard Lakemeyer Ph.D.
Ahornstrasse 55
D-52074 Aachen
Tel: 0241-80-21509
Fax: 0241-80-22321
E-Mail: maassen(a)dbis.rwth-aachen.de