+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Freitag, 12. Juli 2019, 10.00 Uhr
Ort: Informatikzentrum, E3, Raum 9222
Referent: Dipl.-Inform. Malte Nuhn
Thema: Unsupervised Training with Applications in Natural Language
Processing//
Abstract:
The state-of-the-art algorithms for various natural language processing
tasks require large amounts of labeled training data. At the same time,
obtaining labeled data of high quality is often the most costly step in
setting up natural language processing systems.Opposed to this,
unlabeled data is much cheaper to obtain and available in larger
amounts.Currently, only few training algorithms make use of unlabeled
data. In practice, training with only unlabeled data is not performed at
all. In this thesis, we study how unlabeled data can be used to train a
variety of models used in natural language processing. In particular, we
study models applicable to solving substitution ciphers, spelling
correction, and machine translation. This thesis lays the groundwork for
unsupervised training by presenting and analyzing the corresponding
models and unsupervised training problems in a consistent manner.We show
that the unsupervised training problem that occurs when breaking
one-to-one substitution ciphers is equivalent to the quadratic
assignment problem (QAP) if a bigram language model is incorporated and
therefore NP-hard. Based on this analysis, we present an effective
algorithm for unsupervised training for deterministic substitutions. In
the case of English one-to-one substitution ciphers, we show that our
novel algorithm achieves results close to human performance, as
presented in [Shannon 49].
Also, with this algorithm, we present, to the best of our knowledge, the
first automatic decipherment of the second part of the Beale
ciphers.Further, for the task of spelling correction, we work out the
details of the EM algorithm [Dempster & Laird + 77] and experimentally
show that the error rates achieved using purely unsupervised training
reach those of supervised training.For handling large vocabularies, we
introduce a novel model initialization as well as multiple training
procedures that significantly speed up training without hurting the
performance of the resulting models significantly.By incorporating an
alignment model, we further extend this model such that it can be
applied to the task of machine translation. We show that the true
lexical and alignment model parameters can be learned without any
labeled data: We experimentally show that the corresponding likelihood
function attains its maximum for the true model parameters if a
sufficient amount of unlabeled data is available. Further, for the
problem of spelling correction with symbol substitutions and local
swaps, we also show experimentally that the performance achieved with
purely unsupervised EM training reaches that of supervised training.
Finally, using the methods developed in this thesis, we present results
on an unsupervised training task for machine translation with a ten
times larger vocabulary than that of tasks investigated in previous work.
Es laden ein: die Dozentinnen und Dozenten der Informatik
_______________________________________________
--
--
Stephanie Jansen
Faculty of Mathematics, Computer Science and Natural Sciences
HLTPR - Human Language Technology and Pattern Recognition
RWTH Aachen University
Ahornstraße 55
D-52074 Aachen
Tel. Frau Jansen: +49 241 80-216 06
Tel. Frau Andersen: +49 241 80-216 01
Fax: +49 241 80-22219
sek(a)i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
Tel: +49 241 80-216 01/06
Fax: +49 241 80-22219
sek(a)i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Montag, 15. Juli 2019, 16.00 Uhr
Ort: Seminarraum 001, IT Center, Kopernikusstr. 6
Referent: Dipl.-Inform. Sebastian Pick
Thema: Interactive Data Annotation for Virtual Reality Applications
Abstract:
Virtual Reality (VR) offers the unique possibility to explore data sets in a highly immersive
and interactive fashion. A key aspect of this exploration process is the acquisition
of insights and findings concerning the involved data. In this context, it becomes the
task of so-called annotation systems to enable users to preserve said findings and review
them at a later time. Even though the design of VR-based annotation systems has been
a research topic for many years, crucial challenges remain unresolved. Many interaction
solutions focus on isolated issues and are rather limited in scope. Additionally, they
usually forgo holistic workflow designs that are vital for covering all relevant annotation
operations in a consistent fashion. Similarly, available annotation presentation techniques
are usually inappropriate for use within immersive virtual environments (IVEs)
or they perform sub-optimally. Lastly, the design of annotation systems is usually further
complicated if they are to be reused in context of other usage scenarios that impose
a different and varying set of requirements on them.
In this thesis, I set out to address the aforementioned issues. To this end, I present an
annotation framework that was specifically designed to be configurable and extensible in
accordance to the changing requirements of different application scenarios. It is grounded
in a specialized data model that facilitates the integration not only of required data types,
which are used to describe annotations, but also techniques to capture and present them.
I employ this model to develop a flexible solution to the design of holistic interaction
workflows. My design approach covers all essential operations, including the creation,
modication, and review of arbitrary types of annotations from within IVEs. To provide
a set of ready-to-use interaction techniques to acquire annotation contents, I identify the
most relevant data types and devise appropriate interaction concepts. Similarly, I present
a generalized two-tier presentation strategy that facilitates the access to and review of
arbitrary annotation contents. While the tier one representation provides basic access
to a selection of contents in a customizable form, the tier two representation allows for
access to all contents in a generalized manner. I furthermore introduce a novel automated
annotation layout approach that prevents occlusions of and by annotations to maintain
visual access to them and the original data. In order to determine the viability of my
approaches, I cover a series of technical analyses and user studies. They are rounded
off by a presentation of concrete application examples that utilize varying subsets of
techniques to demonstrate the usefulness and flexibility of my framework.
Es laden ein: die Dozentinnen und Dozenten der Informatik
+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Kolloquium
*
*
*
+**********************************************************************
Zeit: Freitag, 5. Juli 2019, 11.30 Uhr
Ort: Informatikzentrum, E3, Raum 9007
Referent: Prof. Richard Fujimoto
Georgia Institute of Technology, Atlanta
Thema: Power Consumption in Parallel and Distributed Simulations
Abstract:
Energy and power consumption are important concerns for many computing
systems ranging from battery-powered embedded and mobile devices to
supercomputers and data centers. Although this issue has been
extensively studied at the hardware and operating system levels, thus
far only a limited amount of work has considered power consumption in
parallel and distributed simulations. This presentation addresses this
topic and discusses a variety of options to reduce power consumption.
Further, parallel and distributed discrete event simulations require a
synchronization algorithm to ensure the concurrent execution of the
program produces the same results as a sequential execution. The energy
consumed by synchronization algorithms for distributed simulation
programs is considered, and experimental data presented highlighting
that a significant portion of the energy consumed by distributed
simulations can be attributed to synchronization. The concept of zero
energy synchronization is introduced and techniques to reduce the energy
consumed by synchronization algorithms are presented and evaluated. The
presentation highlights that many open areas of research concerning
power and energy consumption of distributed simulations remain to be
explored.
Speaker Biography:
Richard Fujimoto is a Regents’ Professor in the School of Computational
Science and Engineering at the Georgia Institute of Technology. He
received the Ph.D. degree from the University of California at Berkeley
in 1983. He has been an active researcher in the parallel and
distributed simulation field since that time. He led the definition of
the time management services in the High Level Architecture for Modeling
and Simulation, IEEE standard 1516. His publications include seven award
winning papers and he has received the ACM Distinguished Contributions
in Modeling and Simulation Award. He has played various leadership roles
in a variety of modeling and simulation conferences, journals and other
activities. He was the founding chair of the School of Computational
Science and Engineering (CSE) at Georgia Tech and established it as an
academic unit devoted to the study of computer-based models of natural
and engineered systems. In that role he led the creation of the
interdisciplinary PhD and MS degree programs in CSE as well as two
undergraduate minors.
Es laden ein: die Dozentinnen und Dozenten der Informatik