+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Freitag, 12. Juli 2019, 10.00 Uhr
Ort: Informatikzentrum, E3, Raum 9222
Referent: Dipl.-Inform. Malte Nuhn
Thema: Unsupervised Training with Applications in Natural Language
Processing//
Abstract:
The state-of-the-art algorithms for various natural language processing
tasks require large amounts of labeled training data. At the same time,
obtaining labeled data of high quality is often the most costly step in
setting up natural language processing systems.Opposed to this,
unlabeled data is much cheaper to obtain and available in larger
amounts.Currently, only few training algorithms make use of unlabeled
data. In practice, training with only unlabeled data is not performed at
all. In this thesis, we study how unlabeled data can be used to train a
variety of models used in natural language processing. In particular, we
study models applicable to solving substitution ciphers, spelling
correction, and machine translation. This thesis lays the groundwork for
unsupervised training by presenting and analyzing the corresponding
models and unsupervised training problems in a consistent manner.We show
that the unsupervised training problem that occurs when breaking
one-to-one substitution ciphers is equivalent to the quadratic
assignment problem (QAP) if a bigram language model is incorporated and
therefore NP-hard. Based on this analysis, we present an effective
algorithm for unsupervised training for deterministic substitutions. In
the case of English one-to-one substitution ciphers, we show that our
novel algorithm achieves results close to human performance, as
presented in [Shannon 49].
Also, with this algorithm, we present, to the best of our knowledge, the
first automatic decipherment of the second part of the Beale
ciphers.Further, for the task of spelling correction, we work out the
details of the EM algorithm [Dempster & Laird + 77] and experimentally
show that the error rates achieved using purely unsupervised training
reach those of supervised training.For handling large vocabularies, we
introduce a novel model initialization as well as multiple training
procedures that significantly speed up training without hurting the
performance of the resulting models significantly.By incorporating an
alignment model, we further extend this model such that it can be
applied to the task of machine translation. We show that the true
lexical and alignment model parameters can be learned without any
labeled data: We experimentally show that the corresponding likelihood
function attains its maximum for the true model parameters if a
sufficient amount of unlabeled data is available. Further, for the
problem of spelling correction with symbol substitutions and local
swaps, we also show experimentally that the performance achieved with
purely unsupervised EM training reaches that of supervised training.
Finally, using the methods developed in this thesis, we present results
on an unsupervised training task for machine translation with a ten
times larger vocabulary than that of tasks investigated in previous work.
Es laden ein: die Dozentinnen und Dozenten der Informatik
_______________________________________________
--
--
Stephanie Jansen
Faculty of Mathematics, Computer Science and Natural Sciences
HLTPR - Human Language Technology and Pattern Recognition
RWTH Aachen University
Ahornstraße 55
D-52074 Aachen
Tel. Frau Jansen: +49 241 80-216 06
Tel. Frau Andersen: +49 241 80-216 01
Fax: +49 241 80-22219
sek(a)i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
Tel: +49 241 80-216 01/06
Fax: +49 241 80-22219
sek(a)i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
***********************************************************************
Zeit: Dienstag, 03. November 2020, 12:30 Uhr
Zoom:
https://rwth.zoom.us/j/92659511051?pwd=TmVibnlCNUdYZzlZSW52eCs0V2RjQT09
Referent: Dr. Andreas Wortmann
Thema: Model-Driven Architecture and Behavior of Cyber-Physical Systems
Vorstellung des Habilitationsvorhabens.
Abstract:
Systems engineering has produced striking results in many domains.
Researchers and practitioners have devised concepts, methods, tools
that autonomously move vehicles, enable doctors to conduct remote
surgeries across continents, and sent astronauts into space. All of
these cyber-physical systems are driven by software whose complexity
increases tremendously. Overcompensating this growth in software and
systems complexity demands novel methods that increase the
abstraction in systems engineering, advance automation, and
facilitate the integration of domain expert solutions. Model-based
systems engineering aims to address this complexity by advancing
systems engineering from its contemporary document-based processes to
sophisticated model-based processes. In the latter, abstract models
serve as means for systems design, communication, documentation, and
basis for implementation. But to overcompensate the growth in
complexity, using models as secondary artifacts is insufficient.
Comprehensive research in software engineering has led to recognizing
that model-driven processes, in which models are the primary
engineering artifacts, can significantly improve abstraction,
automation, and domain-specific modeling to address the increasing
complexity in systems engineering. Yet, model-based systems
engineering focuses on informal models that are hardly accessible to
meaningful automation and overly generic.
This thesis summarizes 14 selected publications of a research program
towards a model-driven systems engineering that operates on
domain-specific modeling languages, supports sophisticated modeling
methods, and enables the systematic operation of cyber-physical
systems. The results of this research program cover four substantial
challenges towards the model-driven engineering of cyber-physical
systems: First, it contributes to understanding the use of models and
modeling languages for cyber-physical systems through two
comprehensive literature studies on modeling for cyber-physical
systems in Industry 4.0 and mobile robotics. The studies surveyed
over 3.000 publications each and produced insights into requirements
for the efficient model-driven engineering and operations of
cyber-physical systems in both domains. Second, it conduces novel
foundations for the efficient engineering of domain-specific modeling
languages based on the requirements identified in both studies. These
foundations introduce innovative notions of language components and
their composition upon which families of domain-specific modeling
languages can be created systematically efficiently. Third, it
leverages these foundations to produce modeling languages to describe
functional architectures and geometric-physical architectures of
cyber-physical systems that support unprecedented automated modeling
methods, including tracing, decomposition, and semantic differencing,
to facilitate modeling, maintaining, and evolving these
architectures. Fourth, it exploits the novel language engineering
foundations and the unprecedented automated modeling methods to
alleviate the systematic operation of cyber-physical systems with
digital twins that represent and optimize the observed systems.
Hence, this research program forges a bridge from observations on
modeling cyber-physical systems, over software language engineering
and modeling methods, to their operation that supports researchers
and practitioners to advance from the contemporary document-based
engineering of cyber-physical systems to their systematic
model-driven engineering.
Vita:
Andreas Wortmann is a tenured research associate at the Chair for
Software Engineering at RWTH Aachen University. There, he leads a
team on model-driven systems engineering, coordinates a workstream of
the Internet of Production excellence cluster, and advises the Center
for Systems Engineering. He conducts research in model-driven
software and systems engineering, formal methods in software
engineering, and software language engineering, which is documented
in over 70 publications. Moreover, he has chaired and organized
various international conferences and workshops, serves on the board
of the European Association for Programming Languages and Systems
(EAPLS), and co-chairs the working group on model-based systems
engineering of the German INCOSE chapter GfSE.
Es laden ein: die Dozentinnen und Dozenten der Informatik
--
----------------------------------------------------------------
Prof. Dr. Bernhard Rumpe | http://www.se-rwth.de
Lehrstuhl Software Engineering | Informatik 3
Ahornstr. 55, 52074 Aachen, Germany | RWTH Aachen University
Phone ++49 241 80-21301 / Fax -22218 |
+**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
+**********************************************************************
Zeit: Dienstag, 22. September 2020, 16.00 Uhr
Ort: Videokonferenz (Zoom-Meeting, Informationen siehe unten)
Referent: Markus Hoehnerbach M.Sc.
High-Performance and Automatic Computing
Thema: A Framework for the Vectorization of Molecular Dynamics Kernels
Abstract:
We introduce a domain-specific language (DSL) for many-body potentials,
which are used in molecular dynamics (MD) simulations in the area of
materials science. We also introduce a compiler to translate the DSL
into high-performance code suitable for modern supercomputers.
We begin by studying ways to speedup up potentials on supercomputers
using two case studies: The Tersoff and the AIREBO potentials. In both
case studies, we identify a number of optimizations, both
domain-specific and general, to achieve speedups of up to 5x; we also
introduce a method to keep the resulting code performance portable.
During the AIREBO case study, we also discover that the existing code
contains a number of errors. This experience motivates us to include the
derivation step, the most error-prone step in manual optimization, in
our automation effort.
After having identified beneficial optimization techniques, we create a
``potential compiler'', short PotC, which generates fully-usable
performance-portable potential implementations from specifications
written in our DSL. DSL code is significantly shorter (20x to 30x) than
a manual code, reducing both manual work and opportunities to introduce
bugs.
We present performance results on five different platforms: Three CPU
platforms (Broadwell, Knights Landing, and Skylake) and two GPU
platforms (Pascal and Volta). While the performance in some cases
remains far below that of hand-written code, it also manages to match or
exceed manually written implementations in other cases. For these cases,
we achieve speedups of up to 9x compared to non-vectorized code.
Es laden ein: die Dozentinnen und Dozenten der Informatik
********************************
Thema: Dissertation M. Höhnerbach
Uhrzeit: 22.Sep.2020 03:45 PM Paris
Zoom-Meeting beitreten
https://rwth.zoom.us/j/98920238040?pwd=Q3F1OStEcklpcDV1Vk5IWEp0cFFEQT09
Meeting-ID: 989 2023 8040
Kenncode: 668020