[Informatik-Vortraege] Einladung Informatik-Oberseminar Jan Thorsten, July 22, 2020

30 Jun 2020

      **********************************************************************
*
*
*                          Einladung
*
*
*
*                     Informatik-Oberseminar
*
*
*
***********************************************************************

Zeit: Mittwoch, 22. Juli 2020, 11:00 Uhr

Zoom: 
https://us02web.zoom.us/j/89473373091?pwd=Y2JraVF6aEtUQ3NTVy9leGJhUFFBZz09

Referent: Diplom-Informatiker Jan Thorsten Peter

Thema: An Exploration of Alignment Concepts to Bridge the Gap between 
Phrase-based and Neural Machine Translation

Abstract:

Machine translation, the task of automatically translating text from one 
natural language into another, has seen massive changes in recent years. 
After phrase-based systems represented state of the art for over a 
decade, made advancements in the structure of neural networks and 
computational power, it possible to build neural machine translation 
systems, which first improved and later outperformed phrase-based 
systems. These two approaches have their strength in different areas, 
the well known phrase-based systems allow fast translations on CPU that 
can be easily explained by examining the translation table. In contrast, 
neural machine translation produces more fluent translations and is more 
robust to small changes in the provided input. This thesis aims to 
improve both systems by combining their advantages.

The first part of this thesis focuses on investigating the integration 
of feed-forward neural models into phrase-based systems. Small changes 
in the input of a phrase-based system can turn an event that was seen in 
the training data into an unseen event. Neural network models are by 
design able to handle such cases due to the continuous space 
representation of the input, whereas, phrase-based systems are forced to 
fall back to shorter phrases. This means a loss of knowledge about the 
local context which results in a degradation of, the translation 
quality. We combine the flexibility provided by feed-forward neural 
networks with phrase-based systems while gaining a significant 
improvement over the phrase-based baseline systems. We use feed-forward 
networks since they are conceptually simple and computationally fast. 
Commonly, their structure only utilizes local source and target context. 
Due to this structure, they cannot capture long-distance dependencies. 
We improve the performance of feed-forward neural networks by 
efficiently incorporating long-distance dependencies into their 
structure by using a bag-of-words input.

The second part of the thesis focuses on the pure neural machine 
translation approach using the encoder-decoder model with an attention 
mechanism. This mechanism corresponds indirectly to a soft alignment. At 
each translation step, this model relies only on its previous internal 
state and the current decoder position to compute the attention weights. 
There is no direct feedback from the previously used attention. Inspired 
by hidden Markov models, where the prediction of the currently aligned 
position depends on the previously aligned position, we improve the 
attention model by adding direct feedback from previously used attention 
to improve the overall model performance. Additionally, we utilize word 
alignments for neural networks is by to guide the neural network during 
training. By incorporating the alignment as an additional cost function 
the network performs better as our experiments show. Even though the 
state-of-the-art neural models, do not require word alignments anymore, 
there are still applications that benefit from good alignments, 
including the visualization of parallel sentences, the creation of 
dictionaries, the automatic segmentation of long parallel sentences and 
the above-mentioned usage during neural network training. We present a 
way to apply neural models to create word alignments that improve over 
word alignments trained with IBM and hidden Markov models.

We evaluate these techniques on various large-scale translation tasks of 
public evaluation campaigns. Applying new methods with usually complex 
workflows to new translation tasks is a cumbersome and error-prone 
exercise. We present a workflow manager, which is developed as part of 
this thesis, to simplify this task and enable an easier knowledge transfer.

Es laden ein: die Dozentinnen und Dozenten der Informatik

--
Stephanie Jansen

Faculty of Mathematics, Computer Science and Natural Sciences
HLTPR - Human Language Technology and Pattern Recognition
RWTH Aachen University
Ahornstraße 55
D-52074 Aachen
Tel. Stephanie Jansen:  +49 241 80-216 06
Tel. Luisa Wingerath: +49 241 80-216 01
Fax: +49 241 80-22219
sek@i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de
Tel: +49 241 80-216 06/01
Fax: +49 241 80-22219
sek@i6.informatik.rwth-aachen.de
www.hltpr.rwth-aachen.de

[Informatik-Vortraege] Einladung Informatik-Oberseminar Jan Thorsten, July 22, 2020

Sekretariat I6