**********************************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
***********************************************************************
Zeit: Mittwoch, 12. Juni 2024, 09:30 Uhr
Ort: Raum 9222, E3, Informatikzentrum
Zoom: https://rwth.zoom-x.de/j/68743953886?pwd=HMUlqnO8qakacCpfazBAzKy8b222EK.1
Meeting-ID: 687 4395 3886
Kenncode: 183143
Referent: Christian Herold, M.Sc.; Lehrstuhl Informatik 6
Thema: Context-Aware Neural Machine Translation
Abstract:
Despite the known limitations, most automatic machine translation
(MT) systems today still operate on the sentence-level, ignoring
cross-sentence context information. This is, because considering
cross-sentence context leads to (i) exponentially increasing
complexity, (ii) limits us in terms of the available training
data, and (iii) sometimes even reduces translation quality on
general MT benchmarks. In this talk, we discuss our efforts to
combat these issues and to improve context-aware MT systems.
First, we discuss the different decoding strategies for
document-level MT and explain how constraining the model attention
can result in a more efficient translation system. Second, to
tackle the problem of scarce document-level training data, we
elaborate on our efforts to utilize monolingual document-level
data for MT. Finally, we discuss our efforts on data filtering for
MT, which can benefit both sentence- and document-level systems.
-- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences Chair of Computer Science 6 ML - Machine Learning and Reasoning RWTH Aachen University Theaterstraße 35-39 D-52062 Aachen Tel: +49 241 80-21601 sek@ml.rwth-aachen.de www.hltpr.rwth-aachen.de