********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Mittwoch, 12. Juni 2024, 09:30 Uhr Ort: Raum 9222, E3, Informatikzentrum Zoom:https://rwth.zoom-x.de/j/68743953886?pwd=HMUlqnO8qakacCpfazBAzKy8b222EK.1 Meeting-ID: 687 4395 3886 Kenncode: 183143 Referent: Christian Herold, M.Sc.; Lehrstuhl Informatik 6 Thema: Context-Aware Neural Machine Translation Abstract: Despite the known limitations, most automatic machine translation (MT) systems today still operate on the sentence-level, ignoring cross-sentence context information. This is, because considering cross-sentence context leads to (i) exponentially increasing complexity, (ii) limits us in terms of the available training data, and (iii) sometimes even reduces translation quality on general MT benchmarks. In this talk, we discuss our efforts to combat these issues and to improve context-aware MT systems. First, we discuss the different decoding strategies for document-level MT and explain how constraining the model attention can result in a more efficient translation system. Second, to tackle the problem of scarce document-level training data, we elaborate on our efforts to utilize monolingual document-level data for MT. Finally, we discuss our efforts on data filtering for MT, which can benefit both sentence- and document-level systems. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences Chair of Computer Science 6 ML - Machine Learning and Reasoning RWTH Aachen University Theaterstraße 35-39 D-52062 Aachen Tel: +49 241 80-21601 sek@ml.rwth-aachen.de www.hltpr.rwth-aachen.de