Vortrag Dr. Markus Freitag (heute)
Sehr geehrte Damen und Herren, wir möchten Sie freundlich auf den Vortrag (s.u.) von Herr Dr. Markus Freitag heute Nachmittag um 14:30 hinweisen. Bitte entschuldigen Sie die kurzfristige Ankündigung. Mit freundlichen Grüßen, Christian Herold *********************************************************************** * * * Einladung zum Gastvortrag * * * ************************************************************************ Zeit: Dienstag, 14. Januar 2020, 14:30 Ort: Ahornstraße 55, E2, Raum 5056 Referent: Dr. Markus Freitag Titel: (Some) Research Happening at Google Translate Abstract: Machine Translation is one of the most appealing research topics in Natural Language Processing and Machine Learning. In this talk, you will be given an overview of some of the current research efforts happening at Google Translate. We will start with a project with the end-goal of training a single model to translate between all languages supported by Google Translate. Neural models can be trained to perform several tasks simultaneously as exemplified by multilingual NMT using a single model to translate between multiple languages. Apart from reducing operational costs, multilingual models improve performance on low and zero-resource language pairs due to joint training. We attempt to study multilingual neural machine translation, using a massive open-domain dataset containing over 25 billion parallel sentences in 103 languages. In the second half of the talk, we focus on translationese, a term that refers to artifacts present in text that was translated into a given language that distinguish it from text originally written in that language. These artifacts include lexical and word order choices that are influenced by the source language as well as the use of more explicit and simpler constructions. Machine translation has an undesirable propensity to produce translationese artifacts, which can lead to higher BLEU scores while being liked less by human raters. First, we train an Automatic Post Editing (APE) model that convert the translationese output into a more natural text. We use this model as a tool to reveal systematic problems with reference translations. Second, we model translationese and original (i.e. natural) text as separate languages in a multilingual model, and pose the question: can we perform zero-shot translation between original source text and original target text? To sum up, we will discuss why a loss in BLEU score does not always mean lower translation quality. -- Christian Herold Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel: +49 241 80 21613 Fax: +49 241 80 22219 herold@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
participants (1)
-
Christian Herold