Einladung Informatik-Oberseminar Malte Nuhn
+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Freitag, 12. Juli 2019, 10.00 Uhr Ort: Informatikzentrum, E3, Raum 9222 Referent: Dipl.-Inform. Malte Nuhn Thema: Unsupervised Training with Applications in Natural Language Processing// Abstract: The state-of-the-art algorithms for various natural language processing tasks require large amounts of labeled training data. At the same time, obtaining labeled data of high quality is often the most costly step in setting up natural language processing systems.Opposed to this, unlabeled data is much cheaper to obtain and available in larger amounts.Currently, only few training algorithms make use of unlabeled data. In practice, training with only unlabeled data is not performed at all. In this thesis, we study how unlabeled data can be used to train a variety of models used in natural language processing. In particular, we study models applicable to solving substitution ciphers, spelling correction, and machine translation. This thesis lays the groundwork for unsupervised training by presenting and analyzing the corresponding models and unsupervised training problems in a consistent manner.We show that the unsupervised training problem that occurs when breaking one-to-one substitution ciphers is equivalent to the quadratic assignment problem (QAP) if a bigram language model is incorporated and therefore NP-hard. Based on this analysis, we present an effective algorithm for unsupervised training for deterministic substitutions. In the case of English one-to-one substitution ciphers, we show that our novel algorithm achieves results close to human performance, as presented in [Shannon 49]. Also, with this algorithm, we present, to the best of our knowledge, the first automatic decipherment of the second part of the Beale ciphers.Further, for the task of spelling correction, we work out the details of the EM algorithm [Dempster & Laird + 77] and experimentally show that the error rates achieved using purely unsupervised training reach those of supervised training.For handling large vocabularies, we introduce a novel model initialization as well as multiple training procedures that significantly speed up training without hurting the performance of the resulting models significantly.By incorporating an alignment model, we further extend this model such that it can be applied to the task of machine translation. We show that the true lexical and alignment model parameters can be learned without any labeled data: We experimentally show that the corresponding likelihood function attains its maximum for the true model parameters if a sufficient amount of unlabeled data is available. Further, for the problem of spelling correction with symbol substitutions and local swaps, we also show experimentally that the performance achieved with purely unsupervised EM training reaches that of supervised training. Finally, using the methods developed in this thesis, we present results on an unsupervised training task for machine translation with a ten times larger vocabulary than that of tasks investigated in previous work. Es laden ein: die Dozentinnen und Dozenten der Informatik _______________________________________________ -- -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Frau Jansen: +49 241 80-216 06 Tel. Frau Andersen: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 01/06 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
+********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * +********************************************************************** Zeit: Montag, 06. Juli 2020, 11:00 Uhr Zoom: https://us02web.zoom.us/j/89211319412?pwd=NExERjh0elRvOUNGb01pWURybjhQQT09 Referent: Tamer Alkhouli, M.Sc. Thema: Alignment-Based Neural Networks for Machine Translation Abstract: After more than a decade of phrase-based systems dominating the scene of machine translation, neural machine translation has emerged as the new machine translation paradigm. Not only does state-of-the-art neural machine translation demonstrate superior performance compared to conventional phrase-based systems, but it also presents an elegant end-to-end model that captures complex dependencies between source and target words. Neural machine translation offers a simpler modeling pipeline, making its adoption appealing both for practical and scientific reasons. Concepts like word alignment, which is a core component of phrase-based systems, are no longer required in neural machine translation. While this simplicity is viewed as an advantage, disregarding word alignment can come at the cost of having less controllable translation. Phrase-based systems generate translation composed of word sequences that also occur in the training data. On the other hand, neural machine translation is more flexible to generate translation without exact correspondence in the training data. This aspect enables such models to generate more fluent output, but it also makes translation free of pre-defined constraints. The lack of an explicit word alignment makes it potentially harder to relate generated target words to the source words. With the wider deployment of neural machine translation in commercial products, the demand is increasing for giving users more control over generated translation, such as enforcing or excluding translation of certain terms. This dissertation aims to take a step towards addressing controllability in neural machine translation. We introduce alignment as a latent variable to neural network models, and describe an alignment-based framework for neural machine translation. The models are inspired by conventional IBM and hidden Markov models that are used to generate word alignment for phrase-based systems. However, our models derive from recent neural network architectures that are able to capture more complex dependencies. In this sense, this work can be viewed as an attempt to bridge the gap between conventional statistical machine translation and neural machine translation. We demonstrate that introducing alignment explicitly maintains neural machine translation performance, while making the models more explainable by improving the alignment quality. We show that such improved alignment can be beneficial for real tasks, where the user desires to influence the translation output. We also introduce recurrent neural networks to phrase-based systems in two different ways. We propose a method to integrate complex recurrent models, which capture long-range context, into the phrase-based framework, which considers short context only. We also use neural networks to rescore phrase-based translation candidates, and evaluate that in comparison to the direct integration approach. Es laden ein: die Dozentinnen und Dozenten der Informatik _______________________________________________ Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Frau Jansen: +49 241 80-216 06 Tel. Frau Andersen: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 01/06 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Dienstag, 21. Juli 2020, 16:00 Uhr Zoom: https://us02web.zoom.us/j/83598281729?pwd=c2lhdmpIU0JJeFFDcG01M0FnS0cyZz09 Referent: Diplom-Informatiker Jens Forster Thema: Automatic Sign Language Recognition: From Video Corpora to Gloss Sentences Abstract: In this work, we investigate large vocabulary, automatic sign language recognition (ASLR) from single view video using hidden Markov models (HMMs) with Gaussian mixture models as state emission functions and n-gram, statistical language models. We go beyond the state-of-the-art by investigating continuous sign language instead of isolated signs and extract features and object locations from video via object tracking foregoing invasive data acquisition methods. Overall, we present contributions in three areas. First, we introduce the large vocabulary, single view, continuous sign language corpus RWTH-PHOENIX-Weather which has been created in the context of this work. RWTH-PHOENIX-Weather is annotated in gloss notation and features several subsets usable for object tracking, single signer as well as multi signer recognition.Second, we extend an existing model-free dynamic programming tracking framework with spatial pruning and multi-pass tracking techniques. These approaches are quantitatively evaluated on hand and face location annotations of more than 140k video frames created as part of this work. Third, we investigate the impact of error propagation from object tracking and hidden Markov model (HMM) state alignment quality, among other factors, on ASLR. Methods to improve alignment quality such as non-gesture modeling are shown be effective in improving recognition results for single signer recognition. Addressing the multimodal nature of sign languages, we investigate modality combination techniques applied during decoding finding that synchronous and asynchronous combination without re-training improve recognition results in the context of single signer and multi signer recognition. All proposed modelling and recognition techniques are evaluated on publicly available, continuous German Sign Language corpora or the novel RWTH-PHOENIX-Weather corpus. In either case, we achieve either competitive results or results that clearly outperform results found in the literature at the time of writing. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Frau Jansen: +49 241 80-216 06 Tel. Frau Andersen: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 01/06 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Mittwoch, 22. Juli 2020, 11:00 Uhr Zoom: https://us02web.zoom.us/j/89473373091?pwd=Y2JraVF6aEtUQ3NTVy9leGJhUFFBZz09 Referent: Diplom-Informatiker Jan Thorsten Peter Thema: An Exploration of Alignment Concepts to Bridge the Gap between Phrase-based and Neural Machine Translation Abstract: Machine translation, the task of automatically translating text from one natural language into another, has seen massive changes in recent years. After phrase-based systems represented state of the art for over a decade, made advancements in the structure of neural networks and computational power, it possible to build neural machine translation systems, which first improved and later outperformed phrase-based systems. These two approaches have their strength in different areas, the well known phrase-based systems allow fast translations on CPU that can be easily explained by examining the translation table. In contrast, neural machine translation produces more fluent translations and is more robust to small changes in the provided input. This thesis aims to improve both systems by combining their advantages. The first part of this thesis focuses on investigating the integration of feed-forward neural models into phrase-based systems. Small changes in the input of a phrase-based system can turn an event that was seen in the training data into an unseen event. Neural network models are by design able to handle such cases due to the continuous space representation of the input, whereas, phrase-based systems are forced to fall back to shorter phrases. This means a loss of knowledge about the local context which results in a degradation of, the translation quality. We combine the flexibility provided by feed-forward neural networks with phrase-based systems while gaining a significant improvement over the phrase-based baseline systems. We use feed-forward networks since they are conceptually simple and computationally fast. Commonly, their structure only utilizes local source and target context. Due to this structure, they cannot capture long-distance dependencies. We improve the performance of feed-forward neural networks by efficiently incorporating long-distance dependencies into their structure by using a bag-of-words input. The second part of the thesis focuses on the pure neural machine translation approach using the encoder-decoder model with an attention mechanism. This mechanism corresponds indirectly to a soft alignment. At each translation step, this model relies only on its previous internal state and the current decoder position to compute the attention weights. There is no direct feedback from the previously used attention. Inspired by hidden Markov models, where the prediction of the currently aligned position depends on the previously aligned position, we improve the attention model by adding direct feedback from previously used attention to improve the overall model performance. Additionally, we utilize word alignments for neural networks is by to guide the neural network during training. By incorporating the alignment as an additional cost function the network performs better as our experiments show. Even though the state-of-the-art neural models, do not require word alignments anymore, there are still applications that benefit from good alignments, including the visualization of parallel sentences, the creation of dictionaries, the automatic segmentation of long parallel sentences and the above-mentioned usage during neural network training. We present a way to apply neural models to create word alignments that improve over word alignments trained with IBM and hidden Markov models. We evaluate these techniques on various large-scale translation tasks of public evaluation campaigns. Applying new methods with usually complex workflows to new translation tasks is a cumbersome and error-prone exercise. We present a workflow manager, which is developed as part of this thesis, to simplify this task and enable an easier knowledge transfer. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Stephanie Jansen: +49 241 80-216 06 Tel. Luisa Wingerath: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 06/01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Mittwoch, 22. Juli 2020, 11:00 Uhr Zoom: https://us02web.zoom.us/j/89473373091?pwd=Y2JraVF6aEtUQ3NTVy9leGJhUFFBZz09 Referent: Diplom-Informatiker Jan Thorsten Peter Thema: An Exploration of Alignment Concepts to Bridge the Gap between Phrase-based and Neural Machine Translation Abstract: Machine translation, the task of automatically translating text from one natural language into another, has seen massive changes in recent years. After phrase-based systems represented state of the art for over a decade, made advancements in the structure of neural networks and computational power, it possible to build neural machine translation systems, which first improved and later outperformed phrase-based systems. These two approaches have their strength in different areas, the well known phrase-based systems allow fast translations on CPU that can be easily explained by examining the translation table. In contrast, neural machine translation produces more fluent translations and is more robust to small changes in the provided input. This thesis aims to improve both systems by combining their advantages. The first part of this thesis focuses on investigating the integration of feed-forward neural models into phrase-based systems. Small changes in the input of a phrase-based system can turn an event that was seen in the training data into an unseen event. Neural network models are by design able to handle such cases due to the continuous space representation of the input, whereas, phrase-based systems are forced to fall back to shorter phrases. This means a loss of knowledge about the local context which results in a degradation of, the translation quality. We combine the flexibility provided by feed-forward neural networks with phrase-based systems while gaining a significant improvement over the phrase-based baseline systems. We use feed-forward networks since they are conceptually simple and computationally fast. Commonly, their structure only utilizes local source and target context. Due to this structure, they cannot capture long-distance dependencies. We improve the performance of feed-forward neural networks by efficiently incorporating long-distance dependencies into their structure by using a bag-of-words input. The second part of the thesis focuses on the pure neural machine translation approach using the encoder-decoder model with an attention mechanism. This mechanism corresponds indirectly to a soft alignment. At each translation step, this model relies only on its previous internal state and the current decoder position to compute the attention weights. There is no direct feedback from the previously used attention. Inspired by hidden Markov models, where the prediction of the currently aligned position depends on the previously aligned position, we improve the attention model by adding direct feedback from previously used attention to improve the overall model performance. Additionally, we utilize word alignments for neural networks is by to guide the neural network during training. By incorporating the alignment as an additional cost function the network performs better as our experiments show. Even though the state-of-the-art neural models, do not require word alignments anymore, there are still applications that benefit from good alignments, including the visualization of parallel sentences, the creation of dictionaries, the automatic segmentation of long parallel sentences and the above-mentioned usage during neural network training. We present a way to apply neural models to create word alignments that improve over word alignments trained with IBM and hidden Markov models. We evaluate these techniques on various large-scale translation tasks of public evaluation campaigns. Applying new methods with usually complex workflows to new translation tasks is a cumbersome and error-prone exercise. We present a workflow manager, which is developed as part of this thesis, to simplify this task and enable an easier knowledge transfer. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Stephanie Jansen: +49 241 80-216 06 Tel. Luisa Wingerath: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 06/01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Freitag, 14. August 2020, 14:00 Uhr Zoom: https://us02web.zoom.us/j/83272559800?pwd=Nk5yU1c3anRZeE9yYU5GMU0yaHQ3Zz09 Referent: Diplom-Informatiker Pavel Golik Thema: Data-Driven Deep Modeling and Training for Automatic Speech Recognition Abstract: Many of today's state-of-the-art automatic speech recognition (ASR) systems are based on hybrid hidden Markov models (HMM) that rely on neural networks to provide acoustic and language model probabilities. The training of the acoustic model will be the main focus of this thesis. In the first part of this thesis we will be concerned with the question, to which extent can the extraction of acoustic features be learned by the acoustic model. We will show that not only can a neural network learn to classify the HMM states from the raw time signal, but also learn to perform the time-frequency decomposition in its input layer. Inspired by this finding, we will replace the fully-connected input layer by a convolutional layer and demonstrate that such models show competitive performance on real data. In the second part we will investigate the objective function that is optimized during the supervised acoustic training. In principle, both cross entropy and squared error can be used in frame-wise training. We will compare the objective functions and demonstrate that it is possible to train a hybrid acoustic model using squared error criterion. In the third part of this study we will investigate how i-vectors can be used for acoustic adaptation. We will show that i-vectors can help to obtain a consistent reduction of word error rate on multiple tasks and perform a careful analysis of different integration strategies. In the fourth and final part of this thesis we will apply these and other methods to the task of speech recognition and keyword search on low-resource languages. The limited amount of available resources makes the acoustic training extremely challenging. We will present a series of experiments performed in the scope of the IARPA Babel project that make heavy use of multilingual bottleneck features. Es laden ein: die Dozentinnen und Dozenten der Informatik -- -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Stephanie Jansen: +49 241 80-216 06 Tel. Luisa Wingerath: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 06/01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Mittwoch, 23. September 2020, 13:00 Uhr Zoom: https://us02web.zoom.us/j/88115756486?pwd=TktPZkFRVCtxSVIydWtLS0Z5WEJNQT09 Referent: Diplom-Informatiker Vlad Andreas Guta Thema: Search and Training with Joint Translation and Reordering Models for Statistical Machine Translation Abstract: Statistical machine translation describes the task of automatically translating a written text from a natural language into another. This is done by means of statistical models, which implies defining suitable models, searching the most likely translation of the given text using them and training their parameters on given bilingual sentence pairs. Phrase-based machine translation emerged two decades ago—and it became the state of the art throughout the following years. Nevertheless, the breakthrough of neural machine translation in 2014 triggered an abrupt conversion towards neural models. A fundamental drawback of the traditional approach is the phrases themselves. They are extracted from word-aligned bilingual data via hand-crafted heuristics. The phrase translation models are estimated using the extraction counts resulting from the applied phrase extraction heuristics. Moreover, the translation models exclude any phrase-external information, which in turn limits the context used to generate a target word during search. To complement the restricted models, a variety of additional models and heuristics are used. However, the potentially largest downside is that the word alignments required for the phrase extraction are trained with IBM and hidden Markov models. This results in a discrepancy between the models applied in training and those that are actually used in search. Although the neural approach clearly outperforms the phrasal one, it remains to be answered whether it is the complexity of neural models, which capture dependencies between whole source sentences and their translations, or the coherent application of the same models in both, training and decoding, that leads to the superior performance of neural machine translation. We aim at answering this research question by developing a coherent modelling pipeline that improves over the phrasal approach by relying on fewer but stronger models, discarding dependencies on phrasal heuristics and applying the same word-level models in training and search. First, we investigate two different types of word-based translation models: extended translation models and joint translation and reordering models. Both are enhanced with extended context information and estimate lexical and reordering probabilities. They are integrated directly into the phrase-based search and evaluated against state-of-the-art phrasal baselines to investigate their benefit on top of phrasal models. In the second part, we develop a novel beam-search decoder that generates the translation word-wise, thus discarding any dependencies on heuristic phrases, and incorporates a joint translation and reordering model. It includes far less features than its phrasal systems and its performance is analyzed in comparison to the above-mentioned phrasal baseline systems. The final goal is to achieve a sound and coherent end-to-end machine translation framework. For this purpose, we apply the same models and search algorithm that are employed in word-based translation also in training. To this end, we develop an algorithm for optimizing word alignments and model parameters alternatingly, which is performed iteratively with an increasing model complexity. Es laden ein: die Dozentinnen und Dozenten der Informatik Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Stephanie Jansen: +49 241 80-216 06 Tel. Luisa Wingerath: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 06/01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Donnerstag, 08. Oktober 2020, 14:00 Uhr Zoom: https://us02web.zoom.us/j/83164563245?pwd=THlJNTdtd3ZBK3Z5TDh4RWpWbXhlQT09 Referent: Diplom-Informatiker Patrick Doetsch Thema: Alignment models for recurrent neural networks Abstract: Over the last decade a new standard for modeling automatic speech recognition systems (ASR) and handwriting recognition systems (HWR) has been established by combining hidden Markov models (HMM) with recurrent neural network (RNN) observation models. While earlier approaches with feed-forward neural networks require a fine-graded time-synchronous alignment between the input data and the output transcription, RNNs are capable of modeling the sequential nature of the speech signal or text line image directly. The aim of this thesis is to investigate how these sequential modeling properties affect the training of ASR and HWR observation models on large-scale corpora. In the first part of the thesis we investigate the training procedure of several RNN topologies. We hereby focus on variants of the long short-term memory (LSTM) and measure their performance on different corpora. For this purpose we introduce a software package for large-scale RNN training, which was developed as part of this thesis. Different methods to improve training performance are discussed and we demonstrate their effectiveness on several large tasks. In the second part of this thesis we study the effects of the temporal modeling capabilities of RNNs on the time-synchronous alignment approach, which has been used in combination with HMMs over the last decade. Our focus here are variants of the connectionist temporal classification (CTC) HMM topology. Based on the insights gained from this study, we investigate label-synchronous alignment approaches for HWR and ASR. These alignment methods do not rely on time alignments, but generate the output transcription label-by-label while taking specific parts of the input signal into account. First, we describe an encoder-decoder system with an attention mechanism for HWR. We then combine this idea with the classical approach by deriving so-called inverted alignments, which allow to formalize label-synchronous alignments in the context of HMMs. We evaluate our novel approach in different experimental settings and present results on a large ASR corpus. Es laden ein: die Dozentinnen und Dozenten der Informatik -- -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Stephanie Jansen: +49 241 80-216 06 Tel. Luisa Wingerath: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 06/01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Freitag, 13. November 2020, 10:00 Uhr Zoom: https://us02web.zoom.us/j/81795496004?pwd=Y1hJWDlWTWdMVzRPT0d6eTFYdi9qUT09 Referent: Diplom-Informatiker Markus Nußbaum-Thom Thema: Investigations on Neural Networks, Discriminative Training Criteria and Error Bounds Abstract: The task of an automatic speech recognition system is to convert speech signals into written text by choosing the recognition result according to a statistical decision rule. The discriminative training of the underlying statistical model is an essential part to improve the word error rate performance of the system. In automatic speech recognition a mismatch exists between the loss used in the word error rate performance measure, the loss of the decision rule and the loss of the discriminative training criterion. In the course of this thesis the analysis of this mismatch leads to the development of novel error bounds and training criteria. The novel training criteria are evaluated in practical speech recognition experiments. In summary, we come to the conclusion the statistical model is able to compensate for this mismatch if the discriminative training criterion involves the loss of the performance measure. Es laden ein: die Dozentinnen und Dozenten der Informatik Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel: +49 241 80-216-06 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Montag, 07. Dezember 2020, 16:00 Uhr Zoom: https://us02web.zoom.us/j/87415814965?pwd=b0pSQmxZYnd2Q214MkFjUVE0Tkg3UT09 Referent: Zoltán Tüske, M.Sc. Thema: Discriminative Feature Modeling for Statistical Speech Recognition Abstract: Conventional speech recognition systems consist of feature extraction, acoustic and language modeling blocks, and search block. In a recent trend, deep neural networks replaced or extended traditional modeling approaches in these blocks. Using such layered structures, data-driven feature extraction and representation learning happens at multiple levels, besides the traditional cepstral feature extraction. This work revisits and extends these manually and automatically derived features in multiple ways. In the first part, we relax short-time stationary assumption of traditional feature extraction, and a novel non-stationary framework is introduced to perform more precise analysis of voiced speech. The noise-robustness of the derived features are evaluated in standard, noisy speech recognition tasks. Second, with the advent of deep learning and big data, the necessity of manually designed feature extraction pipelines is challenged, and we investigate whether direct acoustic modeling of waveform is a viable option. The representation learned by a deep neural network is analyzed, and we study whether it is useful to a priori choose neural network structures based on decades of speech signal processing knowledge. Third, a theoretical connection is presented between the two most widely used and equally powerful types of neural network based acoustic models, the hybrid and tandem approaches. Supported by experimental results, we show that a Gaussian mixture model trained on optimally chosen neural network based features, the tandem approach, cannot perform worse than a similar hybrid model. High quality transcribed speech data is a significant cost factor in developing acoustic models for a new language. In the fourth part, an efficient multilingual neural network framework is presented to reuse resources collected in other languages in order to improve system performance. Further, a rapid, multilingual feature based framework is proposed, which allows us to reach reasonable performances under extreme short time constraints and very limited data conditions. Last, we also investigate multi-domain neural network language model structures. The proposed framework allows efficient limited-data domain adaptation, and a shared embedding space of language model history across domains results in a compact final model. Besides comparing the performance of neural network and traditional count based models, we also examine the effective context length of the best performing neural networks. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel: +49 241 80-216-06 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Freitag, 04. Dezember 2020, 15:00 Uhr Zoom: https://us02web.zoom.us/j/87007692723?pwd=d1M3WFJvV2J1SjRtL3N0WDZjU21pdz09 Referent: Harald Hanselmann, M.Sc. Thema: Alignment and localization in fine-grained image recognition Abstract: Image recognition tasks can be classified into different categories with respect to the extent of the inter-class variations. General image recognition tasks typically classify images into a wide variety of broad categories and therefore display large inter-class variation. Fine-grained image classifications tasks, however, are defined by low inter-class variation. Examples of such tasks include the classification of animal species, car models or face recognition. For fine-grained tasks, it is not only important to detect which features are in an image, but also where they are located and what their spatial relations are. In this thesis we look at different methods to align and localize features and discriminative regions for fine-grained image classification. On the one hand, we will look at computing dense pixel-wise alignments using 2D-Warping. In this context, we will introduce methods for speeding up the computation of the dense alignments as the runtime is the main drawback of 2D-Warping based approaches. Additionally, we will introduce a new 2D-Warping algorithm that obtains better results in terms of optimization score and classification accuracy compared to previous 2D-Warping algorithms. On the other hand, we will explore a new method to obtain local features needed to compute the dense alignments. These features are learned from data using convolutional neural networks (CNNs) Further, we introduce a warped region-of-interest pooling layer based on 2D-Warping that can be inserted into a CNN. We observe that for good classification accuracy, modeling translation and scaling are most important. For this reason we introduce a stand-alone localization module that handles translation and scaling variances, is very lightweight and efficient, and needs only class labels to be trained. We then add an embedding layer and global K-max pooling to obtain a complete and efficient system for fine-grained image classification. Finally, to simplify the training procedure and leverage the benefits of full end-to-end systems, we transform the localization module such that it can be integrated into the classification model and trained jointly. We evaluate our methods on popular and challenging tasks for fine-grained image classification and are able to report very competitive results. Es laden ein: die Dozentinnen und Dozenten der Informatik Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel: +49 241 80-216-06 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Dienstag, 21. Dezember 2021, 16:30 Uhr Zoom: https://rwth.zoom.us/j/99299040714?pwd=amY1QldwUTdPUUNwN0FrNW9wRkNMQT09 Meeting-ID: 992 9904 0714 Kenncode: 146526 Referent: Yunsu Kim, M.Sc. Thema: Neural Machine Translation for Low-Resource Scenarios Abstract: Machine translation has been tackled for decades mainly by statistical learning on bilingual text data. In the most recent paradigm with neural networks, building a machine translation system requires more data than ever to make the best use of the modeling capacity and yield a reasonable performance. Unfortunately, however, there is not a sufficient amount of bilingual corpora for many language pairs and domains. To expand the coverage of neural machine translation, this talk investigates state-of-the-art methods to enhance the modeling, training, or data for such low-resource scenarios. These are categorized into two emerging paradigms in machine learning: First, in semi-supervised learning, we review the language model integration, monolingual pre-training, and back-translation. Second, in transfer learning, we study the end-to-end cascading of translation models and a series of sequential cross-lingual transfer techniques. These methods are empirically verified, compared, and combined in extensive experiments, providing the best practice for both English-centric and non-English language pairs when the data is scarce. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel: +49 241 80-216-06 Fax: +49 241 80-22219 sek@hltpr.rwth-aachen.de www.hltpr.rwth-aachen.de
Dear all, I am sorry to inform you that the defense talk on Dec. 21 is cancelled. A new date will be announced next year. Kind regards Stephanie Jansen Am 14.12.2021 um 12:47 schrieb Stephanie Jansen:
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * ***********************************************************************
Zeit: Dienstag, 21. Dezember 2021, 16:30 Uhr
Zoom: https://rwth.zoom.us/j/99299040714?pwd=amY1QldwUTdPUUNwN0FrNW9wRkNMQT09
Meeting-ID: 992 9904 0714
Kenncode: 146526
Referent: Yunsu Kim, M.Sc.
Thema: Neural Machine Translation for Low-Resource Scenarios
Abstract:
Machine translation has been tackled for decades mainly by statistical learning on bilingual text data. In the most recent paradigm with neural networks, building a machine translation system requires more data than ever to make the best use of the modeling capacity and yield a reasonable performance. Unfortunately, however, there is not a sufficient amount of bilingual corpora for many language pairs and domains. To expand the coverage of neural machine translation, this talk investigates state-of-the-art methods to enhance the modeling, training, or data for such low-resource scenarios. These are categorized into two emerging paradigms in machine learning: First, in semi-supervised learning, we review the language model integration, monolingual pre-training, and back-translation. Second, in transfer learning, we study the end-to-end cascading of translation models and a series of sequential cross-lingual transfer techniques. These methods are empirically verified, compared, and combined in extensive experiments, providing the best practice for both English-centric and non-English language pairs when the data is scarce.
Es laden ein: die Dozentinnen und Dozenten der Informatik
-- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel: +49 241 80-216-06 Fax: +49 241 80-22219 sek@hltpr.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Montag, 07. Februar 2022, 14:00 Uhr Zoom: https://rwth.zoom.us/j/99299040714?pwd=amY1QldwUTdPUUNwN0FrNW9wRkNMQT09 Meeting-ID: 992 9904 0714 Kenncode: 146526 Referent: Yunsu Kim, M.Sc. Thema: Neural Machine Translation for Low-Resource Scenarios Abstract: Machine translation has been tackled for decades mainly by statistical learning on bilingual text data. In the most recent paradigm with neural networks, building a machine translation system requires more data than ever to make the best use of the modeling capacity and yield a reasonable performance. Unfortunately, however, there is not a sufficient amount of bilingual corpora for many language pairs and domains. To expand the coverage of neural machine translation, this talk investigates state-of-the-art methods to enhance the modeling, training, or data for such low-resource scenarios. These are categorized into two emerging paradigms in machine learning: First, in semi-supervised learning, we review the language model integration, monolingual pre-training, and back-translation. Second, in transfer learning, we study the end-to-end cascading of translation models and a series of sequential cross-lingual transfer techniques. These methods are empirically verified, compared, and combined in extensive experiments, providing the best practice for both English-centric and non-English language pairs when the data is scarce. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel: +49 241 80-216-06 Fax: +49 241 80-22219 sek@hltpr.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Dienstag, 07. Juni 2022, 14:00 Uhr Zoom: https://rwth.zoom.us/j/98675211647?pwd=NlZjYlFxRExtZ2FuU1NTMDl4by93dz09 Meeting-ID: 986 7521 1647 Kenncode: 021523 Referent: Diplom-Mathematiker, Diplom-Informatiker Albert Zeyer Thema: Neural Network based Modeling and Architectures for Automatic Speech Recognition and Machine Translation Abstract: Our work aims to advance the field and application of neural networks, to advance sequence-to-sequence architectures by extending and developing new approaches, and to improve training methods. We perform a comprehensive study of long short-term memory (LSTM) acoustic models and improve over our feed-forward neural network (FFNN) baseline by 16% relative. Layer-normalized (LN) LSTM variants further enhance this by up to 10% relative with improved training stability and better convergence. Our comparison of Transformer and LSTM models yields state-of-the-art Transformer language models with 6% relative improvements over the best LSTM. We aim to advance the status quo which is the hybrid neural network (NN)-hidden Markov model (HMM) by investigating alternative sequence-to-sequence architectures. We develop state-of-the-art attention-based models for machine translation and speech recognition. With the motivation to introduce monotonicity and potential streaming, we propose latent local attention segmental models with hard attention as a special case. We discover the equivalence of segmental and transducer models, and propose a novel class of generalized and extended transducer models, which perform and generalize better than our attention models. Our work shows that training strategies such as learning rate scheduling, data augmentation, and regularization play the most important role in good performance. Our novel pretraining schemes, where we grow the depth and width of the neural network, improve convergence and performance. A generalized training procedure for hybrid NN-HMMs is studied, which includes the full sum over all alignments, where we identify connectionist temporal classification (CTC) as a special case. Our novel mathematical analysis explains the peaky behavior of CTC and its convergence properties. We develop large parts of RETURNN as an efficient and flexible software framework including beam search to perform all the experiments. This framework and most of our results and baselines are widely used among the team and beyond. All of our work is published and all code and setups are available online. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Theaterstraße 35-39 D-52062 Aachen Tel: +49 241 80-21601 sek@hltpr.rwth-aachen.de www.hltpr.rwth-aachen.de
********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Dienstag, 29. September 2020, 11:00 Uhr Zoom: https://us02web.zoom.us/j/86007461999?pwd=VTgvOUdqcW0yeVQvdU5pYmlUNHROdz09 Referent: Dipl.-Ing. Oscar Koller Thema: Towards Large Vocabulary Continuous Sign Language Recognition: From Artificial to Real-Life Tasks Abstract: This thesis deals with large vocabulary continuous sign language recognition. Historically, research on sign language recognition has been dispersed and often researchers independently captured their own small-scale data sets for experimentation. Most available data sets do not cover the complexity that sign languages encompass. Moreover, most previous work does not tackle continuous sign language but only isolated single signs. Besides containing only a very limited vocabulary, no work has ever targeted real-life sign language. The employed data sets typically comprised artificial and staged sign language footage, which was planned and recorded with the aim of enabling automatic recognition. The kind of signs to be encountered, the structure of sentences, the signing speed, the choice of expression and dialects were usually controlled and determined beforehand. This work aims at moving sign language recognition to more realistic scenarios. For this purpose we created the first real-life large vocabulary continuous sign language corpora, which are based on recordings of the broadcast channel featuring natural sign language of professional interpreters. This kind of data provides unprecedented complexity for recognition. A statistical sign language recognition system based on Gaussian mixture and hidden Markov models (HMMs) with hand-crafted features is created and evaluated on the challenging task. We then leverage advances in deep learning and propose modern hybrid convolutional neural network (CNN) and long short-term memory (LSTM) HMMs which are shown to halve the recognition error. Finally, we develop a weakly supervised learning scheme based on hybrid multi-stream CNN-LSTM-HMMs that allows the accurate discovery of sign subunits such as articulated handshapes and mouth patterns in sign language footage. Es laden ein: die Dozentinnen und Dozenten der Informatik -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Stephanie Jansen: +49 241 80-216 06 Tel. Luisa Wingerath: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 06/01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de
participants (3)
-
Sekretariat I6
-
Stephanie Jansen
-
Stephanie Jansen