********************************************************************** * * * Einladung * * * * Informatik-Oberseminar * * * *********************************************************************** Zeit: Freitag, 14. August 2020, 14:00 Uhr Zoom: https://us02web.zoom.us/j/83272559800?pwd=Nk5yU1c3anRZeE9yYU5GMU0yaHQ3Zz09 Referent: Diplom-Informatiker Pavel Golik Thema: Data-Driven Deep Modeling and Training for Automatic Speech Recognition Abstract: Many of today's state-of-the-art automatic speech recognition (ASR) systems are based on hybrid hidden Markov models (HMM) that rely on neural networks to provide acoustic and language model probabilities. The training of the acoustic model will be the main focus of this thesis. In the first part of this thesis we will be concerned with the question, to which extent can the extraction of acoustic features be learned by the acoustic model. We will show that not only can a neural network learn to classify the HMM states from the raw time signal, but also learn to perform the time-frequency decomposition in its input layer. Inspired by this finding, we will replace the fully-connected input layer by a convolutional layer and demonstrate that such models show competitive performance on real data. In the second part we will investigate the objective function that is optimized during the supervised acoustic training. In principle, both cross entropy and squared error can be used in frame-wise training. We will compare the objective functions and demonstrate that it is possible to train a hybrid acoustic model using squared error criterion. In the third part of this study we will investigate how i-vectors can be used for acoustic adaptation. We will show that i-vectors can help to obtain a consistent reduction of word error rate on multiple tasks and perform a careful analysis of different integration strategies. In the fourth and final part of this thesis we will apply these and other methods to the task of speech recognition and keyword search on low-resource languages. The limited amount of available resources makes the acoustic training extremely challenging. We will present a series of experiments performed in the scope of the IARPA Babel project that make heavy use of multilingual bottleneck features. Es laden ein: die Dozentinnen und Dozenten der Informatik -- -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Stephanie Jansen: +49 241 80-216 06 Tel. Luisa Wingerath: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 06/01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de