[Informatik-Vortraege] Einladung Informatik-Oberseminar Albert Zeyer, 07. Juni 2022, 14:00

30 May 2022

      **********************************************************************
*
*
*                          Einladung
*
*
*
*                     Informatik-Oberseminar
*
*
*
***********************************************************************

Zeit: Dienstag, 07. Juni 2022, 14:00 Uhr

Zoom: 
https://rwth.zoom.us/j/98675211647?pwd=NlZjYlFxRExtZ2FuU1NTMDl4by93dz09

Meeting-ID: 986 7521 1647

Kenncode: 021523

Referent: Diplom-Mathematiker, Diplom-Informatiker Albert Zeyer

Thema: Neural Network based Modeling and Architectures for Automatic 
Speech Recognition and Machine Translation

Abstract:

Our work aims to advance the field and application of neural networks, 
to advance sequence-to-sequence architectures by extending and 
developing new approaches, and to improve training methods. We perform a 
comprehensive study of long short-term memory (LSTM) acoustic models and 
improve over our feed-forward neural network (FFNN) baseline by 16% 
relative. Layer-normalized (LN) LSTM variants further enhance this by up 
to 10% relative with improved training stability and better convergence. 
Our comparison of Transformer and LSTM models yields state-of-the-art 
Transformer language models with 6% relative improvements over the best 
LSTM. We aim to advance the status quo which is the hybrid neural 
network (NN)-hidden Markov model (HMM) by investigating alternative 
sequence-to-sequence architectures. We develop state-of-the-art 
attention-based models for machine translation and speech recognition. 
With the motivation to introduce monotonicity and potential streaming, 
we propose latent local attention segmental models with hard attention 
as a special case. We discover the equivalence of segmental and 
transducer models, and propose a novel class of generalized and extended 
transducer models, which perform and generalize better than our 
attention models.

Our work shows that training strategies such as learning rate 
scheduling, data augmentation, and regularization play the most 
important role in good performance. Our novel pretraining schemes, where 
we grow the depth and width of the neural network, improve convergence 
and performance. A generalized training procedure for hybrid NN-HMMs is 
studied, which includes the full sum over all alignments, where we 
identify connectionist temporal classification (CTC) as a special case. 
Our novel mathematical analysis explains the peaky behavior of CTC and 
its convergence properties.

We develop large parts of RETURNN as an efficient and flexible software 
framework including beam search to perform all the experiments. This 
framework and most of our results and baselines are widely used among 
the team and beyond. All of our work is published and all code and 
setups are available online.

Es laden ein: die Dozentinnen und Dozenten der Informatik

-- 
Stephanie Jansen

Faculty of Mathematics, Computer Science and Natural Sciences
HLTPR - Human Language Technology and Pattern Recognition
RWTH Aachen University
Theaterstraße 35-39
D-52062 Aachen
Tel: +49 241 80-21601
sek@hltpr.rwth-aachen.de
www.hltpr.rwth-aachen.de