******************************************************
*
*
* Einladung
*
*
*
* Informatik-Oberseminar
*
*
*
*******************************************************
Zeit: Dienstag, 05. Mai 2020, 14.00 Uhr
Zoom:
https://us02web.zoom.us/j/84813327259?pwd=Y2NydlRMRzE1dkpkcmpERkFwMWZYZz09
Referent: Kazuki Irie, M.Sc.
Thema: Advancing Neural Language Modeling in Automatic Speech
Recognition
Abstract:
Statistical
language modeling is one of the fundamental problems in natural
language processing. In the recent years, language modeling has
seen great advances by active research and engineering efforts
in applying artificial neural networks, especially those which
are recurrent. The application of neural language models to
speech recognition has now become well established and
ubiquitous. Despite this impression of some degree of maturity,
we claim that the full potential of the neural network based
language modeling is yet to be explored. In this thesis, we
further advance neural language modeling in automatic speech
recognition, by investigating a number of new perspectives. From
the architectural view point, we investigate the newly proposed
Transformer neural networks for language modeling application.
The original model architecture proposed for machine translation
is studied and modified to accommodate the specific task of
language modeling. Particularly deep models with about one
hundred layers are developed. We present an in-depth comparison
with the state-of-the-art recurrent neural network language
models based on the long short-term memory.
While
scaling up language modeling to larger scale datasets, the
diversity of the data emerges as an opportunity and a challenge.
The current state-of-the-art neural language modeling lacks a
mechanism of handling diverse data from different domains for a
single model to perform well across different domains. In this
context, we introduce domain robust language modeling with
neural networks, and propose two solutions. As a first solution,
we propose a new type of adaptive mixture of experts model which
is fully based on neural networks. In the second approach, we
investigate knowledge distillation from multiple domain expert
models, as a solution to the large model size problem seen in
the first approach. Methods for practical applications of
knowledge distillation to large vocabulary language modeling are
proposed, and studied to a large extent.
Finally,
we investigate the potential of neural language models to
leverage long-span cross-sentence contexts for cross-utterance
speech recognition. The appropriate training method for such a
scenario is under-explored in the existing works. We carry out
systematic comparisons of the training methods, allowing us to
achieve improvements in cross-utterance speech recognition. In
the same context, we study the sequence length robustness for
both recurrent neural networks based on the long short-term
memory and Transformers, because such a robustness is one of the
fundamental properties we wish to have, in neural networks with
the ability to handle variable length contexts. Throughout
the thesis, we tackle these problems through novel perspectives
of neural language modeling, while keeping the traditional
spirit of language modeling in speech recognition.
Es laden ein: die Dozentinnen und Dozenten der Informatik
-- -- Stephanie Jansen Faculty of Mathematics, Computer Science and Natural Sciences HLTPR - Human Language Technology and Pattern Recognition RWTH Aachen University Ahornstraße 55 D-52074 Aachen Tel. Frau Jansen: +49 241 80-216 06 Tel. Frau Andersen: +49 241 80-216 01 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de Tel: +49 241 80-216 01/06 Fax: +49 241 80-22219 sek@i6.informatik.rwth-aachen.de www.hltpr.rwth-aachen.de