Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication
Technology Colloquium.
*Monday, April 25, 2022*
*Speaker*: Alexander Sobolew
*Time*: 03:00 p.m.
*Location*:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Lecture*: Investigation of Specialized Recurrent Units for
Acoustic Echo Cancellation
In today's communication, hands-free devices e.g. remote communication
are widely used. Without further action, these would suffer from an
acoustic echo that arises from the coupling between speaker and
microphone. To minimize these disturbances, acoustic echo cancellation
is indispensable. Model-based adaptive algorithms exist to solve this
issue. However, they require careful tuning of parameters whose optimum
differs between devices and acoustic situations.
In this thesis, a new data-driven approach for acoustic echo
cancellation is developed and investigated. In contrast to the purely
model-based approach, the algorithm is supposed to learn the optimal
performance from data without the necessity to be tuned. In new
situations, the unknown parameters should be estimated. At its core, the
novel structure is similar to a frequency adaptive filter. However, it
is extended by the gating mechanism known from recurrent neural
networks. The development also includes the determination of optimal
training paradigms. When choosing the model structure, attention is paid
to a reasonable training complexity. Major challenges in this thesis
include the investigation of the gating mechanism, which is represented
by a learn gate and a reset gate. The prior is used to estimate a
time-varying step size of the iterative algorithm. Gated Recurrent Units
provide an internal memory to accommodate the sequential information in
speech, while skip connections optimize the gradient flow during
training. Independent use of a reset gate to reset the impulse response
estimation in case of situation change is outperformed by exploitation
of weight sharing. Using weight sharing, the learn and reset gates have
direct information about each other's behavior due to a shared partial
network. It was shown that when using backpropagation through time, the
truncation order can be reduced to a certain extent, which reduced the
training complexity but did not decrease the performance. The developed
model outperforms the tuned Fast Block Normalized Least-Mean-Square
algorithm in reconvergence speed and steady-state performance in far-end
single talk and double talk. Furthermore, our model repeatedly
outperforms the tuned diagonalized Kalman filter in certain scenarios
and offers significantly improved overall performance in single talk.
and
*Monday, April 25, 2022*
*Speaker*: Alexej Sobolew
*Time*: 04:00 p.m.
*Location*:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Lecture*: Investigation of Generative Neural Networks for Speech
Enhancement
Speech enhancement aims to reduce noise in speech signals and is widely
used in hearing aids and mobile speech communication. Speech synthesis,
on the other hand, aims to generate high-quality human speech and is
used, e. g., in text-to-speech generation. Noise reduction and speech
synthesis can be combined since conventional noise reduction methods
often only improve the magnitude spectrum and keep the noisy phase.
However, the phase has an important influence on speech quality and
intelligibility. In addition, training neural networks with complex
spectrograms is more difficult, so it is reasonable to first denoise the
magnitude spectrum and subsequently synthesize the waveform of the
speech based on it. Mentioned applications often require low execution
times and low computational overhead. This is achievable by exploiting
parallel processors using the non-autoregressive property and by
reducing the number of parameters in the neural network. Hence, this
thesis aims to investigate noise reduction, speech synthesis, and joint
interaction.
In this thesis, the first use case considered is phase reconstruction
and speech synthesis based on clean data. A non-autoregressive
three-stage speech enhancement system is developed for the second use
case of combined noise reduction and speech synthesis based on magnitude
spectra. For speech synthesis on clean magnitude spectra such as
mel-spectrograms, the neural network called WaveGlow from Nvidia is
taken as a basis. WaveGlow achieves similar subjective performance as
the Griffin-Lim algorithm but is better suited for fast applications.
For the reduction of parameters in WaveGlow, the SqueezeWave is used,
resulting in a decrease in the number of parameters and the inference
time by up to 70%. During the switch to additional noise reduction
besides speech synthesis, it is shown that WaveGlow alone is not
suitable for performing both tasks simultaneously. Consequently, the
problem is divided into three stages: masking, inpainting, and
synthesis. The models for masking and inpainting are adapted to
mel-spectrograms and studied in detail for noise reduction. As result,
they are able to reduce the noise significantly. It is worth noting that
in this thesis the mel-filter is used as downsampling adapted to human
perception due to its non-linear property, resulting
in a reduction of the number of computations in the first two models.
Subsequently, the performance of the entire three-stage speech
enhancement system is investigated. It improves the speech quality and
intelligibility of noisy data while exploiting parallel processors and
can compete with existing state-of-the-art methods. The system achieves
better noise reduction than the Convolutional Recurrent Network (CRN)
and additionally does not rely on the noisy phase.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of
dates of the Communication Technology Colloquium can be found at:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/