Dear subscribers of the colloquium newsletter,

we are happy to inform you about the next date of our Communication Technology Colloquium.

Monday, April 25, 2022
Speaker: Alexander Sobolew
Time: 03:00 p.m.
Location:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09

                    Meeting-ID: 979 0415 7921
                    Passwort: 481650

Master-Lecture: Investigation of Specialized Recurrent Units for Acoustic Echo Cancellation

In today's communication, hands-free devices e.g. remote communication are widely used. Without further action, these would suffer from an acoustic echo that arises from the coupling between speaker and microphone. To minimize these disturbances, acoustic echo cancellation is indispensable. Model-based adaptive algorithms exist to solve this issue. However, they require careful tuning of parameters whose optimum differs between devices and acoustic situations.

In this thesis, a new data-driven approach for acoustic echo cancellation is developed and investigated. In contrast to the purely model-based approach, the algorithm is supposed to learn the optimal performance from data without the necessity to be tuned. In new situations, the unknown parameters should be estimated. At its core, the novel structure is similar to a frequency adaptive filter. However, it is extended by the gating mechanism known from recurrent neural networks. The development also includes the determination of optimal training paradigms. When choosing the model structure, attention is paid to a reasonable training complexity. Major challenges in this thesis include the investigation of the gating mechanism, which is represented by a learn gate and a reset gate. The prior is used to estimate a time-varying step size of the iterative algorithm. Gated Recurrent Units provide an internal memory to accommodate the sequential information in speech, while skip connections optimize the gradient flow during training. Independent use of a reset gate to reset the impulse response estimation in case of situation change is outperformed by exploitation of weight sharing. Using weight sharing, the learn and reset gates have direct information about each other's behavior due to a shared partial network. It was shown that when using backpropagation through time, the truncation order can be reduced to a certain extent, which reduced the training complexity but did not decrease the performance. The developed model outperforms the tuned Fast Block Normalized Least-Mean-Square algorithm in reconvergence speed and steady-state performance in far-end single talk and double talk. Furthermore, our model repeatedly outperforms the tuned diagonalized Kalman filter in certain scenarios and offers significantly improved overall performance in single talk.

and

Monday, April 25, 2022
Speaker: Alexej Sobolew
Time: 04:00 p.m.
Location:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09

                    Meeting-ID: 979 0415 7921
                    Passwort: 481650

Master-Lecture: Investigation of Generative Neural Networks for Speech Enhancement

Speech enhancement aims to reduce noise in speech signals and is widely used in hearing aids and mobile speech communication. Speech synthesis, on the other hand, aims to generate high-quality human speech and is used, e. g., in text-to-speech generation. Noise reduction and speech synthesis can be combined since conventional noise reduction methods often only improve the magnitude spectrum and keep the noisy phase. However, the phase has an important influence on speech quality and intelligibility. In addition, training neural networks with complex spectrograms is more difficult, so it is reasonable to first denoise the magnitude spectrum and subsequently synthesize the waveform of the speech based on it. Mentioned applications often require low execution times and low computational overhead. This is achievable by exploiting parallel processors using the non-autoregressive property and by reducing the number of parameters in the neural network. Hence, this thesis aims to investigate noise reduction, speech synthesis, and joint interaction.

In this thesis, the first use case considered is phase reconstruction and speech synthesis based on clean data. A non-autoregressive three-stage speech enhancement system is developed for the second use case of combined noise reduction and speech synthesis based on magnitude spectra. For speech synthesis on clean magnitude spectra such as mel-spectrograms, the neural network called WaveGlow from Nvidia is taken as a basis. WaveGlow achieves similar subjective performance as the Griffin-Lim algorithm but is better suited for fast applications. For the reduction of parameters in WaveGlow, the SqueezeWave is used, resulting in a decrease in the number of parameters and the inference time by up to 70%. During the switch to additional noise reduction besides speech synthesis, it is shown that WaveGlow alone is not suitable for performing both tasks simultaneously. Consequently, the problem is divided into three stages: masking, inpainting, and synthesis. The models for masking and inpainting are adapted to mel-spectrograms and studied in detail for noise reduction. As result, they are able to reduce the noise significantly. It is worth noting that in this thesis the mel-filter is used as downsampling adapted to human perception due to its non-linear property, resulting
in a reduction of the number of computations in the first two models. Subsequently, the performance of the entire three-stage speech enhancement system is investigated. It improves the speech quality and intelligibility of noisy data while exploiting parallel processors and can compete with existing state-of-the-art methods. The system achieves better noise reduction than the Convolutional Recurrent Network (CRN) and additionally does not rely on the noisy phase.

All interested parties are cordially invited, registration is not required.

General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium

-- 
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz@iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/