Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication Technology Colloquium.
Wednesday, January 18, 2022
Speaker: Konstantin Wehmeyer
Time: 2:00 p.m.
Location: hybrid - Lecture room 4G and https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
Bachelor-Lecture: Deep Learning-Based Speech Synthesis as Post-Processing of a Noise Reduction
Audio and speech signals are often disturbed by noise signals in frequency- and/or time-limited parts. To attenuate or remove these distortions, several methods, including deep learning- based approaches, are known. Often, however, only the magnitude spectrum is processed and the phase spectrum is taken over unchanged due to its comparatively lower relevance. Consequently, the noisy phase is reused when synthesizing the waveform from the processed magnitude spectrum. Therefore, distortions in the magnitude spectrum can be reduced, but not in the phase spectrum which inevitably leads to a deterioration in speech quality and intelligibility.
This thesis presents methods that allow a reconstruction of
the phase spectrum of speech signals based on noise-reduced
magnitude spectra. At the Institute of Communication Systems at
RWTH Aachen University a phase reconstruction algorithm was
developed and this algorithm has already been evaluated in a
previous study for the case of smoothed magnitude spectra. It
was shown that the deep neural network (DNN) used can benefit
from targeted training on the smoothed magnitude spectra even
without further modification of the network structures. However,
even slight smearing of the magnitude spectra already leads to a
significant loss in performance compared to the use of perfect
magnitude spectra. In this work, therefore, the DNNs used are
optimized for the case of noise-reduced magnitude spectra.
Several deep learning-based models are introduced and compared
with each other and with the models already developed. Their
properties and aspects such as causality are addressed.
Moreover, a new loss function and assessment measure
specifically designed to estimate and assess the phase spectrum
of speech signals is developed and tested. In order to be able
to evaluate the results as independently as possible of a
specific type of noise reduction, ideal masks are developed,
used, and discussed.
All interested parties are
cordially invited, registration is not required.
-- Simone Sedgwick Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/