--- Englich version below ---
Sehr geehrte Abonnenten des Kolloquium-Newsletters,
gerne informieren wir Sie über den nächsten Termin unseres Kommunikationstechnischen Kolloquiums.
Freitag, 25. September 2020
Vortragende: Liuhui Deng
Zeit: 11:00 Uhr
Ort: https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
Master-Vortrag: Speech Inpainting Using Image Processing Techniques
Speech inpainting is a task that reconstructs speech from damaged speech signals, wherein corruption can result from improper storage, packet loss in communication networks and etc. Neural networks are becoming an active research hot-spot in the field of audio inpainting in recent years, including speech inpainting, music inpainting and etc. The networks can either be fed waveforms of audio or other feature representations such as Short-Time Frequency Transform (STFT), Mel Frequency Cepstral Coefficients (MFCC) and etc. in order to reconstruct audio.
In this thesis, advanced Convolutional Neural Networks (CNNs) based architectures in image inpainting are adopted to the task speech inpainting. The motivation lie in the facts that the neural techniques in image inpainting are well investigated and turn out to be powerful, besides, the task speech inpainting can be interpreted as image inpainting when speech spectrogram is treated as 2-dimensional image. The involving networks are mainly context encoder, context encoder with Generative Adversarial Networks (GANs), EdgeConnect (w / o GANs) and EdgeConnect (with GANs).
In this work, context encoder is an encoder
decoder architecture and takes as input STFT magnitudes (and
ground truth corruption mask) while EdgeConnect is fed
additionally edge map of spectrogram in order to alleviate the
blurriness issue observed in image inpainting. EdgeConnect (w /
o GANs) is composed of two sub-models, both of which are a
context encoder. One sub-model is referred to as edge completion
model which reconstructs edge map from corrupted edge map and
the other is inpainting model which reconstructs spectrogram
based on correupted spectrogram and edge map. GANs applied in
the models of interest are also intended to mitigate the
blurriness by adding adversarial loss from GANs to the loss
function of context encoder, edge completion model and
inpainting model. Experiments indicate that context encoder (w/
or w/o GANs) outperforms the CNNs which are simply stacking a
few convolutional layers. EdgeConnect (w/ or w/o GANs) achieves
even better performance than context encoder (w/ or w/o GANs)
mainly thanks to additional informative edge map of spectrogram.
The best model among them is EdgeConnect (with GANs), its
reconstructed speeches achieve 3,03 in terms of PESQ score,
71,2% improvement compared to input corrupted speech. Besides,
analyses of edge map quality in EdgeConnect (w/ or w/o GANs)
reveal that edge map of low quality heavily degrades the
inpainting performance, thus a well performing edge completion
model is of great importance and is a promising direction to put
more effort into in the future.
Alle Interessierten sind herzlich eingeladen, eine Anmeldung ist nicht erforderlich.
Allgemeine Informationen zum Kolloquium sowie
eine aktuelle Liste der Termine des Kommunikationstechnischen
Kolloquiums finden Sie unter:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium/
Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our communication technology colloquium.
Friday, September 25, 2020
Speaker: Liuhui Deng
Time: 11:00 a.m.
Location: https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
Master Lecture: Speech Inpainting Using Image Processing Techniques
Speech inpainting is a task that reconstructs speech from damaged speech signals, wherein corruption can result from improper storage, packet loss in communication networks and etc. Neural networks are becoming an active research hot-spot in the field of audio inpainting in recent years, including speech inpainting, music inpainting and etc. The networks can either be fed waveforms of audio or other feature representations such as Short-Time Frequency Transform (STFT), Mel Frequency Cepstral Coefficients (MFCC) and etc. in order to reconstruct audio.
In this thesis, advanced Convolutional Neural Networks (CNNs) based architectures in image inpainting are adopted to the task speech inpainting. The motivation lie in the facts that the neural techniques in image inpainting are well investigated and turn out to be powerful, besides, the task speech inpainting can be interpreted as image inpainting when speech spectrogram is treated as 2-dimensional image. The involving networks are mainly context encoder, context encoder with Generative Adversarial Networks (GANs), EdgeConnect (w / o GANs) and EdgeConnect (with GANs).
In this work, context encoder is an encoder
decoder architecture and takes as input STFT magnitudes (and
ground truth corruption mask) while EdgeConnect is fed
additionally edge map of spectrogram in order to alleviate the
blurriness issue observed in image inpainting. EdgeConnect (w /
o GANs) is composed of two sub-models, both of which are a
context encoder. One sub-model is referred to as edge completion
model which reconstructs edge map from corrupted edge map and
the other is inpainting model which reconstructs spectrogram
based on correupted spectrogram and edge map. GANs applied in
the models of interest are also intended to mitigate the
blurriness by adding adversarial loss from GANs to the loss
function of context encoder, edge completion model and
inpainting model. Experiments indicate that context encoder (w/
or w/o GANs) outperforms the CNNs which are simply stacking a
few convolutional layers. EdgeConnect (w/ or w/o GANs) achieves
even better performance than context encoder (w/ or w/o GANs)
mainly thanks to additional informative edge map of spectrogram.
The best model among them is EdgeConnect (with GANs), its
reconstructed speeches achieve 3,03 in terms of PESQ score,
71,2% improvement compared to input corrupted speech. Besides,
analyses of edge map quality in EdgeConnect (w/ or w/o GANs)
reveal that edge map of low quality heavily degrades the
inpainting performance, thus a well performing edge completion
model is of great importance and is a promising direction to put
more effort into in the future.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well
as a current list of the dates of the communication technology
colloquium can be found at:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium
-- Irina Ronkartz Institute of Communication Systems (IKS) RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26958 (phone) ronkartz@iks.rwth-aachen.de http://www.iks.rwth-aachen.de/