Kommunikationstechnisches Kolloquium am IKS | Communication Technology Colloquium at IKS - Kommunikationstechnik-Kolloquium - lists.rwth-aachen.de

18 Sep 2020

      --- Englich version below ---

Sehr geehrte Abonnenten des Kolloquium-Newsletters,

gerne informieren wir Sie über den nächsten Termin unseres 
Kommunikationstechnischen Kolloquiums.

*Freitag, 25. September 2020*
*Vortr**agende*: Liuhui Deng
*Zeit*: 11:00 Uhr
*Ort*: 
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09

         Meeting-ID: 979 0415 7921
         Passwort: 481650

*Master-Vortrag*: Speech Inpainting Using Image Processing Techniques

Speech inpainting is a task that reconstructs speech from damaged speech 
signals, wherein corruption can result from improper storage, packet 
loss in communication networks and etc. Neural networks are becoming an 
active research hot-spot in the field of audio inpainting in recent 
years, including speech inpainting, music inpainting and etc. The 
networks can either be fed waveforms of audio or other feature 
representations such as Short-Time Frequency Transform (STFT), Mel 
Frequency Cepstral Coefficients (MFCC) and etc. in order to reconstruct 
audio.

In this thesis, advanced Convolutional Neural Networks (CNNs) based 
architectures in image inpainting are adopted to the task speech 
inpainting. The motivation lie in the facts that the neural techniques 
in image inpainting are well investigated and turn out to be powerful, 
besides, the task speech inpainting can be interpreted as image 
inpainting when speech spectrogram is treated as 2-dimensional image. 
The involving networks are mainly context encoder, context encoder with 
Generative Adversarial Networks (GANs), EdgeConnect (w / o GANs) and 
EdgeConnect (with GANs).

In this work, context encoder is an encoder decoder architecture and 
takes as input STFT magnitudes (and ground truth corruption mask) while 
EdgeConnect is fed additionally edge map of spectrogram in order to 
alleviate the blurriness issue observed in image inpainting. EdgeConnect 
(w / o GANs) is composed of two sub-models, both of which are a context 
encoder. One sub-model is referred to as edge completion model which 
reconstructs edge map from corrupted edge map and the other is 
inpainting model which reconstructs spectrogram based on correupted 
spectrogram and edge map. GANs applied in the models of interest are 
also intended to mitigate the blurriness by adding adversarial loss from 
GANs to the loss function of context encoder, edge completion model and 
inpainting model. Experiments indicate that context encoder (w/ or w/o 
GANs) outperforms the CNNs which are simply stacking a few convolutional 
layers. EdgeConnect (w/ or w/o GANs) achieves even better performance 
than context encoder (w/ or w/o GANs) mainly thanks to additional 
informative edge map of spectrogram. The best model among them is 
EdgeConnect (with GANs), its reconstructed speeches achieve 3,03 in 
terms of PESQ score, 71,2% improvement compared to input corrupted 
speech. Besides, analyses of edge map quality in EdgeConnect (w/ or w/o 
GANs) reveal that edge map of low quality heavily degrades the 
inpainting performance, thus a well performing edge completion model is 
of great importance and is a promising direction to put more effort into 
in the future.

Alle Interessierten sind herzlich eingeladen, eine Anmeldung ist nicht 
erforderlich.

Allgemeine Informationen zum Kolloquium sowie eine aktuelle Liste der 
Termine des Kommunikationstechnischen Kolloquiums finden Sie unter:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium/

Dear subscribers of the colloquium newsletter,

we are happy to inform you about the next date of our communication 
technology colloquium.

*Friday, September 25, 2020*
*Speaker*: Liuhui Deng
*Time*: 11:00 a.m.
*Location*: 
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09

         Meeting-ID: 979 0415 7921
         Passwort: 481650

*Master Lecture*: Speech Inpainting Using Image Processing Techniques

Speech inpainting is a task that reconstructs speech from damaged speech 
signals, wherein corruption can result from improper storage, packet 
loss in communication networks and etc. Neural networks are becoming an 
active research hot-spot in the field of audio inpainting in recent 
years, including speech inpainting, music inpainting and etc. The 
networks can either be fed waveforms of audio or other feature 
representations such as Short-Time Frequency Transform (STFT), Mel 
Frequency Cepstral Coefficients (MFCC) and etc. in order to reconstruct 
audio.

In this thesis, advanced Convolutional Neural Networks (CNNs) based 
architectures in image inpainting are adopted to the task speech 
inpainting. The motivation lie in the facts that the neural techniques 
in image inpainting are well investigated and turn out to be powerful, 
besides, the task speech inpainting can be interpreted as image 
inpainting when speech spectrogram is treated as 2-dimensional image. 
The involving networks are mainly context encoder, context encoder with 
Generative Adversarial Networks (GANs), EdgeConnect (w / o GANs) and 
EdgeConnect (with GANs).

In this work, context encoder is an encoder decoder architecture and 
takes as input STFT magnitudes (and ground truth corruption mask) while 
EdgeConnect is fed additionally edge map of spectrogram in order to 
alleviate the blurriness issue observed in image inpainting. EdgeConnect 
(w / o GANs) is composed of two sub-models, both of which are a context 
encoder. One sub-model is referred to as edge completion model which 
reconstructs edge map from corrupted edge map and the other is 
inpainting model which reconstructs spectrogram based on correupted 
spectrogram and edge map. GANs applied in the models of interest are 
also intended to mitigate the blurriness by adding adversarial loss from 
GANs to the loss function of context encoder, edge completion model and 
inpainting model. Experiments indicate that context encoder (w/ or w/o 
GANs) outperforms the CNNs which are simply stacking a few convolutional 
layers. EdgeConnect (w/ or w/o GANs) achieves even better performance 
than context encoder (w/ or w/o GANs) mainly thanks to additional 
informative edge map of spectrogram. The best model among them is 
EdgeConnect (with GANs), its reconstructed speeches achieve 3,03 in 
terms of PESQ score, 71,2% improvement compared to input corrupted 
speech. Besides, analyses of edge map quality in EdgeConnect (w/ or w/o 
GANs) reveal that edge map of low quality heavily degrades the 
inpainting performance, thus a well performing edge completion model is 
of great importance and is a promising direction to put more effort into 
in the future.

All interested parties are cordially invited, registration is not required.

General information on the colloquium, as well as a current list of the 
dates of the communication technology colloquium can be found at:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium

-- 
Irina Ronkartz
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz@iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/

Kommunikationstechnisches Kolloquium am IKS | Communication Technology Colloquium at IKS

Irina Ronkartz

tags

participants (1)