-- English version below --
Sehr geehrte Abonnenten des Kolloquium-Newsletters,
gerne informieren wir Sie über den nächsten Termin unseres
Kommunikationstechnischen Kolloquiums.
*Montag, 28. September 2020*
*Vortragender:* Zach Lee
*Zeit:* 11:00 Uhr
*Ort*:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Bachelor-Vortrag*: Time-Varying Simulation of Adaptive Crosstalk
Cancellation Systems based on Acoustic Measurements
In order to reproduce binaural audio via loudspeakers, a crosstalk
cancellation (CTC) system is necessary to attenuate the crosstalk. The
task is becoming more challenging when the listener is allowed to move,
as the CTC system is often robust in a small and limited area. A recent
innovative idea known as the adaptive crosstalk cancellation (ACTC)
system is proposed. The CTC filter in this system is updated real-time
by placing mircophones close to the entrance of the ear canal to measure
the signals perceived by the listener and estimate the corresponding
head-related impulse response (HRIR).
The goal of this thesis is to evaluate the ACTC system by performing an
acoustic measurement in an anechoic chamber to obtain the real and
accurate HRIR of the test listener. With this knowledge, the performance
of the ACTC system can be evaluated in a simulation. Furthermore, the
limit of the ACTC system can be tested, in terms of how fast it can
adapt to changes.
The results show that the ACTC system has a fairly good performance when
the listener's ears are on the ipsilateral side of the loudspeakers and
the performance falls drastically on the contralateral side. In lower
frequencies, the ACTC system works quite well up to a movement (head
rotation) speed of 20 deg/s.
Alle Interessierten sind herzlich eingeladen, eine Anmeldung ist nicht
erforderlich.
Allgemeine Informationen zum Kolloquium sowie eine aktuelle Liste der
Termine des Kommunikationstechnischen Kolloquiums finden Sie unter:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium
Dear subscirbers of the colloquium newsletter,
we are happy to inform you about the next date of our communication
technology colloquium.
*Monday, September 28, 2020*
*Speaker*: Zach Lee
*Time:* 11:00 a.m.
*Location*:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Bachelor Lecture*: Time-Varying Simulation of Adaptive Crosstalk
Cancellation Systems based on Acoustic Measurements
In order to reproduce binaural audio via loudspeakers, a crosstalk
cancellation (CTC) system is necessary to attenuate the crosstalk. The
task is becoming more challenging when the listener is allowed to move,
as the CTC system is often robust in a small and limited area. A recent
innovative idea known as the adaptive crosstalk cancellation (ACTC)
system is proposed. The CTC filter in this system is updated real-time
by placing microphones close to the entrance of the ear canal to measure
the signals perceibed by the listener and estimate the corresponding
head-related impulse response (HRIR).
The goal of this thesis is to evaluate the ACTC system by performing an
acoustic measurement in an anechoic chamber to obtain the real and
accurate HRIR of the test listener. With this knowledge, the performance
of the ACTC system can be evaluated in a simulation. Furthermore, the
limit of the ACTC system can be tested, in terms of how fast it can
adapt to changes.
The results show that the ACTC system has a fairly good performance when
the listener's ears are on the ipsilateral side of the loudspeakers and
the performance falls drastically on the contralateral side. In lower
frequencies, the ACTC system works quite well up to a movement (head
rotation) speed of 20 deg/s.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of the
dates of the communication technology colloquium can be found at:
http://www.iks.rwth-aachen.de/aktulles/kolloquium
--
Irina Ronkartz
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
--- Englich version below ---
Sehr geehrte Abonnenten des Kolloquium-Newsletters,
gerne informieren wir Sie über den nächsten Termin unseres
Kommunikationstechnischen Kolloquiums.
*Freitag, 25. September 2020*
*Vortr**agende*: Liuhui Deng
*Zeit*: 11:00 Uhr
*Ort*:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Vortrag*: Speech Inpainting Using Image Processing Techniques
Speech inpainting is a task that reconstructs speech from damaged speech
signals, wherein corruption can result from improper storage, packet
loss in communication networks and etc. Neural networks are becoming an
active research hot-spot in the field of audio inpainting in recent
years, including speech inpainting, music inpainting and etc. The
networks can either be fed waveforms of audio or other feature
representations such as Short-Time Frequency Transform (STFT), Mel
Frequency Cepstral Coefficients (MFCC) and etc. in order to reconstruct
audio.
In this thesis, advanced Convolutional Neural Networks (CNNs) based
architectures in image inpainting are adopted to the task speech
inpainting. The motivation lie in the facts that the neural techniques
in image inpainting are well investigated and turn out to be powerful,
besides, the task speech inpainting can be interpreted as image
inpainting when speech spectrogram is treated as 2-dimensional image.
The involving networks are mainly context encoder, context encoder with
Generative Adversarial Networks (GANs), EdgeConnect (w / o GANs) and
EdgeConnect (with GANs).
In this work, context encoder is an encoder decoder architecture and
takes as input STFT magnitudes (and ground truth corruption mask) while
EdgeConnect is fed additionally edge map of spectrogram in order to
alleviate the blurriness issue observed in image inpainting. EdgeConnect
(w / o GANs) is composed of two sub-models, both of which are a context
encoder. One sub-model is referred to as edge completion model which
reconstructs edge map from corrupted edge map and the other is
inpainting model which reconstructs spectrogram based on correupted
spectrogram and edge map. GANs applied in the models of interest are
also intended to mitigate the blurriness by adding adversarial loss from
GANs to the loss function of context encoder, edge completion model and
inpainting model. Experiments indicate that context encoder (w/ or w/o
GANs) outperforms the CNNs which are simply stacking a few convolutional
layers. EdgeConnect (w/ or w/o GANs) achieves even better performance
than context encoder (w/ or w/o GANs) mainly thanks to additional
informative edge map of spectrogram. The best model among them is
EdgeConnect (with GANs), its reconstructed speeches achieve 3,03 in
terms of PESQ score, 71,2% improvement compared to input corrupted
speech. Besides, analyses of edge map quality in EdgeConnect (w/ or w/o
GANs) reveal that edge map of low quality heavily degrades the
inpainting performance, thus a well performing edge completion model is
of great importance and is a promising direction to put more effort into
in the future.
Alle Interessierten sind herzlich eingeladen, eine Anmeldung ist nicht
erforderlich.
Allgemeine Informationen zum Kolloquium sowie eine aktuelle Liste der
Termine des Kommunikationstechnischen Kolloquiums finden Sie unter:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium/
Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our communication
technology colloquium.
*Friday, September 25, 2020*
*Speaker*: Liuhui Deng
*Time*: 11:00 a.m.
*Location*:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master Lecture*: Speech Inpainting Using Image Processing Techniques
Speech inpainting is a task that reconstructs speech from damaged speech
signals, wherein corruption can result from improper storage, packet
loss in communication networks and etc. Neural networks are becoming an
active research hot-spot in the field of audio inpainting in recent
years, including speech inpainting, music inpainting and etc. The
networks can either be fed waveforms of audio or other feature
representations such as Short-Time Frequency Transform (STFT), Mel
Frequency Cepstral Coefficients (MFCC) and etc. in order to reconstruct
audio.
In this thesis, advanced Convolutional Neural Networks (CNNs) based
architectures in image inpainting are adopted to the task speech
inpainting. The motivation lie in the facts that the neural techniques
in image inpainting are well investigated and turn out to be powerful,
besides, the task speech inpainting can be interpreted as image
inpainting when speech spectrogram is treated as 2-dimensional image.
The involving networks are mainly context encoder, context encoder with
Generative Adversarial Networks (GANs), EdgeConnect (w / o GANs) and
EdgeConnect (with GANs).
In this work, context encoder is an encoder decoder architecture and
takes as input STFT magnitudes (and ground truth corruption mask) while
EdgeConnect is fed additionally edge map of spectrogram in order to
alleviate the blurriness issue observed in image inpainting. EdgeConnect
(w / o GANs) is composed of two sub-models, both of which are a context
encoder. One sub-model is referred to as edge completion model which
reconstructs edge map from corrupted edge map and the other is
inpainting model which reconstructs spectrogram based on correupted
spectrogram and edge map. GANs applied in the models of interest are
also intended to mitigate the blurriness by adding adversarial loss from
GANs to the loss function of context encoder, edge completion model and
inpainting model. Experiments indicate that context encoder (w/ or w/o
GANs) outperforms the CNNs which are simply stacking a few convolutional
layers. EdgeConnect (w/ or w/o GANs) achieves even better performance
than context encoder (w/ or w/o GANs) mainly thanks to additional
informative edge map of spectrogram. The best model among them is
EdgeConnect (with GANs), its reconstructed speeches achieve 3,03 in
terms of PESQ score, 71,2% improvement compared to input corrupted
speech. Besides, analyses of edge map quality in EdgeConnect (w/ or w/o
GANs) reveal that edge map of low quality heavily degrades the
inpainting performance, thus a well performing edge completion model is
of great importance and is a promising direction to put more effort into
in the future.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of the
dates of the communication technology colloquium can be found at:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Ronkartz
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/