Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication
Technology Colloquium.
*Wednesday, January 18, 2022*
*Speaker*: Konstantin Wehmeyer
*Time:* 2:00 p.m.
*Location*: hybrid - Lecture room 4G and
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Bachelor-Lecture: ***Deep Learning-Based Speech Synthesis as
Post-Processing of a Noise Reduction
/Audio and speech signals are often disturbed by noise signals in
frequency- and/or time-limited parts. To attenuate or remove these
distortions, several methods, including deep learning- based approaches,
are known. Often, however, only the magnitude spectrum is processed and
the phase spectrum is taken over unchanged due to its comparatively
lower relevance. Consequently, the noisy phase is reused when
synthesizing the waveform from the processed magnitude spectrum.
Therefore, distortions in the magnitude spectrum can be reduced, but not
in the phase spectrum which inevitably leads to a deterioration in
speech quality and intelligibility./
/This thesis presents methods that allow a reconstruction of the phase
spectrum of speech signals based on noise-reduced magnitude spectra. At
the Institute of Communication Systems at RWTH Aachen University a phase
reconstruction algorithm was developed and this algorithm has already
been evaluated in a previous study for the case of smoothed magnitude
spectra. It was shown that the deep neural network (DNN) used can
benefit from targeted training on the smoothed magnitude spectra even
without further modification of the network structures. However, even
slight smearing of the magnitude spectra already leads to a significant
loss in performance compared to the use of perfect magnitude spectra. In
this work, therefore, the DNNs used are optimized for the case of
noise-reduced magnitude spectra. //
/
/Several deep learning-based models are introduced and compared with
each other and with the models already developed. Their properties and
aspects such as causality are addressed. Moreover, a new loss function
and assessment measure specifically designed to estimate and assess the
phase spectrum of speech signals is developed and tested. In order to be
able to evaluate the results as independently as possible of a specific
type of noise reduction, ideal masks are developed, used, and discussed.
/
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of
dates of the Communication Technology Colloquium can be found at:
https://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Simone Sedgwick
Institute of Communication Systems(IKS)
Prof. Dr.-Ing. Peter Jax
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26956(phone)
+49 241 80 22254(fax)
sedgwick(a)iks.rwth-aachen.de
https://www.iks.rwth-aachen.de/
Dear subscribers of the Colloquium Newsletter,
we are happy to inform you about the next date of our Communication
Technology Colloquium.
*Wednesday, 30. November 2022**
**Speaker:* Frederick Pietschmann
*Time*: 14:00
*Location:* Lecture room 4G and
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Lecture*: Perceptual Optimization and Evaluation of a Binaural
Signal Modification Algorithm
With individualized binaural signals, it is possible to reproduce
auditory scenes such that the signal is perceived similar to the real
scene. However, perceptual similarity is no longer achieved when the
binaural signal doesn’t fully adapt to different listeners and different
orientations of the listener’s head. To address these problems, a
perceptually motivated algorithm referred to as the Binaural Cue
Adaptation (BCA) system has been developed at the Institute of
Communication Systems. The BCA system is capable of adding both
interactivity and individualization to existing binaural signals,
thereby achieving a higher degree of perceptual similarity to a
corresponding real auditory scene.
In this thesis, a perceptual optimization of the existing BCA system is
conducted in that new approaches for some components of the algorithm
are proposed, all parametrization options are identified and the overall
best parametrization is chosen. To identify the best parametrization,
both an isolated analysis of individual components is conducted and a
perceptually motivated optimization procedure for a full system analysis
is proposed and implemented.
Finally, a perceptual evaluation based on the result of the perceptual
optimization is realized. For this, two listening tests with a total
number of 17 participants are conducted – one for a normal and one for a
highly reverberant scenario. The results of these listening tests
suggest that signals produced by the optimized BCA system achieve a high
degree of perceptual plausibility for both reverberation scenarios, with
an averaged 2AFC probability to detect a BCA-generated signal of 0.563
for the normal scenario and 0.604 for the highly reverberant scenario.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of
dates of the Communication Technology Colloquium can be fount at:
https://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
esser(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
Sehr geehrte Abonnenten des Kolloquium-Newsletters.
gerne informieren wir Sie über den nächsten Termin unseres
Kommunikationstechnischen Kolloquiums.
*Mittwoch, 23. November 2022*
*Vortragender: *Philipp Tigges
*Zeit: *11:00 Uhr
*O**rt*: hybrid - Hörsaal 4G und
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Vortrag*: Charakterisierung der Körperschallübertragung von
Sprache für Hearables
Körperschall ist ein wesentlicher Bestandteil bei der Wahrnehmung der
eigenen Stimme. Das Übertragungsverhalten von Körperschall ist deutlich
komplexer als bei Luftschall, da verschiedenste Gewebearten mit
unterschiedlichen Eigenschaften durchlaufen werden. Da die Erzeugung von
Sprachlauten ein vielschichtiger Prozess ist und die Bildung eines
spezifischen Klangs auch mit dem Erzeugungsort zusammenhängt, liegt die
Annahme eines phonemspezifischen Übertragungsverhaltens nahe.
Ziel der Arbeit ist es, die Übertragung von Sprache über Körperschall zu
charakterisieren, um die Eigenstimmwahrnehmung bei der Benutzung von
Hearables zu verbessern. Für eine Auswahl an Phonemen wurde der
Körperschall per Beschleunigungssensor und der Schall im inneren des
Ohrkanal über ein Mikrofon gemessen. Die beiden Signale wurden dann auf
lineare Zusammenhänge hin untersucht. Auf den Signalen wurden dann
Filter ausgelegt, um aus dem Signal des Sensors das Signal im Ohrkanal
zu schätzen und dann den Okklusionseffekt zu reduzieren. Es hat sich
gezeigt, dass zwischen dem Signal des Beschleunigungssensors und dem
Signal im Ohrkanal ein linearer Zusammenhang gefunden werden kann. Das
Übertragungsverhalten der verschiedenen Phoneme unterscheidet sich,
jedoch sind die Unterschiede klein genug, so dass sich Filter zur
Okklusionsreduktion auch auf andere Laute anwenden lassen. Auch für
gesprochenen Text können Filter gefunden werden, die den
Okklusionseffekt effektiv reduzieren können. Zusätzlich wurden auch
einige Geräusche, die sich auch über Körperschall ausbreiten,
analysiert. Hier war es deutlich schwerer lineare Zusammenhänge zu
finden. Im weiteren Verlauf wurden dann die Rahmenbedingungen
realistischer gestaltet, indem der Sekundärpfad mit einbezogen wurde.
Außerdem fordern reale Anwendungen kausalen Filtern. Diese zeigten sich
ähnlich performant, wie ihre akausalen Äquivalente, jedoch mit starken
Überhöhungen vor allem zu hohen Frequenzen, da sie nah an der
Kausalitätsgrenze arbeiten müssen. Im letzten Teil der Analyse konnten
erste Einblicke in die Frage, ob die erstellten Filter eine ausgeprägte
Abhängigkeit von der jeweiligen Person haben, gewonnen werden. Diese
Abhängigkeit hat sich insbesondere bei den auf einzelnen Phonemen
ausgelegten Filtern gezeigt.
Alle Interessierten sind herzlich eingeladen, eine Anmeldung ist nicht
erforderlich.
Allgemeine Informationen zum Kolloquium sowie eine aktuelle Liste der
Termine des Kommunikationstechnischen Kolloquiums finden Sie unter:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
Sehr geehrte Abonnenten des Kolloquium-Newsletters,
gerne informieren wir Sie über den nächsten Termin unseres
Kommunikationstechnischen Kolloquiums.
*Mittwoch, 16. November 2022*
*Vortragende:* Jana Lorenz
*Zeit*: 09:00 Uhr
*Ort*: hybrid - Hörsaal 4G und
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Bachelor-Vortrag:* Vergleich adaptiver Algorithmen für die aktive
Störgeräuschkompensation bei Kopfhörern in komplexen Schallfeldern
Lärmbelästigung ist ein alltägliches Problem. Es gibt viele Ansätze
diese zu reduzieren. Kopfhörer mit aktiver Störgeräuschunterdrückung
(ANC, engl. Active Noise Cancellation) sind einer davon. Sie
kompensieren vor allem tieffrequente Geräusche, welche passiv nur
unzureichend gedämpft werden können. Dabei hat das Störgeräusch einen
großen Einfluss auf die aktiven Kompensationsmöglichkeiten. Dieses kann
sich in realen Umgebungen bezogen auf seine Einfallsrichtung oder seine
statistischen Eigenschaften verändern und somit die aktive
Störgeräuschunterdrückung erschweren. Daher sind adaptive
Feedforward-Algorithmen von großer Bedeutung, da diese sich
kontinuierlich an das sich verändernde Störgeräusch anpassen können. So
werden in dieser Arbeit zwei ANC-Filterstrukturen, der Filtered-x Least
Mean Square Algorithmus (FxLMS) und der Adaptive Linear Combiner (ALC),
betrachtet. Es wird untersucht, wie sich die feste adaptiv gewichtete
Parallelfilterstruktur des ALC, verglichen zum FxLMS-Algorithmus, auf
das Konvergenzverhalten und die aktive Dämpfung auswirkt.
In dieser Arbeit werden beide Algorithmen anhand von Schallfeldern
verschiedener Komplexität verglichen. Es werden ruhende, sich räumlich
bewegende und komplexere Schallfelder verwendet. Dabei erreicht die
ALC-Filterstruktur, verglichen zum FxLMS-Algorithmus, auf Kosten einer
geringeren maximal möglichen aktiven Dämpfung, eine schnellere
Konvergenzzeit.
Alle Interessierten sind herzlich eingeladen, eine Anmeldung ist nicht
erforderlich.
Allgemeine Informationen zum Kolloquium, sowie eine aktuelle Liste der
Termine des Kommunikationstechnischen Kolloquiums finden Sie unter:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
esser(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
Sehr geehrte Abonnenten des Kolloquium-Newsletters,
gerne informieren wir Sie über die nächsten Termine unseres
Kommunikationstechnischen Kolloquiums.
*Mittwoch, 9. November 2022*
*Vortragender:* Nils Lattasch
*Zeit*: 14:00 Uhr
*Ort:* hybrid - Hörsaal 4G und
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Bachelor-Vortrag*: Koordinatentranformation für die Adaptive Aktive
Störgeräuschunterdrückung in Kopfhörern
Bei der aktiven Störgeräuschkompensation wird versucht, ein Störsignal
mit einem destruktiv interferierenden Signal auszulöschen. Dazu wird in
dieser Arbeit die feedforward Topologie verwendet. Für diese kann
gezeigt werden, dass das optimale Filter ein IIR Filter impliziert. Da
die Störgeräusche in der Praxis facettenreich sind, ist es ebenfalls
sinnvoll adaptive Lösungen zu verwenden. In etablierten adaptiven
Verfahren wie dem FxLMS Algorithmus kann versucht werden, ein FIR Filter
mit möglichst vielen Filterkoeffizienten zu erstellen, um eine lange
Impulsantwort zu generieren. Gleichzeitig ergibt sich jedoch ein Problem
in der Adaption, da Systeme in vielen Variablen, in diesem Fall viele
Filterkoeffizienten, langsam adaptiert werden können. Eine mögliche
Lösung des Problems stellt eine Koordinatentransformation dar, bei der
zeitinvariante IIR Filter, Teil eines adaptiven Gesamtsystems sind.
Somit kann die Anzahl der Filterkoeffizienten reduziert werden und
zusätzlich besitzt das sich ergebende Filter eine unendliche Impulsantwort.
Ziel dieser Arbeit ist es eine Koordinatentransformation für die
adaptive aktive Störgeräuschkompensation zu realisieren. Die
Koordinatentransformation stellt durch eine frei wählbare Polstelle
einen neuen Freiheitsgrad im Design eines ANC-Systems dar. In dieser
Arbeit wird die Performance und der Ressourcenverbrauch der
Koordinatentransformation systematisch untersucht. Zur Implementierung
wird das sogenannte Laguerre-Netzwerk verwendet. Daraus ergeben sich
neue Adaptionsvorschriften für das adaptive System, welche in dieser
Arbeit hergeleitet werden. Zudem wird untersucht, welche Polstellen in
dem Anwendungsfall ANC vorteilhaft sind. Des Weiteren wird untersucht,
wie eine Konvergenzgrenze bezüglich der maximalen Schrittweite
abgeschätzt werden kann. Abschließende Experimente erfolgen auf Basis
von Messdaten am Beispiel eines gängigen in-ear Kopfhörers. Es zeigt
sich, dass die Performance sensitiv gegenüber der Wahl der Polstelle und
Schrittweite ist, bei geeigneter Wahl jedoch die Anzahl der
Filterkoeffizienten um ein vielfaches reduziert werden können.
und
*Mittwoch, 9. November 2022*
*Vortragender:* Christian Wolf
*Zeit:* 15:0 Uhr
*Ort*: hybrid - Hörsaal 4G und
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Bachelor-Vortrag*: Auslegung von Übersprechkompensationssystemen
mittels Verfahren der robusten Regelung
Das Abspielen von Binauralsignalen über Lautsprecher ist, im Gegensatz
zur Verwendung von Kopfhörern, durch akustisches Übersprechen nachteilig
beeinflusst. Das Übersprechen kann jedoch durch geeignete Vorfilterung
des Binauralsignals so modifiziert werden, das es effektiv unterdrückt
wird. Dies ist das grundsätzliche Prinzip der Übersprechkompensation.
Das Vorfilter wird dabei als Übersprechkompensationsfilter bezeichnet.
Eine gängige Methode zur Auslegung eines Übersprechkompensationsfilters
ist die Methode der kleinsten Quadrate im Zeitbereich.
Diese Arbeit stellt ein Vorgehen vor, mit dem
Übersprechkompensationsfilter mithilfe von Verfahren aus der robusten
Regelung, wie zum Beispiel der H2- oder H∞-Synthese, entworfen werden
können. Zunächst wird gezeigt, wie sich die Übersprechkompensation als
Problem der robusten Regelung modellieren lässt, sodass Lösungsverfahren
wie die H2- oder H∞-Synthese auf das Problem angewendet werden können.
Das allgemeine Problem wird um wählbare Gewichtungsfunktionen erweitert,
sodass die Performance des Systems gezielt und frequenzselektiv
beeinflusst werden kann. Zudem wird eine Regularisierung bezüglich der
Amplitudengänge des Übersprechkompensationsfilters vorgesehen, mit der
die Verstärkung des Übersprechkompensationsfilters auf ein in der Praxis
sinnvolles Maß reduziert werden kann. Weiterhin wird untersucht, welchen
Einfluss die Latenz des akustischen Systems auf die Performance hat und
wie dies im Design berücksichtigt werden sollte. Im Anschluss daran wird
der erarbeitete Ansatz mit der H2- und H∞-Synthese mit der Methode der
kleinsten Quadrate im Zeitbereich verglichen. Durch eine theoretische
Herleitung und eine Simulation wird gezeigt, dass die H2-Synthese und
die Methode der kleinsten Quadrate im Zeitbereich direkt zusammenhängen.
Insbesondere scheint sich die Methode der kleinsten Quadrate im
Zeitbereich für große Längen des Übersprechkompensationsfilters den
Ergebnissen mit der H2-Synthese anzunähern. Ein weiterer
praxisrelevanter Aspekt, der in der Arbeit genauer beleuchtet wird, ist
die Unsicherheit im akustischen System und deren Einfluss auf die
Performance. Allgemein ist die Unsicherheit in praktischen Systemen
verschiedenen Ursachen geschuldet. In dieser Arbeit wird die
Unsicherheit durch die Variation von Kopfdrehung und der Geometrie
verschiedener Zuhörer genauer untersucht. Es wird gezeigt, dass die
Performance im Allgemeinen nicht robust gegenüber den untersuchten
Variationen ist, jedoch bei tiefen Frequenzen weniger signifikant als
bei hohen Frequenzen.
Alle Interessierten sind herzlich eingeladen, eine Anmeldung ist nicht
erforderlich.
Allgemeine Informationen zum Kolloquium, sowie eine aktuelle Liste der
Termine des Kommunikationstechnischen Kolloquiums finden Sie unter:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication
Technology Colloquium.
*Wednesday, October 12, 2022*
*Speaker*: Christian Prick
*Time:* 10:00 a.m.
*Location*: hybrid - Lecture room 4G and
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Bachelor-Lecture*: Adaptive Active Noise Cancellation for Headphones
using Virtual Sensing
In recent years, active noise canceling headphones became widespread
because they can attenuate low-frequency noise and therefore complement
the passive dampening characteristics of headphones. In order to further
improve their performance, methods to increase the attenuation at higher
frequencies are of interest. As the main limiting factor for high
frequency attenuation is the transfer characteristics from the error
microphone to the eardrum, virtual sensing algorithms have to be used.
These require a knowledge of the acoustic paths to the eardrum, which
are influenced by variations of different kinds.
This thesis investigates the influences of the direction of arrival
(DoA) and fit on the virtual sensing paths and proposes methods for
their approximation. Furthermore, a combined method is proposed, which
takes both the fit and DoA into account to approximate the virtual
sensing paths at runtime. The resulting performance improvements are
evaluated for both in- and over-ear headphones using dummy head
measurements and acoustic simulations. The results indicate that the
influence of the DoA and fit depend on the type of the headphones. The
approximations thus achieve different performance improvements, with the
fit generally having a larger impact. For both types of headphones, the
combined method is able to improve attenuation.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of
dates of the Communication Technology Colloquium can be found at:
https://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
Dear subcribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication
Technology Colloquium.
*Friday, September 2, 2022*
*Speake**r:* Vitor Horst Duque
*Time*: 10:00 a.m.
*Location*: hybrid - Lecture room 4G and
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Lecture:* Autoencoder based Waveform Enhancement for Software
Defined Radio Applications
Deep learning is a trend that has made it's way into communication
systems and gained a lot of interest already. this presentation
discusses a machine learning-based approach to add novel features to
existing waveform implementations for software defined radios. Such
additional features include the reduction of the peak-to-average power
ratio, spread spectrum abilities and interference mitigation. The
developed machine learning structures use autoencoders to add these
features. The results indicate that realizing additional features by
means of autoencoders is a promising concept for future waveform
development.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of
dates of the Communication Technology Colloquium can be found at:
https://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
esser(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication
Technology Colloquium.
*Monday, June 13, 2022*
*Speaker*: Zihang Wei
*Time*: 02:00 p.m.
*Location*: hybrid - Lecture room 4G and
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Lecture*: Investigations on Supervised System Identification
Algorithms
The system identification task aims at inferring the room impulse
response of a specific acoustic enclosure. System identification is
mandatory in applications such as acoustic echo cancellation and cross
talk cancellation. Traditional gradient-based algorithms such as
normalized least mean square algorithm uses FIR filters to estimate the
RIRs, unfortunately, in a relatively high dimension. A novel dual-stage
algorithm is proposed in this thesis. The algorithm performs a state
update where allowed states are located on a manifold. In a first stage,
an undercomplete autoencoder is trained over the RIR data set. In the
second stage, we perform the system identification tasks. Here the
problem is reformulated such that the latent state is updated instead of
the full impulse response. The trained decoder is then exploited to
transform the latent variables to a proper impulse response. Evaluation
is made between the reconstructed RIR with reference to the true RIR.
In this thesis, at first, the simulation framework generates RIR data
set. Then autoencoders with different layer setups are trained on the
generated data set. The qualified autoencoders are employed in the
inference stage to perform the system identification tasks. Two crucial
parameters, i. e., the latent dimension size and updating step size of
the manifold are investigated under different SNR conditions. It is
demonstrated that under noisy conditions, the proposed method
outperforms the traditional NLMS approach. Evaluation results also show
that lower bottleneck size design benefits the system identification
with adverse noise
and
*Monday, June 13, 2022*
*Speaker*: Johannes Imort
*Time*: 03:00 p.m.
*Location*: hybrid - Lecture room 4G and
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Lecture*: Online Learning of Loudspeaker Nonlineraities for
Acoustic Echo Cancellation
Hands-free communication is pervasive throughout modern society,
requiring a robust cancellation of the echo from the far end speech
signal that is emitted from the loudspeaker to the microphone. Acoustic
echo cancellation addresses this issue typically by employing linear
adaptive filters. However, in reality, the echo path is nonlinear due to
the non-ideal characteristics of acoustic transducers and power
amplifiers operated at their physical limits.
This thesis introduces and investigates a novel approach to tackle
nonlinear AEC by estimating the nonlinear reference signal using a deep
neural network and a differentiable Kalman filter. The hybrid system is
designed to learn loudspeaker nonlinearities directly from data,
enabling end-to-end training on data composed of pairs of the far end
reference and near end microphone signals. In contrast to previous
neural network-based solutions that have been tailored toward one
particular loudspeaker, the proposed system aims to be generalizable for
different loudspeaker nonlinearities. Therefore, inspired by linear
adaptive filtering, the recurrent architecture explicitly takes
advantage of the information in the residual echo in order to estimate
the nonlinearity adaptively.
The proposed approach was evaluated for both simulated and measured
data. The results indicate that the architecture could enable faster
convergence and better steady state performance than related adaptive
approaches. Furthermore, some examples demonstrated that the performance
of a method that makes use of oracle knowledge could be surpassed,
evidently because the models adapt to the linear acoustic echo path, too.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of
dates of the Communication Technology Colloquium can be found at:
https://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication
Technology Colloquium.
*Wednesday, June 1, 2022*
*Speaker*: Pattaratorn Santivarakom
*Time:* 11:15 a.m.
*Location*: hybrid - Lecture room 4G and
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Bachelor-Lecture*: Sound Source Localization Using Binaural Signals
Binaural signals are similar to the sounds that humans hear with their
left and right ears when the signal is recorded with a suitable device.
As the human auditory system is able to localize sound sources, the
directions of sound sources can also be found in binaural signals.
This thesis addresses the performance of sound source localization
algorithms using binaural signals. The algorithms regarded in this
thesis are based on the concept of beamforming. Several algorithms from
the literature are systematically compared and their basic building
blocks are recombined. From this, a total of 26 variants are identified.
The performance of an algorithm is determined by comparing the accuracy
of the estimated source source’s directions with other algorithms.
Furthermore, error metrics are suggested that take into account the
varying signal powers in different time-frequency bins. According to the
results, there are two algorithms that always estimate the same source
source’s directions and they also provide the most accurate source
source’s directions in most situations.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of
dates of the Communication Technology Colloquium can be found at:
https://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
esser(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/
Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication
Technology Colloquium.
*Monday, April 25, 2022*
*Speaker*: Alexander Sobolew
*Time*: 03:00 p.m.
*Location*:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Lecture*: Investigation of Specialized Recurrent Units for
Acoustic Echo Cancellation
In today's communication, hands-free devices e.g. remote communication
are widely used. Without further action, these would suffer from an
acoustic echo that arises from the coupling between speaker and
microphone. To minimize these disturbances, acoustic echo cancellation
is indispensable. Model-based adaptive algorithms exist to solve this
issue. However, they require careful tuning of parameters whose optimum
differs between devices and acoustic situations.
In this thesis, a new data-driven approach for acoustic echo
cancellation is developed and investigated. In contrast to the purely
model-based approach, the algorithm is supposed to learn the optimal
performance from data without the necessity to be tuned. In new
situations, the unknown parameters should be estimated. At its core, the
novel structure is similar to a frequency adaptive filter. However, it
is extended by the gating mechanism known from recurrent neural
networks. The development also includes the determination of optimal
training paradigms. When choosing the model structure, attention is paid
to a reasonable training complexity. Major challenges in this thesis
include the investigation of the gating mechanism, which is represented
by a learn gate and a reset gate. The prior is used to estimate a
time-varying step size of the iterative algorithm. Gated Recurrent Units
provide an internal memory to accommodate the sequential information in
speech, while skip connections optimize the gradient flow during
training. Independent use of a reset gate to reset the impulse response
estimation in case of situation change is outperformed by exploitation
of weight sharing. Using weight sharing, the learn and reset gates have
direct information about each other's behavior due to a shared partial
network. It was shown that when using backpropagation through time, the
truncation order can be reduced to a certain extent, which reduced the
training complexity but did not decrease the performance. The developed
model outperforms the tuned Fast Block Normalized Least-Mean-Square
algorithm in reconvergence speed and steady-state performance in far-end
single talk and double talk. Furthermore, our model repeatedly
outperforms the tuned diagonalized Kalman filter in certain scenarios
and offers significantly improved overall performance in single talk.
and
*Monday, April 25, 2022*
*Speaker*: Alexej Sobolew
*Time*: 04:00 p.m.
*Location*:
https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921
Passwort: 481650
*Master-Lecture*: Investigation of Generative Neural Networks for Speech
Enhancement
Speech enhancement aims to reduce noise in speech signals and is widely
used in hearing aids and mobile speech communication. Speech synthesis,
on the other hand, aims to generate high-quality human speech and is
used, e. g., in text-to-speech generation. Noise reduction and speech
synthesis can be combined since conventional noise reduction methods
often only improve the magnitude spectrum and keep the noisy phase.
However, the phase has an important influence on speech quality and
intelligibility. In addition, training neural networks with complex
spectrograms is more difficult, so it is reasonable to first denoise the
magnitude spectrum and subsequently synthesize the waveform of the
speech based on it. Mentioned applications often require low execution
times and low computational overhead. This is achievable by exploiting
parallel processors using the non-autoregressive property and by
reducing the number of parameters in the neural network. Hence, this
thesis aims to investigate noise reduction, speech synthesis, and joint
interaction.
In this thesis, the first use case considered is phase reconstruction
and speech synthesis based on clean data. A non-autoregressive
three-stage speech enhancement system is developed for the second use
case of combined noise reduction and speech synthesis based on magnitude
spectra. For speech synthesis on clean magnitude spectra such as
mel-spectrograms, the neural network called WaveGlow from Nvidia is
taken as a basis. WaveGlow achieves similar subjective performance as
the Griffin-Lim algorithm but is better suited for fast applications.
For the reduction of parameters in WaveGlow, the SqueezeWave is used,
resulting in a decrease in the number of parameters and the inference
time by up to 70%. During the switch to additional noise reduction
besides speech synthesis, it is shown that WaveGlow alone is not
suitable for performing both tasks simultaneously. Consequently, the
problem is divided into three stages: masking, inpainting, and
synthesis. The models for masking and inpainting are adapted to
mel-spectrograms and studied in detail for noise reduction. As result,
they are able to reduce the noise significantly. It is worth noting that
in this thesis the mel-filter is used as downsampling adapted to human
perception due to its non-linear property, resulting
in a reduction of the number of computations in the first two models.
Subsequently, the performance of the entire three-stage speech
enhancement system is investigated. It improves the speech quality and
intelligibility of noisy data while exploiting parallel processors and
can compete with existing state-of-the-art methods. The system achieves
better noise reduction than the Convolutional Recurrent Network (CRN)
and additionally does not rely on the noisy phase.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of
dates of the Communication Technology Colloquium can be found at:
http://www.iks.rwth-aachen.de/aktuelles/kolloquium
--
Irina Esser
Institute of Communication Systems (IKS)
RWTH Aachen University
Muffeter Weg 3a, 52074 Aachen, Germany
+49 241 80 26958 (phone)
ronkartz(a)iks.rwth-aachen.de
http://www.iks.rwth-aachen.de/