
Dear subscribers of the colloquium newsletter, we are happy to inform you about the next date of our Communication Technology Colloquium. *Wednesday, January 18, 2022* *Speaker*: Konstantin Wehmeyer *Time:* 2:00 p.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09 Meeting-ID: 979 0415 7921 Passwort: 481650 *Bachelor-Lecture: ***Deep Learning-Based Speech Synthesis as Post-Processing of a Noise Reduction /Audio and speech signals are often disturbed by noise signals in frequency- and/or time-limited parts. To attenuate or remove these distortions, several methods, including deep learning- based approaches, are known. Often, however, only the magnitude spectrum is processed and the phase spectrum is taken over unchanged due to its comparatively lower relevance. Consequently, the noisy phase is reused when synthesizing the waveform from the processed magnitude spectrum. Therefore, distortions in the magnitude spectrum can be reduced, but not in the phase spectrum which inevitably leads to a deterioration in speech quality and intelligibility./ /This thesis presents methods that allow a reconstruction of the phase spectrum of speech signals based on noise-reduced magnitude spectra. At the Institute of Communication Systems at RWTH Aachen University a phase reconstruction algorithm was developed and this algorithm has already been evaluated in a previous study for the case of smoothed magnitude spectra. It was shown that the deep neural network (DNN) used can benefit from targeted training on the smoothed magnitude spectra even without further modification of the network structures. However, even slight smearing of the magnitude spectra already leads to a significant loss in performance compared to the use of perfect magnitude spectra. In this work, therefore, the DNNs used are optimized for the case of noise-reduced magnitude spectra. // / /Several deep learning-based models are introduced and compared with each other and with the models already developed. Their properties and aspects such as causality are addressed. Moreover, a new loss function and assessment measure specifically designed to estimate and assess the phase spectrum of speech signals is developed and tested. In order to be able to evaluate the results as independently as possible of a specific type of noise reduction, ideal masks are developed, used, and discussed. / All interested parties are cordially invited, registration is not required. General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium -- Simone Sedgwick Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear subscribers of the colloquium newsletter, we are happy to inform you about the next date of our Communication Technology Colloquium. *Thursday, March 9, 2023* *Speaker*: Maximilian Tillmann *Time:* 2:00 p.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09 Meeting-ID: 979 0415 7921 Passwort: 481650 *Master-Lecture: *Investigations on Autoencoder Models for Online System Identification /Speech communication devices are indispensable in our daily work and personal lives. Using them in hands free mode can create an echo signal, which, if no action is taken, would disturb the speaker. However, the echo signal can be predicted, when the impulse response between loudspeaker and microphone is known. For this task, system identification algorithms exist, such as the Least-Mean-Square (LMS) algorithm, the Normalized-Least-Mean-Square (NLMS) algorithm, and the Kalman filter.They work well in general, but face difficulties when confronted with high correlation input signals, high noise levels, or rapidly changing impulse responses over time./ / This thesis aims to explore whether prior knowledge about the impulse response can improve system identification.The key approach is to utilize the manifold hypothesis, which has shown promising results in previous works in mapping acoustic room impulse responses to a lower dimensional subspace. These approaches require training data of impulse responses. This thesis investigates how well affine subspace models can represent impulse response with a limited number of subspace components compared to the same number of components in the time domain. One well known way to find an optimal affine subspace is by Principal- Component-Analysis (PCA). It is shown that the affine subspace model can have the same achievable system mismatch with significantly less number of subspace components, when the loudspeaker and the microphone are constrained in their positions./ / The manifoldLMSalgorithm, the manifoldNLMSalgorithm and the manifold Kalman filter are proposed in this thesis, which can utilise general non linear manifolds for the acoustic echo compensation task. For the manifoldLMSandNLMSalgorithm in the case of white noise excitation and an affine manifold, the expected convergence speed and the expected steady state system mismatch are derived theoretically and are shown to accurately describe the algorithms behaviour in simulations.For scenarios with constrained loudspeaker and microphone positions it is shown that the manifoldNLMSalgorithm significantly outperforms the time domainNLMSalgorithm.The manifold Kalman filter is compared to the time domain Kalman filter and another subspace approach from literature. / All interested parties are cordially invited, registration is not required. General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium -- Simone Sedgwick Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear subscribers of the colloquium newsletter, I am sorry, but I have to inform about the changing appointment: *Wednesday, March 8, 2023* *Speaker*: Maximilian Tillmann *Time:* 11:00 a.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09 Meeting-ID: 979 0415 7921 Passwort: 481650 *Master-Lecture: *Investigations on Autoencoder Models for Online System Identification see below... // Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/ Am 27.02.2023 um 15:04 schrieb Simone Sedgwick:
Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication Technology Colloquium.
*Thursday, March 9, 2023* *Speaker*: Maximilian Tillmann *Time:* 2:00 p.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09
Meeting-ID: 979 0415 7921 Passwort: 481650
*Master-Lecture: *Investigations on Autoencoder Models for Online System Identification
/Speech communication devices are indispensable in our daily work and personal lives. Using them in hands free mode can create an echo signal, which, if no action is taken, would disturb the speaker. However, the echo signal can be predicted, when the impulse response between loudspeaker and microphone is known. For this task, system identification algorithms exist, such as the Least-Mean-Square (LMS) algorithm, the Normalized-Least-Mean-Square (NLMS) algorithm, and the Kalman filter.They work well in general, but face difficulties when confronted with high correlation input signals, high noise levels, or rapidly changing impulse responses over time./ / This thesis aims to explore whether prior knowledge about the impulse response can improve system identification.The key approach is to utilize the manifold hypothesis, which has shown promising results in previous works in mapping acoustic room impulse responses to a lower dimensional subspace. These approaches require training data of impulse responses. This thesis investigates how well affine subspace models can represent impulse response with a limited number of subspace components compared to the same number of components in the time domain. One well known way to find an optimal affine subspace is by Principal- Component-Analysis (PCA). It is shown that the affine subspace model can have the same achievable system mismatch with significantly less number of subspace components, when the loudspeaker and the microphone are constrained in their positions./ / The manifoldLMSalgorithm, the manifoldNLMSalgorithm and the manifold Kalman filter are proposed in this thesis, which can utilise general non linear manifolds for the acoustic echo compensation task. For the manifoldLMSandNLMSalgorithm in the case of white noise excitation and an affine manifold, the expected convergence speed and the expected steady state system mismatch are derived theoretically and are shown to accurately describe the algorithms behaviour in simulations.For scenarios with constrained loudspeaker and microphone positions it is shown that the manifoldNLMSalgorithm significantly outperforms the time domainNLMSalgorithm.The manifold Kalman filter is compared to the time domain Kalman filter and another subspace approach from literature. /
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium -- Simone Sedgwick Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear subscribers of the colloquium newsletter, we are happy to inform you about the next date of our Communication Technology Colloquium. *Friday, April 28, 2023* *Speaker*: Helena Janning *Time:* 11:00 a.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/97904157921?pwd=SWpsbDl0MWhrWjY1ZkZaeFRoYmErZz09 Meeting-ID: 979 0415 7921 Passwort: 481650 *Bachelor-Lecture: ** Direction-of-Arrival Estimation in Headphones using Subspace Methods* *Einfallsrichtungsschätzung bei Kopfhörern mittels Unterraummethoden* In order to estimate the direction of arrival of a sound source, different algorithms exist which vary in accuracy, robustness, computational complexity and memory requirements. In the thesis, a modified Steered-Response Power (SRP) algorithm with a Minimum Variance Distortionless Response (MVDR) beamformer is presented, whose computational complexity and memory requirements are reduced compared to the original algorithm. For this purpose, the auto- and cross-energy spectral density of the device-related transfer functions are approximated using a principal component analysis. It could be shown that the maximum complexity saving is 97,23% and the memory requirement can be reduced by 91,72%. Depending on the requirements for the estimation quality, further savings can be achieved. All interested parties are cordially invited, registration is not required. General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear subscribers of the colloquium newsletter, we are happy to inform you about the next date of our Communication Technology Colloquium. *Friday, October 20, 2023* *Speaker*: Elgiz Coskun *Time:* 11:00 a.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/61215027648?pwd=MTJvayt5bkdka04raWZVempPZGE0Zz09 Meeting-ID: 612 1502 7648 Passwort: 380386 *Master-Lecture: ** Optimization of an Instrumental Audio Quality Assessment Approach Using Machine Learning Methods* The assessment of audio system playback quality involves diverse methods, including auditory tests, technical parameter measurements, and instrumental evaluation techniques. The instrumental methods emulate human auditory perception using algorithmic steps, transforming analysis results into perceptual scales. Recent advancements include applying Machine Learning (ML) and Deep Learning (DL) to instrumental assessment, enhancing prediction quality and operational efficiency. This study proposes a double-ended model for audio quality prediction, aiming to match the prediction quality of an existing method, called Multi-Dimensional Audio Quality Score (MDAQS), while improving efficiency. A Deep Neural Network (DNN) model is designed, utilizing a Convolutional Neural Network (CNN)-Encoder for feature extraction, Self-Attention for time-weighting, and specialized attention-pooling. Data is gathered from binaural measurements using various audio systems and augmented to enhance model resilience. Preprocessing includes labeling and domain transformation using a sophisticated hearing model. The model is trained with preprocessed labeled data, and its prediction results are compared to the target scores obtained via MDAQS. Next, the pretrained model is extended to predict quality dimensions directly comparable to auditory results in listening tests. Parameters of the pretrained model are kept fixed during the second training phase due to limited auditory data. Predictions are evaluated using metrics accounting for auditory result uncertainty. All interested parties are cordially invited, registration is not required. General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear Subscribers of the Colloquium Newsletter, please note that the lecture (see below) will take place on Friday, October 20, 2023, *at 10:45 a.m.* kind regards Simone Sedgwick -------- Weitergeleitete Nachricht -------- Betreff: Communication Technology Colloquium at IKS Datum: Wed, 11 Oct 2023 13:32:59 +0200 Von: Simone Sedgwick <sedgwick@iks.rwth-aachen.de> An: kommunikationstechnik-kolloquium@lists.rwth-aachen.de Dear subscribers of the colloquium newsletter, we are happy to inform you about the next date of our Communication Technology Colloquium. *Friday, October 20, 2023* *Speaker*: Elgiz Coskun *Time:* 11:00 a.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/61215027648?pwd=MTJvayt5bkdka04raWZVempPZGE0Zz09 Meeting-ID: 612 1502 7648 Passwort: 380386 *Master-Lecture: ** Optimization of an Instrumental Audio Quality Assessment Approach Using Machine Learning Methods* The assessment of audio system playback quality involves diverse methods, including auditory tests, technical parameter measurements, and instrumental evaluation techniques. The instrumental methods emulate human auditory perception using algorithmic steps, transforming analysis results into perceptual scales. Recent advancements include applying Machine Learning (ML) and Deep Learning (DL) to instrumental assessment, enhancing prediction quality and operational efficiency. This study proposes a double-ended model for audio quality prediction, aiming to match the prediction quality of an existing method, called Multi-Dimensional Audio Quality Score (MDAQS), while improving efficiency. A Deep Neural Network (DNN) model is designed, utilizing a Convolutional Neural Network (CNN)-Encoder for feature extraction, Self-Attention for time-weighting, and specialized attention-pooling. Data is gathered from binaural measurements using various audio systems and augmented to enhance model resilience. Preprocessing includes labeling and domain transformation using a sophisticated hearing model. The model is trained with preprocessed labeled data, and its prediction results are compared to the target scores obtained via MDAQS. Next, the pretrained model is extended to predict quality dimensions directly comparable to auditory results in listening tests. Parameters of the pretrained model are kept fixed during the second training phase due to limited auditory data. Predictions are evaluated using metrics accounting for auditory result uncertainty. All interested parties are cordially invited, registration is not required. General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear subscribers of the colloquium newsletter, we are happy to inform you about the next date of our Communication Technology Colloquium. *Wednesday, April 10, 2024* *Speaker*: Henning Konermann *Time:* 2:30 p.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/61215027648?pwd=MTJvayt5bkdka04raWZVempPZGE0Zz09 Meeting-ID: 612 1502 7648 Passwort: 380386 *Master-Lecture: ** **Investigations on Phase-Aware Speech Enhancement Using Deep Neural Networks * Speech enhancement aims to improve speech quality and intelligibility by removing noise from noisy speech signals. Currently Machine Learning (ML) based speech enhancement has become mainstream and is used in hundreds of millions of devices. This is crucial in various applications, from telecommunications to hearing aids. Historically, the phase component was considered unimportant for this task when using the analysis-modification-synthesis approach. However, with the rise of ML and, in particular, Deep Neural Networks (DNNs), these technologies have become increasingly important in recent times. This thesis presents an in-depth study of phase-aware speech enhancement using DNNs, initially focusing on the theoretical benefits of integrating phase information into the speech enhancement process through oracle experiments. A significant emphasis of this work is on the recently proposed Consistent-Inconsistent Phase (CIP) approach, discussing its advantages and disadvantages to phase estimation. Traditional magnitude estimation, with and without additional phase information, serves as the baseline for comparison. It has been demonstrated that CIP offers theoretical advantages over pure phase estimation and could, in theory, perform equally well as magnitude estimation without additional phase information while adopting the noisy phase for synthesis. However, the practical implementation does not fully realize its theoretical potential when validating the theoretical results by replicating the experiments with state-of-the-art DNNs. Solely in the context of background noise removal, a combination of magnitude and CIP estimation proves clear superiority to other techniques evaluated in this study. The estimation of the CIP emerges as a viable alternative to direct estimation of the clean phase, especially in noise-dominated signals. All interested parties are cordially invited, registration is not required. General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear subscribers of the colloquium newsletter, please note that the lecture will take place at *1:30 p.m*. Kind regards Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/ Am 03.04.2024 um 11:51 schrieb Simone Sedgwick:
Dear subscribers of the colloquium newsletter,
we are happy to inform you about the next date of our Communication Technology Colloquium.
*Wednesday, April 10, 2024* *Speaker*: Henning Konermann *Time:* 2:30 p.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/61215027648?pwd=MTJvayt5bkdka04raWZVempPZGE0Zz09
Meeting-ID: 612 1502 7648 Passwort: 380386
*Master-Lecture: **
**Investigations on Phase-Aware Speech Enhancement Using Deep Neural Networks *
Speech enhancement aims to improve speech quality and intelligibility by removing noise from noisy speech signals. Currently Machine Learning (ML) based speech enhancement has become mainstream and is used in hundreds of millions of devices. This is crucial in various applications, from telecommunications to hearing aids. Historically, the phase component was considered unimportant for this task when using the analysis-modification-synthesis approach. However, with the rise of ML and, in particular, Deep Neural Networks (DNNs), these technologies have become increasingly important in recent times. This thesis presents an in-depth study of phase-aware speech enhancement using DNNs, initially focusing on the theoretical benefits of integrating phase information into the speech enhancement process through oracle experiments. A significant emphasis of this work is on the recently proposed Consistent-Inconsistent Phase (CIP) approach, discussing its advantages and disadvantages to phase estimation. Traditional magnitude estimation, with and without additional phase information, serves as the baseline for comparison. It has been demonstrated that CIP offers theoretical advantages over pure phase estimation and could, in theory, perform equally well as magnitude estimation without additional phase information while adopting the noisy phase for synthesis. However, the practical implementation does not fully realize its theoretical potential when validating the theoretical results by replicating the experiments with state-of-the-art DNNs. Solely in the context of background noise removal, a combination of magnitude and CIP estimation proves clear superiority to other techniques evaluated in this study. The estimation of the CIP emerges as a viable alternative to direct estimation of the clean phase, especially in noise-dominated signals.
All interested parties are cordially invited, registration is not required.
General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium
Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear subscribers of the colloquium newsletter, we are happy to inform you about the next date of our Communication Technology Colloquium. *Thursday, October 26, 2023* *Speaker*: Abisman Balachanthiran *Time:* 11:00 a.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/61215027648?pwd=MTJvayt5bkdka04raWZVempPZGE0Zz09 Meeting-ID: 612 1502 7648 Passwort: 380386 *Bachelor-Lecture: ** Evaluation of Dereverberation Methods in the Context of Spatial Modification of Binaural Signals* A new binaural signal modification system is proposed based on the Binaural Cue Adapta- tion (BCA ) algorithm and it is evaluated with existing BCA algorithms. The BCA algorithm dynamically adjusts binaural signals according to the listener’s head movements using a head tracker. This is done by splitting the signal into two components. The system under evaluation proposed in this thesis considers the reverberation as temporally correlated with the source signal, in contrast to the baseline BCA system. With the implemen- tation of the Weighted Prediction Error ( WPE ) dereverberation algorithm, the BCA system has a component that explicitly removes reverberation. In addition, an online implementation of the BCA system is presented for evaluation purposes. The three systems presented in this thesis - the baseline BCA system, the proposed offline BCA system and the online BCA system - were evaluated using a MUSHRA listening test. Based on questions about basic audio quality, source direction and reverberation perception, 15 participants evaluated the three systems for 4 different scenarios. A statistically significant difference between the evaluated systems could not be found in terms of source localisation quality. Statistically significant differences were found in the ratings of basic audio quality and the perception of reverberation, with the system using an online implementation of WPE performing best. All interested parties are cordially invited, registration is not required. General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/

Dear subscribers of the colloquium newsletter, we are happy to inform you about the next date of our Communication Technology Colloquium. *Monday, May 5, 2025* *Speaker*: Lars Thieling, M.Sc. *Time:* 2:00 p.m. *Location*: hybrid - Lecture room 4G and https://rwth.zoom.us/j/61215027648?pwd=MTJvayt5bkdka04raWZVempPZGE0Zz09 Meeting-ID: 612 1502 7648 Passwort: 380386 Abstract of the Dissertation Phase-Aware Spectral Speech Enhancement Using Deep Learning Techniques by Lars Thieling, M.Sc. Motivation, Goal, and Task of the Dissertation Speech communication is crucial for human interaction and has become central in various domains like entertainment, education, and healthcare. However, speech signals often suffer from impairments due to, e.g., background noise, reverberation, acoustic echo, limited bandwidth, and packet losses. These impairments lead to degraded speech quality and intelligibility, ultimately resulting in unsatisfactory communication experiences. To mitigate the degradations, speech enhancement techniques are required. In recent years, speech enhancement approaches leveraging deep learning (DL) have achieved significant advancements and established new benchmarks in the field. Particularly in noise reduction, deep neural networks (DNNs) have facilitated enhancements even in challenging scenarios characterized by very low signal-to-noise ratios (SNRs) and highly non-stationary noise environments. Most speech enhancement methods operate in the spectral domain and traditionally focus on processing the magnitude spectrum as the phase spectrum is often deemed less relevant under moderate SNR conditions. However, the remarkable success of DNNs in these magnitude-controlled approaches at low SNR levels has increased the need for enhancing the phase, leading to a greater research focus on this aspect. Nowadays, many modern speech enhancement approaches are phase-aware, meaning they estimate both the magnitude and phase spectrum simultaneously. The task of this thesis is to develop concepts and algorithms in the emerging topic of phase-aware speech enhancement using DNNs with the aim of providing new insights and directions for the next generation of speech enhancement approaches. Major Scientific Contributions The first major contribution of the dissertation is a novel magnitude-controlled two-stage speech enhancement (TSSE) approach, which is designed to be effective under challenging SNR conditions. This method consists of a mask estimation and a speech extraction stage, with each stage utilizing a DNN specifically designed for its respective task. Unlike other state-of-the-art solutions that perform masking of the noisy spectrum in the first stage, the estimated mask is utilized as prior information in the second stage. Hence, the mask provides rough classification into speech- and noise-dominated regions and facilitates precise extraction of the speech, thereby eliminating the need to restore it only from the unmasked areas. The second major contribution of the dissertation is a novel two-stage phase reconstruction (TSPR) approach. For a given magnitude spectrum, this method first estimates phase derivatives using DNNs and then combines these estimates into a unified phase spectrum that can be utilized for speech synthesis. In the TSPR approach, modifications are proposed for both stages. For the first stage, a preprocessing step and a new loss function are introduced that simplify the DNN training and stabilize it against hyperparameter variations. In the second stage, a new phase combination method is presented that recursively calculates each time-frequency entry of the phase spectrum from the estimated phase derivatives in its local vicinity. Compared to other state-of-the-art combination methods, this approach leverages the magnitude spectrum as prior information, ultimately resulting in improved performance. The third major contribution of the dissertation is the development of novel phase spectra that can be utilized in phase-controlled speech enhancement approaches. A silence-generating phase is introduced, which achieves perfect cancellation through destructive interference of the time signals from adjacent frames during synthesis. By suitably combining this silence-generating phase with the clean speech phase, a combined consistent-inconsistent phase (CIP) is developed. This CIP enables noise reduction through pure modification of the phase without altering the noisy magnitude spectrum before synthesis. Using this CIP in a phase-controlled approach performed similarly or even better than a magnitude-controlled approach using the clean magnitude spectrum, highlighting the remarkable potential of phase processing in speech enhancement. The fourth major contribution of the dissertation is the phase-aware two-stage speech enhancement (PATSSE) approach. This PATSSE approach is a phase-aware extension of the magnitude-controlled TSSE approach, which not only predicts the clean speech magnitude spectrum but also estimates the proposed CIP by building upon and extending concepts from the TSPR approach. Specifically, there is a distinction between a separated and a joint PATSSE. While the separated PATSSE approach generates independent estimates for the magnitude and phase spectra, the joint PATSSE approach introduces an additional joint loss term to optimize these estimates simultaneously. For the joint approach, a perceptually-motivated loss is proposed, which considers aspects of human perception and therefore generally has an increased correlation with subjective listening results. Objective and subjective evaluation results demonstrate the effectiveness of both the additional estimation of CIP in the separated approach and the simultaneous optimization of the estimates in the joint approach. All interested parties are cordially invited, registration is not required. General information on the colloquium, as well as a current list of dates of the Communication Technology Colloquium can be found at: https://www.iks.rwth-aachen.de/aktuelles/kolloquium Simone Sedgwick Secretariat Institute of Communication Systems(IKS) Prof. Dr.-Ing. Peter Jax RWTH Aachen University Muffeter Weg 3a, 52074 Aachen, Germany +49 241 80 26956(phone) +49 241 80 22254(fax) sedgwick@iks.rwth-aachen.de https://www.iks.rwth-aachen.de/
participants (1)
-
Simone Sedgwick