A hybrid model for weakly supervised speech dereverberation

Louis Bahrman, Mathieu Fontaine, Gaël Richard

LTCI, Télécom Paris, IP-Paris, France

ICASSP 2025


Paper   Arxiv   HAL   Code   Poster

Block diagram

Abstract

This paper introduces a new training strategy to improve speech dereverberation systems using minimal acoustic information and reverberant (wet) speech. Most existing algorithms rely on paired dry/wet data, which is difficult to obtain, or on target metrics that may not adequately capture reverberation characteristics and can lead to poor results on non-target metrics. Our approach uses limited acoustic information, like the reverberation time (RT60), to train a dereverberation system. The system’s output is resynthesized using a generated room impulse response and compared with the original reverberant speech, providing a novel reverberation matching loss replacing the standard target metrics. During inference, only the trained dereverberation model is used. Experimental results demonstrate that our method achieves more consistent performance across various objective metrics used in speech dereverberation than the state-of-the-art.

Audio examples

Examples are randomly drawn among several RT60 classes. 'WS' denotes Weak supervision, by RT60 (proposed) or by SRMR (Baseline). Listening with headphones is recommended, as our proposed approaches mostly remove the late reverberation.

Wet input Ground truth FSN (proposed) FSN BiLSTM (proposed) BiLSTM Baseline
WS
RT60=0.14
RT60=0.55
RT60=0.85

Citing this work

If you use this work in your research or business, please cite it using the following BibTeX entry:

        
@INPROCEEDINGS{10888095,
  author={Bahrman, Louis and Fontaine, Mathieu and Richard, Gaël},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={A Hybrid Model for Weakly-Supervised Speech Dereverberation}, 
  year={2025},
  pages={1-5},
  doi={10.1109/ICASSP49660.2025.10888095}}