This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

Nils L. Westhausen

Last update: Jan 7, 2023

Related tags

Deep Learning DTLN-aec

Overview

DTLN-aec

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation in TF-lite format. This model was handed in to the acoustic echo cancellation challenge (AEC-Challenge) organized by Microsoft. The DTLN-aec model is among the top-five models of the challenge. The results of the AEC-Challenge can be found here.

The model was trained on data from the DNS-Challenge and the AEC-Challenge reposetories.

The arXiv preprint can be found here.

@article{westhausen2020acoustic,
  title={Acoustic echo cancellation with the dual-signal transformation LSTM network},
  author={Westhausen, Nils L. and Meyer, Bernd T.},
  journal={arXiv preprint arXiv:2010.14337},
  year={2020}
}

Author: Nils L. Westhausen (Communication Acoustics , Carl von Ossietzky University, Oldenburg, Germany)

This code is licensed under the terms of the MIT license.

Usage:

First install the depencies from requirements.txt

Afterwards the model can be tested with:

$ python run_aec.py -i /folder/with/input/files -o /target/folder/ -m ./pretrained_models/dtln_aec_512

Files for testing can be found in the AEC-Challenge respository. The convention for file names is *_mic.wav for the near-end microphone signals and *_lpb.wav for the far-end microphone or loopback signals. The folder audio_samples contains one audio sample for each condition. The *_processed.wav files are created by the dtln_aec_512 model.

This repository is still under construction.

Comments

Can you open source your crowdsourced test data and results?

Hi Do you know the NISQA(NON-INTRUSIVE SPEECH QUALITY ASSESSMENT) project? The current author model that is focused on distortions that occur in communication networks, and not focused on speech enhancment. Can you fine-tune this model with your data so that it can cover the front-end signal processing? Thanks!

opened by zuowanbushiwo 0
Some questions about concatenate operation in DTLN-aec model?

Hi, breizhn~ I have some questions about concatenate operation. I want to know that whether the features of the microphone and the loop-back signal are concatenated in the channel dimension or the time dimension？ I'm looking forward to your reply! Good Luck!

opened by xk2016 0
Does DTLN-aec also contain the noise suppression?

I want to use DTLN-aec in real time communication. Does DTLN-aec also contain the noise suppression? or It can be combined with other ANS/AGC? the audio processing sequence just like: DTLN-aec->ANS(DTLN like)->AGC?

Best Regards

opened by cloudvc 0
Just Questions this time :)

Thanks nils @breizhn with the tflite :) apols for being dumb. If you ever have the time to answer then please do.

I have been wondering if https://www.tensorflow.org/lite/examples/on_device_training/overview could be used to increase accuracy. I have been reading the proposed DTLN-aec model architecture with similar effect to my 1st tries with tflite-dtln and just thought I would ask do you have any code examples for training

opened by StuartIanNaylor 0
About the training target: nearend speech

Thanks for your great job. I have a problem with the training target. I do not know which I should take as the training target among nearend-speech with rir and noise, nearend-speech with rir and pure nearend-speech. After reading the paper, I did some tests but got terrible training losses when selecting pure nearend-speech as training target. And, I got some good results when when selecting nearend-speech with rir and noise as training target. I will appreciate any advice. Looking forward to your reply. @breizhn

opened by liziru 0
How to use the model to generate the echo cancelled file.

Hi,

I used your dtln repo to generate bunch of noisy suppressed sound file by simply following this $ python run_evaluation.py -i in/folder/with/wav -o target/folder/processed/files -m ./pretrained_model/model.h5

I want to use your aec model to generate the echo suppressed files. It doesn't seem to work with $ python run_aec.py -i /folder/with/input/files -o /target/folder/ -m ./pretrained_models/dtln_aec_512

It looks like the model needs both mic and lpb file to generate the processed file. Am I understand it right? Would it be possible to just generate the enhanced file the same way as the dtln?

Thanks,

opened by victkid 0

Owner

Nils L. Westhausen

PhD candidate at the Communication Acoustics group at the University of Oldenburg. Working on speech enhancement and separation.

GitHub

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

249 Dec 21, 2022

ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Voice2Series-Reprogramming Voice2Series: Reprogramming Acoustic Models for Time Series Classification International Conference on Machine Learning (IC

49 Jan 3, 2023

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

152 Jan 2, 2023

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

1 Feb 15, 2022

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

294 Jan 1, 2023

Cache Requests in Deta Bases and Echo them with Deta Micros

Deta Echo Cache Leverage the awesome Deta Micros and Deta Base to cache requests and echo them as needed. Stop worrying about slow public APIs or agre

8 Dec 7, 2021

A project to make Amazon Echo respond to sign language using your webcam

Making Alexa respond to Sign Language using Tensorflow.js Try the live demo Read the Blog Post on Tensorflow's Blog Coming Soon Watch the video This p

444 Jan 3, 2023

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

12 Nov 30, 2022

Multistream CNN for Robust Acoustic Modeling

Multistream Convolutional Neural Network (CNN) A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recogni

37 Sep 21, 2022

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

2 Sep 15, 2022

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

3 Jan 4, 2023

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

281 Dec 22, 2022

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

16 Dec 30, 2022

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

408 Jan 1, 2023

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

Related tags

Overview

DTLN-aec

Contents:

Usage:

This repository is still under construction.

Comments

Can you open source your crowdsourced test data and results?

Some questions about concatenate operation in DTLN-aec model?

Does DTLN-aec also contain the noise suppression?

Just Questions this time :)

About the training target: nearend speech

How to use the model to generate the echo cancelled file.

Owner

Nils L. Westhausen

Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

Real-Time-Student-Attendence-System - Real Time Student Attendence System

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

Cache Requests in Deta Bases and Echo them with Deta Micros

A project to make Amazon Echo respond to sign language using your webcam

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

Multistream CNN for Robust Acoustic Modeling

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Replication of Pix2Seq with Pretrained Model

Adds timm pretrained backbone to pytorch's FasterRcnn model

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Real-Time Multi-Contact Model Predictive Control via ADMM