This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

Overview

DTLN-aec

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation in TF-lite format. This model was handed in to the acoustic echo cancellation challenge (AEC-Challenge) organized by Microsoft. The DTLN-aec model is among the top-five models of the challenge. The results of the AEC-Challenge can be found here.

The model was trained on data from the DNS-Challenge and the AEC-Challenge reposetories.

The arXiv preprint can be found here.

@article{westhausen2020acoustic,
  title={Acoustic echo cancellation with the dual-signal transformation LSTM network},
  author={Westhausen, Nils L. and Meyer, Bernd T.},
  journal={arXiv preprint arXiv:2010.14337},
  year={2020}
}

Author: Nils L. Westhausen (Communication Acoustics , Carl von Ossietzky University, Oldenburg, Germany)

This code is licensed under the terms of the MIT license.


Contents:

This repository contains three prtrained models of different size:

  • dtln_aec_128 (model with 128 LSTM units per layer, 1.8M parameters)
  • dtln_aec_256 (model with 256 LSTM units per layer, 3.9M parameters)
  • dtln_aec_512 (model with 512 LSTM units per layer, 10.4M parameters)

The dtln_aec_512 was handed in to the challenge.


Usage:

First install the depencies from requirements.txt

Afterwards the model can be tested with:

$ python run_aec.py -i /folder/with/input/files -o /target/folder/ -m ./pretrained_models/dtln_aec_512

Files for testing can be found in the AEC-Challenge respository. The convention for file names is *_mic.wav for the near-end microphone signals and *_lpb.wav for the far-end microphone or loopback signals. The folder audio_samples contains one audio sample for each condition. The *_processed.wav files are created by the dtln_aec_512 model.


This repository is still under construction.

Comments
  • Can you open source your crowdsourced test data and results?

    Can you open source your crowdsourced test data and results?

    Hi Do you know the NISQA(NON-INTRUSIVE SPEECH QUALITY ASSESSMENT) project? The current author model that is focused on distortions that occur in communication networks, and not focused on speech enhancment. Can you fine-tune this model with your data so that it can cover the front-end signal processing? Thanks!

    opened by zuowanbushiwo 0
  • Some questions about concatenate operation in DTLN-aec model?

    Some questions about concatenate operation in DTLN-aec model?

    Hi, breizhn~ I have some questions about concatenate operation. I want to know that whether the features of the microphone and the loop-back signal are concatenated in the channel dimension or the time dimension? I'm looking forward to your reply! Good Luck!

    opened by xk2016 0
  • Does DTLN-aec also contain the noise suppression?

    Does DTLN-aec also contain the noise suppression?

    I want to use DTLN-aec in real time communication. Does DTLN-aec also contain the noise suppression? or It can be combined with other ANS/AGC? the audio processing sequence just like: DTLN-aec->ANS(DTLN like)->AGC?

    Best Regards

    opened by cloudvc 0
  • Just Questions this time :)

    Just Questions this time :)

    Thanks nils @breizhn with the tflite :) apols for being dumb. If you ever have the time to answer then please do.

    I have been wondering if https://www.tensorflow.org/lite/examples/on_device_training/overview could be used to increase accuracy. I have been reading the proposed DTLN-aec model architecture with similar effect to my 1st tries with tflite-dtln and just thought I would ask do you have any code examples for training

    opened by StuartIanNaylor 0
  • About the training target: nearend speech

    About the training target: nearend speech

    Thanks for your great job. I have a problem with the training target. I do not know which I should take as the training target among nearend-speech with rir and noise, nearend-speech with rir and pure nearend-speech. After reading the paper, I did some tests but got terrible training losses when selecting pure nearend-speech as training target. And, I got some good results when when selecting nearend-speech with rir and noise as training target. I will appreciate any advice. Looking forward to your reply. @breizhn

    opened by liziru 0
  • How to use the model to generate the echo cancelled file.

    How to use the model to generate the echo cancelled file.

    Hi,

    I used your dtln repo to generate bunch of noisy suppressed sound file by simply following this $ python run_evaluation.py -i in/folder/with/wav -o target/folder/processed/files -m ./pretrained_model/model.h5

    I want to use your aec model to generate the echo suppressed files. It doesn't seem to work with $ python run_aec.py -i /folder/with/input/files -o /target/folder/ -m ./pretrained_models/dtln_aec_512

    It looks like the model needs both mic and lpb file to generate the processed file. Am I understand it right? Would it be possible to just generate the enhanced file the same way as the dtln?

    Thanks,

    opened by victkid 0
Owner
Nils L. Westhausen
PhD candidate at the Communication Acoustics group at the University of Oldenburg. Working on speech enhancement and separation.
Nils L. Westhausen
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Voice2Series-Reprogramming Voice2Series: Reprogramming Acoustic Models for Time Series Classification International Conference on Machine Learning (IC

null 49 Jan 3, 2023
Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

Keon Lee 152 Jan 2, 2023
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

null 294 Jan 1, 2023
Cache Requests in Deta Bases and Echo them with Deta Micros

Deta Echo Cache Leverage the awesome Deta Micros and Deta Base to cache requests and echo them as needed. Stop worrying about slow public APIs or agre

Gingerbreadfork 8 Dec 7, 2021
A project to make Amazon Echo respond to sign language using your webcam

Making Alexa respond to Sign Language using Tensorflow.js Try the live demo Read the Blog Post on Tensorflow's Blog Coming Soon Watch the video This p

Abhishek Singh 444 Jan 3, 2023
Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

Boyuan Chen 12 Nov 30, 2022
Multistream CNN for Robust Acoustic Modeling

Multistream Convolutional Neural Network (CNN) A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recogni

ASAPP Research 37 Sep 21, 2022
This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

LEAP Lab 2 Sep 15, 2022
Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

kenro515 3 Jan 4, 2023
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

凌逆战 16 Dec 30, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
Replication of Pix2Seq with Pretrained Model

Pretrained-Pix2Seq We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and c

peng gao 51 Nov 22, 2022
Adds timm pretrained backbone to pytorch's FasterRcnn model

timmFasterRcnn model_config.py -> it returns the model,feat_sizes,output channel and the feat layer names, which is reqd by the Add_FPN.py file Add_FP

Mriganka Nath 12 Dec 3, 2022
[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

Michihiro Yasunaga 264 Jan 1, 2023
PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

FullSubNet This Git repository for the official PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech E

郝翔 357 Jan 4, 2023
Real-Time Multi-Contact Model Predictive Control via ADMM

Here, you can find the code for the paper 'Real-Time Multi-Contact Model Predictive Control via ADMM'. Code is currently being cleared up and optimize

null 17 Dec 28, 2022