[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

Overview

Listening to Sounds of Silence for Speech Denoising

Introduction

This is the repository of the "Listening to Sounds of Silence for Speech Denoising" project. (Project URL: here) Our approach is based on a key observation about human speech: there is often a short pause between each sentence or word. In a recorded speech signal, those pauses introduce a series of time periods during which only noise is present. We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. Detected silent intervals over time expose not just pure noise but its time varying features, allowing the model to learn noise dynamics and suppress it from the speech signal. An overview of our audio denoise network is shown here:

Silent Interval Detection Model

Our model has three components: (a) one that detects silent intervals over time, and outputs a noise profile observed from detected silent intervals; (b) another that estimates the full noise profile, and (c) yet another that cleans up the input signal.

Dependencies

  • Python 3
  • PyTorch 1.3.0

You can install the requirements either to your virtual environment or the system via pip with:

pip install -r requirements.txt

Data

Training and Testing

Our model is trained on publicly available audio datasets. We obtain clean speech signals using AVSPEECH, from which we randomly choose 2448 videos (4:5 hours of total length) and extract their speech audio channels. Among them, we use 2214 videos for training and 234 videos for testing, so the training and testing speeches are fully separate.

We use two datasets, DEMAND and Google’s AudioSet, as background noise. Both consist of environmental noise, transportation noise, music, and many other types of noises. DEMAND has been widely used in previous denoising works. Yet AudioSet is much larger and more diverse than DEMAND, thus more challenging when used as noise.

Due to the linearity of acoustic wave propagation, we can superimpose clean speech signals with noise to synthesize noisy input signals. When synthesizing a noisy input signal, we randomly choose a signal-to-noise ratio (SNR) from seven discrete values: -10dB, -7dB, -3dB, 0dB, 3dB, 7dB, and 10dB; and by mixing the foreground speech with properly scaled noise, we produce a noisy signal with the chosen SNR. For example, a -10dB SNR means that the power of noise is ten times the speech. The SNR range in our evaluations (i.e., [-10dB, 10dB]) is significantly larger than those tested in previous works.

Dataset Structure (For inference)

Please organize the dataset directory as follows:

dataset/
├── audio1.wav
├── audio2.wav
├── audio3.wav
...

Please also provide a csv file including each audio file's file_name (without extension). For example:

audio1
audio2
audio3
...

An example is provided in the data/sounds_of_silence_audioonly_original directory.

Data Preprocessing

To process the dataset, run the script:

python preprocessing/preprocessor_audioonly.py

Note: Please specify dataset's directory, csv file, and output path inside preprocessor_audioonly.py. After running the script, the dataset directory looks like the data/sounds_of_silence_audioonly directory, with a JSON file (sounds_of_silence.json in this example) linking to the directory.

Inference

Pretrained weights

You can download the pretrained weights from authors here.

Step 1

  1. Go to model_1_silent_interval_detection directory
  2. Choose the audioonly_model
  3. Run
    CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0,1 python3 predict.py --ckpt 87 --save_results false --unknown_clean_signal true
  4. Run
    python3 create_data_from_pred.py --unknown_clean_signal true
  5. Outputs can be found in the model_output directory.

Step 2

  1. Go to model_2_audio_denoising directory
  2. Choose audio_denoising_model
  3. Run
    CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 python3 predict.py --ckpt 24 --unknown_clean_signal true
  4. Outputs can be found in the model_output directory. The denoised result is called denoised_output.wav.

Command Parameters Explanation:

  1. --ckpt [number]: Refers to the pretrained model located in each models output directory (model_output/{model_name}/model/ckpt_epoch{number}.pth).
  2. --save_results [true|false]: If true, intermediate audio results and waveform figures will be saved. Recommend to leave it off to speed up the inference process.
  3. --unknown_clean_signal [true|false]: If running inference on external data (data without known clean signals), please set it to true.

Contact

E-mail: [email protected]




© 2020 The Trustees of Columbia University in the City of New York. This work may be reproduced and distributed for academic non-commercial purposes only without further authorization, but rightsholder otherwise reserves all rights.

Comments
  • Error in data processing step

    Error in data processing step

    I am trying to run your code. I installed all of the requirements, and then in the phase of data processing, I defined the directories in preprocessor_audioonly.py file as (I edited only this part below)

    SNR = [-10, -7, -3, 0, 3, 7, 10]
    for snr in tqdm(SNR):
        #DIR = '/proj/vondrick/rx2132/test_noise_robust_embedding/data/TIMIT/TEST_noisy_snr' + str(int(snr))
        DIR = '/home/mehri/data/PycharmProjects/denoise/data/sounds_of_silence_audioonly' + str(int(snr))
        CSV = '/home/mehri/data/PycharmProjects/denoise/data/sounds_of_silence_audioonly' + str(int(snr)) + '/sounds_of_silence' + str(int(snr)) + '.csv'
        build_csv(DIR, CSV, ext='.WAV')
        JSON = '/home/mehri/data/PycharmProjects/denoise/data/sounds_of_silence_audioonly' + str(int(snr)) + '.json'
        build_json_better(DIR, CSV, JSON, ext='.WAV')
    

    But I get the error

    Traceback (most recent call last):
      File "<input>", line 2, in <module>
    NameError: name 'tqdm' is not defined
    
    
    opened by ghost 8
  • Labels for silent detector

    Labels for silent detector

    Hi! First of all thank your sharing your code and your paper(It has been a pleasure to read it) I have gone through your code looking for the area were you create the labels for your audio without success, could you explain me briefly how are you generating your labels to train it ? I understand you have divided in segments of 1/30 seconds, but my question is more related to: you are setting output of that network in 100 which means you need to label 100 segments.. Am I correct? if you split a second by 30 you have 30 segments per seconds, then 2 seconds == 60 segments ie. [1,0,0,0,1,1,1,1,1,0.....,1] saying which segment is speech ==1 and which is silent ==0 ... hopefully I am understanding correctly... so how is that you match the 100 segments?

    opened by Toku11 3
  • metrics

    metrics

    I have a issue about the pesq metrics. I see that you use the pypesq package to compute the pesq. But this package just can compute narrow band pesq version. The demand dataset is 16k sampling rate and the baseline model also provide the wide band pesq version. The two values will be different. And also the other three metrics related to pesq will be different. Does author note sunch things?

    opened by yunzqq 2
  • Pretrained checkpoints availability

    Pretrained checkpoints availability

    Hi, are the pretrained model checkpoints you used to produce the results in the paper available for download? Running the code snippets you provided in the inference section fail because ckpt 87 and ckpt 24 are not in the model_output directories. Thanks!

    opened by yossing-audatic 2
  • What to do about pre-trained models and the ckpt argument?

    What to do about pre-trained models and the ckpt argument?

    I carefully read the paper and loved it. I got most of the way through to make the code work for my use case, but I can't figure out what the purpose of the ckpt parameter is.

    Following the steps and fixing things along the way, I inevitably get to FileNotFoundError: [Errno 2] No such file or directory: '../model_output/audioonly_model/model/ckpt_epoch87.pth', which should be of no surprise, but I'm clueless as to what the workaround should be.

    Any guidance here please?

    opened by andreev-io 2
  • Error in step 2 of inference

    Error in step 2 of inference

    While running step 2 of inference, I encountered an error where the script was trying to load a non existing file. model_1_silent_interval_detection/model_output/audioonly_model/outputs/sounds_of_silence/recovered/sos_1_0000001_mixed.wav As per the script at this point, the file is loaded. I think it is supposed to be generated from step 1. But no such file was generated. Can you please look into this? Thanks.

    opened by krantiparida 1
  • data preprocessing

    data preprocessing

    Hello,

    Could you please attach a link with the final version of data that you used (i mean with wav files)?

    Thank you in advance, Sincerely yours, Aleksandra

    opened by SashaBurashnikova 1
  • Pretrained models

    Pretrained models

    Hi! Is it possible to get your models you used to get results, cause they are not in model_output directory? I would like to use then in my academic research in the university (I'm a student). Thanks!

    opened by popandopulogeo 1
  • label for silent frame detection

    label for silent frame detection

    For labeling, the threshold 0.08 is used for the average energy of one frame? I do not find the label processing code. I use 0.08 that does not work for me.

    opened by yunzqq 0
  • Dataset installation documentation is unclear

    Dataset installation documentation is unclear

    I find your README unclear when I click the links to the dataset in the "We use two datasets, DEMAND and Google’s AudioSet, as background noise." I don't see any installation process, Could you please explain it a little more specific? I don't get how should I get those datasets or how to install them

    opened by amcdvg 0
  • Inference Question

    Inference Question

    Could you provide more information as to how to run the inference models? For example, how do you modify the code to point to the dataset? Any extra details or examples would be greatly appreciated!

    opened by sstrelnikoff 4
Owner
Henry Xu
Henry Xu
Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

Xavier 33 Oct 12, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

null 449 Dec 27, 2022
The official project of SimSwap (ACM MM 2020)

SimSwap: An Efficient Framework For High Fidelity Face Swapping Proceedings of the 28th ACM International Conference on Multimedia The official reposi

Six_God 2.6k Jan 8, 2023
UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

UDP-Pose This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Top-Down Results on

null 20 Jul 29, 2022
git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

null 35 Sep 8, 2021
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
《Dual-Resolution Correspondence Network》(NeurIPS 2020)

Dual-Resolution Correspondence Network Dual-Resolution Correspondence Network, NeurIPS 2020 Dependency All dependencies are included in asset/dualrcne

Active Vision Laboratory 45 Nov 21, 2022
(NeurIPS 2020) Wasserstein Distances for Stereo Disparity Estimation

Wasserstein Distances for Stereo Disparity Estimation Accepted in NeurIPS 2020 as Spotlight. [Project Page] Wasserstein Distances for Stereo Disparity

Divyansh Garg 92 Dec 12, 2022
Implementation of "Fast and Flexible Temporal Point Processes with Triangular Maps" (Oral @ NeurIPS 2020)

Fast and Flexible Temporal Point Processes with Triangular Maps This repository includes a reference implementation of the algorithms described in "Fa

Oleksandr Shchur 20 Dec 2, 2022
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Yazhou XING 90 Oct 19, 2022
Code for ICE-BeeM paper - NeurIPS 2020

ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA This repository contains code to run and reproduce the experiments

Ilyes Khemakhem 65 Dec 22, 2022
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Discriminative Sounding Objects Localization Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovis

null 51 Dec 11, 2022
Advances in Neural Information Processing Systems (NeurIPS), 2020.

What is being transferred in transfer learning? This repo contains the code for the following paper: Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.

Google Research 36 Aug 26, 2022
Neuron Merging: Compensating for Pruned Neurons (NeurIPS 2020)

Neuron Merging: Compensating for Pruned Neurons Pytorch implementation of Neuron Merging: Compensating for Pruned Neurons, accepted at 34th Conference

Woojeong Kim 33 Dec 30, 2022
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement Paper Xin Liu, Josh Fromm, Shwetak Patel, Daniel M

Xin Liu 106 Dec 30, 2022
Defending graph neural networks against adversarial attacks (NeurIPS 2020)

GNNGuard: Defending Graph Neural Networks against Adversarial Attacks Authors: Xiang Zhang ([email protected]), Marinka Zitnik (marinka@hms.

Zitnik Lab @ Harvard 44 Dec 7, 2022
Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Population-Based Bandits (PB2) Code for the Population-Based Bandits (PB2) Algorithm, from the paper Provably Efficient Online Hyperparameter Optimiza

Jack Parker-Holder 22 Nov 16, 2022
Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

THUML @ Tsinghua University 35 Sep 23, 2022