HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Overview

HiFiGAN Denoiser

This is a Unofficial Pytorch implementation of the paper HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks.

Citations

@misc{su2020hifigan,
      title={HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks}, 
      author={Jiaqi Su and Zeyu Jin and Adam Finkelstein},
      year={2020},
      eprint={2006.05694},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Requirement

Tested on Python 3.6

pip install -r requirements.txt

Train & Tensorboard

  • python train.py -c [config yaml file]

  • tensorboard --logdir log_dir

Inference

  • python inference.py -p [checkpoint path] -i [input wav path]

Checkpoint :

  • WIP

References

Comments
  • Tensorshape mismatch error when Postnet starts

    Tensorshape mismatch error when Postnet starts

    Hello, I've been trying to train a model and when postnet starts I run into the following issue

    Traceback (most recent call last):
      File "train.py", line 300, in <module>
        main()
      File "train.py", line 296, in main
        train(0, args, hp, hp_str)
      File "train.py", line 169, in train
        sc_loss_, mag_loss_ = stft_loss(y_g_hat[:, :, :y.size(2)].squeeze(1), y.squeeze(1))
      File "/home/guest/Supreeth/hifigan-denoiser/hifigan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/guest/Supreeth/hifigan-denoiser/stft_loss.py", line 130, in forward
        sc_l, mag_l = f(x, y)
      File "/home/guest/Supreeth/hifigan-denoiser/hifigan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/guest/Supreeth/hifigan-denoiser/stft_loss.py", line 91, in forward
        sc_loss = self.spectral_convergenge_loss(x_mag, y_mag)
      File "/home/guest/Supreeth/hifigan-denoiser/hifigan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/guest/Supreeth/hifigan-denoiser/stft_loss.py", line 46, in forward
        return torch.norm(y_mag - x_mag, p="fro") / torch.norm(y_mag, p="fro")
    RuntimeError: The size of tensor a (641) must match the size of tensor b (640) at non-singleton dimension 1
    

    Is there a fix for this? thank you!

    opened by SupreethRao99 0
  • Loss

    Loss

    Hello Rishi,

    I am experimenting Speech-Bandwdith-Extension(NarrwoBand - SuperWIdeBand) using this network without Post-Net. I could observe that Generator loss going high-value and that to fluctuating, But evaluating with unseen signal , I could able to recunstruct SuperWideBand from NarrowBand signal.

    I am having confusion on model convergence .. Can you plz give some insights on model convergence?

    opened by saivinaypsv 2
  • Data simulation and augmentation

    Data simulation and augmentation

    Can you detail the way you are using to make the noise audio for training?

    Does it the same with described in the paper?

    image

    Are you using kaldi or any tool for this, and can you share your noise dataset !

    Thank rishikksh !

    opened by v-nhandt21 8
  • KeyError: '__getstate__'

    KeyError: '__getstate__'

    Hi, thanks for opensourcing your code! During the training process, I met an error with the command bellow.

    COMMAND python train.py -c config.yaml

    ERROR Initializing Training Process.. Batch size per GPU : 0 Traceback (most recent call last): File "train.py", line 304, in main() File "train.py", line 298, in main mp.spawn(train, nprocs=hp.train.num_gpus, args=(args, hp, hp_str,)) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 148, in start_processes process.start() File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/process.py", line 105, in start self._popen = self._Popen(self) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) KeyError: 'getstate'

    Would you like to tell me why it happened and how to solve it? Thank you! Have a nice day.

    opened by KevinBaylor 1
  • postnet parameters

    postnet parameters

    I noticed that the postnet filter size is 32, which makes the output have different shapes than the input. Also, the dropout rate is so high that it's not learning anything meaningful. Is that supposed to be like this?

    opened by ghost 8
Owner
Rishikesh (ऋषिकेश)
Deep Learning/ AI Researcher | Open Source enthusiast | Text to Speech | Speech Synthesis | Generative Models | Object detection | Language Understanding
Rishikesh (ऋषिकेश)
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 1, 2023
Combine Tacotron2 and Hifi GAN to generate speech from text

EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN -> hifi_gan/checkpoint/ : pretrain 2.5M ste

Phạm Quốc Huy 1 Dec 18, 2021
A two-stage U-Net for high-fidelity denoising of historical recordings

A two-stage U-Net for high-fidelity denoising of historical recordings Official repository of the paper (not submitted yet): E. Moliner and V. Välimäk

Eloi Moliner Juanpere 57 Jan 5, 2023
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi-GAN+ This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All

Brent M. Spell 134 Dec 30, 2022
HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement This is the unofficial implementation of Vocoder part of

Rishikesh (ऋषिकेश) 118 Dec 29, 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Bilateral Denoising Diffusion Models (BDDMs) This is the official PyTorch implementation of the following paper: BDDM: BILATERAL DENOISING DIFFUSION M

null 172 Dec 23, 2022
Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

NVIDIA Research Projects 2.9k Dec 28, 2022
Generate high quality pictures. GAN. Generative Adversarial Networks

ESRGAN generate high quality pictures. GAN. Generative Adversarial Networks """ Super-resolution of CelebA using Generative Adversarial Networks. The

Lieon 1 Dec 14, 2021
Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Neural Spatio-Temporal Point Processes [arxiv] Ricky T. Q. Chen, Brandon Amos, Maximilian Nickel Abstract. We propose a new class of parameterizations

Facebook Research 75 Dec 19, 2022
Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P

Zhying 77 Dec 21, 2022
《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

Towards High Fidelity Face-Relighting with Realistic Shadows Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu. In CVPR, 2021. T

null 114 Dec 10, 2022
Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos This repository is the official tensorflow python implementation

Yasamin Jafarian 287 Jan 6, 2023
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

Rishikesh (ऋषिकेश) 55 Dec 26, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 4, 2023
This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

H3DS Dataset This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction Access

Crisalix 72 Dec 10, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 54 Aug 30, 2021
SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis Pretrained Models In this work, we created synthetic tissue

Emirhan Kurtuluş 1 Feb 7, 2022
LBK 35 Dec 26, 2022