HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Rishikesh (ऋषिकेश)

Last update: Dec 27, 2022

Related tags

Deep Learning python waveform speech pytorch gan wavenet speech-processing denoising denoiser hifigan

Overview

HiFiGAN Denoiser

This is a Unofficial Pytorch implementation of the paper HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks.

Citations

@misc{su2020hifigan,
      title={HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks}, 
      author={Jiaqi Su and Zeyu Jin and Adam Finkelstein},
      year={2020},
      eprint={2006.05694},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Requirement

Tested on Python 3.6

pip install -r requirements.txt

Train & Tensorboard

python train.py -c [config yaml file]
tensorboard --logdir log_dir

Inference

python inference.py -p [checkpoint path] -i [input wav path]

Checkpoint :

References

Comments

Tensorshape mismatch error when Postnet starts

Hello, I've been trying to train a model and when postnet starts I run into the following issue

Traceback (most recent call last):
  File "train.py", line 300, in <module>
    main()
  File "train.py", line 296, in main
    train(0, args, hp, hp_str)
  File "train.py", line 169, in train
    sc_loss_, mag_loss_ = stft_loss(y_g_hat[:, :, :y.size(2)].squeeze(1), y.squeeze(1))
  File "/home/guest/Supreeth/hifigan-denoiser/hifigan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guest/Supreeth/hifigan-denoiser/stft_loss.py", line 130, in forward
    sc_l, mag_l = f(x, y)
  File "/home/guest/Supreeth/hifigan-denoiser/hifigan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guest/Supreeth/hifigan-denoiser/stft_loss.py", line 91, in forward
    sc_loss = self.spectral_convergenge_loss(x_mag, y_mag)
  File "/home/guest/Supreeth/hifigan-denoiser/hifigan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/guest/Supreeth/hifigan-denoiser/stft_loss.py", line 46, in forward
    return torch.norm(y_mag - x_mag, p="fro") / torch.norm(y_mag, p="fro")
RuntimeError: The size of tensor a (641) must match the size of tensor b (640) at non-singleton dimension 1

Is there a fix for this? thank you!

opened by SupreethRao99 0

Loss

Hello Rishi,

I am experimenting Speech-Bandwdith-Extension(NarrwoBand - SuperWIdeBand) using this network without Post-Net. I could observe that Generator loss going high-value and that to fluctuating, But evaluating with unseen signal , I could able to recunstruct SuperWideBand from NarrowBand signal.

I am having confusion on model convergence .. Can you plz give some insights on model convergence?

opened by saivinaypsv 2
Data simulation and augmentation

Can you detail the way you are using to make the noise audio for training?

Does it the same with described in the paper?

Are you using kaldi or any tool for this, and can you share your noise dataset !

Thank rishikksh !

opened by v-nhandt21 8
KeyError: '__getstate__'

Hi, thanks for opensourcing your code! During the training process, I met an error with the command bellow.

COMMAND python train.py -c config.yaml

ERROR Initializing Training Process.. Batch size per GPU : 0 Traceback (most recent call last): File "train.py", line 304, in main() File "train.py", line 298, in main mp.spawn(train, nprocs=hp.train.num_gpus, args=(args, hp, hp_str,)) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 148, in start_processes process.start() File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/process.py", line 105, in start self._popen = self._Popen(self) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/home/lian/.conda/envs/hifi-GAN-denoise/lib/python3.6/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) KeyError: 'getstate'

Would you like to tell me why it happened and how to solve it? Thank you! Have a nice day.

opened by KevinBaylor 1
postnet parameters

I noticed that the postnet filter size is 32, which makes the output have different shapes than the input. Also, the dropout rate is so high that it's not learning anything meaningful. Is that supposed to be like this?

opened by ghost 8

Owner

Rishikesh (ऋषिकेश)

GitHub

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

157 Jan 1, 2023

Combine Tacotron2 and Hifi GAN to generate speech from text

EndToEndTextToSpeech Combine Tacotron2 and Hifi GAN to generate speech from text Download weights Hifi GAN -> hifi_gan/checkpoint/ : pretrain 2.5M ste

1 Dec 18, 2021

A two-stage U-Net for high-fidelity denoising of historical recordings

A two-stage U-Net for high-fidelity denoising of historical recordings Official repository of the paper (not submitted yet): E. Moliner and V. Välimäk

57 Jan 5, 2023

Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

371 Dec 30, 2022

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi-GAN+ This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All

134 Dec 30, 2022

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement This is the unofficial implementation of Vocoder part of

118 Dec 29, 2022

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Bilateral Denoising Diffusion Models (BDDMs) This is the official PyTorch implementation of the following paper: BDDM: BILATERAL DENOISING DIFFUSION M

172 Dec 23, 2022

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

2.9k Dec 28, 2022

Generate high quality pictures. GAN. Generative Adversarial Networks

ESRGAN generate high quality pictures. GAN. Generative Adversarial Networks """ Super-resolution of CelebA using Generative Adversarial Networks. The

1 Dec 14, 2021

Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Neural Spatio-Temporal Point Processes [arxiv] Ricky T. Q. Chen, Brandon Amos, Maximilian Nickel Abstract. We propose a new class of parameterizations

75 Dec 19, 2022

Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P

77 Dec 21, 2022

《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

Towards High Fidelity Face-Relighting with Realistic Shadows Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu. In CVPR, 2021. T

114 Dec 10, 2022

Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos This repository is the official tensorflow python implementation

287 Jan 6, 2023

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

55 Dec 26, 2022

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

170 Jan 4, 2023

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

H3DS Dataset This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction Access

72 Dec 10, 2022

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

54 Aug 30, 2021

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis Pretrained Models In this work, we created synthetic tissue

1 Feb 7, 2022

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

NeurIPS 2021 Title: Distilling Robust and Non-Robust Features in Adversarial Exa

35 Dec 26, 2022

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Related tags

Overview

HiFiGAN Denoiser

Citations

Requirement

Train & Tensorboard

Inference

Checkpoint :

References

Comments

Tensorshape mismatch error when Postnet starts

Loss

Data simulation and augmentation

KeyError: '__getstate__'

postnet parameters

Owner

Rishikesh (ऋषिकेश)

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Combine Tacotron2 and Hifi GAN to generate speech from text

A two-stage U-Net for high-fidelity denoising of historical recordings

Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Generate high quality pictures. GAN. Generative Adversarial Networks

Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

KeyError: 'getstate'