Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Rishikesh (ऋषिकेश)

Last update: Dec 17, 2022

Related tags

Overview

Fre-GAN Vocoder

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Training:

python train.py --config config.json

Citation:

@misc{kim2021fregan,
      title={Fre-GAN: Adversarial Frequency-consistent Audio Synthesis}, 
      author={Ji-Hoon Kim and Sang-Hoon Lee and Ji-Hyun Lee and Seong-Whan Lee},
      year={2021},
      eprint={2106.02297},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

References:

You might also like...

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Lip to Speech Synthesis with Visual Context Attentional GAN This repository contains the PyTorch implementation of the following paper: Lip to Speech

6 Nov 2, 2022

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

HiFiGAN Denoiser This is a Unofficial Pytorch implementation of the paper HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep F

134 Dec 27, 2022

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

House-GAN++ Code and instructions for our paper: House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent

122 Dec 28, 2022

NR-GAN: Noise Robust Generative Adversarial Networks

NR-GAN: Noise Robust Generative Adversarial Networks (CVPR 2020) This repository provides PyTorch implementation for noise robust GAN (NR-GAN). NR-GAN

59 Dec 11, 2022

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

ODE GAN (Prototype) in PyTorch Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary

15 Feb 10, 2022

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

2.9k Dec 28, 2022

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

3 Dec 29, 2022

Generate high quality pictures. GAN. Generative Adversarial Networks

ESRGAN generate high quality pictures. GAN. Generative Adversarial Networks """ Super-resolution of CelebA using Generative Adversarial Networks. The

1 Dec 14, 2021

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Project Page | Paper | PyTorch implementation for the paper "AD-NeRF: Audio

551 Dec 29, 2022

Comments

Inconsistency with paper

https://arxiv.org/pdf/2106.02297.pdf

In section 2.3 "After each level of DWT, all the frequency sub-bands are channel-wise concatenated and passed to convolutional layers"

https://github.com/rishikksh20/Fre-GAN-pytorch/blob/91d0e4678199003f6b32d53c7501e6123fb70fc5/discriminator.py#L242-L246

You are concatenating on the length dim resulting in an odd looking tensor where the first half is audio features and the 2nd half is DWT features, and local waveform/DWT information can't mix properly.

Is there any reason for this? I feel very confused looking at this, but you've done it twice so I assume there's some reason for this.

opened by CookiePPP 2
about the remove_weight_norm

https://github.com/rishikksh20/Fre-GAN-pytorch/blob/91d0e4678199003f6b32d53c7501e6123fb70fc5/generator.py#L170 I think this line should be remove_weight_norm(l), l.remove_weight_norm() will result in AttributeError: 'Sequential' object has no attribute 'remove_weight_norm'. https://github.com/rishikksh20/Fre-GAN-pytorch/blob/91d0e4678199003f6b32d53c7501e6123fb70fc5/generator.py#L172 there should be remove_weight_norm(l[1]). the previous form will result in AttributeError: 'Sequential' object has no attribute 'remove_weight_norm'

opened by tricky61 0

do nn upsample before mel condition

for generator code line 137:

if i >= self.cond_level: 
                mel = self.cond_up[i - self.cond_level](mel)
                x += mel
if i > self.cond_level:
    if output is None:
        output = self.res_output[i - self.cond_level - 1](x)
    else:
        output = self.res_output[i - self.cond_level - 1](output)

in the code, for the nn upsample input is: mel condition + resblock output.

but in the paper, nn upsample input only is resblock output or the last nn upsample output;

so, Is this more reasonable?

if i > self.cond_level:
    if output is None:
        output = self.res_output[i - self.cond_level - 1](x)
    else:
        output = self.res_output[i - self.cond_level - 1](output)
if i >= self.cond_level: 
                mel = self.cond_up[i - self.cond_level](mel)
                x += mel

opened by liuhuang31 0

comparison with univnet

Hi! How this work compares with UnivNet for which one you already implemented code: https://github.com/rishikksh20/UnivNet-pytorch This paper is a little bit newer but afaik they're more concerned about generalizability of model for unseen speakers whlie this work focuses on overall quality (especially in high frequences) can you maybe elaborate?

opened by thepowerfuldeez 8

Owner

Rishikesh (ऋषिकेश)

GitHub

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

31 Dec 8, 2022

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

375 Dec 31, 2022

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation This project attempted to implement the paper Putting NeRF on a

254 Dec 27, 2022

Cycle Consistent Adversarial Domain Adaptation (CyCADA)

Cycle Consistent Adversarial Domain Adaptation (CyCADA) A pytorch implementation of CyCADA. If you use this code in your research please consider citi

2 Jan 10, 2022

Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

21 Nov 23, 2022

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

191 Dec 31, 2022

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

18 Dec 23, 2022

Style-based Neural Drum Synthesis with GAN inversion

Style-based Drum Synthesis with GAN Inversion Demo TensorFlow implementation of a style-based version of the adversarial drum synth (ADS) from the pap

29 Nov 19, 2022

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN (CARGAN) Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [compan

150 Dec 6, 2022

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis, including human motion imitation, appearance transfer, and novel view synthesis. Currently the paper is under review of IEEE TPAMI. It is an extension of our previous ICCV project impersonator, and it has a more powerful ability in generalization and produces higher-resolution results (512 x 512, 1024 x 1024) than the previous ICCV version.

2.3k Jan 5, 2023

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Related tags

Overview

Fre-GAN Vocoder

Training:

Citation:

References:

You might also like...

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

NR-GAN: Noise Robust Generative Adversarial Networks

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

Generate high quality pictures. GAN. Generative Adversarial Networks

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

Comments

Inconsistency with paper

about the remove_weight_norm

do nn upsample before mel condition

for generator code line 137:

in the code, for the nn upsample input is: mel condition + resblock output.

but in the paper, nn upsample input only is resblock output or the last nn upsample output;

so, Is this more reasonable?

comparison with univnet

Owner

Rishikesh (ऋषिकेश)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation

Cycle Consistent Adversarial Domain Adaptation (CyCADA)

Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

Style-based Neural Drum Synthesis with GAN inversion

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis