Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Overview
You might also like...
PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)
PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Lip to Speech Synthesis with Visual Context Attentional GAN This repository contains the PyTorch implementation of the following paper: Lip to Speech

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

HiFiGAN Denoiser This is a Unofficial Pytorch implementation of the paper HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep F

 House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects
House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

House-GAN++ Code and instructions for our paper: House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent

NR-GAN: Noise Robust Generative Adversarial Networks
NR-GAN: Noise Robust Generative Adversarial Networks

NR-GAN: Noise Robust Generative Adversarial Networks (CVPR 2020) This repository provides PyTorch implementation for noise robust GAN (NR-GAN). NR-GAN

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations
Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

ODE GAN (Prototype) in PyTorch Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)
Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

A method that utilized Generative Adversarial Network (GAN) to interpret the black-box deep image classifier models by PyTorch.

Generate high quality pictures. GAN. Generative Adversarial Networks

ESRGAN generate high quality pictures. GAN. Generative Adversarial Networks """ Super-resolution of CelebA using Generative Adversarial Networks. The

This repository contains a PyTorch implementation of
This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Project Page | Paper | PyTorch implementation for the paper "AD-NeRF: Audio

Comments
  • Inconsistency with paper

    Inconsistency with paper

    https://arxiv.org/pdf/2106.02297.pdf

    In section 2.3 "After each level of DWT, all the frequency sub-bands are channel-wise concatenated and passed to convolutional layers"


    https://github.com/rishikksh20/Fre-GAN-pytorch/blob/91d0e4678199003f6b32d53c7501e6123fb70fc5/discriminator.py#L242-L246

    You are concatenating on the length dim resulting in an odd looking tensor where the first half is audio features and the 2nd half is DWT features, and local waveform/DWT information can't mix properly.

    Is there any reason for this? I feel very confused looking at this, but you've done it twice so I assume there's some reason for this.

    opened by CookiePPP 2
  • about the remove_weight_norm

    about the remove_weight_norm

    https://github.com/rishikksh20/Fre-GAN-pytorch/blob/91d0e4678199003f6b32d53c7501e6123fb70fc5/generator.py#L170 I think this line should be remove_weight_norm(l), l.remove_weight_norm() will result in AttributeError: 'Sequential' object has no attribute 'remove_weight_norm'. https://github.com/rishikksh20/Fre-GAN-pytorch/blob/91d0e4678199003f6b32d53c7501e6123fb70fc5/generator.py#L172 there should be remove_weight_norm(l[1]). the previous form will result in AttributeError: 'Sequential' object has no attribute 'remove_weight_norm'

    opened by tricky61 0
  • do nn upsample before mel condition

    do nn upsample before mel condition

    for generator code line 137:

    if i >= self.cond_level: 
                    mel = self.cond_up[i - self.cond_level](mel)
                    x += mel
    if i > self.cond_level:
        if output is None:
            output = self.res_output[i - self.cond_level - 1](x)
        else:
            output = self.res_output[i - self.cond_level - 1](output)
    

    in the code, for the nn upsample input is: mel condition + resblock output.

    image

    but in the paper, nn upsample input only is resblock output or the last nn upsample output;

    image

    so, Is this more reasonable?

    if i > self.cond_level:
        if output is None:
            output = self.res_output[i - self.cond_level - 1](x)
        else:
            output = self.res_output[i - self.cond_level - 1](output)
    if i >= self.cond_level: 
                    mel = self.cond_up[i - self.cond_level](mel)
                    x += mel
    
    opened by liuhuang31 0
  • comparison with univnet

    comparison with univnet

    Hi! How this work compares with UnivNet for which one you already implemented code: https://github.com/rishikksh20/UnivNet-pytorch This paper is a little bit newer but afaik they're more concerned about generalizability of model for unseen speakers whlie this work focuses on overall quality (especially in high frequences) can you maybe elaborate?

    opened by thepowerfuldeez 8
Owner
Rishikesh (ऋषिकेश)
Deep Learning/ AI Researcher | Open Source enthusiast | Text to Speech | Speech Synthesis | Generative Models | Object detection | Computer Vision
Rishikesh (ऋषिकेश)
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 8, 2022
π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

null 375 Dec 31, 2022
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation This project attempted to implement the paper Putting NeRF on a

null 254 Dec 27, 2022
Cycle Consistent Adversarial Domain Adaptation (CyCADA)

Cycle Consistent Adversarial Domain Adaptation (CyCADA) A pytorch implementation of CyCADA. If you use this code in your research please consider citi

Hyunwoo Ko 2 Jan 10, 2022
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Leo 21 Nov 23, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

Kang Liao 18 Dec 23, 2022
Style-based Neural Drum Synthesis with GAN inversion

Style-based Drum Synthesis with GAN Inversion Demo TensorFlow implementation of a style-based version of the adversarial drum synth (ADS) from the pap

Sound and Music Analysis (SoMA) Group 29 Nov 19, 2022
Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN (CARGAN) Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [compan

Descript 150 Dec 6, 2022
Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis, including human motion imitation, appearance transfer, and novel view synthesis. Currently the paper is under review of IEEE TPAMI. It is an extension of our previous ICCV project impersonator, and it has a more powerful ability in generalization and produces higher-resolution results (512 x 512, 1024 x 1024) than the previous ICCV version.

null 2.3k Jan 5, 2023