Implementation of Google Brain's WaveGrad high-fidelity vocoder

Overview

alt-text-1

WaveGrad

Implementation (PyTorch) of Google Brain's high-fidelity WaveGrad vocoder (paper). First implementation on GitHub with high-quality generation for 6-iterations.

Status

  • Documented API.
  • High-fidelity generation.
  • Multi-iteration inference support (stable for low iterations).
  • Stable and fast training with mixed-precision support.
  • Distributed training support.
  • Training also successfully runs on a single 12GB GPU with batch size 96.
  • CLI inference support.
  • Flexible architecture configuration for your own data.
  • Estimated RTF on popular GPU and CPU devices (see below).
  • 100- and lower-iteration inferences are faster than real-time on RTX 2080 Ti. 6-iteration inference is faster than one reported in the paper.
  • Parallel grid search for the best noise schedule.
  • Uploaded generated samples for different number of iterations (see generated_samples folder).
  • Pretrained checkpoint on 22KHz LJSpeech dataset with noise schedules.

Real-time factor (RTF)

Number of parameters: 15.810.401

Model Stable RTX 2080 Ti Tesla K80 Intel Xeon 2.3GHz*
1000 iterations + 9.59 - -
100 iterations + 0.94 5.85 -
50 iterations + 0.45 2.92 -
25 iterations + 0.22 1.45 -
12 iterations + 0.10 0.69 4.55
6 iterations + 0.04 0.33 2.09

*Note: Used an old version of Intel Xeon CPU.


About

WaveGrad is a conditional model for waveform generation through estimating gradients of the data density with WaveNet-similar sampling quality. This vocoder is neither GAN, nor Normalizing Flow, nor classical autoregressive model. The main concept of vocoder is based on Denoising Diffusion Probabilistic Models (DDPM), which utilize Langevin dynamics and score matching frameworks. Furthemore, comparing to classic DDPM, WaveGrad achieves super-fast convergence (6 iterations and probably lower) w.r.t. Langevin dynamics iterative sampling scheme.


Installation

  1. Clone this repo:
git clone https://github.com/ivanvovk/WaveGrad.git
cd WaveGrad
  1. Install requirements:
pip install -r requirements.txt

Training

1 Preparing data

  1. Make train and test filelists of your audio data like ones included into filelists folder.
  2. Make a configuration file* in configs folder.

*Note: if you are going to change hop_length for STFT, then make sure that the product of your upsampling factors in config is equal to your new hop_length.

2 Single and Distributed GPU training

  1. Open runs/train.sh script and specify visible GPU devices and path to your configuration file. If you specify more than one GPU the training will run in distributed mode.
  2. Run sh runs/train.sh

3 Tensorboard and logging

To track your training process run tensorboard by tensorboard --logdir=logs/YOUR_LOGDIR_FOLDER. All logging information and checkpoints will be stored in logs/YOUR_LOGDIR_FOLDER. logdir is specified in config file.

4 Noise schedule grid search

Once model is trained, grid search for the best schedule* for a needed number of iterations in notebooks/inference.ipynb. The code supports parallelism, so you can specify more than one number of jobs to accelerate the search.

*Note: grid search is necessary just for a small number of iterations (like 6 or 7). For larger number just try Fibonacci sequence benchmark.fibonacci(...) initialization: I used it for 25 iteration and it works well. From good 25-iteration schedule, for example, you can build a higher-order schedule by copying elements.

Noise schedules for pretrained model
  • 6-iteration schedule was obtained using grid search. After, based on obtained scheme, by hand, I found a slightly better approximation.
  • 7-iteration schedule was obtained in the same way.
  • 12-iteration schedule was obtained in the same way.
  • 25-iteration schedule was obtained using Fibonacci sequence benchmark.fibonacci(...).
  • 50-iteration schedule was obtained by repeating elements from 25-iteration scheme.
  • 100-iteration schedule was obtained in the same way.
  • 1000-iteration schedule was obtained in the same way.

Inference

CLI

Put your mel-spectrograms in some folder. Make a filelist. Then run this command with your own arguments:

sh runs/inference.sh -c <your-config> -ch <your-checkpoint> -ns <your-noise-schedule> -m <your-mel-filelist> -v "yes"

Jupyter Notebook

More inference details are provided in notebooks/inference.ipynb. There you can also find how to set a noise schedule for the model and make grid search for the best scheme.


Other

Generated audios

Examples of generated audios are provided in generated_samples folder. Quality degradation between 1000-iteration and 6-iteration inferences is not noticeable if found the best schedule for the latter.

Pretrained checkpoints

You can find a pretrained checkpoint file* on LJSpeech (22KHz) via this Google Drive link.

*Note: uploaded checkpoint is a dict with a single key 'model'.


Important details, issues and comments

  • During training WaveGrad uses a default noise schedule with 1000 iterations and linear scale betas from range (1e-6, 0.01). For inference you can set another schedule with less iterations. Tune betas carefully, the output quality really highly depends on it.
  • By default model runs in a mixed-precision way. Batch size is modified compared to the paper (256 -> 96) since authors trained their model on TPU.
  • After ~10k training iterations (1-2 hours) on a single GPU the model performs good generation for 50-iteration inference. Total training time is about 1-2 days (for absolute convergence).
  • At some point training might start to behave weird and crazy (loss explodes), so I have introduced learning rate (LR) scheduling and gradient clipping. If loss explodes for your data, then try to decrease LR scheduler gamma a bit. It should help.
  • By default hop length of your STFT is equal 300 (thus total upsampling factor). Other cases are not tested, but you can try. Remember, that total upsampling factor should be still equal to your new hop length.

History of updates

  • (NEW: 10/24/2020) Huge update. Distributed training and mixed-precision support. More correct positional encoding. CLI support for inference. Parallel grid search. Model size significantly decreased.
  • New RTF info for NVIDIA Tesla K80 GPU card (popular in Google Colab service) and CPU Intel Xeon 2.3GHz.
  • Huge update. New 6-iteration well generated sample example. New noise schedule setting API. Added the best schedule grid search code.
  • Improved training by introducing smarter learning rate scheduler. Obtained high-fidelity synthesis.
  • Stable training and multi-iteration inference. 6-iteration noise scheduling is supported.
  • Stable training and fixed-iteration inference with significant background static noise left. All positional encoding issues are solved.
  • Stable training of 25-, 50- and 1000-fixed-iteration models. Found no linear scaling (C=5000 from paper) of positional encoding (bug).
  • Stable training of 25-, 50- and 1000-fixed-iteration models. Fixed positional encoding downscaling. Parallel segment sampling is replaced by full-mel sampling.
  • (RELEASE, first on GitHub). Parallel segment sampling and broken positional encoding downscaling. Bad quality with clicks from concatenation from parallel-segment generation.

References

Comments
  • Matplotlib API change & NaNs for short clips & new hop_length

    Matplotlib API change & NaNs for short clips & new hop_length

    I'm trying to run training on a nvidia xavier agx device running nvidia docker container based on these https://ngc.nvidia.com/catalog/containers/nvidia:l4t-pytorch instructions.

    But i receive following error:

    Initializing logger... Initializing model... Number of parameters: 15810401 Initializing optimizer, scheduler and losses... Initializing data loaders...

    Traceback (most recent call last): File "train.py", line 185, in run(config, args) File "train.py", line 72, in run logger.log_specs(0, specs) File "/media/908f901d-e80b-4a8e-8a16-9e0f1b896732/TTS/thorsten-de/models/model-v02/WaveGrad/logger.py", line 53, in log_specs self.add_image(key, plot_tensor_to_numpy(image), iteration, dataformats='HWC') File "/media/908f901d-e80b-4a8e-8a16-9e0f1b896732/TTS/thorsten-de/models/model-v02/WaveGrad/utils.py", line 66, in plot_tensor_to_numpy im = ax.imshow(tensor, aspect="auto", origin="bottom", interpolation='none', cmap='hot') File "/usr/local/lib/python3.6/dist-packages/matplotlib/init.py", line 1438, in inner return func(ax, *map(sanitize_sequence, args), **kwargs) File "/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_axes.py", line 5521, in imshow resample=resample, **kwargs) File "/usr/local/lib/python3.6/dist-packages/matplotlib/image.py", line 905, in init **kwargs File "/usr/local/lib/python3.6/dist-packages/matplotlib/image.py", line 246, in init cbook._check_in_list(["upper", "lower"], origin=origin) File "/usr/local/lib/python3.6/dist-packages/matplotlib/cbook/init.py", line 2257, in _check_in_list .format(v, k, ', '.join(map(repr, values)))) ValueError: 'bottom' is not a valid value for origin; supported values are 'upper', 'lower'

    python3 -V: Python 3.6.9 pip3 -V: 20.2.3

    Running pip3 list shows following installed packages:

    absl-py (0.10.0)
    appdirs (1.4.4)
    cachetools (4.1.1)
    certifi (2020.6.20)
    chardet (3.0.4)
    cycler (0.10.0)
    Cython (0.29.20)
    decorator (4.4.2)
    future (0.18.2)
    google-auth (1.22.1)
    google-auth-oauthlib (0.4.1)
    grpcio (1.32.0)
    idna (2.10)
    importlib-metadata (2.0.0)
    kiwisolver (1.2.0)
    Mako (1.1.3)
    Markdown (3.3)
    MarkupSafe (1.1.1)
    matplotlib (3.3.1)
    numpy (1.19.0)
    oauthlib (3.1.0)
    Pillow (7.2.0)
    pip (9.0.1)
    protobuf (3.13.0)
    pyasn1 (0.4.8)
    pyasn1-modules (0.2.8)
    pycuda (2019.1.2)
    pyparsing (2.4.7)
    python-dateutil (2.8.1)
    pytools (2020.3.1)
    requests (2.24.0)
    requests-oauthlib (1.3.0)
    rsa (4.6)
    setuptools (50.3.0)
    six (1.15.0)
    tensorboard (2.3.0)
    tensorboard-plugin-wit (1.7.0)
    torch (1.6.0)
    torchaudio (0.6.0a0+d6f81d1)
    torchvision (0.7.0a0+6631b74)
    tqdm (4.50.2)
    urllib3 (1.25.10)
    Werkzeug (1.0.1)
    wheel (0.35.1)
    zipp (3.3.0)
    

    I tried matplotlib (3.3.1) and 3.3.2 both with same result.

    Any ideas what i miss? Thank you.

    question update 
    opened by thorstenMueller 27
  • Exponents calculation in positional encoding

    Exponents calculation in positional encoding

    https://github.com/ivanvovk/WaveGrad/blob/f59d4bd257144183d0db44bc616c4af5fc51dfe6/model/linear_modulation.py#L21-L24

    At line 22, the exponents are calculated as: exponents = exponents ** 1e-4 instead of exponents = 1e-4 ** exponents from the original transformer paper.

    This makes the values very closed to each other at different dimensions, I plot an example using exponents = exponents ** 1e-4, with noise level linspace(0, 1, 250) and n_channels=512.

    Figure_1

    After changing to exponents = 1e-4 ** exponents, the positional encoding looks fine:

    Figure_2

    The strange thing is that even with the current position coding, the model seems to be trained well. I tried to train on LibriTTS, and the generated speech sounds okay to me. I'll try to switch to the latter and see whether there will be an improvement.

    bug update 
    opened by enhuiz 9
  • loss explode question

    loss explode question

    Hello, Thanks for your work. it is very helpful to me. I have a question about loss exploding I tried to train ljspeech data which you used with default setting (lr = 1e-3) and i had NaN loss issue So i reduced lr to 5e-4 then there is no NaN loss isuue but loss exploding (normal loss : < 0.07, exploded loss : > 732M) I know there are codes for prevent loss exploding like lr schedule, clipping however it is not working Can you help me?

    opened by junkeon 7
  • schedules model for other dataset and different sample rate

    schedules model for other dataset and different sample rate

    I am not fully understand the Noise schedules . Is the model in schedules/pretrained suitable for other dataset , 22k and 16k?

    I tried to train my own dataset whose sample-rate is 16000, and use the pretrained schedules model(16, 25 and 100 iters), the predict results sound good, especially using 100 iters.

    But I don't understand, why the schedules model can also used for 16k sample-rate? Or though the synthsized wavs are good, it is not the correct way?

    question 
    opened by Liujingxiu23 4
  • reduce noise with lower iteration

    reduce noise with lower iteration

    what's your suggestion to reduce the background noise with lower iteration (e.g., 100 iterations) during inference? the model is actually okay for inference with 1000 iterations.

    opened by taalua 3
  • Is it possible to load .npy spectrograms directly?

    Is it possible to load .npy spectrograms directly?

    Hi, on the notebook it isn't clear whether you can load your own spectrogram directly (for TTS inference) or not. Is this possible and if so, have you tried it?

    question 
    opened by george-roussos 3
  • predict_start_from_noise

    predict_start_from_noise

    I do not understand this funcition in the script "diffusion_process.py". Could someone help me?

    Frankly speaking,I do not fully understand the process of diffusion and the inverse process. But for other functions, I can find corresponding equation in the paper "wavegrad" or "Denoising Diffusion Probabilistic Models". But for this function, I cannot.

    And the inverse process that generate a wave seems much differ and complex than the process in https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/inference.py. lines 56~64. Why is there such a difference?

    question 
    opened by Liujingxiu23 2
  • Were your `generated_samples` generated using a model trained with AMP?

    Were your `generated_samples` generated using a model trained with AMP?

    I'm also adding WaveGrad to my implementation. I have a question for you. Were your generated_samples generated using a model trained with AMP?

    I think this repository is very nice. Good job!

    question 
    opened by Hiroshiba 2
  • ValueError: low >= high

    ValueError: low >= high

    https://github.com/ivanvovk/WaveGrad/blob/6be2f4c228e366eb850db5983344508c0d0fac87/data.py#L47-L49

    Since np.random.randint has the right exclusive boundary, np.random.randint(0, 0) raises an error. Consider to switch to random.randint or change to audio.shape[-1] > self.segment_length?

    update 
    opened by enhuiz 2
  • Unable to load the pre-trained parameters for inference

    Unable to load the pre-trained parameters for inference

    Hey, the quality of generated results is amazing! 🤩 But, unfortunately, I have a little problem with inferencing the model.

    I downloaded the checkpoint file you provide named wavegrad_lj_pretrained.pt (which I renamed to checkpoint_wavegrad_lj_pretrained.pt) and moved to the following location in WaveGrad directory.

    ├── logs
    │   └── default
    │       └── checkpoint_wavegrad_lj_pretrained.pt
    

    This is because of the regex requirement existing in the following code

    https://github.com/ivanvovk/WaveGrad/blob/714cb827dee4b310c249b6165033228a1ff2ced7/utils.py#L19

    which is referred from the fifth cell of inference.ipynb as following:

    # model.load_state_dict(torch.load('../logs/default/checkpoint_180630.pt)['model'], strict=False)
    model, _, _ = utils.load_latest_checkpoint(config.training_config.logdir, model)
    

    When I run this cell I get the following error. 🙁

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-6-2e2546d6cd38> in <module>
          1 # model.load_state_dict(torch.load('../logs/default/checkpoint_180630.pt)['model'], strict=False)
    ----> 2 model, _, _ = utils.load_latest_checkpoint(config.training_config.logdir, model)
    
    ~/Desktop/WaveGrad/utils.py in load_latest_checkpoint(logdir, model, optimizer)
         25 
         26 def load_latest_checkpoint(logdir, model, optimizer=None):
    ---> 27     latest_model_path = latest_checkpoint_path(logdir, regex="checkpoint_*.pt")
         28     print(f'Latest checkpoint: {latest_model_path}')
         29     d = torch.load(
    
    ~/Desktop/WaveGrad/utils.py in latest_checkpoint_path(dir_path, regex)
         19 def latest_checkpoint_path(dir_path, regex="checkpoint_*.pt"):
         20     f_list = glob.glob(os.path.join(dir_path, regex))
    ---> 21     f_list.sort(key=lambda f: int("".join(filter(str.isdigit, f))))
         22     x = f_list[-1]
         23     return x
    
    ~/Desktop/WaveGrad/utils.py in <lambda>(f)
         19 def latest_checkpoint_path(dir_path, regex="checkpoint_*.pt"):
         20     f_list = glob.glob(os.path.join(dir_path, regex))
    ---> 21     f_list.sort(key=lambda f: int("".join(filter(str.isdigit, f))))
         22     x = f_list[-1]
         23     return x
    
    ValueError: invalid literal for int() with base 10: ''
    

    Can you please tell me how to fix this issue?

    question 
    opened by RahulBhalley 2
  • configuration for hop_size 256

    configuration for hop_size 256

    when I configure factors for [4,4,4,2,2] to match with hop_size=256 but I could not find segment_length Would you have idea for exactly matching segment_length to match dilation, padding for the configuration?

    I got error below with segment_length=7200

    Traceback (most recent call last):
      File "train.py", line 185, in <module>
        run(config, args)
      File "train.py", line 92, in run
        loss = model.compute_loss(mels, batch)
      File "/content/WaveGrad/model/diffusion_process.py", line 176, in compute_loss
        eps_recon = self.nn(mels, y_noisy, continuous_sqrt_alpha_cumprod)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/content/WaveGrad/model/nn.py", line 119, in forward
        ublock_outputs = ublock(x=ublock_outputs, scale=scale, shift=shift)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/content/WaveGrad/model/upsampling.py", line 82, in forward
        outputs = self.first_block_main_branch['modulation'](outputs, scale, shift)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/content/WaveGrad/model/upsampling.py", line 30, in forward
        outputs = self.featurewise_affine(x, scale, shift)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/content/WaveGrad/model/linear_modulation.py", line 98, in forward
        outputs = scale * x + shift
    RuntimeError: The size of tensor a (450) must match the size of tensor b (448) at non-singleton dimension 2
    

    full configuration

    {
        "model_config": {
            "factors": [4,4,4,2,2],
            "upsampling_preconv_out_channels": 768,
            "upsampling_out_channels": [512, 512, 256, 128, 128],
            "upsampling_dilations": [
                [1, 2, 1, 2],
                [1, 2, 1, 2],
                [1, 2, 4, 8],
                [1, 2, 4, 8],
                [1, 2, 4, 8]
            ],
            "downsampling_preconv_out_channels": 32,
            "downsampling_out_channels": [128, 128, 256, 512],
            "downsampling_dilations": [
                [1, 2, 4], [1, 2, 4], [1, 2, 4], [1, 2, 4]
            ]
        },
        "data_config": {
            "sample_rate": 22050,
            "n_fft": 1024,
            "win_length": 1024,
            "hop_length": 256,
            "f_min": 0,
            "f_max": 8000,
            "n_mels": 80
        },
    
    
    opened by yhgon 2
  • Poor Synthesis Quality on 44k Sample Rate

    Poor Synthesis Quality on 44k Sample Rate

    Hello, I have two somewhat identical datasets with similar samples. I have trained the WaveGrad with 22k sample rate audios and it is quite good. However, the synthesis quality for 44k sample rate data is not as good. Would really appreciate any suggestions, especially in terms of changing model parameters. The only changes in parameters are as follows:

    sample rate: 44k n_fft: 2048 window_size: 2048 ho_length: 512

    opened by Malik7115 1
  • Static Noise with f_max =  10000

    Static Noise with f_max = 10000

    Hi, when training with f_max = 10000 there is a static noise introduced in the model. We are using our custom dataset. Is there any way to improve upon this? Thanks.

    opened by Malik7115 0
  • Using NVIDIA RTX 3090 GPU?

    Using NVIDIA RTX 3090 GPU?

    I'm fortunate enough to have a machine with an NVIDIA RTX 3090 GPU. However, the GPU-enabled binary versions of PyTorch 1.6.0 available from the PyTorch project won't run on the 3090, and probably won't run on any 3000 series GPUs - the necessary CUDA binaries don't seem to be compiled in.

    PyTorch 1.7.0 does run on my 3090, so I've built a virtual enviroment with that and torchaudio 0.7.0. I started up training on the "LJ" dataset to see if it worked and it appeared to be functioning; it used about 11.5 GB of GPU RAM and about 45% of GPU processors. Do you anticipate any other problems with PyTorch 1.7.0, or should I go ahead with training on my own dataset?

    opened by znmeb 0
  • slow training in single GPU

    slow training in single GPU

    Huge thanks for implement! I have a question regarding the training time in the single GPU you mentioned. I did the same training procedure in batch size 96 on the RTX 2080Ti GPU as you did, but it took a lot longer than the training time you mentioned (12hrs to ~10k training iterations). I have no idea the cause of this issue at all. Could you explain your training environment precisely?

    Please refer to my working environment at the bottom. Docker environment with CUDA 10.1 cuDNN v7 ubuntu 18.04 python 3.8

    opened by guanamusic 1
  • Interpolation and Conv order in Upsample module

    Interpolation and Conv order in Upsample module

    Hi,

    I noticed that in upsampling block, the code does Conv1dWithInitialization and then InterpolationBlock.

    code link

    But in paper network structure it looks like the reverse order of yours. image

    Could you please confirm this?

    Thanks

    opened by softrimewu 1
Owner
Ivan Vovk
• Mathematics • Machine Learning • Speech technologies
Ivan Vovk
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

Rishikesh (ऋषिकेश) 55 Dec 26, 2022
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

Keon Lee 59 Dec 6, 2022
Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos This repository is the official tensorflow python implementation

Yasamin Jafarian 287 Jan 6, 2023
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 1, 2023
FFTNet vocoder implementation

Unofficial Implementation of FFTNet vocode paper. implement the model. implement tests. overfit on a single batch (sanity check). linearize weights fo

Eren Gölge 81 Dec 8, 2022
Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Neural Spatio-Temporal Point Processes [arxiv] Ricky T. Q. Chen, Brandon Amos, Maximilian Nickel Abstract. We propose a new class of parameterizations

Facebook Research 75 Dec 19, 2022
《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

Towards High Fidelity Face-Relighting with Realistic Shadows Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu. In CVPR, 2021. T

null 114 Dec 10, 2022
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

HiFiGAN Denoiser This is a Unofficial Pytorch implementation of the paper HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep F

Rishikesh (ऋषिकेश) 134 Dec 27, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 8, 2022
This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

H3DS Dataset This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction Access

Crisalix 72 Dec 10, 2022
A two-stage U-Net for high-fidelity denoising of historical recordings

A two-stage U-Net for high-fidelity denoising of historical recordings Official repository of the paper (not submitted yet): E. Moliner and V. Välimäk

Eloi Moliner Juanpere 57 Jan 5, 2023
SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis Pretrained Models In this work, we created synthetic tissue

Emirhan Kurtuluş 1 Feb 7, 2022
Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P

Zhying 77 Dec 21, 2022
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese mandarin text to speech based on Fastspeech2 and Unet This is a modification and adpation of fastspeech2 to mandrin(普通话). Many modifications t

null 291 Jan 2, 2023
Fast and Simple Neural Vocoder, the Multiband RNNMS

Multiband RNN_MS Fast and Simple vocoder, Multiband RNN_MS. Demo Quick training How to Use System Details Results References Demo ToDO: Link super gre

tarepan 5 Jan 11, 2022
From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)

Under-exposure introduces a series of visual degradation, i.e. decreased visibility, intensive noise, and biased color, etc. To address these problems, we propose a novel semi-supervised learning approach for low-light image enhancement.

Yang Wenhan 117 Jan 3, 2023
Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

GD-Thief Red Team tool for exfiltrating files from a target's Google Drive that you(the attacker) has access to, via the Google Drive API. This includ

Antonio Piazza 39 Dec 27, 2022
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

null 34 Dec 28, 2022