This code is an unofficial implementation of HiFiSinger.

Overview

HiFiSinger

This code is an unofficial implementation of HiFiSinger. The algorithm is based on the following papers:

Chen, J., Tan, X., Luan, J., Qin, T., & Liu, T. Y. (2020). HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776.
Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., & Liu, T. Y. (2019). Fastspeech: Fast, robust and controllable text to speech. Advances in Neural Information Processing Systems, 32, 3171-3180.
Yamamoto, R., Song, E., & Kim, J. M. (2020, May). Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6199-6203). IEEE.

Requirements

Please see the 'requirements.txt'.

Structure

Generator

  • In training, length regulator use target duration.

Discriminator

  • HiFiSinger uses Sub Frequency GAN(SF-GAN).
  • The frequency range of sampling is fixed and length range is randomized.

Used dataset

  • Code verification was conducted through a limited-sized, private Korean dataset.
  • Please report the information about any available open source dataset.
    • The set of midi files with syncronized lyric and high resolution vocal wave files

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in 'Hyper_Parameters.yaml' according to your environment.

  • Sound

    • Setting basic sound parameters.
  • Tokens

    • The number of Lyric token.
  • Max_Note

    • The highest note value for embedding.
  • Min/Max duration

    • Mel length which model use.
    • Min duration is used at pattern generating only.
  • Encoder

    • Setting the encoder.
  • Duration_Predictor

    • Setting for duration predictor
  • Decoder

    • Setting for decoder.
  • Discriminator

    • Setting for discriminator
    • In frequency range, frequency is the index of mel dimension.
      • The index must be equal or less than Sould.Mel_Dim.
  • Vocoder_Path

    • Setting the traced vocoder path.
    • To generate this, please check Here
  • Train

    • Setting the parameters of training.
  • Use_Mixed_Precision

  • Inference_Batch_Size

    • Setting the batch size when inference
  • Inference_Path

    • Setting the inference path
  • Checkpoint_Path

    • Setting the checkpoint path
  • Log_Path

    • Setting the tensorboard log path
  • Device

    • Setting which GPU device is used in multi-GPU enviornment.
    • Or, if using only CPU, please set '-1'. (But, I don't recommend while training.)

Generate pattern

  • There is no available open source dataset.

Inference file path while training for verification.

  • Inference_for_Training
    • There are two examples for inference.
    • It is midi file based script.

Run

Command

python Train.py -s 
  • -hp

    • The hyper paramter file path
    • This is required.
  • -s

    • The resume step parameter.
    • Default is 0.
You might also like...
Unofficial TensorFlow  implementation of the Keyword Spotting Transformer model
Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.
Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

[TensorFlow] Protein Interface Prediction using Graph Convolutional Networks Unofficial TensorFlow implementation of Protein Interface Prediction usin

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.
Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

nam-pytorch Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al. [abs, pdf] Installation You can access nam-pytorch vi

Unofficial implementation of PatchCore anomaly detection
Unofficial implementation of PatchCore anomaly detection

PatchCore anomaly detection Unofficial implementation of PatchCore(new SOTA) anomaly detection model Original Paper : Towards Total Recall in Industri

Unofficial implementation of
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Unofficial Pytorch Implementation of WaveGrad2
Unofficial Pytorch Implementation of WaveGrad2

WaveGrad 2 — Unofficial PyTorch Implementation WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Unofficial PyTorch+Lightning Implementati

The author's officially unofficial PyTorch BigGAN implementation.
The author's officially unofficial PyTorch BigGAN implementation.

BigGAN-PyTorch The author's officially unofficial PyTorch BigGAN implementation. This repo contains code for 4-8 GPU training of BigGANs from Large Sc

Comments
  • Training fails when `Use_Mixed_Precision` is set to `true`

    Training fails when `Use_Mixed_Precision` is set to `true`

    Hi, I'm trying to use HiFISinger with Tohoku Kiritan; a Japanese singing voice dataset (repo).

    I found that the training with mixed precision seems to fail.

    [Training]:   0%|                                                                                                                                                   | 0/400000 [00:00<?, ?it/s]
    Traceback (most recent call last):
      File "Train.py", line 696, in <module>
        new_Trainer.Train()
      File "Train.py", line 656, in Train
        self.Train_Epoch()
      File "Train.py", line 330, in Train_Epoch
        self.Train_Step(durations, tokens, notes, token_lengths, mels, silences, pitches, mel_lengths)
      File "Train.py", line 239, in Train_Step
        scaled_loss.backward()
      File "/home/data/futabanzu/miniconda3/envs/py38/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/data/futabanzu/miniconda3/envs/py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
        Variable._execution_engine.run_backward(
      File "/home/data/futabanzu/miniconda3/envs/py38/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
        return self._forward_cls.backward(self, *args)  # type: ignore
      File "/home/lab/futabanzu/HIFISinger/thirdparty/HiFiSinger/nvlabs/torch_utils/ops/conv2d_gradfix.py", line 131, in backward
        grad_weight = Conv2dGradWeight.apply(grad_output, input)
      File "/home/lab/futabanzu/HIFISinger/thirdparty/HiFiSinger/nvlabs/torch_utils/ops/conv2d_gradfix.py", line 145, in forward
        grad_weight = op(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
    RuntimeError: Expected tensor for argument #1 'grad_output' to have the same type as tensor for argument #2 'input'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_backward_weight)
    
    opened by 3c1u 3
  • Is `PWGAN_for_HiFiSinger` not public?

    Is `PWGAN_for_HiFiSinger` not public?

    Hi, I tried to use this program, but a link to the vocoder seems to be dead.

    • Vocoder_Path
      • Setting the traced vocoder path.
      • To generate this, please check Here
    opened by 3c1u 2
  • A question about the duration

    A question about the duration

    Hi, I am learning your code recently, but I got a question about the note's duration. While I see a duration extract from midi file, is this duration the phoneme duration? or a word duration? and is the groudtruth duration label predicted by an model ? not from some alignment tool like mfa?

    Thanks!!

    opened by zfishbone01 1
  • Open singing dataset CSD (Children's Song Dataset) may be used for your Work

    Open singing dataset CSD (Children's Song Dataset) may be used for your Work

    I found your work is very interesting to my current research. Many thanks!

    I noticed other people also questioned on the training corpus, which is necessary to reproduce your work completely. So the CSD (Children's Song Dataset) may be useful to make your code be much more influential for other researchers. :-)

    CSD (Children's Song Dataset) https://github.com/emotiontts/emotiontts_open_db/tree/master/Dataset/CSD

    opened by YangWangGit 1
Owner
Heejo You
Main focus: Psycholinguistics / Mechine learning / Deep learning
Heejo You
This is an unofficial PyTorch implementation of Meta Pseudo Labels

This is an unofficial PyTorch implementation of Meta Pseudo Labels. The official Tensorflow implementation is here.

Jungdae Kim 320 Jan 8, 2023
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

Nguyen Mau Dung 438 Dec 29, 2022
Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters [ Project | Paper | Official code base ] ⬅️ Thanks the original

Jianfei Guo 239 Dec 22, 2022
Unofficial Implementation of MLP-Mixer in TensorFlow

mlp-mixer-tf Unofficial Implementation of MLP-Mixer [abs, pdf] in TensorFlow. Note: This project may have some bugs in it. I'm still learning how to i

Rishabh Anand 24 Mar 23, 2022
Unofficial implementation of the Involution operation from CVPR 2021

involution_pytorch Unofficial PyTorch implementation of "Involution: Inverting the Inherence of Convolution for Visual Recognition" by Li et al. prese

Rishabh Anand 46 Dec 7, 2022
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 6, 2023
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 218 Jan 5, 2023
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

MLP-Mixer: An all-MLP Architecture for Vision This repo contains PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision. Usage : impo

Rishikesh (ऋषिकेश) 175 Dec 23, 2022
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

haifeng xia 32 Oct 26, 2022