This code is an unofficial implementation of HiFiSinger.

Heejo You

Last update: Dec 23, 2022

Related tags

Deep Learning HiFiSinger

Overview

HiFiSinger

This code is an unofficial implementation of HiFiSinger. The algorithm is based on the following papers:

Chen, J., Tan, X., Luan, J., Qin, T., & Liu, T. Y. (2020). HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776.
Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., & Liu, T. Y. (2019). Fastspeech: Fast, robust and controllable text to speech. Advances in Neural Information Processing Systems, 32, 3171-3180.
Yamamoto, R., Song, E., & Kim, J. M. (2020, May). Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6199-6203). IEEE.

Requirements

Please see the 'requirements.txt'.

Structure

Generator

In training, length regulator use target duration.

Discriminator

HiFiSinger uses Sub Frequency GAN(SF-GAN).
The frequency range of sampling is fixed and length range is randomized.

Used dataset

Code verification was conducted through a limited-sized, private Korean dataset.
- Thus, current Pattern_Generator.py and Datasets.py are based on the Korean.
Please report the information about any available open source dataset.
- The set of midi files with syncronized lyric and high resolution vocal wave files

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in 'Hyper_Parameters.yaml' according to your environment.

Sound
- Setting basic sound parameters.
Tokens
- The number of Lyric token.
Max_Note
- The highest note value for embedding.
Min/Max duration
- Mel length which model use.
- Min duration is used at pattern generating only.
Encoder
- Setting the encoder.
Duration_Predictor
- Setting for duration predictor
Decoder
- Setting for decoder.
Discriminator
- Setting for discriminator
- In frequency range, frequency is the index of mel dimension.
  - The index must be equal or less than Sould.Mel_Dim.
Vocoder_Path
- Setting the traced vocoder path.
- To generate this, please check Here
Train
- Setting the parameters of training.
Use_Mixed_Precision
- Setting mix precision usage.
- Need a Nvidia-Apex.
Inference_Batch_Size
- Setting the batch size when inference
Inference_Path
- Setting the inference path
Checkpoint_Path
- Setting the checkpoint path
Log_Path
- Setting the tensorboard log path
Device
- Setting which GPU device is used in multi-GPU enviornment.
- Or, if using only CPU, please set '-1'. (But, I don't recommend while training.)

Generate pattern

There is no available open source dataset.

Inference file path while training for verification.

Inference_for_Training
- There are two examples for inference.
- It is midi file based script.

Run

Command

python Train.py -s

-hp
- The hyper paramter file path
- This is required.
-s
- The resume step parameter.
- Default is 0.

You might also like...

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Keyword Spotting Transformer This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train o

8 May 11, 2022

Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

184 Dec 12, 2022

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

[TensorFlow] Protein Interface Prediction using Graph Convolutional Networks Unofficial TensorFlow implementation of Protein Interface Prediction usin

9 Oct 25, 2022

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

nam-pytorch Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al. [abs, pdf] Installation You can access nam-pytorch vi

11 Mar 14, 2022

Unofficial implementation of PatchCore anomaly detection

PatchCore anomaly detection Unofficial implementation of PatchCore(new SOTA) anomaly detection model Original Paper : Towards Total Recall in Industri

268 Dec 22, 2022

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

52 Dec 29, 2022

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

502 Jan 3, 2023

Unofficial Pytorch Implementation of WaveGrad2

WaveGrad 2 — Unofficial PyTorch Implementation WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Unofficial PyTorch+Lightning Implementati

104 Nov 29, 2022

The author's officially unofficial PyTorch BigGAN implementation.

BigGAN-PyTorch The author's officially unofficial PyTorch BigGAN implementation. This repo contains code for 4-8 GPU training of BigGANs from Large Sc

2.6k Jan 2, 2023

Comments

Training fails when `Use_Mixed_Precision` is set to `true`

Hi, I'm trying to use HiFISinger with Tohoku Kiritan; a Japanese singing voice dataset (repo).

I found that the training with mixed precision seems to fail.

[Training]:   0%|                                                                                                                                                   | 0/400000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "Train.py", line 696, in <module>
    new_Trainer.Train()
  File "Train.py", line 656, in Train
    self.Train_Epoch()
  File "Train.py", line 330, in Train_Epoch
    self.Train_Step(durations, tokens, notes, token_lengths, mels, silences, pitches, mel_lengths)
  File "Train.py", line 239, in Train_Step
    scaled_loss.backward()
  File "/home/data/futabanzu/miniconda3/envs/py38/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/data/futabanzu/miniconda3/envs/py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
    Variable._execution_engine.run_backward(
  File "/home/data/futabanzu/miniconda3/envs/py38/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/home/lab/futabanzu/HIFISinger/thirdparty/HiFiSinger/nvlabs/torch_utils/ops/conv2d_gradfix.py", line 131, in backward
    grad_weight = Conv2dGradWeight.apply(grad_output, input)
  File "/home/lab/futabanzu/HIFISinger/thirdparty/HiFiSinger/nvlabs/torch_utils/ops/conv2d_gradfix.py", line 145, in forward
    grad_weight = op(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
RuntimeError: Expected tensor for argument #1 'grad_output' to have the same type as tensor for argument #2 'input'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_backward_weight)

opened by 3c1u 3

Is `PWGAN_for_HiFiSinger` not public?
Hi, I tried to use this program, but a link to the vocoder seems to be dead.

Vocoder_Path

Setting the traced vocoder path.

To generate this, please check Here
opened by 3c1u 2
A question about the duration

Hi, I am learning your code recently, but I got a question about the note's duration. While I see a duration extract from midi file, is this duration the phoneme duration? or a word duration? and is the groudtruth duration label predicted by an model ? not from some alignment tool like mfa?

Thanks!!

opened by zfishbone01 1
Open singing dataset CSD (Children's Song Dataset) may be used for your Work

I found your work is very interesting to my current research. Many thanks!

I noticed other people also questioned on the training corpus, which is necessary to reproduce your work completely. So the CSD (Children's Song Dataset) may be useful to make your code be much more influential for other researchers. :-)

CSD (Children's Song Dataset) https://github.com/emotiontts/emotiontts_open_db/tree/master/Dataset/CSD

opened by YangWangGit 1

Owner

Heejo You

Main focus: Psycholinguistics / Mechine learning / Deep learning

GitHub

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

32 Oct 26, 2022

This code is an unofficial implementation of HiFiSinger.

Related tags

Overview

HiFiSinger

Requirements

Structure

Generator

Discriminator

Used dataset

Hyper parameters

Generate pattern

Inference file path while training for verification.

Run

Command

You might also like...

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

Unofficial implementation of PatchCore anomaly detection

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

Unofficial Pytorch Implementation of WaveGrad2

The author's officially unofficial PyTorch BigGAN implementation.

Comments

Training fails when `Use_Mixed_Precision` is set to `true`

Is `PWGAN_for_HiFiSinger` not public?

A question about the duration

Open singing dataset CSD (Children's Song Dataset) may be used for your Work

Owner

Heejo You

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

This is an unofficial PyTorch implementation of Meta Pseudo Labels

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

Unofficial Implementation of MLP-Mixer in TensorFlow

Unofficial implementation of the Involution operation from CVPR 2021

An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.