StarGAN-ZSVC: Unofficial PyTorch Implementation

Overview

StarGAN-ZSVC: Unofficial PyTorch Implementation

This repository is an unofficial PyTorch implementation of StarGAN-ZSVC by Matthew Baas and Herman Kamper. This repository provides both model architectures and the code to inference or train them.

One of the StarGAN-ZSVC advantages is that it works on zero-shot settings and can be trained on unparallel audio data (different audio content by different speakers). Also, the model inference time is real-time or faster.

Disclaimer: I implement this repository for educational purpose only. All credits go to the original authors. Also, it may contains different details as described in the paper. If there is a room for improvement, please feel free to contact me.

Set up

git clone [email protected]:Top34051/stargan-zsvc.git
cd stargan-zsvc
conda env create -f environment.yml
conda activate stargan-zsvc

Usage

Voice conversion

Given two audio files, source.wav and target.wav, you can generate a new audio file with the same speaking content as in source.wav spoken by the speaker in target.wav as follow.

First, load my pretrained model weights (best.pt) and put it in checkpoints folder.

Next, we need to embed both speaker identity.

python embed.py --path path_to_source.wav --name src
python embed.py --path path_to_target.wav --name trg

This will generate src.npy and trg.npy, the source and target speaker embeddings.

To perform voice conversion,

python convert.py \
  --audio_path path_to_source.wav \
  --src_id src \
  --trg_id trg  

That's it! 🎉 You can check out the result at results/output.wav.

Training

To train the model, you have to download and preprocess the dataset first. Since your data might be different from mine, I recommend you to read and fix the logic I used in preprocess.py (the dataset I used is here).

The fixed size utterances from each speaker will be extracted, resampled to 22,050 Hz, and converted to Mel-spectrogram with window and hop length of size 1024 and 256. This will preprocess the speaker embeddings as well, so that you don't have to embed them one-by-one.

The processed dataset will look like this

data/
    train/
        spk1.npy # contains N samples of (80, 128) mel-spectrogram
        spk2.npy
        ...
    test/
        spk1.npy
        spk2.npy
        ...
        
embeddings/
    spk1.npy # a (256, ) speaker embedding vector
    spk2.npy
    ...

You can customize some of the training hyperparameters or select resuming checkpoint in config.json. Finally, train the models by

python main.py \ 
  --config_file config.json 
  --num_epoch 3000

You will now see new checkpoint pops up in the checkpoints folder.

Please check out my code and modify them for improvement. Have fun training! ✌️

You might also like...
Unofficial PyTorch implementation of Fastformer based on paper
Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

Fastformer-PyTorch Unofficial PyTorch implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Usage : import t

Unofficial pytorch implementation of paper
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

One-Shot Free-View Neural Talking Head Synthesis Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Vide

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial PyTorch implementation of Google AI's VoiceFilter system
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

Unofficial PyTorch implementation of MobileViT based on paper
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Unofficial PyTorch implementation of MobileViT.
Unofficial PyTorch implementation of MobileViT.

MobileViT Overview This is a PyTorch implementation of MobileViT specified in "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Tr

Unofficial PyTorch implementation of SimCLR by Google Brain

Unofficial PyTorch implementation of SimCLR by Google Brain

Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'
Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'

pytorch-AdaIN This is an unofficial pytorch implementation of a paper, Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [Hua

Unofficial PyTorch Implementation of Multi-Singer

Multi-Singer Unofficial PyTorch Implementation of Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus. Requirements See re

Owner
Jirayu Burapacheep
Deep learning enthusiast; Undergrad in Computer and Data Science at UW-Madison
Jirayu Burapacheep
This is an unofficial PyTorch implementation of Meta Pseudo Labels

This is an unofficial PyTorch implementation of Meta Pseudo Labels. The official Tensorflow implementation is here.

Jungdae Kim 320 Jan 8, 2023
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 6, 2023
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

Rishabh Anand 184 Dec 12, 2022
Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

nam-pytorch Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al. [abs, pdf] Installation You can access nam-pytorch vi

Rishabh Anand 11 Mar 14, 2022
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 3, 2023
Unofficial Pytorch Implementation of WaveGrad2

WaveGrad 2 — Unofficial PyTorch Implementation WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Unofficial PyTorch+Lightning Implementati

MINDs Lab 104 Nov 29, 2022
The author's officially unofficial PyTorch BigGAN implementation.

BigGAN-PyTorch The author's officially unofficial PyTorch BigGAN implementation. This repo contains code for 4-8 GPU training of BigGANs from Large Sc

Andy Brock 2.6k Jan 2, 2023
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 4, 2023
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 906 Jan 3, 2023
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 54 Aug 30, 2021