Chunked Autoregressive GAN (CARGAN)
Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [companion website]
Table of contents
Installation
pip install cargan
Configuration
All configuration is performed in cargan/constants.py
. The default configuration is CARGAN. Additional configuration files for experiments described in our paper can be found in config/
.
Inference
CLI
Infer from an audio files on disk. audio_files
and output_files
can be lists of files to perform batch inference.
python -m cargan \
--audio_files
\
--output_files
\
--checkpoint
\
--gpu
Infer from files of features on disk. feature_files
and output_files
can be lists of files to perform batch inference.
python -m cargan \
--feature_files
\
--output_files
\
--checkpoint
\
--gpu
API
cargan.from_audio
"""Perform vocoding from audio
Arguments
audio : torch.Tensor(shape=(1, samples))
The audio to vocode
sample_rate : int
The audio sample rate
gpu : int or None
The index of the gpu to use
Returns
vocoded : torch.Tensor(shape=(1, samples))
The vocoded audio
"""
cargan.from_audio_file_to_file
"""Perform vocoding from audio file and save to file
Arguments
audio_file : Path
The audio file to vocode
output_file : Path
The location to save the vocoded audio
checkpoint : Path
The generator checkpoint
gpu : int or None
The index of the gpu to use
"""
cargan.from_audio_files_to_files
"""Perform vocoding from audio files and save to files
Arguments
audio_files : list(Path)
The audio files to vocode
output_files : list(Path)
The locations to save the vocoded audio
checkpoint : Path
The generator checkpoint
gpu : int or None
The index of the gpu to use
"""
cargan.from_features
"""Perform vocoding from features
Arguments
features : torch.Tensor(shape=(1, cargan.NUM_FEATURES, frames)
The features to vocode
gpu : int or None
The index of the gpu to use
Returns
vocoded : torch.Tensor(shape=(1, cargan.HOPSIZE * frames))
The vocoded audio
"""
cargan.from_feature_file_to_file
"""Perform vocoding from feature file and save to disk
Arguments
feature_file : Path
The feature file to vocode
output_file : Path
The location to save the vocoded audio
checkpoint : Path
The generator checkpoint
gpu : int or None
The index of the gpu to use
"""
cargan.from_feature_files_to_files
"""Perform vocoding from feature files and save to disk
Arguments
feature_files : list(Path)
The feature files to vocode
output_files : list(Path)
The locations to save the vocoded audio
checkpoint : Path
The generator checkpoint
gpu : int or None
The index of the gpu to use
"""
Reproducing results
For the following subsections, the arguments are as follows
checkpoint
- Path to an existing checkpoint on diskdatasets
- A list of datasets to use. Supported datasets arevctk
,daps
,cumsum
, andmusdb
.gpu
- The index of the gpu to usegpus
- A list of indices of gpus to use for distributed data parallelism (DDP)name
- The name to give to an experiment or evaluationnum
- The number of samples to evaluate
Download
Downloads, unzips, and formats datasets. Stores datasets in data/datasets/
. Stores formatted datasets in data/cache/
.
python -m cargan.data.download --datasets
vctk
must be downloaded before cumsum
.
Preprocess
Prepares features for training. Features are stored in data/cache/
.
python -m cargan.preprocess --datasets
--gpu
Running this step is not required for the cumsum
experiment.
Partition
Partitions a dataset into training, validation, and testing partitions. You should not need to run this, as the partitions used in our work are provided for each dataset in cargan/assets/partitions/
.
python -m cargan.partition --datasets
The optional --overwrite
flag forces the existing partition to be overwritten.
Train
Trains a model. Checkpoints and logs are stored in runs/
.
python -m cargan.train \
--name
\
--datasets
\
--gpus
You can optionally specify a --checkpoint
option pointing to the directory of a previous run. The most recent checkpoint will automatically be loaded and training will resume from that checkpoint. You can overwrite a previous training by passing the --overwrite
flag.
You can monitor training via tensorboard
as follows.
tensorboard --logdir runs/ --port
Evaluate
Objective
Reports the pitch RMSE (in cents), periodicity RMSE, and voiced/unvoiced F1 score. Results are both printed and stored in eval/objective/
.
python -m cargan.evaluate.objective \
--name
\
--datasets
\
--checkpoint
\
--num
\
--gpu
Subjective
Generates samples for subjective evaluation. Also performs benchmarking of inference speed. Results are stored in eval/subjective/
.
python -m cargan.evaluate.subjective \
--name
\
--datasets
\
--checkpoint
\
--num
\
--gpu
Receptive field
Get the size of the (non-causal) receptive field of the generator. cargan.AUTOREGRESSIVE
must be False
to use this.
python -m cargan.evaluate.receptive_field
Running tests
pip install pytest
pytest
Citation
IEEE
M. Morrison, R. Kumar, K. Kumar, P. Seetharaman, A. Courville, and Y. Bengio, "Chunked Autoregressive GAN for Conditional Waveform Synthesis," Submitted to ICLR 2022, April 2022.
BibTex
@inproceedings{morrison2022chunked,
title={Chunked Autoregressive GAN for Conditional Waveform Synthesis},
author={Morrison, Max and Kumar, Rithesh and Kumar, Kundan and Seetharaman, Prem and Courville, Aaron and Bengio, Yoshua},
booktitle={Submitted to ICLR 2022},
month={April},
year={2022}
}