Fast and Simple vocoder, Multiband RNN_MS.
Demo
ToDO: Link super great impressive high-quatity audio demo.
Quick Training
Jump to ☞ , then Run. That's all!
How to Use
1. Install
# pip install "torch==1.10.0" -q # Based on your environment (validated with v1.10)
# pip install "torchaudio==0.10.0" -q # Based on your environment
pip install git+https://github.com/tarepan/MultibandRNNMS
2. Data & Preprocessing
"Batteries Included".
RNNMS transparently download corpus and preprocess it for you
3. Train
python -m mbrnnms.main_train
For arguments, check ./mbrnnms/config.py
Advanced: Other datasets
You can switch dataset with arguments.
All speechcorpusy
's preset corpuses are supported.
# LJSpeech corpus
python -m mbrnnms.main_train data.data_name=LJ
Advanced: Custom dataset
Copy mbrnnms.main_train
and replace DataModule.
# datamodule = LJSpeechDataModule(batch_size, ...)
datamodule = YourSuperCoolDataModule(batch_size, ...)
# That's all!
System Details
Model
- PreNet: GRU
- Upsampler: time-directional nearest interpolation
- Decoder: Embedding-auto-regressive generative RNN with 10-bit μ-law encoding
Results
Output Sample
Performance
X [iter/sec] @ NVIDIA T4 on Google Colaboratory (AMP+, num_workers=8)
It takes about Ydays for full training.
References
Acknowlegements
- : Basic vocoder concept came from this paper.
- bshall/UniversalVocoding: Model and hyperparams are derived from this repository. All codes are re-written.