FFTNet vocode paper.
Unofficial Implementation of- implement the model.
- implement tests.
- overfit on a single batch (sanity check).
- linearize weights for eval time.
- measure the run-time on GPU and CPU. (1 sec audio takes ~47 secs) If anyone knows additional tricks from the paper, let me know. So far I asked the authors but nobody returned.
- train on LJSpeech spectrograms.
- distill model as in Parallel WaveNet paper.