iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Overview

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

This repo try to implement iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform specifically model C8C8I. Disclaimer : This repo is build for testing purpose. The code is not optimized for performance.

Training :

python train.py --config config_v1.json

Note:

  • We are able to get good quality of audio with 30 % less training compared to original hifigan.
  • This model approx 60 % faster than counterpart hifigan.

Citations :

@inproceedings{kaneko2022istftnet,
title={{iSTFTNet}: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform},
author={Takuhiro Kaneko and Kou Tanaka and Hirokazu Kameoka and Shogo Seki},
booktitle={ICASSP},
year={2022},
}

References:

Comments
  • window_sum in stft is just a constant?

    window_sum in stft is just a constant?

    I print the window_sum in stft, line: 155, find that the value will a constant, except for the former and latter padding positions. the window function only plays the role of linear scaling. Does this result meet the windowing expectations?

    opened by xiaoyangnihao 3
  • RuntimeError: istft input and window must be on the same device but got self on cuda:0 and window on cpu

    RuntimeError: istft input and window must be on the same device but got self on cuda:0 and window on cpu

    My command to run:

    python3 train.py --config config_v1.json --input_wavs_dir /home/yehor/iSTFTNet-pytorch/lada_wavs --input_training_file /home/yehor/iSTFTNet-pytorch/training_list.txt --input_validation_file /home/yehor/iSTFTNet-pytorch/validation_list.txt
    

    Error:

    ...        (2): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(5,))
          )
        )
      )
      (conv_post): Conv1d(128, 18, kernel_size=(7,), stride=(1,), padding=(3,))
      (reflection_pad): ReflectionPad1d((1, 0))
    )
    checkpoints directory :  cp_hifigan
    Epoch: 1
    /home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
      return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
    /home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
      return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
    /home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
      return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
    /home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
      return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
    Traceback (most recent call last):
      File "train.py", line 280, in <module>
        main()
      File "train.py", line 276, in main
        train(0, a, h)
      File "train.py", line 126, in train
        y_g_hat = stft.inverse(spec, phase)
      File "/home/yehor/iSTFTNet-pytorch/stft.py", line 198, in inverse
        inverse_transform = torch.istft(
    RuntimeError: istft input and window must be on the same device but got self on cuda:0 and window on cpu
    
    opened by egorsmkv 2
  • Different sample rate

    Different sample rate

    Hi @rishikksh20 , thanks for your work.

    I have a question. If I want to use the 16K sampling rate, how do I modify the configuration file? It should not just modify sampling_rate in json.

    opened by wizardk 2
  • Added generator's iSTFT size to config file & iSTFT speed-ups

    Added generator's iSTFT size to config file & iSTFT speed-ups

    1. Added 2 extra hyperparameters to config_v1.json:

      • "gen_istft_n_fft": 16
      • "gen_istft_hop_size": 4
    2. Replaced Seetharaman's version of STFT with built-in torch implementation for a speed boost (because function window_sumsquare invoked by STFT.inverse is implemented in plain CPU)

    opened by aqtq314 2
  • A multi-gpu training bug

    A multi-gpu training bug

    stft.py line 164->165: window_sum = window_sum.cuda() if magnitude.is_cuda else window_sum inverse_transform[:, :, approx_nonzero_indices] /= window_sum[approx_nonzero_indices], would get errors . Because, inverse_transform might in cuda1 while window_sum in cuda0. Change line 164 to window_sum = window_sum.to(inverse_transform.device()) if magnitude.is_cuda else window_sum will fix the problem.

    opened by mayfool 1
  • Single frequency line problem

    Single frequency line problem

    Thanks for the implemention of ISTFT. It has better inference speed than hifigan v1.However, I found that there is a single frequency line which would cause little noise.I use 16KHZ dataset for training.And all the line is extractly at 4k which is the middle of the all frequency.I'm trying to fix this problem, do you have the same problem?

    opened by mayfool 7
  • Fix TypeError: 'torch.device' object is not callable

    Fix TypeError: 'torch.device' object is not callable

    As the issue https://github.com/rishikksh20/iSTFTNet-pytorch/issues/1, the line 164 in stft.py was changed to https://github.com/rishikksh20/iSTFTNet-pytorch/blob/e928a6b604033a3857757562af36241f9225adfc/stft.py#L164

    But inverse_transform.device() will raise the exception mentioned in the title. So it can be changed to inverse_transform.device to fix the problem.

    opened by leminhnguyen 0
Owner
Rishikesh (ऋषिकेश)
Deep Learning/ AI Researcher | Open Source enthusiast | Text to Speech | Speech Synthesis | Generative Models | Object detection | Computer Vision
Rishikesh (ऋषिकेश)
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

Artifici Online Services inc. 74 Oct 7, 2022
This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Text Summarizer This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text. Team Members This mini-project was

null 1 Nov 16, 2021
Programme de chiffrement et de déchiffrement inverse d'un message en python3.

Chiffrement Inverse En Python3 Programme de chiffrement et de déchiffrement inverse d'un message en python3. Explication du chiffrement inverse avec c

Malik Makkes 2 Mar 26, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

LightSpeech UnOfficial PyTorch implementation of LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search.

Rishikesh (ऋषिकेश) 54 Dec 3, 2022
A fast and lightweight python-based CTC beam search decoder for speech recognition.

pyctcdecode A fast and feature-rich CTC beam search decoder for speech recognition written in Python, providing n-gram (kenlm) language model support

Kensho 315 Dec 21, 2022
lightweight, fast and robust columnar dataframe for data analytics with online update

streamdf Streamdf is a lightweight data frame library built on top of the dictionary of numpy array, developed for Kaggle's time-series code competiti

null 23 May 19, 2022
A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

Machinalis 1.2k Dec 18, 2022
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first

Ilaria Manco 91 Dec 23, 2022
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

Maksim Terpilowski 49 Dec 30, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 6, 2022
मराठी भाषा वाचविण्याचा एक प्रयास. इंग्रजी ते मराठीचा शब्दकोश. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

For English, scroll down मराठी शब्द मराठी भाषा वाचवण्यासाठी मी हा ओपन सोर्स प्रोजेक्ट सुरू केला आहे. माझ्या मते, आपली भाषा हळूहळू आणि कोणाचाही लक्षात

मुक्त स्त्रोत 20 Oct 11, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r

null 3.2k Dec 28, 2022
Easy, fast, effective, and automatic g-code compression!

Getting to the meat of g-code. Easy, fast, effective, and automatic g-code compression! MeatPack nearly doubles the effective data rate of a standard

Scott Mudge 97 Nov 21, 2022
Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

Facebook Research 24.1k Jan 5, 2023
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Dec 31, 2022
Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

Facebook Research 22.2k Feb 18, 2021
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 4.3k Feb 18, 2021