Project NII pytorch scripts

Overview

project-NII-pytorch-scripts

By Xin Wang, National Institute of Informatics, since 2021

I am a new pytorch user. If you have any suggestions or questions, pleas email wangxin at nii dot ac dot jp

Table of Contents


1. Note

For tutorials on neural vocoders

Tutorials are available in ./tutorials. Please follow the ./tutorials/README and work in this folder first

cd ./tutorials
head -n 2 README.md
# Hands-on materials for neural vocoders

For other projects

Just follow the rest of the README.

The repository is relatively large. You may use --depth 1 option to skip unnecessary files.

git clone --depth 1 https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts.git

Updates

2022-01-08: upload hn-sinc-nsf + hifi-gan

2022-01-08: upload RawNet2 for anti-spoofing

2. Overview

This repository hosts Pytorch codes for the following projects:

2.1 Neural source-filter waveform model

./project/01-nsf

  1. Cyclic-noise neural source-filter waveform model (NSF)

  2. Harmonic-plus-noise NSF with trainable sinc filter (Hn-sinc-NSF)

  3. Harmonic-plus-noise NSF with fixed FIR filter (Hn-NSF)

  4. Hn-sinc-NSF + HiFiGAN discriminator

All the projects include a pre-trained model on CMU-arctic database (4 speakers) and a demo script to run, train, do inference. Please check ./project/01-nsf/README.

Generated samples from pre-trained models are in ./project/01-nsf/*/__pre_trained/output. If not, please run the demo script to produce waveforms using pre-trained models.

Tutorial on NSF models is also available in ./tutorials

Note that this is the re-implementation of the projects based on CURRENNT. All the papers published so far used CURRENNT implementation.

Many samples can be found on NSF homepage.

2.2 Other neural waveform models

./project/05-nn-vocoders

  1. WaveNet vocoder

  2. WaveGlow

  3. Blow

  4. iLPCNet

All the projects include a pre-trained model and a one-click demo script. Please check ./project/05-nn-vocoders/README.

Generated samples from pre-trained models are in ./project/05-nn-vocoders/*/__pre_trained/output.

Tutorial is also available in ./tutorials

2.3 ASVspoof project with toy example

./project/04-asvspoof2021-toy

It takes time to download ASVspoof2019 database. Therefore, this project demonstrates how to train and evaluate the anti-spoofing model using a toy dataset.

Please try this project before checking other ASVspoof projects below.

A similar project is adopted for ASVspoof2021 LFCC-LCNN baseline, although the LFCC front-end is slightly different.

Please check ./project/04-asvspoof2021-toy/README.

2.4 Speech anti-spoofing for ASVspoof 2019 LA

./project/03-asvspoof-mega

This is for this anti-spoofing project (A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection, paper on arxiv).

There were 36 systems investigated, each of which was trained and evaluated for 6 rounds with different random seeds.

EER-mintDCF

This project is later extended to a book chapter called A Practical Guide to Logical Access Voice Presentation Attack Detection. Single system using RawNet2 is added, and score fusion is added.

EER-mintDCF

Pre-trained models, scores, training recipes are all available. Please check ./project/03-asvspoof-mega/README.

2.5 (Preliminary) speech anti-spoofing

./project/02-asvspoof

  1. Baseline LFCC + LCNN-binary-classifier (lfcc-lcnn-sigmoid)

  2. LFCC + LCNN + angular softmax (lfcc-lcnn-a-softmax)

  3. LFCC + LCNN + one-class softmax (lfcc-lcnn-ocsoftmax)

  4. LFCC + ResNet18 + one-class softmax (lfcc-restnet-ocsoftmax)

This is a pilot test on ASVspoof2019 LA task. I trained each system for 6 times on various GPU devices (single V100 or P100 card), each time with a different random initial seed. Figure below shows the DET curves for these systems: det_curve

The results vary a lot when simply changing the initial random seeds, even with the same random seed, Pytorch environment, and deterministic algorithm selected. This preliminary test motivated the study in ./project-03-asvspoof-mega.

For LCNN, please check this paper; for LFCC, please check this paper; for one-class softmax in ASVspoof, please check this paper.

3. Python environment

You may use ./env.yml to create the environment:

# create environment
conda env create -f env.yml

# load environment (whose name is pytorch-1.6)
conda activate pytorch-1.6

4. How to use

Take project/01-nsf/cyc-noise-nsf as an example:

# cd into one project
cd project/01-nsf/cyc-noise-nsf-4

# add PYTHONPATH and activate conda environment
source ../../../env.sh 

# run the script
bash 00_demo.sh

The printed info will show what is happening. The script may need 1 day or more to finish.

You may also put the job to the background rather than waiting for the job in front of the terminal:

# run the script in background
bash 00_demo.sh > log_batch 2>&1 &

The above steps will download the CMU-arctic data, run waveform generation using a pre-trained model, and train a new model.

5. Project design and convention

Data format

  • Waveform: 16/32-bit PCM or 32-bit float WAV that can be read by scipy.io.wavfile.read

  • Other data: binary, float-32bit, little endian (numpy dtype ). The data can be read in python by:

# for a data of shape [N, M]
f = open(filepath,'rb')
datatype = np.dtype(('
   ,(M,)))
data = np.fromfile(f,dtype=datatype)
f.close()

I assume data should be stored in c_continuous format (row-major). There are helper functions in ./core_scripts/data_io/io_tools.py to read and write binary data:

# create a float32 data array
import numpy as np
data = np.asarray(np.random.randn(5, 3), dtype=np.float32)

# write to './temp.bin' and read it as data2
import core_scripts.data_io.io_tools as readwrite
readwrite.f_write_raw_mat(data, './temp.bin')
data2 = readwrite.f_read_raw_mat('./temp.bin', 3)

# result should 0
data - data2

More instructions can be found in the Jupyter notebook ./tutorials/c01_data_format.ipynb.

Files in this repository

Name Function
./core_scripts scripts to manage the training process, data io, and so on
./core_modules finished pytorch modules
./sandbox new functions and modules to be test
./project project directories, and each folder correspond to one model for one dataset
./project/*/*/main.py script to load data and run training and inference
./project/*/*/model.py model definition based on Pytorch APIs
./project/*/*/config.py configurations for training/val/test set data

The motivation is to separate the training and inference process, the model definition, and the data configuration. For example:

  • To define a new model, change model.py

  • To run on a new database, change config.py

How the script works

The script starts with main.py and calls different function for model training and inference.

During training:

     <main.py>        Entry point and controller of training process
        |           
   Argument parse     core_scripts/config_parse/arg_parse.py
   Initialization     core_scripts/startup_config.py
   Choose device     
        | 
Initialize & load     core_scripts/data_io/customize_dataset.py
training data set
        |----------|
        .     Load data set   <config.py> 
        .     configuration 
        .          |
        .     Loop over       core_scripts/data_io/customize_dataset.py
        .     data subset
        .          |       
        .          |---------|
        .          .    Load one subset   core_scripts/data_io/default_data_io.py
        .          .         |
        .          |---------|
        .          |
        .     Combine subsets 
        .     into one set
        .          |
        |----------|
        |
Initialize & load 
development data set  
        |
Initialize Model     <model.py>
Model(), Loss()
        | 
Initialize Optimizer core_scripts/op_manager/op_manager.py
        |
Load checkpoint      --trained-model option to main.py
        |
Start training       core_scripts/nn_manager/nn_manager.py f_train_wrapper()
        |             
        |----------|
        .          |
        .     Loop over training data
        .     for one epoch
        .          |
        .          |-------|    core_scripts/nn_manager/nn_manager.py f_run_one_epoch()
        .          |       |    
        .          |  Loop over 
        .          |  training data
        .          |       |
        .          |       |-------|
        .          |       .    get data_in, data_tar, data_info
        .          |       .    Call data_gen <- Model.forward(...)   <mode.py>
        .          |       .    Call Loss.compute()                   <mode.py>
        .          |       .    loss.backward()
        .          |       .    optimizer.step()
        .          |       .       |
        .          |       |-------|
        .          |       |
        .          |  Save checkpoint 
        .          |       |
        .          |  Early stop?
        .          |       | No  \
        .          |       |      \ Yes
        .          |<------|       |
        .                          |
        |--------------------------|
       Done

A detailed flowchat is ./misc/APPENDIX_1.md. This may be useful if you want to hack on the code.

6 On NSF projects (./project/01-nsf)

Differences from CURRENNT implementation

There may be more, but here are the important ones:

  • "Batch-normalization": in CURRENNT, "batch-normalization" is conducted along the length sequence, i.e., assuming each frame as one sample;

  • No bias in CNN and FF: due to the 1st point, NSF in this repository uses bias=false for CNN and feedforward layers in neural filter blocks, which can be helpful to make the hidden signals around 0;

  • Smaller learning rate: due to the 1st point, learning rate in this repository is decreased from 0.0003 to a smaller value. Accordingly, more training epochs are required;

  • STFT framing/padding: in CURRENNT, the first frame starts from the 1st step of a signal; in this Pytorch repository (as Librosa), the first frame is centered around the 1st step of a signal, and the frame is padded with 0;

  • STFT backward: in CURRENNT, STFT backward follows the steps in this paper; in Pytorch repository, backward over STFT is done by the Pytorch library.

  • ...

The learning curves look similar to the CURRENNT version. learning_curve

24kHz

Most of my experiments are done on 16 kHz waveforms. For 24 kHz waveforms, FIR or sinc digital filters in the model may be changed for better performance:

  1. hn-nsf: lp_v, lp_u, hp_v, and hp_u are calculated for 16 kHz configurations. For different sampling rate, you may use this online tool http://t-filter.engineerjs.com to get the filter coefficients. In this case, the stop-band for lp_v and lp_u is extended to 12k, while the pass-band for hp_v and hp_u is extended to 12k. The reason is that, no matter what is the sampling rate, the actual formats (in Hz) and spectral of sounds don't change with the sampling rate;

  2. hn-sinc-nsf and cyc-noise-nsf: for the similar reason above, the cut-off-frequency value (0, 1) should be adjusted. I will try (hidden_feat * 0.2 + uv * 0.4 + 0.3) * 16 / 24 in model.CondModuleHnSincNSF.get_cut_f();

Links

The end

Comments
  • About fairseq wav2vec2 extractor

    About fairseq wav2vec2 extractor

    hello, i have a problem about wav2vec2 feature extrcator. i use your code class SSLModel(torch_nn.Module) to extract wav2vec2 feature and get error Traceback (most recent call last): File "/home/xieyuankun/asvspoof2019_wav2vec2_fairseq/preprocess.py", line 274, in <module> w2vmodel = SSLModel('/home/xieyuankun/data/wav2vec_big_960h.pt', 1024) File "/home/xieyuankun/asvspoof2019_wav2vec2_fairseq/preprocess.py", line 228, in __init__ md, _, _ = fq.checkpoint_utils.load_model_ensemble_and_task([mpath]) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/checkpoint_utils.py", line 473, in load_model_ensemble_and_task model = task.build_model(cfg.model, from_checkpoint=True) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/tasks/audio_pretraining.py", line 197, in build_model model = super().build_model(model_cfg, from_checkpoint) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/tasks/fairseq_task.py", line 338, in build_model model = models.build_model(cfg, self, from_checkpoint) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/models/__init__.py", line 106, in build_model return model.build_model(cfg, task) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/models/wav2vec/wav2vec2_asr.py", line 208, in build_model w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary)) TypeError: object of type 'NoneType' has no len() it seems like a problem from fairseq fq.checkpoint_utils.load_model_ensemble_and_task([mpath]) which didn't recognize the task: Wav2VecEncoder(cfg, len(task.target_dictionary))

    opened by xieyuankun 6
  • p2sgrad, not converge

    p2sgrad, not converge

    I try implement P2Sgrad using face recognition dataset MS1M v1c. learning rate 0.1, not pretrained, not converge learning rate 1e-3, pretrained, not converge

    opened by LuckyZxy182 5
  • Why normalize target data ?

    Why normalize target data ?

    As in the model.py file like: "/project/03-asvspoof-mega/lfcc-lcnn-lstmsum-p2s/01/model.py ", I don't understand why use def normalize_target(self, y): in line #272, I thought the target data was the label just 0 or 1. Alos don't understand the difference of "# case 2, loss is defined independent of pt_model" and "# case 1, pt_model.loss is available" in line #112 and line #120 of nn_manager.py file.

    opened by Amforever 5
  • Can I use mgc extracted from WORLD?

    Can I use mgc extracted from WORLD?

    First of all, thanks for your amazing code. When I read the paper https://arxiv.org/pdf/1810.11946.pdf, I found it used 60 dim MGC extracted by WOLRD.

    Acoustic features, including 60 dimensions of Melgeneralized cepstral coefficients (MGCs) [18] and 1 dimension of F0, were extracted from the 48 kHz waveforms at a frame shift of 5 ms using WORLD [19].

    And in the config file project/01-nsf/hn-nsf/config.py, it becomes 80 dim mel spec. Can I use the feature described in paper? Thanks in advance.

    opened by OnceJune 5
  • How to generate with frame shift 12.5ms

    How to generate with frame shift 12.5ms

    Hi, I am runing a exp (cyc-noise-nsf-4) with 12.5ms frame shift, 50ms frame length (to match the config of Tacotron). I only modify input_reso = [200, 200] in config.py, and corresponding args to extract mel and f0 But, the f0 of the synthesized audio looks dijscontinuous. Can you help me? 屏幕截图 2020-12-14 135748

    opened by Chunhui-Lu 5
  • The quality of longer wavs generated by hn-sinc-nsf worsens over time.

    The quality of longer wavs generated by hn-sinc-nsf worsens over time.

    Hello. I trained hn-sinc-nsf model by running project/hn-sinc-nsf-9/00_demo.sh and each generated WAV listed in test_list in config.py sounds good. But when I synthesized a longer WAV, the quality of generated sound worsens over time.

    For example, [1]I prepared a longer WAV file with length of 1:52 by concatenating all WAVs listed in test_list for 3 times randomly. I extracted its acoustic features and synthesized a WAV file with trained hn-sinc-nsf model using these features.

    From 1:41 to 1:47, this big WAV contains the data of slt_arctic_b0474 and slt_arctic_b0476 but their quality of sound are inferior to those of [2][3]separately generated WAVs

    This phenomenon occurs with default 00_demo.sh settings(acoustic features of mel-spectrum, f0), with other acoustic features(mel-generalized cepstrum, band aperiodicity, f0), and with other data-set(NIT-SONG070 singing voice data-set provided at HTS webpage.

    Could anyone please advise me to avoid this trouble?

    1. https://drive.google.com/file/d/1konFc3QtgTNUhCUOgGULDRJjfM44Zkvl/view?usp=sharing
    2. https://drive.google.com/file/d/1dYDvsoGKZgmEl7BNlkX1HRuQzEL5Vf0m/view?usp=sharing
    3. https://drive.google.com/file/d/1ctLLzYoOn1t5lGRwtu79RIYAE3s3s3xJ/view?usp=sharing
    opened by taroushirani 5
  • Tacotron 2

    Tacotron 2

    Hi, thank you, for your great Job. I wondering should I retrain Tacotron 2 with the same sample rate if I want to feed output from Tacotron 2 to this project?

    opened by kikirizki 4
  • an analysis-synthesis example

    an analysis-synthesis example

    So many stuff in the tutorials, but I can't find something like the analysis and synthesis of an existing wav file. I mean, given a wav file, first do an analysis of it and use the result of the analysis to synthesize a new wav file.

    opened by zhouyong64 3
  • Fail to run project/hn-sinc-nsf-9/00_demo.sh on Google Colaboratory

    Fail to run project/hn-sinc-nsf-9/00_demo.sh on Google Colaboratory

    Hello. I tried to run project/hn-sinc-nsf-9/00_demo.sh on Google Colaboratory but I failed. The contents of log_err are as follows;

    /usr/local/lib/python3.6/dist-packages/torch/functional.py:516: UserWarning: stft will require the return_complex parameter be explicitly  specified in a future PyTorch release. Use return_complex=False  to preserve the current behavior or return_complex=True to return  a complex output. (Triggered internally at  /pytorch/aten/src/ATen/native/SpectralOps.cpp:653.)
      normalized, onesided, return_complex)
    Traceback (most recent call last):
      File "main.py", line 170, in <module>
        main()
      File "main.py", line 114, in main
        trn_set, val_set, checkpoint)
      File "/content/project-NN-Pytorch-scripts/core_scripts/nn_manager/nn_manager.py", line 336, in f_train_wrapper
        epoch_idx, optimizer, normtarget_f)
      File "/content/project-NN-Pytorch-scripts/core_scripts/nn_manager/nn_manager.py", line 153, in f_run_one_epoch
        loss_computed = loss_wrapper.compute(data_gen, normed_target)
      File "/content/project-NN-Pytorch-scripts/project/hn-sinc-nsf-9/model.py", line 957, in compute
        pad_mode="constant")
      File "/usr/local/lib/python3.6/dist-packages/torch/functional.py", line 516, in stft
        normalized, onesided, return_complex)
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
    

    It seems that torch.hann_window in class Loss returns the Tensor on cpu, which is incompatible with outputs (on CUDA) at line 955 in project/hn-sinc-nsf-9/model.py. I uploaded the jupyter notebook file which I used to gist.github.com[1] so please see it for detail.

    1. https://gist.github.com/taroushirani/a263d1c33a23191ad0cedacf76f36409
    opened by taroushirani 3
  • How to generate fir filters for 24k, low pass and high pass?

    How to generate fir filters for 24k, low pass and high pass?

    hi, i am runing a exp with 24khz audio, can u help me? https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/blob/8c8318612e467c61c9d7d9315714e522bce3f2fe/project/hn-nsf/model.py#L594-L617

    opened by azraelkuan 2
  • fairseq

    fairseq

    Is fairseq version 0.10.2 used? When I run bash 00_ demo. sh model-W2V-XLSR-ft-GF/config_ train_ Asvspoof2019 times error:Traceback (most recent call last): File "main.py", line 211, in main() File "main.py", line 191, in main model = prj_model.Model(test_set.get_in_dim(),
    File "/home/hcl/ASVspoof_code/project-NN-Pytorch-scripts-master/project/07-asvspoof-ssl/model-W2V-XLSR-ft-GF/config_train_asvspoof2019/01/model.py", line 197, in init self.m_ssl = SSLModel(ssl_path, ssl_orig_output_dim) File "/home/hcl/ASVspoof_code/project-NN-Pytorch-scripts-master/project/07-asvspoof-ssl/model-W2V-XLSR-ft-GF/config_train_asvspoof2019/01/model.py", line 44, in init model, _, _ = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path]) File "/home/hcl/anaconda3/envs/torch1.6/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 279, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File "/home/hcl/anaconda3/envs/torch1.6/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 232, in load_checkpoint_to_cpu state = _upgrade_state_dict(state) File "/home/hcl/anaconda3/envs/torch1.6/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 420, in _upgrade_state_dict state["args"].task = "translation" AttributeError: 'NoneType' object has no attribute 'task'

    opened by beijita-yegucheng 1
Owner
Yamagishi and Echizen Laboratories, National Institute of Informatics
Yamagishi and Echizen Laboratories, National Institute of Informatics, Japan
Yamagishi and Echizen Laboratories, National Institute of Informatics
Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning

Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning

Ian Pointer 368 Dec 17, 2022
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Ceph.

Project Aquarium Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Cep

Aquarist Labs 73 Jul 21, 2022
This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

Deepender Singla 1.4k Dec 22, 2022
It's final year project of Diploma Engineering. This project is based on Computer Vision.

Face-Recognition-Based-Attendance-System It's final year project of Diploma Engineering. This project is based on Computer Vision. Brief idea about ou

Neel 10 Nov 2, 2022
Erpnext app for make employee salary on payroll entry based on one or more project with percentage for all project equal 100 %

Project Payroll this app for make payroll for employee based on projects like project on 30 % and project 2 70 % as account dimension it makes genral

Ibrahim Morghim 8 Jan 2, 2023
BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

null 1 Jan 26, 2022
UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down. UpChecker - just run file and use project easy

UpChecker UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down.

Yan 4 Apr 7, 2022
Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Python scripts to detect faces using Python with the BlazeFace Tensorflow Lite models. Tested on Windows 10, Tensorflow 2.4.0 (Python 3.8).

Ibai Gorordo 46 Nov 17, 2022
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
Blender scripts for computing geodesic distance

GeoDoodle Geodesic distance computation for Blender meshes Table of Contents Overivew Usage Implementation Overview This addon provides an operator fo

null 20 Jun 8, 2022
This package contains deep learning models and related scripts for RoseTTAFold

RoseTTAFold This package contains deep learning models and related scripts to run RoseTTAFold This repository is the official implementation of RoseTT

null 1.6k Jan 3, 2023
A toy compiler that can convert Python scripts to pickle bytecode 🥒

Pickora ?? A small compiler that can convert Python scripts to pickle bytecode. Requirements Python 3.8+ No third-party modules are required. Usage us

ꌗᖘ꒒ꀤ꓄꒒ꀤꈤꍟ 68 Jan 4, 2023
Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Ibai Gorordo 35 Sep 7, 2022
Python scripts for performing stereo depth estimation using the HITNET Tensorflow model.

HITNET-Stereo-Depth-estimation Python scripts for performing stereo depth estimation using the HITNET Tensorflow model from Google Research. Stereo de

Ibai Gorordo 76 Jan 2, 2023