End-to-End Speech Processing Toolkit

ESPnet

Last update: Jan 4, 2023

Related tags

Deep Learning deep-learning chainer end-to-end machine-translation pytorch speech-synthesis speech-recognition kaldi voice-conversion speech-separation speech-enhancement speech-translation

Overview

ESPnet: end-to-end speech processing toolkit

system/pytorch ver.	1.3.1	1.4.0	1.5.1	1.6.0	1.7.1	1.8.1	1.9.0
ubuntu20/python3.9/pip
ubuntu20/python3.8/pip
ubuntu18/python3.7/pip
debian9/python3.7/conda
centos7/python3.7/conda
doc/python3.8

ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments.

Key Features

Kaldi style complete recipe

Support numbers of ASR recipes (WSJ, Switchboard, CHiME-4/5, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, REVERB, etc.)
Support numbers of TTS recipes with a similar manner to the ASR recipe (LJSpeech, LibriTTS, M-AILABS, etc.)
Support numbers of ST recipes (Fisher-CallHome Spanish, Libri-trans, IWSLT'18, How2, Must-C, Mboshi-French, etc.)
Support numbers of MT recipes (IWSLT'16, the above ST recipes etc.)
Support speech separation and recognition recipe (WSJ-2mix)
Support voice conversion recipe (VCC2020 baseline) (new!)
Support speech language understanding recipe (FSC baseline) (new!)

ASR: Automatic Speech Recognition

State-of-the-art performance in several ASR benchmarks (comparable/superior to hybrid DNN/HMM and CTC)
Hybrid CTC/attention based end-to-end ASR
- Fast/accurate training with CTC/attention multitask training
- CTC/attention joint decoding to boost monotonic alignment decoding
- Encoder: VGG-like CNN + BiRNN (LSTM/GRU), sub-sampling BiRNN (LSTM/GRU) or Transformer
Attention: Dot product, location-aware attention, variants of multihead
Incorporate RNNLM/LSTMLM/TransformerLM/N-gram trained only with text data
Batch GPU decoding
Transducer based end-to-end ASR
- Architecture:
  - RNN-based encoder and decoder.
  - Custom encoder and decoder supporting Transformer, Conformer (encoder), TDNN (encoder) and causal Conv1D (decoder) blocks.
  - VGG2L (RNN/custom encoder) and Conv2D (custom encoder) bottlenecks.
- Search algorithms:
  - Greedy search constrained to one emission by timestep.
  - Default beam search algorithm without prefix search.
  - Alignment-Length Synchronous decoding (Saon et al., 2020).
  - Time Synchronous Decoding (Saon et al., 2020).
  - N-step Constrained beam search modified from Kim et al., 2020.
  - modified Adaptive Expansion Search based on Kim et al. (2021) and NSC.
- Features:
  - Multi-task learning with various auxiliary tasks: CTC, Label smoothing, auxiliary RNN-T and symmetric KL divergence.
  - Transfer learning with acoustic model and/or language model.
  - Training with FastEmit regularization method.
Please refer to the tutorial page for complete documentation.
CTC segmentation
Non-autoregressive model based on Mask-CTC
ASR examples for supporting endangered language documentation (Please refer to egs/puebla_nahuatl and egs/yoloxochitl_mixtec for details)
Wav2Vec2.0 pretrained model as Encoder, imported from FairSeq.
Self-supervised learning representations as features, using upstream models in S3PRL in frontend.

Demonstration

Real-time ASR demo with ESPnet2
Gradio Web Demo on Huggingface Spaces. Check out the Web Demo

TTS: Text-to-speech

Architecture
- Tacotron2
- Transformer-TTS
- FastSpeech
- FastSpeech2
- Conformer FastSpeech & FastSpeech2
- VITS
Multi-speaker & multi-language extention
- Pretrined speaker embedding (e.g., X-vector)
- Speaker ID embedding
- Language ID embedding
- Global style token (GST) embedding
- Mix of the above embeddings
End-to-end training
- End-to-end text-to-wav model (e.g., VITS)
- Joint training of text2mel and vocoder
Various language support
- En / Jp / Zn / De / Ru / And more...
Integration with neural vocoders
- Parallel WaveGAN
- MelGAN
- Multi-band MelGAN
- HiFiGAN
- StyleMelGAN
- Mix of the above models

Demonstration

Real-time TTS demo with ESPnet2

To train the neural vocoder, please check the following repositories:

NOTE:

We are moving on ESPnet2-based development for TTS.

If you are beginner, we recommend using ESPnet2-TTS.

SE: Speech enhancement (and separation)

Single-speaker speech enhancement
Multi-speaker speech separation
Unified encoder-separator-decoder structure for time-domain and frequency-domain models
- Encoder/Decoder: STFT/iSTFT, Convolution/Transposed-Convolution
- Separators: BLSTM, Transformer, Conformer, DPRNN, Neural Beamformers, etc.
Flexible ASR integration: working as an individual task or as the ASR frontend
Easy to import pretrained models from Asteroid
- Both the pre-trained models from Asteroid and the specific configuration are supported.

Demonstration

Interactive SE demo with ESPnet2

ST: Speech Translation & MT: Machine Translation

State-of-the-art performance in several ST benchmarks (comparable/superior to cascaded ASR and MT)
Transformer based end-to-end ST (new!)
Transformer based end-to-end MT (new!)

VC: Voice conversion

Transformer and Tacotron2 based parallel VC using melspectrogram (new!)
End-to-end VC based on cascaded ASR+TTS (Baseline system for Voice Conversion Challenge 2020!)

SLU: Speech Language Understanding

Predicting intent by directly classifying it as one of intent or decoding by character
Transformer & RNN based encoder-decoder model
Establish SOTA results with spectral augmentation (Performs better than reported results of pretrained model on Fluent Speech Command Dataset)

DNN Framework

Flexible network architecture thanks to chainer and pytorch
Flexible front-end processing thanks to kaldiio and HDF5 support
Tensorboard based monitoring

ESPnet2

See ESPnet2.

Independent from Kaldi/Chainer, unlike ESPnet1
On the fly feature extraction and text processing when training
Supporting DistributedDataParallel and DaraParallel both
Supporting multiple nodes training and integrated with Slurm or MPI
Supporting Sharded Training provided by fairscale
A template recipe which can be applied for all corpora
Possible to train any size of corpus without CPU memory error
ESPnet Model Zoo
Integrated with wandb

Installation

If you intend to do full experiments including DNN training, then see Installation.

If you just need the Python module only:

pip install espnet
# To install latest
# pip install git+https://github.com/espnet/espnet

You need to install some packages.

pip install torch
pip install chainer==6.0.0 cupy==6.0.0    # [Option] If you'll use ESPnet1
pip install torchaudio                    # [Option] If you'll use enhancement task
pip install torch_optimizer               # [Option] If you'll use additional optimizers in ESPnet2

There are some required packages depending on each task other than above. If you meet ImportError, please install them at that time.

(ESPnet2) Once installed, run wandb login and set --use_wandb true to enable tracking runs using W&B.

Usage

See Usage.

Docker Container

go to docker/ and follow instructions.

Contribution

Thank you for taking times for ESPnet! Any contributions to ESPnet are welcome and feel free to ask any questions or requests to issues. If it's the first contribution to ESPnet for you, please follow the contribution guide.

Results and demo

You can find useful tutorials and demos in Interspeech 2019 Tutorial

ASR results

expand

We list the character error rate (CER) and word error rate (WER) of major ASR tasks.

Task	CER (%)	WER (%)	Pretrained model
Aishell dev/test	4.6/5.1	N/A	link
ESPnet2 Aishell dev/test	4.4/4.7	N/A	link
Common Voice dev/test	1.7/1.8	2.2/2.3	link
CSJ eval1/eval2/eval3	5.7/3.8/4.2	N/A	link
ESPnet2 CSJ eval1/eval2/eval3	4.5/3.3/3.6	N/A	link
HKUST dev	23.5	N/A	link
ESPnet2 HKUST dev	21.2	N/A	link
Librispeech dev_clean/dev_other/test_clean/test_other	N/A	1.9/4.9/2.1/4.9	link
ESPnet2 Librispeech dev_clean/dev_other/test_clean/test_other	0.6/1.5/0.6/1.4	1.7/3.4/1.8/3.6	link
Switchboard (eval2000) callhm/swbd	N/A	14.0/6.8	link
TEDLIUM2 dev/test	N/A	8.6/7.2	link
TEDLIUM3 dev/test	N/A	9.6/7.6	link
WSJ dev93/eval92	3.2/2.1	7.0/4.7	N/A
ESPnet2 WSJ dev93/eval92	1.1/0.8	2.8/1.8	link

Note that the performance of the CSJ, HKUST, and Librispeech tasks was significantly improved by using the wide network (#units = 1024) and large subword units if necessary reported by RWTH.

If you want to check the results of the other recipes, please check egs/<name_of_recipe>/asr1/RESULTS.md.

ASR demo

expand

You can recognize speech in a WAV file using pretrained models. Go to a recipe directory and run utils/recog_wav.sh as follows:

# go to recipe directory and source path of espnet tools
cd egs/tedlium2/asr1 && . ./path.sh
# let's recognize speech!
recog_wav.sh --models tedlium2.transformer.v1 example.wav

where example.wav is a WAV file to be recognized. The sampling rate must be consistent with that of data used in training.

Available pretrained models in the demo script are listed as below.

Model	Notes
tedlium2.rnn.v1	Streaming decoding based on CTC-based VAD
tedlium2.rnn.v2	Streaming decoding based on CTC-based VAD (batch decoding)
tedlium2.transformer.v1	Joint-CTC attention Transformer trained on Tedlium 2
tedlium3.transformer.v1	Joint-CTC attention Transformer trained on Tedlium 3
librispeech.transformer.v1	Joint-CTC attention Transformer trained on Librispeech
commonvoice.transformer.v1	Joint-CTC attention Transformer trained on CommonVoice
csj.transformer.v1	Joint-CTC attention Transformer trained on CSJ
csj.rnn.v1	Joint-CTC attention VGGBLSTM trained on CSJ

SE results

expand

We list results from three different models on WSJ0-2mix, which is one the most widely used benchmark dataset for speech separation.

Model	STOI	SAR	SDR	SIR
TF Masking	0.89	11.40	10.24	18.04
Conv-Tasnet	0.95	16.62	15.94	25.90
DPRNN-Tasnet	0.96	18.82	18.29	28.92

SE demos

expand

You can try the interactive demo with Google Colab. Please click the following button to get access to the demos.

It is based on ESPnet2. Pretrained models are available for both speech enhancement and speech separation tasks.

ST results

expand

We list 4-gram BLEU of major ST tasks.

end-to-end system

Task	BLEU	Pretrained model
Fisher-CallHome Spanish fisher_test (Es->En)	51.03	link
Fisher-CallHome Spanish callhome_evltest (Es->En)	20.44	link
Libri-trans test (En->Fr)	16.70	link
How2 dev5 (En->Pt)	45.68	link
Must-C tst-COMMON (En->De)	22.91	link
Mboshi-French dev (Fr->Mboshi)	6.18	N/A

cascaded system

Task	BLEU	Pretrained model
Fisher-CallHome Spanish fisher_test (Es->En)	42.16	N/A
Fisher-CallHome Spanish callhome_evltest (Es->En)	19.82	N/A
Libri-trans test (En->Fr)	16.96	N/A
How2 dev5 (En->Pt)	44.90	N/A
Must-C tst-COMMON (En->De)	23.65	N/A

If you want to check the results of the other recipes, please check egs/<name_of_recipe>/st1/RESULTS.md.

ST demo

expand

(New!) We made a new real-time E2E-ST + TTS demonstration in Google Colab. Please access the notebook from the following button and enjoy the real-time speech-to-speech translation!

You can translate speech in a WAV file using pretrained models. Go to a recipe directory and run utils/translate_wav.sh as follows:

# go to recipe directory and source path of espnet tools
cd egs/fisher_callhome_spanish/st1 && . ./path.sh
# download example wav file
wget -O - https://github.com/espnet/espnet/files/4100928/test.wav.tar.gz | tar zxvf -
# let's translate speech!
translate_wav.sh --models fisher_callhome_spanish.transformer.v1.es-en test.wav

where test.wav is a WAV file to be translated. The sampling rate must be consistent with that of data used in training.

Available pretrained models in the demo script are listed as below.

Model	Notes
fisher_callhome_spanish.transformer.v1	Transformer-ST trained on Fisher-CallHome Spanish Es->En

MT results

expand

Task	BLEU	Pretrained model
Fisher-CallHome Spanish fisher_test (Es->En)	61.45	link
Fisher-CallHome Spanish callhome_evltest (Es->En)	29.86	link
Libri-trans test (En->Fr)	18.09	link
How2 dev5 (En->Pt)	58.61	link
Must-C tst-COMMON (En->De)	27.63	link
IWSLT'14 test2014 (En->De)	24.70	link
IWSLT'14 test2014 (De->En)	29.22	link
IWSLT'16 test2014 (En->De)	24.05	link
IWSLT'16 test2014 (De->En)	29.13	link

TTS results

ESPnet2

You can listen to the generated samples in the following URL.

ESPnet2 TTS generated samples

Note that in the generation we use Griffin-Lim (wav/) and Parallel WaveGAN (wav_pwg/).

You can download pretrained models via espnet_model_zoo.

You can download pretrained vocoders via kan-bayashi/ParallelWaveGAN.

ESPnet1

NOTE: We are moving on ESPnet2-based development for TTS. Please check the latest results in the above ESPnet2 results.

You can listen to our samples in demo HP espnet-tts-sample. Here we list some notable ones:

You can download all of the pretrained models and generated samples:

Note that in the generated samples we use the following vocoders: Griffin-Lim (GL), WaveNet vocoder (WaveNet), Parallel WaveGAN (ParallelWaveGAN), and MelGAN (MelGAN). The neural vocoders are based on following repositories.

kan-bayashi/ParallelWaveGAN: Parallel WaveGAN / MelGAN / Multi-band MelGAN
r9y9/wavenet_vocoder: 16 bit mixture of Logistics WaveNet vocoder
kan-bayashi/PytorchWaveNetVocoder: 8 bit Softmax WaveNet Vocoder with the noise shaping

If you want to build your own neural vocoder, please check the above repositories. kan-bayashi/ParallelWaveGAN provides the manual about how to decode ESPnet-TTS model's features with neural vocoders. Please check it.

Here we list all of the pretrained neural vocoders. Please download and enjoy the generation of high quality speech!

Model link	Lang	Fs [Hz]	Mel range [Hz]	FFT / Shift / Win [pt]	Model type
ljspeech.wavenet.softmax.ns.v1	EN	22.05k	None	1024 / 256 / None	Softmax WaveNet
ljspeech.wavenet.mol.v1	EN	22.05k	None	1024 / 256 / None	MoL WaveNet
ljspeech.parallel_wavegan.v1	EN	22.05k	None	1024 / 256 / None	Parallel WaveGAN
ljspeech.wavenet.mol.v2	EN	22.05k	80-7600	1024 / 256 / None	MoL WaveNet
ljspeech.parallel_wavegan.v2	EN	22.05k	80-7600	1024 / 256 / None	Parallel WaveGAN
ljspeech.melgan.v1	EN	22.05k	80-7600	1024 / 256 / None	MelGAN
ljspeech.melgan.v3	EN	22.05k	80-7600	1024 / 256 / None	MelGAN
libritts.wavenet.mol.v1	EN	24k	None	1024 / 256 / None	MoL WaveNet
jsut.wavenet.mol.v1	JP	24k	80-7600	2048 / 300 / 1200	MoL WaveNet
jsut.parallel_wavegan.v1	JP	24k	80-7600	2048 / 300 / 1200	Parallel WaveGAN
csmsc.wavenet.mol.v1	ZH	24k	80-7600	2048 / 300 / 1200	MoL WaveNet
csmsc.parallel_wavegan.v1	ZH	24k	80-7600	2048 / 300 / 1200	Parallel WaveGAN

If you want to use the above pretrained vocoders, please exactly match the feature setting with them.

TTS demo

ESPnet2

You can try the real-time demo in Google Colab. Please access the notebook from the following button and enjoy the real-time synthesis!

Real-time TTS demo with ESPnet2

English, Japanese, and Mandarin models are available in the demo.

ESPnet1

NOTE: We are moving on ESPnet2-based development for TTS. Please check the latest demo in the above ESPnet2 demo.

You can try the real-time demo in Google Colab. Please access the notebook from the following button and enjoy the real-time synthesis.

Real-time TTS demo with ESPnet1

We also provide shell script to perform synthesize. Go to a recipe directory and run utils/synth_wav.sh as follows:

# go to recipe directory and source path of espnet tools
cd egs/ljspeech/tts1 && . ./path.sh
# we use upper-case char sequence for the default model.
echo "THIS IS A DEMONSTRATION OF TEXT TO SPEECH." > example.txt
# let's synthesize speech!
synth_wav.sh example.txt

# also you can use multiple sentences
echo "THIS IS A DEMONSTRATION OF TEXT TO SPEECH." > example_multi.txt
echo "TEXT TO SPEECH IS A TECHNIQUE TO CONVERT TEXT INTO SPEECH." >> example_multi.txt
synth_wav.sh example_multi.txt

You can change the pretrained model as follows:

synth_wav.sh --models ljspeech.fastspeech.v1 example.txt

Waveform synthesis is performed with Griffin-Lim algorithm and neural vocoders (WaveNet and ParallelWaveGAN). You can change the pretrained vocoder model as follows:

synth_wav.sh --vocoder_models ljspeech.wavenet.mol.v1 example.txt

WaveNet vocoder provides very high quality speech but it takes time to generate.

See more details or available models via --help.

synth_wav.sh --help

VC results

expand

Transformer and Tacotron2 based VC

You can listen to some samples on the demo webpage.

Cascade ASR+TTS as one of the baseline systems of VCC2020

The Voice Conversion Challenge 2020 (VCC2020) adopts ESPnet to build an end-to-end based baseline system. In VCC2020, the objective is intra/cross lingual nonparallel VC. You can download converted samples of the cascade ASR+TTS baseline system here.

SLU results

ESPnet2

Transformer based SLU for Fluent Speech Command Dataset

In SLU, The objective is to infer the meaning or intent of spoken utterance. The Fluent Speech Command Dataset describes an intent as combination of 3 slot values: action, object and location. You can see baseline results on this dataset here

CTC Segmentation demo

ESPnet1

CTC segmentation determines utterance segments within audio files. Aligned utterance segments constitute the labels of speech datasets.

As demo, we align start and end of utterances within the audio file ctc_align_test.wav, using the example script utils/asr_align_wav.sh. For preparation, set up a data directory:

cd egs/tedlium2/align1/
# data directory
align_dir=data/demo
mkdir -p ${align_dir}
# wav file
base=ctc_align_test
wav=../../../test_utils/${base}.wav
# recipe files
echo "batchsize: 0" > ${align_dir}/align.yaml

cat << EOF > ${align_dir}/utt_text
${base} THE SALE OF THE HOTELS
${base} IS PART OF HOLIDAY'S STRATEGY
${base} TO SELL OFF ASSETS
${base} AND CONCENTRATE
${base} ON PROPERTY MANAGEMENT
EOF

Here, utt_text is the file containing the list of utterances. Choose a pre-trained ASR model that includes a CTC layer to find utterance segments:

# pre-trained ASR model
model=wsj.transformer_small.v1
mkdir ./conf && cp ../../wsj/asr1/conf/no_preprocess.yaml ./conf

../../../utils/asr_align_wav.sh \
    --models ${model} \
    --align_dir ${align_dir} \
    --align_config ${align_dir}/align.yaml \
    ${wav} ${align_dir}/utt_text

Segments are written to aligned_segments as a list of file/utterance name, utterance start and end times in seconds and a confidence score. The confidence score is a probability in log space that indicates how good the utterance was aligned. If needed, remove bad utterances:

min_confidence_score=-5
awk -v ms=${min_confidence_score} '{ if ($5 > ms) {print} }' ${align_dir}/aligned_segments

The demo script utils/ctc_align_wav.sh uses an already pretrained ASR model (see list above for more models). It is recommended to use models with RNN-based encoders (such as BLSTMP) for aligning large audio files; rather than using Transformer models that have a high memory consumption on longer audio data. The sample rate of the audio must be consistent with that of the data used in training; adjust with sox if needed. A full example recipe is in egs/tedlium2/align1/.

ESPnet2

CTC segmentation determines utterance segments within audio files. Aligned utterance segments constitute the labels of speech datasets.

As demo, we align start and end of utterances within the audio file ctc_align_test.wav. This can be done either directly from the Python command line or using the script espnet2/bin/asr_align.py.

From the Python command line interface:

# load a model with character tokens
from espnet_model_zoo.downloader import ModelDownloader
d = ModelDownloader(cachedir="./modelcache")
wsjmodel = d.download_and_unpack("kamo-naoyuki/wsj")
# load the example file included in the ESPnet repository
import soundfile
speech, rate = soundfile.read("./test_utils/ctc_align_test.wav")
# CTC segmentation
from espnet2.bin.asr_align import CTCSegmentation
aligner = CTCSegmentation( **wsjmodel , fs=rate )
text = """
utt1 THE SALE OF THE HOTELS
utt2 IS PART OF HOLIDAY'S STRATEGY
utt3 TO SELL OFF ASSETS
utt4 AND CONCENTRATE ON PROPERTY MANAGEMENT
"""
segments = aligner(speech, text)
print(segments)
# utt1 utt 0.26 1.73 -0.0154 THE SALE OF THE HOTELS
# utt2 utt 1.73 3.19 -0.7674 IS PART OF HOLIDAY'S STRATEGY
# utt3 utt 3.19 4.20 -0.7433 TO SELL OFF ASSETS
# utt4 utt 4.20 6.10 -0.4899 AND CONCENTRATE ON PROPERTY MANAGEMENT

Aligning also works with fragments of the text. For this, set the gratis_blank option that allows skipping unrelated audio sections without penalty. It's also possible to omit the utterance names at the beginning of each line, by setting kaldi_style_text to False.

aligner.set_config( gratis_blank=True, kaldi_style_text=False )
text = ["SALE OF THE HOTELS", "PROPERTY MANAGEMENT"]
segments = aligner(speech, text)
print(segments)
# utt_0000 utt 0.37 1.72 -2.0651 SALE OF THE HOTELS
# utt_0001 utt 4.70 6.10 -5.0566 PROPERTY MANAGEMENT

The script espnet2/bin/asr_align.py uses a similar interface. To align utterances:

# ASR model and config files from pretrained model (e.g. from cachedir):
asr_config=<path-to-model>/config.yaml
asr_model=<path-to-model>/valid.*best.pth
# prepare the text file
wav="test_utils/ctc_align_test.wav"
text="test_utils/ctc_align_text.txt"
cat << EOF > ${text}
utt1 THE SALE OF THE HOTELS
utt2 IS PART OF HOLIDAY'S STRATEGY
utt3 TO SELL OFF ASSETS
utt4 AND CONCENTRATE
utt5 ON PROPERTY MANAGEMENT
EOF
# obtain alignments:
python espnet2/bin/asr_align.py --asr_train_config ${asr_config} --asr_model_file ${asr_model} --audio ${wav} --text ${text}
# utt1 ctc_align_test 0.26 1.73 -0.0154 THE SALE OF THE HOTELS
# utt2 ctc_align_test 1.73 3.19 -0.7674 IS PART OF HOLIDAY'S STRATEGY
# utt3 ctc_align_test 3.19 4.20 -0.7433 TO SELL OFF ASSETS
# utt4 ctc_align_test 4.20 4.97 -0.6017 AND CONCENTRATE
# utt5 ctc_align_test 4.97 6.10 -0.3477 ON PROPERTY MANAGEMENT

The output of the script can be redirected to a segments file by adding the argument --output segments. Each line contains file/utterance name, utterance start and end times in seconds and a confidence score; optionally also the utterance text. The confidence score is a probability in log space that indicates how good the utterance was aligned. If needed, remove bad utterances:

min_confidence_score=-7
# here, we assume that the output was written to the file `segments`
awk -v ms=${min_confidence_score} '{ if ($5 > ms) {print} }' segments

See the module documentation for more information. It is recommended to use models with RNN-based encoders (such as BLSTMP) for aligning large audio files; rather than using Transformer models that have a high memory consumption on longer audio data. The sample rate of the audio must be consistent with that of the data used in training; adjust with sox if needed.

References

[1] Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai, "ESPnet: End-to-End Speech Processing Toolkit," Proc. Interspeech'18, pp. 2207-2211 (2018)

[2] Suyoun Kim, Takaaki Hori, and Shinji Watanabe, "Joint CTC-attention based end-to-end speech recognition using multi-task learning," Proc. ICASSP'17, pp. 4835--4839 (2017)

[3] Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey and Tomoki Hayashi, "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition," IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240-1253, Dec. 2017

Citations

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}
@inproceedings{inaguma-etal-2020-espnet,
    title = "{ESP}net-{ST}: All-in-One Speech Translation Toolkit",
    author = "Inaguma, Hirofumi  and
      Kiyono, Shun  and
      Duh, Kevin  and
      Karita, Shigeki  and
      Yalta, Nelson  and
      Hayashi, Tomoki  and
      Watanabe, Shinji",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.34",
    pages = "302--311",
}
@inproceedings{li2020espnet,
  title={{ESPnet-SE}: End-to-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
  author={Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph Boeddeker and Zhuo Chen and Shinji Watanabe},
  booktitle={Proceedings of IEEE Spoken Language Technology Workshop (SLT)},
  pages={785--792},
  year={2021},
  organization={IEEE},
}

Comments

[not-for-merge] Transformer
I am currently working on the Transformer for ASR (https://arxiv.org/pdf/1706.03762.pdf). I am implementing it with minimum changes from the original script. If it works, I will adapt it to fit into the E2E module, meanwhile I will keep using a different script (e2e_transformer).

I am currently testing with voxforge dataset with CPU, because I am having some memory issues. It seems that the multihead attention layers consume a huge amount of memory. When the model was training for MNT (utters with a length of ~50) the model consumed 5GB of GPU. For ASR, the input length has more the 100, and so the model requires more than 20 GB of GPU for training with reduced parameters.

I am training with: ./run.sh --stage 3 --ngpu 0 --verbose 1 --backend chainer --mtlalpha 0.0 --elayers 3 --batchsize 20 --maxlen_in 500 --epochs 2

Some changes: Implemented subsampling in the encoder (/4) I will test it also with subsampling by the may cause a memory error. Layers for Enc and dec reduced to 3. Once more, to avoid memory issues.

TODO:

Implement recog script.

Preliminary study
opened by Fhrozen 90
[ESPnet2] Transducer

Hi,

This PR add vanilla RNN-T training + decoding (w/ all search algorithms supported) for ESPnet2. For now, it's quite straightforward and I duplicated and modified classes from ESPnet1: RNNTDecoder, BeamSearchTransducerESPnet2 and ErrorCalculatorTransESPnet2. I also added run script and config files in vivos recipe for testing purpose. The former will be removed in the future.

Concerning performances, I observed some degradations. I'll have to investigate, I may have made a mistake:

| dataset | cer / wer (espnet1) | cer / wer (espnet2) | | - | - | - | | dev | 17.4 / 39.7 | 18.9 / 45.1 | | test | 18.4 / 38.9 | 21.1 / 45.8 |

@kamo-naoyuki I'm not sure what should be modified and future actions, feel free to assign me next tasks!

P.S: I'll extend espnet2 test later to cover RNN-T when we have a proper v1.
Refactoring New Features ASR ESPnet2

opened by b-flo 56
add wav2vec_encoder

This is the initial PR for importing the Wav2Vec2.0 model in ESPnet. Before the code review, there is an issue that FairSeq is now made as an optional package, however, I cannot pass the test. The error message is related to failure for "import fairseq". Can anyone give some suggestions?
Installation New Features ASR ESPnet2

opened by simpleoier 54
Adding L3DAS22 Task1 model to ESPNet-SE

As in the title, I am adding my re-implementation of the model we used to win the L3DAS22 speech enhancement challenge.

Should I make two different files for MISO+BF+MISO and DenseUNet ? Also note that DenseUNet right now can not easily modified. Do you have any ideas on how we can make it more adaptable to e.g. different STFT window sizes ?

NOTE: WIP so don't run the tests for now as they will fail.
ASR ESPnet2 SE ESPnet1

opened by popcornell 51
[WIP] transducer v4
This PR add/modify a bunch of things:

To do / in progress:

Default beam search:

[x] Optimization techniques

[x] prediction net caching

N-Step Constrained beam search (modified version of: https://arxiv.org/pdf/2002.03577.pdf):

[x] RNN-T

[x] RNN-T w/ att.

[x] T-T

[x] VIVOS decode config

[x] Voxforge decode config

[x] VIVOS results

[x] Voxforge results

[x] Optimization techniques

[x] prediction net caching

[ ] Code optimization

Time Synchronous Decoding (https://ieeexplore.ieee.org/document/9053040):

[x] RNN-T

[x] RNN-T w/ att.

[x] T-T

[x] VIVOS decode config

[x] Voxforge decode config

[x] VIVOS results

[x] Voxforge results

[x] Optimization techniques

[x] prediction net caching

[x] prediction net bashing

[x] joint net caching

[ ] Code optimization

Alignment-Length Synchronous Decoding (https://ieeexplore.ieee.org/document/9053040):

[x] RNN-T

[x] RNN-T w/ att.

[x] T-T

[x] VIVOS decode config

[x] Voxforge decode config

[x] VIVOS results

[x] Voxforge results

[x] ! Fix bug final hypothesis empty with transformer (Edit: lazy patch for now, I'll investigate later)

[x] Optimization techniques

[x] prediction net caching

[x] prediction net bashing

[x] joint net caching

[ ] Code optimization

Transformer-Transducer:

[x] Customizable architecture

[x] TDNN-BN-ReLU blocks for encoder part

[x] Conformer blocks for encoder part

[x] CausalConv1d for decoder part

General:

[x] Beam search interface (first version)

[x] Beam search tests (I'll add more tests in another PR I think)

[x] Benchmark: decoding speed (edit: only rough benchmark for now, complete will be done after optimizations)

[x] Fix mtl_mode (#2162)

[x] Fix attention weights saving condition + PlotAttentionReport selection for unified transducer architecture

[x] Fix CER/WER reporting CPU-GPU bug (#2232)

[ ] Fix CER/WER reporting multi GPUs bug (#2084)

[x] Documentation

[x] T-T w/ customizable architecure

[x] New features in main README

Recipe

[x] Voxforge: remove transducer run script + transfer learning scripts for voxforge (too complicated to maintain and documentation is now available for finetuning)

Note/Important: Modifications related to customizable architecture should be discussed. This version is only intended to start the discussion: it works correctly however it's not user-friendly. This feature would be better suited with espnet2 but I think it should be accessible in espnet1 for now as it is necessary to reproduce some (if not most) transformer-transducer related papers.

To be added later: prefix tree when dealing with BPE.

~P.S: I'm now focused on porting transducer to espnet2 so I'll probably keep remaining work on standby.~
Refactoring Documentation CI New Features ASR README ESPnet1
opened by b-flo 48
espnet2 ASR recipe
We're thinking of converting espnet ASR recipes to new espnet (espnet2) ASR recipes (https://github.com/espnet/espnet/tree/v.0.7.0/egs2). The following is a current assignment. I did not finish the assignment of some recipes, and if you volunteer to do it, please let me know!

@ftshijt, @Emrys365, @sas91, @YosukeHiguchi, @simpleoier, Thanks a lot for helping it! This is a temporal assignment. Please let me know if you have any requests for the assignment. Also, if you have any problems, comments on our new design, etc., you may use this issue.

[x] aishell @Emrys365

[x] ami @ftshijt

[ ] aurora4 @sas91

[x] babel @ftshijt

[x] chime4 @sas91

[ ] chime5 @b-flo

[x] commonvoice @ftshijt

[x] csj @YosukeHiguchi

[x] dirha_wsj @ruizhilijhu

[ ] fisher_callhome_spanish

[ ] fisher_swbd @YosukeHiguchi

[x] hkust @Emrys365

[x] how2 @b-flo

[ ] hub4_spanish @ruizhilijhu

[ ] iwslt18

[ ] iwslt19

[ ] jnas @YosukeHiguchi

[x] jsut @YosukeHiguchi

[ ] libri_trans

[ ] librispeech @simpleoier

[ ] must_c

[ ] reverb @sas91

[ ] ru_open_stt

[ ] swbd @YosukeHiguchi

[ ] tedlium2 @simpleoier

[ ] tedlium3 @simpleoier

[ ] timit @Emrys365

[x] vivos @b-flo

[x] voxforge @kamo-naoyuki

[x] yesno @b-flo

[x] wsj @kamo-naoyuki

Stale Recipe ASR ESPnet2
opened by sw005320 46
[ESPnet2] distributed training
Anyone test this.

This PR is too complex to explain, so I'd like to show the examples only:

See: https://github.com/espnet/espnet/wiki/About-distributed-training

I added drop_last arguments for Batch Sampler.

~~For training, drop_last is true. In Distributed training, mini-batch is divided by worldsize and each worker must have 1 or more batch-size. To avoid 0-batchsize, drop_last=true for training~~.

For training, drop_last is false.

For validation, drop_last is false. ~~Validation mode perform only at RANK==0 worker.~~

For inference, drop_last is false.

By default, drop_last is false.

New Features ESPnet2
opened by kamo-naoyuki 41
Espnet2 transducer v2
Hi,

This PR is a draft for the new version of Transducer models in ESPnet2, separated from the main ASR task (CTC+Att). It's working but please note that :

This is not the final version, it's only to open discussion. If needed, I have some alternatives / other versions.

Some parts or features are removed compared to the previous version. It can be easily added but I would like to add them one by one with careful testing or feedback.

This draft may contain minor issues and typos, useless code, etc. Feel free to point out any weird/wrong parts.

Performance should be on par or better than previously. I also found out what caused the performance degradation for the Voxforge model (mainly due to initialization, and some small training differences). It may be worth extending the investigation though! @jeon30c Would it be possible for you to re-train a Librispeech model with this version to compare performance, please? @sw005320 Do you know if we have other models to compare? I'm not sure who already used the first version.

Also, after we are set on the task and model definition, I would like to at least make the encoder and decoder fully customizable (similar to the custom model in ESPnet1). Mainly, the changes would be :

Add unified Encoder containing PreEncoder (bottlenecks/input blocks) + BodyEncoder (supporting main nets/blocks + some bridge blocks)

Same for Decoder.

Refactor the BeamSearch / Scorer part to reflect changes and optimize for ESPnet2.

After that :

[x] Add tests

[x] Add documentation

Documentation CI ESPnet2 conflicts RNNT
opened by b-flo 40
Pytorch transformer (take 2)
this is rework of #555 with upstream chainer implementation #655 on master

[x] split modules as previous discussion in #555 #655

[x] add docstrings

[x] update asr train and recog with transformer

[x] implement dynamic module loading like preprocessing transform module (also in chainer)

[x] CTC/LM joint decoding

[x] constent ASRInterface for all the E2E implementation

[x] add pytorch exp and RESULTS (maybe finish tomorrow)

Enhancement
opened by ShigekiKarita 40
Why the multichannel data is randomly processed by the " chime4/asr1_multich " in v0.4.0

I check the procedures in the chime4/asr1_multich of v0.4.0 and have some questions about the code in line 90 of espnet/espnet/nets/pytorch_backend/frontends/frontend.py.

It means that in the training "use_beamformer" is randomly set true or false. So not all the data is processed by the beamformer.
I just change the code and find the loss is worse than the original code. Can anyone tell me why the data is processed like this. In my opinion, the beamformer is not trained by all the data. The results can not perform better than using all the data

Thank you
Question Stale

opened by Rpersie 38
Development plan of TTS recipes for v.0.5.0
Continue from #561

[x] Integrate neural vocoder #1081

[ ] GPU batch inference

[x] Multi-speaker Transformer #1001

[x] Multi-speaker FastSpeech #1006

[x] Add Transformer recipe

[x] JSUT #1009

[x] LibriTTS #1005

[x] Add Chinese recipe #1259

[ ] Re-design interface to be compatible with other types of embeddings

[ ] Integrate online text cleaning in training #998

If you have other suggestions, please let me know.
Help wanted Roadmap
opened by kan-bayashi 37
Can we get a cloned voicie in Real Time ?

Hello,

I have 2 quick questions about what can be done using TTS technology.

What is the minimum training time (in minutes) required to have a good result ? Can the processing time (after training data) be instantaneous ? I mean if we can get the cloned voice in real time... Happy new year by the way !

Thank you !

opened by elmoundir-rohmat 0
[WIP] EURO uasr scripts
Revert the previous modification to tokenize_text.py script:

do tokenization only using tokenize_text

get vocabulary in later in pyscripts/text/combine_text_and_vocab.py

ESPnet2 Unsupervise
opened by DongjiGao 2
[WIP] Add S4 decoder in ESPnet2
Hi, this PR adds an implementation of the S4 decoder (https://arxiv.org/abs/2210.17098) and a sample config. The S4 codebase follows the official repository (https://github.com/HazyResearch/state-spaces), so the network configuration format is somewhat different from the models supported in ESPnet (e.g., Transformer decoder).

LibriSpeech 960h WER result using this PR

| | dev_clean | dev_other | test_clean | test_other | |:-----------:|----------:|-----------:|-----------:|-----------:| | without LM | 2.0 | 5.0 | 2.3 | 5.0 | | with LM | 1.7 | 4.0 | 1.9 | 4.1 |
Installation ESPnet2 ESPnet1
opened by m-koichi 1
stage5 failed

bash: line 1: 25413 Killed ( python3 -m espnet2.bin.tts_train --collect_stats true --write_collected_feats true --use_preprocessor true --token_type phn --token_list dump/token_list/phn_g2p_en_no_space/tokens.txt --non_linguistic_symbols none --cleaner none --g2p g2p_en_no_space --normalize none --pitch_normalize none --energy_normalize none --train_data_path_and_name_and_type dump/raw/hz_test_train/text,text,text --train_data_path_and_name_and_type dump/raw/hz_test_train/wav.scp,speech,sound --valid_data_path_and_name_and_type dump/raw/hz_test_dev/text,text,text --valid_data_path_and_name_and_type dump/raw/hz_test_dev/wav.scp,speech,sound --train_shape_file exp/tts_hz_test/decode_use_teacher_forcingtrue_train.loss.ave/stats/logdir/train.3.scp --valid_shape_file exp/tts_hz_test/decode_use_teacher_forcingtrue_train.loss.ave/stats/logdir/valid.3.scp --output_dir exp/tts_hz_test/decode_use_teacher_forcingtrue_train.loss.ave/stats/logdir/stats.3 --config conf/tuning/train_conformer_fastspeech2_lanemb.yaml --feats_extract fbank --feats_extract_conf n_fft=2048 --feats_extract_conf hop_length=300 --feats_extract_conf win_length=1200 --feats_extract_conf fs=22050 --feats_extract_conf fmin=80 --feats_extract_conf fmax=7600 --feats_extract_conf n_mels=80 --pitch_extract_conf fs=22050 --pitch_extract_conf n_fft=2048 --pitch_extract_conf hop_length=300 --pitch_extract_conf f0max=400 --pitch_extract_conf f0min=80 --energy_extract_conf fs=22050 --energy_extract_conf n_fft=2048 --energy_extract_conf hop_length=300 --energy_extract_conf win_length=1200 --train_data_path_and_name_and_type exp/tts_hz_test/decode_use_teacher_forcingtrue_train.loss.ave/hz_test_train/durations,durations,text_int --valid_data_path_and_name_and_type exp/tts_hz_test/decode_use_teacher_forcingtrue_train.loss.ave/hz_test_dev/durations,durations,text_int --train_data_path_and_name_and_type dump/raw/hz_test_train/utt2lid,lids,text_int --valid_data_path_and_name_and_type dump/raw/hz_test_dev/utt2lid,lids,text_int ) 2>> exp/tts_hz_test/decode_use_teacher_forcingtrue_train.loss.ave/stats/logdir/stats.3.log >> exp/tts_hz_test/decode_use_teacher_forcingtrue_train.loss.ave/stats/logdir/stats.3.log

Did someone have this problem?
Question

opened by huaiche 4
Question: Voice conversion demo

Hi,

Is there any demo to reproduce the voice conversion results shown here: https://unilight.github.io/Publication-Demos/publications/transformer-vc/

Thanks

opened by SerhiiArtemuk 3

Bug in espnet2.main_funcs.pack_funcs.get_dict_from_cache: outpath = meta.parent.parent

Describe the bug

espnet_model_zoo changes downloaded files in second run, while it should only read the files -> Not process safe, when it should only read.

I tried to use https://zenodo.org/record/3966501 and got EOFError: Ran out of input errors from torch, when I started multiple experiments simultaneously with the same network file system. To avoid, that each experiment downloads and writes the downloaded files (A lockfile don't work reliably on every network file systems), I executed the ASR system once before to trigger the download.

Now I found in espnet2.main_funcs.pack_funcs.get_dict_from_cache the code https://github.com/espnet/espnet/blob/aa5cc02cf830620a5c517a9efddaaadd1bf803ad/espnet2/main_funcs/pack_funcs.py#L161 , but in https://zenodo.org/record/3966501 is the meta.yaml in the root folder, so outpath will be the parent folder of the extracted files and not the folder, where the files are extracted. Here, the folder on my system with the downloaded files:

$ ~/deploy/espnet_model_zoo/espnet_model_zoo/653d10049fdc264f694f57b49849343e$ ls -1
asr_train_asr_transformer_e18_raw_bpe_sp_valid.acc.best.zip
asr_train_asr_transformer_e18_raw_bpe_sp_valid.acc.best.zip.lock
data
exp
meta.yaml
meta.yaml.lock
url
$ ~/deploy/espnet_model_zoo/espnet_model_zoo/653d10049fdc264f694f57b49849343e$ cat meta.yaml
espnet: 0.8.0
files:
  asr_model_file: exp/asr_train_asr_transformer_e18_raw_bpe_sp/54epoch.pth
  lm_file: exp/lm_train_lm_adam_bpe/17epoch.pth
python: "3.7.3 (default, Mar 27 2019, 22:11:17) \n[GCC 7.3.0]"
timestamp: 1600169559.061977
torch: 1.6.0
yaml_files:
  asr_train_config: exp/asr_train_asr_transformer_e18_raw_bpe_sp/config.yaml
  lm_train_config: exp/lm_train_lm_adam_bpe/config.yaml

so the following checks, if the files exist fails:

https://github.com/espnet/espnet/blob/aa5cc02cf830620a5c517a9efddaaadd1bf803ad/espnet2/main_funcs/pack_funcs.py#L174-L176

and the download_and_unpack executes the unpack function, which will overwrite the existing files:

https://github.com/espnet/espnet_model_zoo/blob/4d5855ac04fa137da6ba2c45f3f5aab7a1b398a6/espnet_model_zoo/downloader.py#L402-L412

And this can break another process.

Task information:

Task: ASR
Recipe: librispeech
ESPnet2

To Reproduce

Execute this

from espnet_model_zoo.downloader import ModelDownloader
d = ModelDownloader()
d.download_and_unpack('Shinji Watanabe/librispeech_asr_train_asr_transformer_e18_raw_bpe_sp_valid.acc.best')

and go to the download dir (In my case .../espnet_model_zoo/653d10049fdc264f694f57b49849343e) and check the timestamps of the files (i.e. ls -ahl). Execute the code snippet again in a fresh python shell and check again the time stamps. All files, except url and the folders, will have a new time stamp.

Btw.: Is there an alternative to download_and_unpack with the same return value, but without download or unpack?

Error logs

Traceback (most recent call last):
  File "/.../python3.9/[runpy.py](http://runpy.py/)", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File ".../python3.9/[runpy.py](http://runpy.py/)", line 87, in _run_code
    exec(code, run_globals)
  File "...", line 118, in <module>
    [fire.Fire](http://fire.fire/)(main)
  File ".../python3.9/site-packages/fire/[core.py](http://core.py/)", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File ".../fire/[core.py](http://core.py/)", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File ".../fire/[core.py](http://core.py/)", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "...", line 54, in main
    apply_asr = ESPnetASR(model_tag=model_tag).apply_asr
  File "...", line 143, in __init__
    speech2text = Speech2Text(  # Speech2Text.from_pretrained
  File ".../espnet2/bin/asr_[inference.py](http://inference.py/)", line 98, in __init__
    asr_model, asr_train_args = [task.build](http://task.build/)_model_from_file(
  File ".../espnet2/tasks/abs_[task.py](http://task.py/)", line 1826, in build_model_from_file
    model.load_state_dict(torch.load(model_file, map_location=device))
  File ".../torch/[serialization.py](http://serialization.py/)", line 795, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File ".../torch/[serialization.py](http://serialization.py/)", line 1002, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

Bug

opened by boeddeker 3

Releases(v.202211)

v.202211(Dec 11, 2022)
What's Changed

Update muskits update by @ftshijt in https://github.com/espnet/espnet/pull/4616

Muskit installation by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4617

Sync Muskits branch with Master by @ftshijt in https://github.com/espnet/espnet/pull/4640

Updates on Muskit Migration by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4631

Update Muskits branch by @ftshijt in https://github.com/espnet/espnet/pull/4662

Add stage 5 & stage 6 by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4649

Muskit: rename & reorganize features by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4668

Update Muskits branch by @ftshijt in https://github.com/espnet/espnet/pull/4671

Muskits CI fixing by @ftshijt in https://github.com/espnet/espnet/pull/4672

Muskits CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4673

Muskits - apply isort by @ftshijt in https://github.com/espnet/espnet/pull/4677

Muskits CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4678

Muskit: Add tokenizer by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4676

Muskits - various fix for CI test by @ftshijt in https://github.com/espnet/espnet/pull/4679

Muskit: add recipe ofuton by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4681

Muskits (CI fix) by @ftshijt in https://github.com/espnet/espnet/pull/4682

Fix CI issue in muskits by @ftshijt in https://github.com/espnet/espnet/pull/4687

Add dns_icassp22 Speech Enhancement Recipe by @slSeanWU in https://github.com/espnet/espnet/pull/4657

Singing Voice Synthesis Task for ESPnet by @ftshijt in https://github.com/espnet/espnet/pull/4670

Documentation of Tutorial and Muskits by @ftshijt in https://github.com/espnet/espnet/pull/4692

Add tests on MacOS and Windows (only installation) by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4669

Add missing entries in readme by @ftshijt in https://github.com/espnet/espnet/pull/4699

Support ST without texts in source language by @sophia1488 in https://github.com/espnet/espnet/pull/4688

Update ConvInput for Transducer by @b-flo in https://github.com/espnet/espnet/pull/4720

Small changes for standalone Transducer by @b-flo in https://github.com/espnet/espnet/pull/4722

Fix input block tutorial documentation for Transducer by @b-flo in https://github.com/espnet/espnet/pull/4724

Fix HF Pytest Errors by @siddhu001 in https://github.com/espnet/espnet/pull/4737

Update to puebla-nahuatl recipe (some minor fixes) by @ftshijt in https://github.com/espnet/espnet/pull/4713

Add espnet2 TTS recipe on M-AILABS by @Takaaki-Saeki in https://github.com/espnet/espnet/pull/4701

Update outdated enh config files by @Emrys365 in https://github.com/espnet/espnet/pull/4719

add src_sos & src_eos for mt task to address the index out of range w… by @simpleoier in https://github.com/espnet/espnet/pull/4736

Add g2pk_explicit_space tokenizer by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4718

Fix JETS inference with GST (#4743) by @kan-bayashi in https://github.com/espnet/espnet/pull/4744

Update on Muskit by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4700

add fleurs conformer+sc-ctc results by @wanchichen in https://github.com/espnet/espnet/pull/4746

Add recipe for OCR task on IAM handwriting dataset by @kenzheng99 in https://github.com/espnet/espnet/pull/4707

Add Talromur2 recipe by @G-Thor in https://github.com/espnet/espnet/pull/4680

Add multi-channel enh_asr for CHiME-4 by @YoshikiMas in https://github.com/espnet/espnet/pull/4706

chunk_mask error by @aky15 in https://github.com/espnet/espnet/pull/4751

fix wav2vec2 encoder mask bug by @simpleoier in https://github.com/espnet/espnet/pull/4772

Add Hugging Face Transformers Decoder, Tokenizer and their example on SLURP by @akreal in https://github.com/espnet/espnet/pull/4099

[Recipe PR] MELD: Multimodal EmotionLines Dataset by @realzza in https://github.com/espnet/espnet/pull/4771

MultiIRIS follow up by @YoshikiMas in https://github.com/espnet/espnet/pull/4765

Add CATSLU results for XLS-R with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4782

Add MEDIA and PortMEDIA results for XLS-R with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4794

Add SLUE-VoxPopuli results for WavLM with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4777

Follow up for SLURP and CATSLU by @akreal in https://github.com/espnet/espnet/pull/4796

Update README in chime4/enh_asr1 by @YoshikiMas in https://github.com/espnet/espnet/pull/4795

fix parsing token_list by @imdanboy in https://github.com/espnet/espnet/pull/4778

Use torchaudio functions for beamforming related operations in torch 1.12.1+ by @Emrys365 in https://github.com/espnet/espnet/pull/4638

PIT E2E multi-speaker ASR and librimix recipe by @simpleoier in https://github.com/espnet/espnet/pull/4753

Fix an audio format issue in some enh recipes by @YoshikiMas in https://github.com/espnet/espnet/pull/4799

Fixing How2-2000h Data preparation and Seq Length Assert for Longformer Encoder by @roshansh-cmu in https://github.com/espnet/espnet/pull/4805

Adding MFA scripts for LJSpeech by @iamanigeeit in https://github.com/espnet/espnet/pull/4801

fix typo in espnet2_tutorial.md by @eltociear in https://github.com/espnet/espnet/pull/4811

[WIP] E-Branchformer Encoder in ESPnet2 by @kkim-asapp in https://github.com/espnet/espnet/pull/4812

Muskit update by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4783

New Contributors

@A-Quarter-Mile made their first contribution in https://github.com/espnet/espnet/pull/4617

@sophia1488 made their first contribution in https://github.com/espnet/espnet/pull/4688

@kenzheng99 made their first contribution in https://github.com/espnet/espnet/pull/4707

@realzza made their first contribution in https://github.com/espnet/espnet/pull/4771

@iamanigeeit made their first contribution in https://github.com/espnet/espnet/pull/4801

@eltociear made their first contribution in https://github.com/espnet/espnet/pull/4811

@kkim-asapp made their first contribution in https://github.com/espnet/espnet/pull/4812

Full Changelog: https://github.com/espnet/espnet/compare/v.202209...v.202211
Source code(tar.gz)
Source code(zip)
v.202209(Oct 4, 2022)
What's Changed

Add dynamic mixing in the speech separation task. by @LiChenda in https://github.com/espnet/espnet/pull/4387

Added test script and usage for calculate_rtf.py script to ESPnet2 tutorial page by @espnetUser in https://github.com/espnet/espnet/pull/4560

Offline/Online (standalone) ESPnet2 Transducer by @b-flo in https://github.com/espnet/espnet/pull/4479

Unfix matplotlib version by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4576

use torch.finfo for dtype other than float by @wenzhe-nrv in https://github.com/espnet/espnet/pull/4584

Update recipe for slurp-entity by @ftshijt in https://github.com/espnet/espnet/pull/4585

Egs2 aesrc by @brianyan918 in https://github.com/espnet/espnet/pull/4592

update checks for bias in initialization by @LiChenda in https://github.com/espnet/espnet/pull/4574

[WIP] Update to fit the recent update in s3prl. by @simpleoier in https://github.com/espnet/espnet/pull/4593

Unfix numpy version by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4598

Update to fit the recent update in s3prl. by @simpleoier in https://github.com/espnet/espnet/pull/4600

Add improved results on FLEURS dataset by @wanchichen in https://github.com/espnet/espnet/pull/4596

Update mp4_to_wav.sh by @jaehyun-ko in https://github.com/espnet/espnet/pull/4605

Pass output_dir as str to wandb.init() by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4607

Support enh_s2t joint training on multi-speaker data by @Emrys365 in https://github.com/espnet/espnet/pull/4566

Add ASR results for commonvoice zh_TW by @slSeanWU in https://github.com/espnet/espnet/pull/4612

Fix both utt2sid and utt2lid when removing long/short data by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4609

recipe config update by @ftshijt in https://github.com/espnet/espnet/pull/4621

Add pytorch=1.12.1 to CI configurations by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4604

New SLU task by @siddhu001 in https://github.com/espnet/espnet/pull/4569

Joss paper: Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing by @neillu23 in https://github.com/espnet/espnet/pull/4620

Update conformer result of AMI corpus by @teinhonglo in https://github.com/espnet/espnet/pull/4629

Offline/Online Branchformer Transducer by @b-flo in https://github.com/espnet/espnet/pull/4582

Change to install numba using pip instead of conda by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4637

Add MixIT support. It is unsupervised only. Semi-supervised config is not available for now. by @simpleoier in https://github.com/espnet/espnet/pull/4619

Add 2-pass SLU code for FSC Challenge by @siddhu001 in https://github.com/espnet/espnet/pull/4636

CI fix and some other minor recipe fixes by @ftshijt in https://github.com/espnet/espnet/pull/4656

Update the title of plots to be y-label vs x-label by @pyf98 in https://github.com/espnet/espnet/pull/4647

Update VIVOS download link by @hieuthi in https://github.com/espnet/espnet/pull/4644

Add ASR recipe of MAGICDATA mandarin read speech by @tjysdsg in https://github.com/espnet/espnet/pull/4635

Amend to CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4663

qasr update by @massabaali7 in https://github.com/espnet/espnet/pull/4642

Open_li110 for large-scale multilingual speech by @ftshijt in https://github.com/espnet/espnet/pull/4408

Fix the path of calculate_rft.py by @sw005320 in https://github.com/espnet/espnet/pull/4660

Fix importlib-metadata version by @kan-bayashi in https://github.com/espnet/espnet/pull/4686

Cmu arctic tts pretrain finetune by @soumimaiti in https://github.com/espnet/espnet/pull/4456

updated version to 202209 by @kan-bayashi in https://github.com/espnet/espnet/pull/4685

New Contributors

@wenzhe-nrv made their first contribution in https://github.com/espnet/espnet/pull/4584

@jaehyun-ko made their first contribution in https://github.com/espnet/espnet/pull/4605

@jonghwanhyeon made their first contribution in https://github.com/espnet/espnet/pull/4607

@slSeanWU made their first contribution in https://github.com/espnet/espnet/pull/4612

@massabaali7 made their first contribution in https://github.com/espnet/espnet/pull/4642

@soumimaiti made their first contribution in https://github.com/espnet/espnet/pull/4456

Full Changelog: https://github.com/espnet/espnet/compare/v.202207...v.202209
Source code(tar.gz)
Source code(zip)
v.202207(Aug 2, 2022)
New Features

[New Features][ESPnet1][**ASR**] Add DDP support for v1 ASR training. #4430 by @lazykyama

[New Features][**ESPnet2**] Support tensorboard graph #4418 by @kamo-naoyuki

[New Features][ESPnet2][**ASR**] Branchformer Encoder in ESPnet2 #4400 by @pyf98

[New Features][ESPnet2][Diarization][**SE**] enh_diar joint model #4339 by @YushiUeda

[New Features][ESPnet2][**ESPnet1**] Calculate RTF and latency in espnet2 #4382 by @espnetUser

[New Features][ESPnet2][ESPnet1][**SE**] Add EnhPreprocessor for Speech Enhancement #4321 by @Emrys365

[New Features][ESPnet2][**SE**] Add DPTNet and WarmupStepLR scheduler #4449 by @Emrys365

[New Features][ESPnet2][**SE**] Add support for calculating losses on noise and dereverberated signals #4476 by @Emrys365

Recipe

[Recipe][**ESPnet2**] Aishell-2 GPU info #4501 by @jctian98

[Recipe][**ESPnet2**] Fix librispeech default path to signify auto download #4517 by @karthik19967829

[Recipe][**ESPnet2**] Recipe fix for PueblaNahuatl Recipe #4522 by @ftshijt

[Recipe][ESPnet2][ASR][**README**] Add Aishell-2 ASR Recipe for Espnet2 #4451 by @jctian98

[Recipe][ESPnet2][ASR][**README**] Add AmericasNLP 2022 baselines #4428 by @akreal

[Recipe][ESPnet2][ESPnet1][ASR][**Installation**] FLEURS ASR Recipe for ESPnet2 #4455 by @wanchichen

[Recipe][ESPnet2][ESPnet1][ASR][**README**] tedx_spanish_corpus egs2 recipe #4523 by @jessicah25

[Recipe][ESPnet2][ESPnet1][ASR][**SE**] Adding L3DAS22 Task1 model to ESPNet-SE #3994 by @popcornell

[Recipe][ESPnet2][ESPnet1][**ST**] Must_C v1 and v2 in egs2 #4306 by @brianyan918

[Recipe][ESPnet2][**README**] Dcase task1 Baseline #4317 by @siddhu001

[Recipe][ESPnet2][**README**] Report Aishell-2 Transducer results #4489 by @jctian98

[Recipe][ESPnet2][**README**] Update language codes in AmericasNLP 2022 baseline #4441 by @akreal

[Recipe][ESPnet2][**README**] Vox populi baseline #4478 by @siddhu001

[Recipe][ESPnet2][**SE**] L3DAS22 enhancement recipe #4269 by @neillu23

[Recipe][ESPnet2][**SE**] Update notes in the recipes for DNS challenges #4433 by @YoshikiMas

[Recipe][ESPnet2][SE][SLU][**ST**] LT-Spatialized and SLURP-Spatialized combined enhancement recipe #4268 by @neillu23

[Recipe][ESPnet2][**ST**] Add moses check for ST recipes #4417 by @ftshijt

[Recipe][ESPnet2][**TTS**] Add talromur recipe #4379 by @G-Thor

[Recipe][ESPnet2][**TTS**] Fix for issue #4401 #4402 by @G-Thor

[Recipe][ESPnet2][**TTS**] add pre-trained model jets in the recipe of ljspeech, kss #4406 by @imdanboy

Bugfix

[Bugfix][**ESPnet1**] fix the corrupted pretrained model #4490 by @wentaoxandry

[Bugfix][ESPnet1][**ESPnet2**] Fix an4 URL #4427 by @pyf98

[Bugfix][ESPnet1][ESPnet2][**RNNT**] Fix mAES with big vocab size #4312 by @b-flo

[Bugfix][**ESPnet2**] Adding init.py to espnet2/diar/layers and espnet2/diar/separator #4470 by @cycentum

[Bugfix][**ESPnet2**] Fix tensorboard-graph creation for multi gpu mode #4431 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Update char_tokenizer.py #4499 by @xiabingquan

[Bugfix][ESPnet2][ESPnet1][ASR][LM][MT][**TTS**] Fix Transducer LM fusion and add Logging for Transducer inference #4327 by @chintu619

[Bugfix][ESPnet2][**SE**] Fix a bug in enh unit test #4435 by @Emrys365

Enhancement

[Enhancement][**ESPnet2**] Optionize graph creation #4551 by @kan-bayashi

[Enhancement][ESPnet2][Installation][**TTS**] Add icelandic g2p #4384 by @G-Thor

[Enhancement][ESPnet2][**SE**] Add support of test-only criterions after each epoch #4381 by @Emrys365

[Enhancement][ESPnet2][**SSL**] raise more useful error in espnet2/asr/frontend/s3prl.py if s3prl is not installed #4480 by @popcornell

[Enhancement][ESPnet2][**TTS**] Add JETS AlignmentModule in calculate_all_attentions.py #4446 by @seastar105

Refactoring

[Refactoring][**ESPnet1**] Refactoring 'is_prefix' function #4530 by @jhlee9010

[Refactoring][ESPnet2][**ASR**] Zero_infinity option for ctc loss #4415 by @kamo-naoyuki

Others

[CI][ESPnet1][ESPnet2][**Installation**] Remove the version restriction for numpy #4419 by @kamo-naoyuki

[CI][**ESPnet2**] Canged to install espnet from wheel in the test_import CI test #4471 by @kamo-naoyuki

[CI][**Installation**] Temporary fixed numpy version #4464 by @kamo-naoyuki

[Documentation] Add notes on batch size and num of GPUs in ESPnet2 documentation #4436 by @pyf98

[Documentation][**ESPnet1**] Update decoder.py #4322 by @sw005320

[Documentation][**ESPnet2**] Add a note to follow the installation instructions #4477 by @akreal

Acknowledgements

Special thanks to @Emrys365, @G-Thor, @YoshikiMas, @YushiUeda, @akreal, @b-flo, @brianyan918, @chintu619, @cycentum, @espnetUser, @ftshijt, @imdanboy, @jctian98, @jessicah25, @jhlee9010, @kamo-naoyuki, @kan-bayashi, @karthik19967829, @lazykyama, @neillu23, @popcornell, @pyf98, @seastar105, @siddhu001, @sw005320, @wanchichen, @wentaoxandry, @xiabingquan.
Source code(tar.gz)
Source code(zip)
v.202205(May 28, 2022)
New Features

[New Features][ESPnet1][ESPnet2][**ASR**] Add quantization in ESPnet2 for asr inference #4349 by @pyf98

[New Features][ESPnet2][**SE**] Add svoice recipe for wsj0-2mix speech separation #4257 by @nateanl

[New Features][ESPnet2][**SE**] Merge Deep Clustering and Deep Attractor Network to enh separator #4110 by @earthmanylf

[New Features][ESPnet2][**SE**] Some improvements to current enh functions #4251 by @Emrys365

[New Features][ESPnet2][SE][**Installation**] Import fast_bss_eval and update some time-domain losses for enh task #4256 by @LiChenda

[New Features][ESPnet2][**TTS**] add e2e tts model: JETS #4364 by @imdanboy

Bugfix

[Bugfix][**ESPnet1**] Fix minimum input length for Conv2dSubsampling2 in check_short_utt #4378 by @akreal

[Bugfix][ESPnet1][**ESPnet2**] Minor fixes for the intermediate loss usage and Mask-CTC decoding #4374 by @YosukeHiguchi

[Bugfix][**ESPnet2**] Fix #4396 #4398 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix a bug in utterance_mvn #4304 by @Emrys365

[Bugfix][**ESPnet2**] Minor fix for Mask-CTC forward function #4347 by @YosukeHiguchi

[Bugfix][**ESPnet2**] Wandb Minor Fix for Model Resume #4329 by @roshansh-cmu

[Bugfix][**ESPnet2**] fix the enh_s2t_task argument in espnet2/bin/st_inference.py #4323 by @simpleoier

[Bugfix][ESPnet2][MT][**ST**] fix bug in mt/st templates for having separate token lists #4149 by @brianyan918

[Bugfix][ESPnet2][**Recipe**] Fix aishell3 data preparation script #4277 by @LanceaKing

[Bugfix][ESPnet2][**SE**] Fix a bug in stats aggregation when PITSolver is used #4343 by @Emrys365

[Bugfix][ESPnet2][**SE**] fix for enhancement model loading compatibility #4259 by @LiChenda

[Bugfix][ESPnet2][**ST**] bug fixes in ST recipes #4341 by @chintu619

[Bugfix][ESPnet2][**TTS**] Fix optional data names for TTS #4355 by @kan-bayashi

[Bugfix][ESPnet2][**TTS**] fix a bug in Mandarin pypinyin_g2p_phone #4206 by @WeiGodHorse

[Bugfix][ESPnet2][**TTS**] fix loss = NaN in VITS with mixed precision #4356 by @kan-bayashi

[Bugfix][ESPnet2][**streaming**] Add unit test to streaming ASR inference #4352 by @espnetUser

[Bugfix][**Installation**] fix s3prl install by using legacy version. Temporal solution. #4399 by @simpleoier

[Bugfix][**README**] Fix typo #4338 by @ftshijt

Enhancement

[Enhancement][ESPnet1][ESPnet2][ASR][SE][SLU][**ST**] enh_s2t joint model #4226 by @simpleoier

[Enhancement][**ESPnet2**] Add progress bar to phonemization #4320 by @G-Thor

[Enhancement][ESPnet2][**MT**] Update show_translation_result.sh to show all decoding results under the given exp directory #4330 by @pyf98

Recipe

[Recipe][ESPnet1][**ASR**] Accented English Speech Recognition Challenge 2020 recipe (AESRC2020) #3898 by @brianyan918

[Recipe][ESPnet1][ESPnet2][ASR][README][**Recipe**] Add MediaSpeech ASR recipe #4183 by @AshibaWu

[Recipe][ESPnet2][ASR][**README**] recipee for Microsoft speech corpus for Indian Languages #4191 by @navya-yarrabelly

[Recipe][ESPnet2][ASR][**README**] Accented French Openslr57 ASR recipe (ESPnet2) (part of Homework3 MNLP) #4280 by @DanBerrebbi

[Recipe][ESPnet2][ASR][**README**] Add Mask-CTC results #4180 by @YosukeHiguchi

[Recipe][ESPnet2][ASR][**README**] Add ml_openslr63 ASR recipe #4173 by @bharaniuk

[Recipe][ESPnet2][ASR][**README**] Adding new recipe for Burmese (OpenSLR80) #4182 by @JainSameer06

[Recipe][ESPnet2][ASR][**README**] add chime6 recipe #4332 by @simpleoier

[Recipe][ESPnet2][ASR][SE][**README**] add egs2/chime4/enh_asr1 recipe and results #4316 by @simpleoier

[Recipe][ESPnet2][README][**RNNT**] updated librispeech-asr with rnn-t results #4281 by @chintu619

[Recipe][ESPnet2][README][**SE**] 2021 Clarity Challenge recipe #4210 by @popcornell

[Recipe][ESPnet2][README][**SE**] Add AISHELL-4 ENH recipe #4249 by @Emrys365

[Recipe][ESPnet2][README][**SE**] Add ConferencingSpeech 2021 recipe to egs2 #4192 by @Emrys365

[Recipe][ESPnet2][README][**SE**] Add ICASSP2021 DNS Challenge 2 recipe #4253 by @YoshikiMas

[Recipe][ESPnet2][README][**SE**] Add INTERSPEECH 2021 DNS Challenge 3 recipe #4238 by @YoshikiMas

[Recipe][ESPnet2][README][**SE**] Add results of ICASSP2021 DNS Challenge 2 recipe #4309 by @YoshikiMas

[Recipe][ESPnet2][README][**SE**] Rename egs2/clarity21/enh_2021 to egs2/clarity21/enh1 #4328 by @Emrys365

[Recipe][ESPnet2][README][**SE**] add convtasnet recipe for dns_ins20 #4314 by @muqiaoy

[Recipe][ESPnet2][README][**SLU**] Harpervalley recipe #4315 by @YushiUeda

[Recipe][ESPnet2][README][**SLU**] SLUE Voxpopuli base recipe #4262 by @siddhu001

[Recipe][ESPnet2][README][**ST**] CoVOST2 recipes #4300 by @ftshijt

[Recipe][ESPnet2][SLU][**README**] Update SLU results for ICASSP #4283 by @siddhu001

Others

[CI][**Docker**] Github Action Trigger Docker Build #4295 by @Fhrozen

[CI][**Docker**] Github Action for Docker build #4219 by @Fhrozen

[CI][ESPnet1][ESPnet2][Installation][**README**] Add isort checking to the CI tests #4372 by @kamo-naoyuki

[CI][ESPnet1][ESPnet2][Installation][README][**mergify**] Add pytorch=1.10.2 and 1.11.0 to ci configurations #4348 by @kamo-naoyuki

[CI][ESPnet2][ASR][**SE**] add integration test and fix the decoding in enh_asr and enh_st #4310 by @simpleoier

[CI][ESPnet2][New Features][SLU][ST][**streaming**] Add streaming ST/SLU #4243 by @D-Keqi

[CI][ESPnet2][**ST**] Add Test Functions for ST Train and Inference #4324 by @ftshijt

[CI][**Installation**] update install_pesq.sh #4265 by @LiChenda

[Documentation][ESPnet2][README][**TTS**] Minor update for JETS #4369 by @kan-bayashi

[Documentation][**README**] Change the order of README #4289 by @ftshijt

[Documentation][**README**] Update README.md #4284 by @sw005320

Acknowledgements

Special thanks to @AshibaWu, @D-Keqi, @DanBerrebbi, @Emrys365, @Fhrozen, @G-Thor, @JainSameer06, @LanceaKing, @LiChenda, @WeiGodHorse, @YoshikiMas, @YosukeHiguchi, @YushiUeda, @akreal, @bharaniuk, @brianyan918, @chintu619, @earthmanylf, @espnetUser, @ftshijt, @imdanboy, @kamo-naoyuki, @kan-bayashi, @muqiaoy, @nateanl, @navya-yarrabelly, @popcornell, @pyf98, @roshansh-cmu, @siddhu001, @simpleoier, @sw005320.
Source code(tar.gz)
Source code(zip)
v.202204(Apr 12, 2022)
News

From this version, we decided to use date-based versioning, e.g., v.202204.

New Features

[New Features][**ESPnet1**] added learnable fourier features #4029 by @popcornell

[New Features][ESPnet1][ESPnet2][**ASR**] Restricted Self Attention for E2E Speech Summarization #4071 by @roshansh-cmu

[New Features][ESPnet1][Installation][**README**] add lrs avsr recipe #4104 by @wentaoxandry

[New Features][ESPnet1][**README**] add lip reading sentences dataset code #4074 by @wentaoxandry

[New Features][ESPnet2][**ASR**] [ESPnet2] Intermediate/Self-conditioned CTC #4084 by @YosukeHiguchi

[New Features][ESPnet2][**ASR**] [WIP] [ESPnet2] Mask-CTC #4158 by @YosukeHiguchi

[New Features][ESPnet2][ASR][**README**] Add stochastic depth to conformer and share results on LibriSpeech 960h #4142 by @pyf98

[New Features][ESPnet2][**MT**] MT task for espnet2 with IWSLT14 recipe #4111 by @siddalmia

[New Features][ESPnet2][README][**SE**] Add DC-CRN complex masking and spectral mapping approach for speech enhancement #4127 by @Emrys365

[New Features][ESPnet2][README][**SE**] Add DCCRN separator #4097 by @Johnson-Lsx

[New Features][ESPnet2][README][**SE**] Add a new separator for speech enhancement/separation tasks #4062 by @LiChenda

[New Features][ESPnet2][README][**SE**] Add iFaSNet for enhancement/separation tasks. #4130 by @LiChenda

[New Features][ESPnet2][**SE**] Refactor DNN_Beamformer in espnet2 and add new beamformers #4082 by @Emrys365

Enhancement

[Enhancement][**ESPnet2**] Add an optional suffix to the averaged model file name #4067 by @pyf98

[Enhancement][**ESPnet2**] Update perturb_data_dir_speed.sh #4091 by @AmirHussein96

[Enhancement][ESPnet2][**ASR**] Add tests for Intermediate/Self-conditioned CTC #4117 by @YosukeHiguchi

[Enhancement][ESPnet2][**TTS**] Add option to use norm. feats over denorm. #4250 by @G-Thor

Recipe

[Recipe][ESPnet1][**RNNT**] [ESPNET1] Add the results of conformer-transducer for Librispeech #4080 by @eesungkim

[Recipe][ESPnet2][**ASR**] Add ASR recipe for VCTK dataset based on TTS's dataprep. #4088 by @kashikashi

[Recipe][ESPnet2][**ASR**] Add new conformer config with hop length 160 for LibriSpeech 960h #4162 by @pyf98

[Recipe][ESPnet2][**ASR**] Add new zh_openslr38 ASR recipe #4181 by @cuichenx

[Recipe][ESPnet2][**ASR**] Add transformer results for LibriSpeech 100h #4089 by @pyf98

[Recipe][ESPnet2][**ASR**] Added Marathi OpenSLR 64 recipe #4179 by @SujaySKumar

[Recipe][ESPnet2][**ASR**] Added recipe for Microsoft Speech Corpus (Indian languages) #4194 by @chintu619

[Recipe][ESPnet2][**ASR**] Automatic lyric recognition Recipe #4129 by @ftshijt

[Recipe][ESPnet2][**ASR**] ESPNET - LRS3 Recepie #4101 by @gdebayan

[Recipe][ESPnet2][**ASR**] bengali asr model with no finetuning #4047 by @dzeinali

[Recipe][ESPnet2][**MT**] IWSLT'14 Results using ESPnet2-MT #4132 by @pyf98

[Recipe][ESPnet2][**README**] Mandarin ISO id should be CMN instead of ZHO #4125 by @xinjli

[Recipe][ESPnet2][**README**] Update README.md #4037 by @dzeinali

[Recipe][ESPnet2][**README**] Update README.md #4121 by @dzeinali

[Recipe][ESPnet2][**README**] Update README.md for How2 2000h ASR,SUM #4155 by @roshansh-cmu

[Recipe][ESPnet2][**RNNT**] Create decode_rnnt_conformer.yaml #4058 by @sw005320

[Recipe][ESPnet2][**RNNT**] Create train_rnnt_conformer.yaml #4057 by @sw005320

[Recipe][ESPnet2][**SLU**] Add IEMOCAP results and configs #4100 by @YushiUeda

[Recipe][ESPnet2][**SLU**] Add new config and support for computing WER in SLUE-VoxCeleb #4152 by @siddhu001

[Recipe][ESPnet2][**SLU**] Add sentiment data preparation for IEMOCAP #4065 by @YushiUeda

[Recipe][ESPnet2][**SLU**] ESPnet2 swbd_sentiment recipe #4134 by @YushiUeda

[Recipe][ESPnet2][**ST**] egs2/iwslt22_dialect #4013 by @brianyan918

Bugfix

[Bugfix][CI][**ESPnet2**] Fix CI test failures related to torch_complex 0.4.0 #4112 by @Emrys365

[Bugfix][CI][**Installation**] fix doc ci by pinning jinja version #4239 by @xinjli

[Bugfix][**ESPnet2**] Fix n-gram decoding #4168 by @sw005320

[Bugfix][**ESPnet2**] bug fixes and efficient train/dev split in data prep of Microsoft Indian Languages recipe #4196 by @chintu619

[Bugfix][**ESPnet2**] fix errors in configs of librispeech ssl frontends #4098 by @simpleoier

[Bugfix][ESPnet2][ASR][**ST**] [bug patch] egs2/iwslt22_dialect #4049 by @brianyan918

[Bugfix][ESPnet2][MT][**ST**] Fix joint tokenization in st.sh #4143 by @pyf98

[Bugfix][ESPnet2][MT][**ST**] scoring fixes MT and ST #4146 by @siddalmia

[Bugfix][ESPnet2][**TTS**] Fix speaker normalization #4229 by @LanceaKing

[Bugfix][**Installation**] set gtn version #4122 by @brianyan918

[Bugfix][ESPnet1][**ESPnet2**] minor fixes in ST in espnet2 #4056 by @siddalmia

Others

[CI] Simplify vocoder compatibility test #4061 by @kan-bayashi

[CI][**Documentation**] Fix notebook in the official doc. #4171 by @ShigekiKarita

[Docker] Docker Updates #4064 by @Fhrozen

[Documentation] Add a checklist for PRs on recipe #4053 by @ftshijt

[Documentation] README Update for E2E Speech Summarization #4071 #4150 by @roshansh-cmu

[Documentation] Update the example PyTorch version in Installation doc #4116 by @pyf98

[Documentation] [documentation] fix minor typo in installation.md #4164 by @JDongian

[Documentation][**ESPnet1**] fix typo #4044 by @ooyamatakehisa

[Documentation][ESPnet1][ESPnet2][**ASR**] Add Huggingface-cli usage #4027 by @karthik19967829

Acknowledgements

Special thanks to @AmirHussein96, @Emrys365, @Fhrozen, @G-Thor, @JDongian, @Johnson-Lsx, @LanceaKing, @LiChenda, @ShigekiKarita, @SujaySKumar, @YosukeHiguchi, @YushiUeda, @brianyan918, @chintu619, @cuichenx, @dzeinali, @eesungkim, @ftshijt, @gdebayan, @kan-bayashi, @karthik19967829, @kashikashi, @ooyamatakehisa, @popcornell, @pyf98, @roshansh-cmu, @siddalmia, @siddhu001, @simpleoier, @sw005320, @wentaoxandry, @xinjli.
Source code(tar.gz)
Source code(zip)
v.0.10.6(Feb 8, 2022)
New Features

[New Features][ESPnet2][TTS][Installation][**README**] [TTS] Support python-based toolkit for xvector extractors #4016 by @Fhrozen

[New Features][**ESPnet2**] Add SpecAug2 which supports variable maximum width in time masking #3902 by @pyf98

Recipe

[Recipe][ESPnet1][**ASR**] Add librispeech-100h recipe #3997 by @YosukeHiguchi

[Recipe][ESPnet1][**ASR**] Update egs/librispeech_100 #4036 by @YosukeHiguchi

[Recipe][ESPnet2][ASR][**README**] Scoring Mandarin / English separately for the SEAME corpus #3976 by @vectominist

[Recipe][ESPnet2][ASR][**README**] update LibriSpeech Pretrained models with SSLRs: results and huggingf… #3979 by @simpleoier

[Recipe][ESPnet2][ASR][README][**ST**] Speech translation framework (merging into master) #3987 by @ftshijt

[Recipe][ESPnet2][ASR][**TTS**] Update two recipes (googlei18n and hub4_spanish) #3895 by @ftshijt

[Recipe][ESPnet2][SLU][**README**] updated the results of Slue voxceleb #3929 by @siddhu001

[Recipe][ESPnet2][**ST**] Update the default setting for st #3993 by @ftshijt

Bugfix

[Bugfix][ESPnet1][**RNNT**] Fix bug for Conformer-T #4020 by @YosukeHiguchi

[Bugfix][ESPnet2][**Diarization**] Diarization: fix for convolutional input layer in the encoder #3957 by @alumae

[Bugfix][ESPnet2][**Diarization**] Two fixes to diarization evaluation scripts #3938 by @alumae

[Bugfix][ESPnet2][Diarization][**Recipe**] Fix issues in EEND-EDA & add Librimix_diar recipe #3900 by @YushiUeda

[Bugfix][ESPnet2][ESPnet1][ASR][**streaming**] streaming conformer bugfix #4025 by @jeon30c

[Bugfix][ESPnet2][**LM**] Bugfix for espnet2 ngram #4002 by @yaochie

[Bugfix][ESPnet2][**RNNT**] espnet2 asr inference bugfix for transducer #3943 by @jeon30c

[Bugfix][ESPnet2][**ST**] Bugfix for ST scoring #3972 by @ftshijt

Enhancement

[Enhancement][**ESPnet2**] cleaned tensorboard and stats logging for espnet2 #3910 by @siddalmia

[Enhancement][ESPnet2][**Diarization**] Add test codes for diarization #3953 by @YushiUeda

[Enhancement][ESPnet2][**streaming**] Add reference for streaming ASR #4014 by @D-Keqi

Ohter

[CI] remove the support of pytorch 1.3.1 #4038 by @sw005320

[CI][ESPnet1][**ESPnet2**] fix ci for librosa update #4043 by @ftshijt

[CI][**Installation**] Fix numpy version #3965 by @kan-bayashi

[CI][**Installation**] temporary fixed pypinyin version #3995 by @kan-bayashi

[Documentation][ESPnet1][ESPnet2][README][**SLU**] Add Sinhala E2E SLU Recipe #3890 by @karthik19967829

[Documentation][**README**] Update README.md #4039 by @sw005320

[ESPnet2][**README**] Update README.md #3931 by @sw005320

[ESPnet2][README][TTS][**Typo**] Fix typo in README.md #4024 by @kan-bayashi

Acknowledgements

Special thanks to @D-Keqi, @Fhrozen, @YosukeHiguchi, @YushiUeda, @alumae, @ftshijt, @jeon30c, @kan-bayashi, @karthik19967829, @pyf98, @siddalmia, @siddhu001, @simpleoier, @sw005320, @vectominist, @yaochie.

Full Changelog

https://github.com/espnet/espnet/compare/v.0.10.5...v.0.10.6
Source code(tar.gz)
Source code(zip)
v.0.10.5(Dec 31, 2021)
New Features

[New Features][ESPnet1][**ASR**] Implement self-conditioned CTC #3856 by @komatta-san

[New Features][ESPnet2][ASR][CI][**Installation**] GTN CTC for ESPnet2 #3778 by @brianyan918

[New Features][ESPnet2][ASR][**Refactoring**] [ESPnet2] Transducer #2533 by @b-flo

[New Features][ESPnet2][README][**Recipe**] Frontends fusion (any type, any number, linear fusion only for now) for ASR in espnet2 #3824 by @DanBerrebbi

[New Features][ESPnet2][**SE**] Refactor loss computation in enhancement tasks. #3838 by @LiChenda

Recipe

[Recipe][ESPnet1][ESPnet2][ASR][**README**] updated the results of aidatatang_200zh #3925 by @sw005320

[Recipe][ESPnet1][**VC**] Various fixes of voice conversion recipes #3800 by @unilight

[Recipe][ESPnet2][ASR][**README**] Expanding egs2 of Tedlium2 #3795 by @D-Keqi

[Recipe][ESPnet2][ASR][**README**] Update an4 config #3913 by @pyf98

[Recipe][ESPnet2][ASR][**README**] aidatatang_200zh recipe #3892 by @sw005320

[Recipe][ESPnet2][**README**] Update README.md #3881 by @daisylab

[Recipe][ESPnet2][**README**] Update egs2/TEMPLATE/README.md #3793 by @kamo-naoyuki

[Recipe][ESPnet2][**README**] fix readme #3827 by @seastar105

[Recipe][ESPnet2][README][**Recipe**] Add ASR Recipe: Primewords_Chinese #3903 by @pyf98

[Recipe][ESPnet2][README][**Recipe**] Update MISP challenge ASR baseline and add AVSR baseline #3819 by @neillu23

[Recipe][ESPnet2][README][**SLU**] Fsc Maseeval scripts #3769 by @siddhu001

[Recipe][ESPnet2][README][**SLU**] Update Google Speechcommands (SLU recipe) #3915 by @pyf98

[Recipe][ESPnet2][README][**TTS**] ESPnet2 ARCTIC TTS #3791 by @peter-yh-wu

[Recipe][ESPnet2][README][**TTS**] Update README and add missing config #3917 by @kan-bayashi

[Recipe][ESPnet2][Recipe][**SLU**] Slue voxceleb Sentiment Analysis #3894 by @siddhu001

[Recipe][ESPnet2][**SE**] modified data type in enh.sh #3768 by @simpleoier

Bugfix

[Bugfix][ESPnet1][README][**RNNT**] Fix cache for Transducer search strategies + doc #3869 by @b-flo

[Bugfix][ESPnet1][**RNNT**] Fix recombine_hyps #3908 by @b-flo

[Bugfix][ESPnet1][**RNNT**] fix rnn-t ALSD beam search index bug #3794 by @maxwellzh

[Bugfix][ESPnet1][**RNNT**] fix the sort order in select_k_expansions() #3864 by @freewym

[Bugfix][**ESPnet2**] Bug fix for .gitignore and db fill up for CMU cluster #3891 by @siddalmia

[Bugfix][**ESPnet2**] Fix #3716 #3849 by @kan-bayashi

[Bugfix][**ESPnet2**] Merging asr_streaming.sh into asr.sh for laborotv egs2 #3868 by @D-Keqi

[Bugfix][**ESPnet2**] add init.py #3928 by @sw005320

[Bugfix][**ESPnet2**] fix small problem that used before defined in step 12 #3871 by @simpleoier

[Bugfix][**ESPnet2**] fix stft olens when win_lengths is not equal to n_fft #3812 by @IceCreamWW

[Bugfix][**ESPnet2**] update s3prl frontend w.r.t. recent modification in s3prl interface #3839 by @simpleoier

[Bugfix][ESPnet2][**TTS**] bugfix lang2lid in tts.sh #3906 by @imdanboy

[Bugfix][**Installation**] Fix #3783 #3786 by @kamo-naoyuki

Others

[CI] Fix G2P test failure in CI due to the dict update #3848 by @kan-bayashi

[CI][Documentation][ESPnet1][**ESPnet2**] Fixing issues about streaming Transformer/Conformer training #3880 by @D-Keqi

[CI][ESPnet1][ESPnet2][Installation][New Features][**README**] nbest rescoring with k2 #3567 by @glynpu

[Documentation][**README**] Update README.md #3893 by @sw005320

[Documentation][README][**SSL**] Add more docs about s3prl frontend #3796 by @simpleoier

[Documentation][README][**streaming**] Updating main README.md about streaming transformer #3855 by @D-Keqi

[ESPnet1][**RNNT**] Add exception for conformer decoder #3801 by @b-flo

[ESPnet2][README][**Typo**] Fix typo in README.md #3852 by @kan-bayashi

[ESPnet2][**SE**] add eps in beam-forming reference channel selection #3904 by @LiChenda

[ESPnet2][**SLU**] Add unit test for score_intent.py #3759 by @siddhu001

[ESPnet2][**ST**] Speech Translation Update #3860 by @ftshijt

[ESPnet2][TTS][Installation][**Refactoring**] Refactor Phonemizer-based G2P #3916 by @kan-bayashi

Acknowledgements

Special thanks to @D-Keqi, @DanBerrebbi, @IceCreamWW, @LiChenda, @b-flo, @brianyan918, @daisylab, @freewym, @ftshijt, @glynpu, @imdanboy, @kamo-naoyuki, @kan-bayashi, @komatta-san, @maxwellzh, @neillu23, @peter-yh-wu, @pyf98, @seastar105, @siddalmia, @siddhu001, @simpleoier, @sw005320, @unilight.
Source code(tar.gz)
Source code(zip)
v.0.10.4(Nov 10, 2021)
New Features

[New Features][ESPnet1][ESPnet2][ASR][**README**] The code for Emiru's real streaming Transformer #3614 by @D-Keqi

[New Features][ESPnet1][MT][ST][**Installation**] Support sacreBLEU #3698 by @hirofumi0810

[New Features][ESPnet2][**ST**] ESPNet2 speech translation #3587 by @ftshijt

Enhancement

[Enhancement][ESPnet1][**ASR**] Fix e2e_asr_maskctc.py to make RTF computable #3634 by @eddiewng

[Enhancement][ESPnet2][Installation][**README**] HuggingFace Upload support for ESPnet2 tasks [cont.] #3677 by @Fhrozen

[Enhancement][ESPnet2][TTS][**Installation**] Add korean_jaso tokenizer and korean_cleaner #3588 by @windtoker

Bugfix

[Bugfix][ESPnet1][ASR][**RNNT**] Fix quantization for Transducer #3616 by @b-flo

[Bugfix][ESPnet2][ASR][**Recipe**] added download test set, small modifications for path of aishell #3663 by @teinhonglo

[Bugfix][**ESPnet2**] Do stft with librosa when neither MKL nor CUDA is available. #3668 by @CTinRay

[Bugfix][**ESPnet2**] [bug fixed] allow adding noise independently of rir, bug fixed in #3692 by @ranchlai

[Bugfix][ESPnet2][**Recipe**] Create Symlinks for 1-channel/2-channel tracks in chime4 #3699 by @neillu23

[Bugfix][ESPnet2][**Recipe**] Fix SWBD Data Prep Bug #3742 by @brianyan918

Recipe

[Recipe][ESPnet1][ASR][MT][**ST**] Add CoVoST2 recipe #3720 by @hirofumi0810

[Recipe][ESPnet2][ASR][**README**] MISP2021 E2E ASR Baseline #3738 by @neillu23

[Recipe][ESPnet2][ASR][**README**] Wenetspeech #3686 by @pengchengguo

[Recipe][ESPnet2][**SLU**] Add snips hubert feature training #3619 by @yuekaizhang

[Recipe][ESPnet2][**SLU**] Make scoring part more general #3715 by @siddhu001

[Recipe][ESPnet2][SLU][**README**] Add ESPnet-SLU Recipe: Google Speech Commands #3693 by @pyf98

[Recipe][ESPnet2][SLU][**README**] Add an ESPnet2 recipe for the Grabo SLU dataset #3669 by @pyf98

[Recipe][ESPnet2][SLU][**README**] CATSLU-MAPS: Added recipe #3685 by @SujaySKumar

[Recipe][ESPnet2][SLU][**README**] ESPnet2 Japanese dialogue act classification recipe #3667 by @YushiUeda

[Recipe][ESPnet2][SLU][**README**] Slurp SLU with bpe encoded transcripts #3674 by @siddhu001

[Recipe][ESPnet2][SLU][**README**] Slurp entity classification #3739 by @siddhu001

[Recipe][ESPnet2][**SSL**] Add eps in acc computation of HuBERT model #3713 by @simpleoier

[Recipe][ESPnet2][**TTS**] Change the timing of srctexts creation #3734 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] update kss recipe with VITS configuration #3660 by @windtoker

Others

[CI][ESPnet2][**Installation**] Fix tests in CI #3700 by @kan-bayashi

[CI][ESPnet2][SLU][**README**] Add Hubert pretrained ASR in FSC SLU #3653 by @siddhu001

[CI][**Installation**] Minor update for CI #3656 by @kan-bayashi

[Documentation][ESPnet1][README][RNNT][**Refactoring**] Refactor custom Transducer build #3697 by @b-flo

[Documentation][ESPnet2][**README**] Hugging Face support - Doc [cont.] #3709 by @Fhrozen

[Installation] Update pyopenjtalk version #3733 by @kan-bayashi

[README] Huggingface spaces ESPnet2-TTS web demo #3673 by @AK391

[README][**ESPnet2**] Add Huggingface model documentation #3714 by @siddhu001

[README][**ESPnet2**] Fix readme #3750 by @takenori-y

Acknowledgements

Special thanks to @AK391, @CTinRay, @D-Keqi, @Fhrozen, @SujaySKumar, @YushiUeda, @b-flo, @brianyan918, @eddiewng, @ftshijt, @hirofumi0810, @kan-bayashi, @neillu23, @pengchengguo, @pyf98, @ranchlai, @siddhu001, @simpleoier, @takenori-y, @teinhonglo, @windtoker, @yuekaizhang.
Source code(tar.gz)
Source code(zip)
v.0.10.3(Oct 11, 2021)
New Features

[New Features][ESPnet1][RNNT][Installation][**README**] FastEmit support #3591 by @b-flo

[New Features][ESPnet2][**ASR**] Add ASR portable evaluation script #3569 by @kan-bayashi

[New Features][ESPnet2][**README**] EEND-EDA model for diarization task #3621 by @YushiUeda

Bugfix

[Bugfix][**ESPnet1**] Fix /usr/bin/env bash -e #3651 by @kamo-naoyuki

[Bugfix][**ESPnet1**] ctc loss using dropout layer since .eval() will not work for F.dropout #3539 by @zh794390558

[Bugfix][**ESPnet2**] Minor fix of evaluate_asr.sh #3596 by @kan-bayashi

[Bugfix][ESPnet2][**ASR**] wav2vec2_encoder bug fix #3545 by @simpleoier

[Bugfix][ESPnet2][README][**SSL**] Fix some issues of #3512 and add README.md to librispeech/ssl1 recipe. #3572 by @Jzmo

[Bugfix][ESPnet2][**TTS**] Bug fix the attribute registration in VITS generator #3573 by @kan-bayashi

[Bugfix][ESPnet2][**TTS**] Fix pyopenjtalk_g2p_accent(_with_pause) #3555 by @zzxiang

Recipe

[Recipe][ESPnet1][ASR][**RNNT**] Update Transducer recipes #3465 by @b-flo

[Recipe][ESPnet1][**ST**] Clean libri-trans #3540 by @hirofumi0810

[Recipe][ESPnet2][ASR][**README**] Dan aishell4 branch #3585 by @DanBerrebbi

[Recipe][ESPnet2][ASR][**README**] update pretrained models of librispeech using hubert/wav2vec2 #3568 by @simpleoier

[Recipe][ESPnet2][SLU][**README**] Add slu snips data receipe #3407 by @yuekaizhang

[Recipe][ESPnet2][**TTS**] Update GAN-TTS based configurations #3570 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Add initial VITS results for JSUT #3550 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Add つくよみちゃんコーパス recipe #3552 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] IndicSpeech TTS Scripts #3435 by @peter-yh-wu

[Recipe][ESPnet2][TTS][**README**] Update ESPnet2-TTS results #3578 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update JSUT and JVS results #3553 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update LJSpeech and CSMSC results #3560 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update TTS results #3615 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update TTS results #3648 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update VCTK results #3581 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update pret-trained model for TTS recipes #3590 by @ftshijt

[Recipe][ESPnet2][TTS][**README**] update kss recipe with new result. #3589 by @windtoker

[Recipe][ESPnet2][TTS][**Typo**] Fix typo egs2/jtubespeech/tts1 #3564 by @kan-bayashi

[Recipe][ESPnet2][TTS][**Typo**] Update JVS README #3554 by @kan-bayashi

Enhancement

[Enhancement][ESPnet2][SE][**Refactoring**] Add PyTorch Builtin Complex Support in the Speech Enhancement Task #3355 by @Emrys365

[Enhancement][ESPnet2][**TTS**] Hindi g2p #3579 by @peter-yh-wu

[Enhancement][ESPnet2][**TTS**] Unify spks / lids / spk_embed_dim type #3551 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Update evaluate_mcd.py script #3566 by @kan-bayashi

[Enhancement][ESPnet2][TTS][**Installation**] Add the installer of tdmelodic pyopenjtalk #3561 by @kan-bayashi

[Enhancement][ESPnet2][TTS][Installation][**README**] Update TTS objective eval scripts #3650 by @kan-bayashi

[Enhancement][ESPnet2][TTS][**README**] Add a new Japanese G2P for TTS #3558 by @kan-bayashi

[Enhancement][ESPnet2][TTS][**README**] Add a new english G2P #3597 by @kan-bayashi

Others

[CI] Add codecov config and flags. #3603 by @ShigekiKarita

[CI] Omit tools/ from code coverage. #3600 by @ShigekiKarita

[CI] Split test_integration.sh #3599 by @ShigekiKarita

[CI][ESPnet2][Installation][**Refactoring**] Make the installation of transformers optional #3622 by @kan-bayashi

[CI][**Installation**] Add no-check-certificate option in PESQ installation #3649 by @kan-bayashi

[CI][Installation][README][**mergify**] Change setup.py for pytorch1.9.1 #3636 by @kamo-naoyuki

[Documentation][ESPnet1][**RNNT**] Fix/improve doc(string)s related to Transducer model #3623 by @b-flo

[Documentation][ESPnet2][TTS][**README**] Update README of ESPnet2-TTS #3546 by @kan-bayashi

[Documentation][ESPnet2][TTS][**README**] Update TTS README #3565 by @kan-bayashi

[Documentation][ESPnet2][TTS][**README**] Update TTS fine-tuning README #3549 by @kan-bayashi

[Typo][**ESPnet2**] Minor bug in format_wav_scp.py #3575 by @ftshijt

[Typo][ESPnet2][**TTS**] update mismatch help info for tts #3602 by @ftshijt

Acknowledgements

Special thanks to @DanBerrebbi, @Emrys365, @Jzmo, @ShigekiKarita, @YushiUeda, @b-flo, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @peter-yh-wu, @simpleoier, @windtoker, @yuekaizhang, @zh794390558, @zzxiang.
Source code(tar.gz)
Source code(zip)
v.0.10.2(Sep 1, 2021)
News

Hubert training is now available!

Try with egs2/librispeech/ssl1

GAN-based TTS model is now available!

Joint text2mel and vocoder training

End-to-end text-to-wave model (VITS) training

Try with egs2/ljspeech/tts1

Support from_pretrained function!
# e.g. from espnet2.bin.asr_inference import Speech2Text asr = Speech2Text.from_pretrained("model_tag") from espnet2.bin.tts_inference import Text2Speech tts = Text2Speech.from_pretrained("model_tag") from espnet2.bin.enh_inference import SeparateSpeech enh = SeparateSpeech.from_pretrained("model_tag") from espnet2.bin.diar_inference import DiarizeSpeech diar = DiarizeSpeech.from_pretrained("model_tag")

Please check the available pretrained models in espnet_model_zoo!

New Features

[New Features][**ESPnet1**] Intermediate CTC + Stochastic depth #3274 by @jaesong

[New Features][**ESPnet2**] Add new trainer for GAN-based training #3436 by @kan-bayashi

[New Features][ESPnet2][**ASR**] Add Hubert model in Espnet2/Refactor from #3458 #3512 by @Jzmo

[New Features][ESPnet2][**ASR**] batch decode with k2 ctc #3433 by @glynpu

[New Features][ESPnet2][ASR][**SE**] Support from_pretrained for ASR and ENH #3535 by @kan-bayashi

[New Features][ESPnet2][**DIAR**] Support from_pretrained for DIAR #3537 by @YushiUeda

[New Features][ESPnet2][**SE**] Adding portable speech enhancement scripts for other tasks #3487 by @Emrys365

[New Features][ESPnet2][**TTS**] Add GAN-TTS task with VITS #3449 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Support SID and LID inputs for TTS models #3490 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Support from_pretrained function in Text2Speech #3532 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Support parallel_wavegan vocoders in tts_inference.py #3513 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Support joint training of text2mel and vocoder #3501 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Support language ID input for espnet2 TTS #3489 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Support speaker id input for TTS models #3452 by @kan-bayashi

Enhancement

[Enhancement][ESPnet2][CTC segmentation][**README**] Fix CTC Segmentation #3500 by @shirayu

[Enhancement][ESPnet2][**TTS**] Add VITS-related modules #3448 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Add cython code for VITS #3483 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Add joint training config example #3508 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Add melgan module for joint training #3516 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Add parallel wavegan module for joint training #3515 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Add style melgan module for joint training #3517 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Add vocoder modules related to VITS #3439 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Change Text2Speech class output format #3437 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Follow up of the support speaker id input #3453 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Support cleaner option in phn converter util #3450 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Support language id in VITS #3499 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Support linear spectrogram #3438 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Support new g2p functions for various languages #3463 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Update the TTS inference #3498 by @kan-bayashi

[Enhancement][ESPnet2][SLU][**README**] Add support for intent classification on SLURP dataset #3482 by @siddhu001

[Enhancement][ESPnet2][SLU][**README**] Add NLU post-encoder using Hugging Face Transformers #3410 by @akreal

Recipe

[Recipe][ESPnet1][**ASR**] Mucs21 subtask1 #3376 by @sanket0211

[Recipe][ESPnet2][ASR][**README**] Add Swahili ASR recipe #3485 by @akreal

[Recipe][ESPnet2][ASR][**README**] Rename swahili recipe to iwslt21_low_resource #3522 by @akreal

[Recipe][ESPnet2][DIAR][**README**] Modify ESPnet2 diarization recipe #3524 by @YushiUeda

[Recipe][ESPnet2][ESPnet1][**ASR**] Espnet2 mucs_subtask2 #3415 by @bloodraven66

[Recipe][ESPnet2][ESPnet1][**ASR**] mucs subtask1 #3417 by @bloodraven66

[Recipe][ESPnet2][**SE**] Add Voicebank (vctk_noisy) script #3486 by @neillu23

[Recipe][ESPnet2][**TTS**] Add missing configs for LibriTTS recipe #3455 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Update VITS config comments and settings #3528 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] aishell3 dataset preparation #3505 by @actboy

[Recipe][ESPnet2][TTS][**README**] Add CSS10 recipe for ESPnet2-TTS #3464 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Add JtubeSpeech Recipe #3459 by @Takaaki-Saeki

[Recipe][ESPnet2][TTS][**README**] Add SIWIS recipe #3460 by @takenori-y

[Recipe][ESPnet2][TTS][**README**] TTS recipe for J-KAC corpus #3468 by @TanUkkii007

[Recipe][ESPnet2][TTS][**README**] TTS recipes for thchs30 and aishell3 #3470 by @ftshijt

[Recipe][ESPnet2][TTS][**README**] Update JMD README #3531 by @takenori-y

[Recipe][ESPnet2][TTS][**README**] Update SIWIS README #3509 by @takenori-y

[Recipe][ESPnet2][SLU][**README**] Predict ASR transcript along with Intent for SLU #3480 by @siddhu001

[Recipe][ESPnet2][SLU][**README**] Update SWBD DA configuration #3425 by @akreal

Bugfix

[Bugfix][**ESPnet2**] Add return_complex=False for stft #3476 by @D-X-Y

[Bugfix][**ESPnet2**] Dynamic import for the ngram function #3420 by @ftshijt

[Bugfix][ESPnet2][README][**Recipe**] Add the GigaSpeech normalization and fix the WER #3519 by @chaisz19

[Bugfix][ESPnet2][**TTS**] Add duration and focus_rate in output dict #3469 by @kan-bayashi

[Bugfix][ESPnet2][**TTS**] Add missing symlink to trim_silence.py for ESPnet2 #3467 by @kan-bayashi

[Bugfix][ESPnet2][**TTS**] Fix wrong arguments in pretrained vococder wrapper #3525 by @kan-bayashi

[Bugfix][ESPnet2][**TTS**] Revert wrongly removed lines in tts.sh #3503 by @kan-bayashi

[Bugfix][ESPnet2][TTS][**Typo**] Fix typo in hifigan #3504 by @kan-bayashi

Refactoring

[Refactoring][ESPnet1][ASR][RNNT][**README**] Transducer v5 #3217 by @b-flo

[Refactoring][ESPnet2][SE][**DIAR**] Remove prefix enh_ and diar_ #3538 by @kan-bayashi

[Refactoring][ESPnet2][**TTS**] Refactor TTS modules in ESPnet2 #3497 by @kan-bayashi

[Refactoring][ESPnet2][**TTS**] Remove the support of feats_type=fbank/stft in ESPnet2-TTS #3514 by @kan-bayashi

Others

[CI] Fix k2 version in CI using conda #3493 by @kan-bayashi

[CI] Fix test condition #3527 by @kan-bayashi

[CI][**Installation**] Update Sentencepiece and add python 3.9 to CI #3422 by @shirayu

[Docker] Docker Updates #3393 by @Fhrozen

[Documentation] Update the tutorial about maxlenratio usage #3523 by @akreal

[Documentation][ESPnet2][**TTS**] Update README.md #3502 by @kan-bayashi

[Installation][**README**] Added a link and a classifier for Python 3.9 #3440 by @shirayu

[Typo] Fix typos in "egs" #3447 by @shirayu

[Typo][**Documentation**] Fix typos in "doc" #3441 by @shirayu

[Typo][**Documentation**] Fix typos in "utils" #3442 by @shirayu

[Typo][ESPnet1][**MT**] Fix typos in "espnet" #3444 by @shirayu

[Typo][**ESPnet2**] Fix typos in "espnet2" #3443 by @shirayu

[Typo][ESPnet2][**README**] Fix typos in "egs2" #3445 by @shirayu

Acknowledgements

Special thanks to @D-X-Y, @Emrys365, @Fhrozen, @Jzmo, @Takaaki-Saeki, @TanUkkii007, @YushiUeda, @actboy, @akreal, @b-flo, @bloodraven66, @chaisz19, @ftshijt, @glynpu, @jaesong, @kan-bayashi, @neillu23, @sanket0211, @shirayu, @siddhu001, @takenori-y.
Source code(tar.gz)
Source code(zip)
v.0.10.1(Aug 12, 2021)
New Features

[New Features][**ESPnet2**] Porting existing pre-trained models to hugging face #3321 by @siddhu001

[New Features][ESPnet2][ASR][CI][**Installation**] k2_and_espnet2 #3358 by @glynpu

[New Features][ESPnet2][ASR][LM][**CI**] espnet2 ngram #3345 by @qmpzzpmq

[New Features][ESPnet2][**Installation**] add s3prl frontend #3187 by @simpleoier

Recipe

[Recipe][ESPnet1][**ASR**] Fix the iconv error in hkust data prep #3397 by @sw005320

[Recipe][ESPnet1][**ASR**] mucs subtask2 baseline recipes (e2e and kaldi) #3362 by @bloodraven66

[Recipe][ESPnet1][ESPnet2][**ASR**] JTubeSpeech recipe and hkust espnet1 #3406 by @sw005320

[Recipe][ESPnet1][**TTS**] CMU INDIC TTS #3347 by @peter-yh-wu

[Recipe][ESPnet2][**ASR**] ESPnet2 Recipe for Ksponspeech #3387 by @YushiUeda

[Recipe][ESPnet2][**ASR**] Fix gigaspeech pre-trained model link #3317 by @sw005320

[Recipe][ESPnet2][**ASR**] LRS2 lipreading recipe #3346 by @LiChenda

[Recipe][ESPnet2][**ASR**] OpenSLR Sundanese ASR #3344 by @peter-yh-wu

[Recipe][ESPnet2][**ASR**] Recipe of JTubeSpeech #3311 by @sw005320

[Recipe][ESPnet2][**ASR**] fix path error in local/score.sh in swbd #3349 by @wonkyuml

[Recipe][ESPnet2][**ASR**] updated javanese and sundanese readmes #3369 by @peter-yh-wu

[Recipe][ESPnet2][ASR][**Installation**] OpenSLR Javanese ASR #2960 by @peter-yh-wu

[Recipe][ESPnet2][**SLU**] Add initial Switchboard Dialogue Act classification recipe #3395 by @akreal

[Recipe][ESPnet2][**SLU**] FSC Espnet2 data preparation #3352 by @siddhu001

[Recipe][ESPnet2][**TTS**] Add HUI-audio-corpus-german recipe for ESPnet2-TTS #3375 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Add JMD recipe #3394 by @takenori-y

[Recipe][ESPnet2][**TTS**] Add RUSLAN recipe for ESPnet2-TTS #3378 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Support KSS dataset recipe for ESPnet2-TTS #3383 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Update HUI audio corpus german recipe #3381 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Update HUI-audio-corpus-german recipe results of ESPnet2-TTS #3391 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Update KSS dataset recipe results of ESPnet2-TTS #3400 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Update RUSLAN recipe results of ESPnet2-TTS #3390 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] indic tts without pretrained model #3401 by @peter-yh-wu

Enhancement

[Enhancement][**ESPnet2**] Update wav2vec2_encoder.py #3312 by @brotheroak

[Enhancement][ESPnet2][**TTS**] Add trim_silence for ESPnet2-TTS #3380 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Allow override default 'speed_control_alpha' parameter #3316 by @airenas

[Enhancement][ESPnet2][**TTS**] Support French G2P #3372 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Support German G2P #3371 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Support Korean G2P #3382 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Support Russian G2P #3377 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Support Spanish G2P #3373 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Update README about G2P #3374 by @kan-bayashi

Bugfix

[Bugfix][ESPnet1][**ESPnet2**] Fix a type error of swbd data preparation. #3324 by @pengchengguo

[Bugfix][ESPnet1][ESPnet2][**TTS**] Fixed label modification in Taco2 or Transformer-TTS with R > 1 #3392 by @kan-bayashi

[Bugfix][**ESPnet2**] fix a bug in OneCycleLR and CyclicLR #3319 by @sw005320

Others

[Typo][**ESPnet1**] Update batch_beam_search_online_sim.py #3367 by @aky15

[Typo][**ESPnet2**] Fixed typo in model name #3364 by @kan-bayashi

[Typo][**ESPnet2**] Update contextual_block_transformer_encoder.py #3354 by @aky15

Acknowledgements

Special thanks to @LiChenda, @YushiUeda, @airenas, @akreal, @aky15, @bloodraven66, @brotheroak, @glynpu, @kan-bayashi, @pengchengguo, @peter-yh-wu, @qmpzzpmq, @siddhu001, @simpleoier, @sw005320, @takenori-y, @wonkyuml.
Source code(tar.gz)
Source code(zip)
v.0.10.0(Jun 22, 2021)
From v.0.10.x, we drop the support pytorch < 1.3.
See more info in https://github.com/espnet/espnet/issues/3300

New Features and Enhancement

[New Features][ESPnet1][ASR][**CI**] Dynamic quantization for decoding #3210 by @xu-gaopeng

[New Features][**ESPnet1**] Add quantize args #3280 by @xu-gaopeng

[Enhancement][ESPnet2][**README**] Update W&B integration #3278 by @AyushExel

[Enhancement][ESPnet2][**README**] Change the default value of use_wandb to False #3287 by @kamo-naoyuki

Bugfix

[Bugfix][**ESPnet1**] Fix some bugs in xml2stm.py #3252 by @AshrafMahdhi

[Bugfix][ESPnet1][**Recipe**] fix the required number of arguments #3249 by @AshrafMahdhi

[Bugfix][**ESPnet2**] Bug fix of accum_grad when grad-nan #3283 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix #3255 #3257 by @tjysdsg

[Bugfix][**ESPnet2**] Fix bug when "--field -5" is passed to espnet2.bin.tokenize_text #3262 by @tjysdsg

[Bugfix][**ESPnet2**] Fix typo in asr.sh (espnet2) that might cause bug #3264 by @tjysdsg

[Bugfix][**ESPnet2**] Warn ignore_nan_grad with warpctc instead of error. #3298 by @ShigekiKarita

[Bugfix][ESPnet2][**TTS**] Fix a bug in the TTS transformer initialization #3251 by @sw005320

Recipe

[Recipe][ESPnet1][**ST**] Minor fix of Fisher-Callhome recipe #3305 by @hirofumi0810

[Recipe][ESPnet2][**ASR**] ESPnet2 Receipe for swbd #3269 by @yuekaizhang

[Recipe][ESPnet2][ASR][**README**] SWBD Result Update #3308 by @roshansh-cmu

[Recipe][ESPnet2][**SE**] Add scripts for DNS Interspeech 2020 in ESPNet-se #3259 by @neillu23

[Recipe][ESPnet2][SE][**README**] Pretrained model for vctk noisy reverberant recipe #3273 by @LiChenda

[Recipe][ESPnet2][SE][**README**] dns_ins20: Add README.md and real_recording testing data. #3281 by @neillu23

Refactoring

[Refactoring][ESPnet2][**ASR**] Update ctc.py #3292 by @200987299

[Refactoring][ESPnet1][ASR][MT][CI][**README**] Delete old pytorch dispatch in espnet1 #3301 by @ShigekiKarita

[Refactoring][CI][Documentation][Installation][**README**] Remove travis and add .github/workflows/doc.yml to deploy doc #3294 by @ShigekiKarita

[Refactoring][CI][Installation][**README**] Add pytorch 1.9.0 support and remove 1.0.1, 1.1.0, and 1.2.0 #3299 by @ShigekiKarita

Others

[Documentation][**ESPnet2**] Add a comment for disabling the attention plot #3258 by @sw005320

[ESPnet2][Installation][**mergify**] Follow up for #3299, about pytorch1.9.0 in ci #3310 by @kamo-naoyuki

Acknowledgements

Special thanks to @200987299, @AshrafMahdhi, @AyushExel, @LiChenda, @ShigekiKarita, @hirofumi0810, @kamo-naoyuki, @neillu23, @roshansh-cmu, @sw005320, @tjysdsg, @xu-gaopeng, @yuekaizhang.
Source code(tar.gz)
Source code(zip)
v.0.9.10(May 29, 2021)
New Features

[New Features][ESPnet1][ESPnet2][Installation][**README**] CTC Segmentation for ESPnet 2 #3087 by @lumaku

Bugfix

[Bugfix][**ESPnet1**] Fix merge_short_segments.py #3171 by @hirofumi0810

[Bugfix][**ESPnet1**] update layer norm to reflect the dimension variable #3193 by @sw005320

[Bugfix][ESPnet1][**ASR**] Fix a bug about variable spelling errors #3208 by @lzm0706

[Bugfix][ESPnet1][**ST**] Fix ST-TED data preparation #3167 by @hirofumi0810

[Bugfix][**ESPnet2**] Fix a bug of adding noise to the training data. #3220 by @pengchengguo

[Bugfix][**ESPnet2**] fix a bug in the CTC mode #3190 by @sw005320

[Bugfix][**ESPnet2**] fix typo for AdapterForSoundScpReader #3096 by @deciding

[Bugfix][**ESPnet2**] remove find_unused_parameters from DataParallel #3149 by @kamo-naoyuki

[Bugfix][ESPnet2][**ASR**] Changed to include nlsyms.txt in the pretrained model #3236 by @kamo-naoyuki

[Bugfix][ESPnet2][**ASR**] Fix missing nlsyms.txt for pretrained models #3234 by @lumaku

[Bugfix][ESPnet2][**ASR**] Workaround for missing nlsyms.txt #3235 by @kamo-naoyuki

[Bugfix][ESPnet1][ASR][**Installation**] GTN CTC bug fix, unit test, and installer #3199 by @brianyan918

[Bugfix][ESPnet2][**README**] Update README.md, edit wrong file link. #3164 by @xxjjvxb

Enhancement

[Enhancement] Added "trans_type" to utils/remove_longshortdata.sh and utils/update_json.sh #3148 by @teinhonglo

[Enhancement][ESPnet2][SE][**README**] Update the readme file for the SE demo page. #3225 by @LiChenda

[Enhancement][ESPnet2][ASR][**README**] update asr demo #3192 by @ftshijt

Recipe

[Recipe][ESPnet1][**ASR**] Fix segmentation in IWSLT21 ASR #3169 by @hirofumi0810

[Recipe][ESPnet1][**ASR**] Fix tokenization on TEDLIUM2 in IWSLT21 ASR recipe #3142 by @hirofumi0810

[Recipe][ESPnet1][**ASR**] fix add_to_datadir.py in mgb2 recipe #3238 by @AshrafMahdhi

[Recipe][ESPnet1][**ASR**] fix receipe bug for swbd #3174 by @yuekaizhang

[Recipe][ESPnet1][ASR][**RNNT**] Transducer configs & results for AISHELL-1 #3240 by @yusshino

[Recipe][ESPnet1][ASR][**ST**] Fix IWSLT21 recipe for test set evaluation #3155 by @hirofumi0810

[Recipe][ESPnet1][ESPnet2][**README**] endangered language recognition espnet2 recipe #3214 by @ftshijt

[Recipe][ESPnet1][**MT**] Add IWSLT21 MT recipe #3140 by @hirofumi0810

[Recipe][ESPnet1][**ST**] Add IWSLT21 ST recipe #3150 by @hirofumi0810

[Recipe][ESPnet1][**ST**] Fix IWSLT evaluation data preparation #3168 by @hirofumi0810

[Recipe][ESPnet1][**ST**] IWSLT21 punctuation restoration recipe #3145 by @hirofumi0810

[Recipe][ESPnet1][**ST**] Merge short segments in IWSLT test sets #3162 by @hirofumi0810

[Recipe][ESPnet1][**TTS**] Fix misspelling in ./egs/jsut/tts1/local/download.sh #3227 by @muramasa2

[Recipe][ESPnet2][**ASR**] Normalization for Open_li52 #3215 by @ftshijt

[Recipe][ESPnet2][**SE**] ESPnet-SE Recipe for noisy reverberant dataset #3243 by @LiChenda

[Recipe][ESPnet2][SE][**README**] Update recipes for speech enhancement task #3153 by @LiChenda

Acknowledgements

Special thanks to @AshrafMahdhi, @LiChenda, @brianyan918, @deciding, @ftshijt, @hirofumi0810, @kamo-naoyuki, @lumaku, @lzm0706, @muramasa2, @pengchengguo, @sw005320, @teinhonglo, @xxjjvxb, @yuekaizhang, @yusshino.
Source code(tar.gz)
Source code(zip)
v.0.9.9(Apr 7, 2021)
New Features

[New Features][**ESPnet2**] Speaker diarization implementation in ESPnet #2939 by @ftshijt

[New Features][**ESPnet2**] Adding gpu_max_cached_mem_GB in reporter's stats #3057 by @kamo-naoyuki

[New Features][**ESPnet2**] add --detect_anomaly option #3035 by @kamo-naoyuki

[New Features][ESPnet2][**SE**] Further update to speech enhancement task #2929 by @shincling

Bugfix

[Bugfix][**ESPnet1**] Fix a typo in the aishell config #3089 by @sw005320

[Bugfix][**ESPnet1**] Fix utils/speed_perturb.sh #3062 by @hirofumi0810

[Bugfix][**ESPnet1**] fix #3017 #3022 by @kamo-naoyuki

[Bugfix][ESPnet1][**RNNT**] Fix+update RNN encoder #3048 by @b-flo

[Bugfix][ESPnet1][**RNNT**] Minor fix for NSC #3030 by @b-flo

[Bugfix][**ESPnet2**] Fix #3072 #3073 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix ESPnet2-TTS conformer backward compatibility #3108 by @kan-bayashi

[Bugfix][**ESPnet2**] Fix a bug when use_amp=True without fairscale #3029 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix logging for pytorch>=1.8 #3056 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fixed backward compatibility issue of new conformer definition #3068 by @hfujihara

[Bugfix][**Installation**] Fix a bug of uninstalling typing #3058 by @kamo-naoyuki

[Bugfix][**Installation**] Fix setup.py to install filelock #3074 by @kamo-naoyuki

[Bugfix][**Installation**] fix the condition to install fairscale #3050 by @kamo-naoyuki

[Bugfix][Recipe][**ESPnet1**] Typo fixed for nahuatl recipe #3044 by @ftshijt

[Bugfix][Recipe][ESPnet1][**ASR**] Bugfix for download_and_untar for nahuatl #3049 by @ftshijt

[Bugfix][Recipe][ESPnet1][ESPnet2][**TTS**] Fix CSMSC download script #3109 by @kan-bayashi

[Bugfix][Recipe][ESPnet2][TTS][**README**] fixed typo #3121 #3123 by @kan-bayashi

Enhancement

[Enhancement][ASR][ESPnet1][**RNNT**] Update loss report #3110 by @b-flo

[Enhancement][ESPnet1][**RNNT**] Fix related to custom encoder and aux task #3045 by @b-flo

[Enhancement][ESPnet2][Documentation][Installation][**README**] modification of freezing option for Wav2Vec encoder, add documents #3036 by @simpleoier

Recipe

[Recipe][ESPnet1][**ASR**] added results and uploaded models #3063 by @sw005320

[Recipe][ESPnet1][ASR][**ST**] fix download for puebla-nahuatl #3039 by @ftshijt

[Recipe][ESPnet1][**MT**] Update IWSLT18 MT recipe #3071 by @hirofumi0810

[Recipe][ESPnet1][**ST**] IWSLT21-low-resource recipe #3023 by @ftshijt

[Recipe][ESPnet1][**ST**] Nahuatl Speech Translation #3034 by @ftshijt

[Recipe][ESPnet2][ASR][**README**] Added spgispeech recipe in espnet2 #2986 by @sw005320

[Recipe][ESPnet2][ASR][**README**] Update librispeech result #3082 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] Updated ami ihm result #3091 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] added a bpe10000 model and result #3060 by @sw005320

[Recipe][ESPnet2][ASR][**README**] gigaspeech #3077 by @sw005320

Refactoring

[Refactoring][**ESPnet1**] Refactor layer selection in Transformer #3024 by @hirofumi0810

[Refactoring][ESPnet1][MT][**ST**] Unify divide_lang.sh #3066 by @hirofumi0810

[Refactoring][**ESPnet2**] Make batch bins sampler faster #3106 by @kamo-naoyuki

[Refactoring][**Installation**] Use new pyopenjtalk version #3107 by @kan-bayashi

[Refactoring][ESPnet1][ESPnet2][Installation][Docker][**Documentation**] Change '#!/bin/bash' to '#!/usr/bin/env bash' #3059 by @kamo-naoyuki

Other

[CI][Installation][README][**mergify**] Using torch=1.8.1 in ci tests #3122 by @kamo-naoyuki

[CI][Installation][README][**mergify**] Adding pytorch=1.8.0 to the ci #3046 by @kamo-naoyuki

Acknowledgements

Special thanks to @b-flo, @ftshijt, @hfujihara, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @shincling, @simpleoier, @sw005320.
Source code(tar.gz)
Source code(zip)
v.0.9.8(Mar 1, 2021)
New Features

[New Features][ESPnet1][ASR][**RNNT**] Auxiliary task #2951 by @b-flo

[New Features][ESPnet1][**Recipe**] RTF calculation #2942 by @hirofumi0810

[New Features][**ESPnet2**] Supporting multiple optimizers in the default trainer #3014 by @kamo-naoyuki

[New Features][ESPnet2][**ASR**] Streaming Transformer ASR #2907 by @eml914

[New Features][ESPnet2][ASR][**Installation**] add wav2vec_encoder #2889 by @simpleoier

[New Features][ESPnet2][Documentation][Installation][**README**] Support sharded training of fairscale #2980 by @kamo-naoyuki

[New Features][ESPnet2][**SE**] Add SeparateSpeech API in espnet2/bin/enh_inference.py #2878 by @Emrys365

[New Features][ESPnet2][TTS][Installation][**README**] Support phonemizer for vairous language G2P #2959 by @kan-bayashi

Bugfix

[Bugfix][CI][**Installation**] Install warp-ctc using pip>=21.0 #2999 by @ysk24ok

[Bugfix][**ESPnet1**] Integration testing for asr_mix was using the wrong config. #3006 by @siddalmia

[Bugfix][ESPnet1][**ASR**] Fix model averaging #2910 by @b-flo

[Bugfix][ESPnet1][**ASR**] bug fixed for streaming transformer ASR #2981 by @eml914

[Bugfix][ESPnet1][**ASR**] builtin ctc modification #3001 by @siddalmia

[Bugfix][ESPnet1][ASR][**CI**] Fix transfer learning w/ pre-trained LM + finetuning tutorial #2967 by @b-flo

[Bugfix][ESPnet1][ASR][**RNNT**] Fix a condition in TSD #2965 by @b-flo

[Bugfix][ESPnet1][ASR][**Recipe**] fix egs/ljspeech/asr1 #2865 #2884 by @kan-bayashi

[Bugfix][ESPnet1][ASR][Recipe][**ST**] Fix bug in How2 recipe #2933 by @hirofumi0810

[Bugfix][ESPnet1][ASR][**Refactoring**] Fix data sorting in attention/CTC visualization #2883 by @hirofumi0810

[Bugfix][ESPnet1][**Docker**] Fix docker error caused by BeamSearchTransducer #2973 by @b-flo

[Bugfix][ESPnet1][**ESPnet2**] Fix bugs of our Conformer implementation. #2816 by @pengchengguo

[Bugfix][ESPnet1][ESPnet2][**Refactoring**] Fix arguments in dynamic and lightweight conv #3004 by @hirofumi0810

[Bugfix][ESPnet1][**RNNT**] fix out_dim definition #2915 by @b-flo

[Bugfix][ESPnet1][**TTS**] Fix attention plot bug #2984 #2985 by @kan-bayashi

[Bugfix][ESPnet1][**mergify**] swbd run.sh is including dev data in the training set #2977 by @brianyan918

[Bugfix][**ESPnet2**] Fix sharded_ddp mode #3015 by @kamo-naoyuki

[Bugfix][**ESPnet2**] bug fix for Wav2Vec encoder #2997 by @simpleoier

[Bugfix][ESPnet2][**Documentation**] Fix for sharded training with amp #2993 by @kamo-naoyuki

[Bugfix][ESPnet2][**Documentation**] Fix sharded training for multiple nodes #2994 by @kamo-naoyuki

[Bugfix][ESPnet2][**SE**] quick fix for librimix (SE) data preparation #2982 by @LiChenda

Recipe

[Recipe][ESPnet1][**ASR**] Fix dev set in IWSLT21 ASR recipe #3000 by @hirofumi0810

[Recipe][ESPnet1][**ASR**] IWSLT'21 ASR recipe #2934 by @hirofumi0810

[Recipe][ESPnet1][**ASR**] Update IWSLT21 ASR recipe #2987 by @hirofumi0810

[Recipe][ESPnet1][**ASR**] Update the pre-trained Conformer model link of Aishell-1 corpus. #2924 by @pengchengguo

[Recipe][ESPnet1][**ASR**] Update transformer training results on common vioce dataset #2927 by @wenjie-p

[Recipe][ESPnet1][ASR][CI][Installation][**Refactoring**] Update IWSLT18 (ST-TED) ASR recipe #2916 by @hirofumi0810

[Recipe][ESPnet1][ASR][MT][ST][**README**] Must-C v2 recipe #2963 by @hirofumi0810

[Recipe][ESPnet1][ASR][MT][ST][**Refactoring**] Refactor Fisher-CallHome recipe #2904 by @hirofumi0810

[Recipe][ESPnet1][ASR][MT][ST][**Refactoring**] Refactor How2 recipe #2906 by @hirofumi0810

[Recipe][ESPnet1][ASR][MT][ST][**Refactoring**] Refactor Must-C recipe #2901 by @hirofumi0810

[Recipe][ESPnet1][ASR][MT][ST][**Refactoring**] Refactor libri-trans recipe #2903 by @hirofumi0810

[Recipe][ESPnet1][ASR][ST][**Refactoring**] Update IWSLT'19 recipe #2940 by @hirofumi0810

[Recipe][ESPnet1][ST][CI][**Refactoring**] Refactor ST recipes #2975 by @hirofumi0810

[Recipe][ESPnet1][ST][**Refactoring**] Refactor Mboshi-French corpus #2911 by @hirofumi0810

[Recipe][ESPnet2][**ASR**] Open-li52(add language id scoring & text case align for test set) #2938 by @ftshijt

[Recipe][ESPnet2][ASR][**README**] Add Russian open STT recipe for ESPnet2 #2972 by @akreal

[Recipe][ESPnet2][ASR][**README**] MLS (multi-lingual librispeech) recipe #2869 by @ftshijt

[Recipe][ESPnet2][ASR][**README**] Update espnet2 librispeech result #2966 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] added nsc results #2937 by @sw005320

[Recipe][ESPnet2][ASR][**README**] fix librispeech model url #2976 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] minor fix of li52 and nsc recipes #2936 by @sw005320

[Recipe][ESPnet2][ASR][**README**] update the results of open li52 recipe #2974 by @sw005320

[Recipe][ESPnet2][**SE**] Librimix separation results for Conv-Tasnet, 8k, min #2928 by @anogkongda

[Recipe][ESPnet2][SE][**README**] Espnet-SE, Speech enhancement recipes #2888 by @LiChenda

Enhancement

[Enhancement][ESPnet1][**ASR**] Auto Resampling to 16khz for pretrained models #2969 by @siddalmia

[Enhancement][ESPnet1][ASR][**RNNT**] Minor refactoring #2932 by @b-flo

[Enhancement][ESPnet1][ASR][RNNT][README][CI][**Documentation**] Refactoring RNNT #2887 by @b-flo

[Enhancement][ESPnet1][ESPnet2][ASR][LM][MT][**TTS**] Print total params and trainable params. #2996 by @siddalmia

[Enhancement][ESPnet1][**LM**] Add LM options like embedding dropout and tie weights #3010 by @siddalmia

[Enhancement][ESPnet1][ST][**Refactoring**] Add the latest RPE implementation to the ST task. #3005 by @pengchengguo

Other

[CI][README][**mergify**] Stop circle ci #2978 by @kamo-naoyuki

[Documentation] Update docs for ESPnet contributing (especially for recipes part) #2905 by @ftshijt

[Documentation] fix a typo #3016 by @Huang17

[Installation] Uninstall typing #2979 by @kamo-naoyuki

Acknowledgements

Special thanks to @Emrys365, @Huang17, @LiChenda, @akreal, @anogkongda, @b-flo, @brianyan918, @eml914, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @pengchengguo, @siddalmia, @simpleoier, @sw005320, @wenjie-p, @ysk24ok.
Source code(tar.gz)
Source code(zip)
v.0.9.7(Jan 15, 2021)
New Feature

[New Features][ESPnet1][**ASR**] Option for GTN CTC mode #2866 by @brianyan918

[New Features][ESPnet2][SE][**README**] Update to speech enhancement task #2649 by @LiChenda

[New Features][ESPnet2][ASR][**README**] Lightweight Sinc Convolutions for Espnet2 #2768 by @lumaku

[New Features][ESPnet2][**Documentation**] --freeze_param option #2787 by @kamo-naoyuki

[New Features][ESPnet2][TTS][**README**] Add a new G2P pyopenjtalk_accent_with_pause #2843 by @kan-bayashi

[New Features][ESPnet2][TTS][**README**] Add pyopenjtalk_accent g2p for ESPnet2 TTS #2781 by @ota

[New Features][ESPnet2][TTS][**README**] Support X-vector based multi-speaker TTS model in ESPnet2 #2800 by @kan-bayashi

Enhancement

[Enhancement][ESPnet1][**ESPnet2**] Add version info in args #2841 by @kan-bayashi

[Enhancement][ESPnet1][ESPnet2][**ASR**] AMI Recipe (Short UTT checker) #2802 by @ftshijt

[Enhancement][**Installation**] add default activate_python.sh #2788 by @kamo-naoyuki

[Enhancement][**Installation**] modified: check_install.py #2834 by @kamo-naoyuki

[Enhancement][Installation][Documentation][ESPnet1][**ESPnet2**] Change version info location #2840 by @kan-bayashi

Bugfix

[Bugfix][ESPnet1][**ASR**] fix greedy decoding #2812 by @b-flo

[Bugfix][ESPnet2][**ASR**] Fix the compatibility of the pretrained ASR model #2794 by @kan-bayashi

[Bugfix][**Installation**] Fix #2799 #2830 by @kamo-naoyuki

[Bugfix][**Installation**] Fix HTS engine installation #2825 by @kan-bayashi

[Bugfix][**Installation**] fix the incorrect $PATH setting in tools/extra_path.sh #2833 by @jumon

[Bugfix][Recipe][ESPnet1][**ASR**] Minor fixes in CSJ #2837 by @YosukeHiguchi

[Bugfix][Recipe][ESPnet1][**ASR**] fix receipe bug for librispeech #2735 by @yuekaizhang

[Bugfix][Recipe][ESPnet2][**ASR**] fix a config name #2729 by @sw005320

[Bugfix][Recipe][ESPnet2][ASR][**README**] Fix dirha_wsj recipe #2747 by @kamo-naoyuki

[Bugfix][Recipe][ESPnet2][**TTS**] Add missing decoding configs in LibriTTS recipe #2827 by @kan-bayashi

Recipe

[Recipe][ESPnet1][**ASR**] Add LibriSpeech Conformer results for LibriCSS #2861 by @akreal

[Recipe][ESPnet1][**ASR**] Update Commonvoice Recipe with Conformer Settings #2739 by @ftshijt

[Recipe][ESPnet1][**ASR**] Update Russian open STT recipe for v1.01 of the dataset #2776 by @akreal

[Recipe][ESPnet1][**ASR**] Update models and results of Conformer. #2765 by @pengchengguo

[Recipe][ESPnet1][ESPnet2][ASR][**README**] ESPnet2 recipe for commonvoice #2793 by @hchung12

[Recipe][ESPnet1][VC][**README**] VCC2020 database #2754 by @unilight

[Recipe][ESPnet2][ASR][**README**] Update Dirha WSJ result #2756 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] espnet2 hkust recipe #2863 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] update the AMI result in espnet2 #2817 by @sw005320

[Recipe][ESPnet2][ASR][**README**] updated the laborotv result #2750 by @sw005320

[Recipe][ESPnet2][ASR][**README**] Update reverb result #2876 by @kamo-naoyuki

[Recipe][ESPnet2][**ASR**] Minor fix of laborotv recipe #2877 by @hfujihara

[Recipe][ESPnet2][**TTS**] Fix total number of iterations #2813 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Add libritts recipe for ESPnet2 #2807 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Add x-vector based configs for VCTK #2808 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Minor update TTS README #2818 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update JSUT TTS results #2792 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update JSUT results #2809 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update JSUT results #2871 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update LibriTTS results #2842 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update VCTK results #2814 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] Update libritts results #2828 by @kan-bayashi

[Recipe][ESPnet2][TTS][**README**] update latest CSMSC link address #2777 by @meowtech

Other

[CI][Documentation][**Installation**] Change warp-ctc and warp-transducer to extra #2748 by @kamo-naoyuki

[CI][**README**] Update ci setting #2848 by @kan-bayashi

[ASR][Documentation][**ESPnet2**] Sinc Convolutions - add documentation for plot_sinc_filters.py #2782 by @lumaku

[Documentation][**ESPnet1**] fixed some typos #2855 by @jumon

[Documentation][**Installation**] Update documentation #2757 by @kamo-naoyuki

[Installation][**Refactoring**] Move the dependencies coming from recipes #2740 by @kamo-naoyuki

Acknowledgements

Special thanks to @AdolfVonKleist, @LiChenda, @YosukeHiguchi, @akreal, @b-flo, @brianyan918, @ftshijt, @hchung12, @hfujihara, @jumon, @kamo-naoyuki, @kan-bayashi, @lumaku, @meowtech, @ota, @pengchengguo, @sw005320, @unilight, @yuekaizhang.
Source code(tar.gz)
Source code(zip)
v.0.9.6(Dec 1, 2020)
New Feature

[New Features][**ESPnet2**] Wandb integration #2707 by @kamo-naoyuki

[New Features][ESPnet2][**ASR**] Add ignore_nan_grad option for CTC #2699 by @kamo-naoyuki

[New Features][ESPnet2][**SE**] Touching common modules before the main Enh PR #2705 by @LiChenda

Bug fix

[Bugfix][**ESPnet1**] bug fix for pytorch1.7 #2656 by @kamo-naoyuki

[Bugfix][ESPnet1][ESPnet2][**TTS**] Use nkf in CSMSC data prep #2726 by @kan-bayashi

[Bugfix][**ESPnet2**] Fix flooring for global_mvn.py #2623 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix small bug of tensorboard part #2702 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix wandb mode with multi gpus #2709 by @kamo-naoyuki

[Bugfix][ESPnet2][**TTS**] Fix token averaged feature the case when r > 1 #2704 by @kan-bayashi

Recipe

[Recipe][**ESPnet1**] Extend model averaging condition in run scripts #2613 by @b-flo

[Recipe][ESPnet1][**ASR**] Enable multi-thread processing of json files. #2681 by @Peidong-Wang

[Recipe][ESPnet1][**ASR**] Update KsponSpeech conformer results #2624 by @jubang0219

[Recipe][ESPnet1][**ASR**] Update Voxforge with Conformer results #2642 by @YosukeHiguchi

[Recipe][ESPnet1][**ASR**] lang was being used before being parsed for user input #2654 by @siddalmia

[Recipe][ESPnet1][ASR][ESPnet2][Installation][**README**] espnet2 reverb recipe #2691 by @kamo-naoyuki

[Recipe][ESPnet1][ASR][**README**] Update Switchboard with conformer results #2697 by @Emrys365

[Recipe][ESPnet1][ASR][**README**] add librispeech conformer w/ speed perturbation + specaug #2617 by @yuekaizhang

[Recipe][ESPnet2][**ASR**] ASR template recipe: --srctexts -> --lm_train_text, --bpe_train_text #2660 by @kamo-naoyuki

[Recipe][ESPnet2][**ASR**] Add $token_type to asr_tag and lm_tag #2625 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][Installation][README][**Recipe**] Laborotv recipe #2703 by @sw005320

[Recipe][ESPnet2][ASR][**README**] Add AISHELL w/o LM result #2718 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] ESPnet2 recipe for TIMIT #2568 by @sknadig

[Recipe][ESPnet2][ASR][**README**] JSUT conformer recipe achieving 12.0/13.9 CER(%) for dev/eval1 #2720 by @hchung12

[Recipe][ESPnet2][ASR][**README**] Update README.md #2659 by @sw005320

[Recipe][ESPnet2][ASR][**README**] Update WSJ result #2628 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] espnet2 librispeech with conformer #2687 by @sw005320

[Recipe][ESPnet2][**README**] Corpus README in egs2 #2713 by @sw005320

[Recipe][ESPnet2][**README**] update egs2/README.md #2719 by @Emrys365

Enhancement

[Enhancement][Documentation][**ESPnet2**] Add --init_param option #2680 by @kamo-naoyuki

[Enhancement][ESPnet1][**ASR**] Save model snapshot at every epoch even if save_interval_iters > 0 - for model averaging #2637 by @sknadig

[Enhancement][**ESPnet2**] Update wandb part #2708 by @kamo-naoyuki

[Enhancement][ESPnet2][**ASR**] Add *_stats_dir options in asr.sh #2724 by @kan-bayashi

Documentation

[Documentation][ESPnet2][**README**] Update egs2 README #2723 by @kan-bayashi

[Documentation][ESPnet2][README][**TTS**] Update README about fine-tuning #2685 by @kan-bayashi

[Documentation][ESPnet2][README][**TTS**] Update TTS README.md #2650 by @kan-bayashi

Refactoring

[Refactoring][ESPnet1][ASR][**README**] Refactor Mask CTC non-autoregressive ASR #2223 by @YosukeHiguchi

[Refactoring][**ESPnet2**] Added unicode support for generated configs #2672 by @Piteryo

Others

[Installation] python setup.py install -> pip install -e #2619 by @kamo-naoyuki

[Installation][**Refactoring**] modify for zsh: tools/extra_path.sh #2696 by @kamo-naoyuki

[Docker] Docker flags for extra libraries (VC) #2622 by @Fhrozen

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @LiChenda, @Peidong-Wang, @Piteryo, @YosukeHiguchi, @b-flo, @hchung12, @jubang0219, @kamo-naoyuki, @kan-bayashi, @siddalmia, @sknadig, @sw005320, @yuekaizhang.
Source code(tar.gz)
Source code(zip)
v.0.9.5(Oct 31, 2020)
New Features

[New Features][ESPnet2][**TTS**] Support g2p=none for text with phonemes #2551 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Add MCD evaluation script for ESPnet2-TTS #2554 by @kan-bayashi

[New Features][ESPnet1][**ST**] Conformer End-to-End Speech Translation #2523 by @hirofumi0810

Bugfix

[Bugfix][**ESPnet1**] CTC segmentation - package update #2566 by @lumaku

[Bugfix][ASR][**ESPnet1**] fix bug about att_ws in multi-enc case #2549 by @lzm0706

[Bugfix][**ESPnet1**] Conformer averaging model support for pytorch 1.6 #2604 by @siddalmia

[Bugfix][ESPnet1][**ASR**] Set built-in CTC for asr_recog #2588 by @lumaku

[Bugfix][ESPnet1][ASR][**Installation**] Transducer float16 loss bug fix #2496 by @GNroy

Refactoring

[Refactoring][ESPnet1][**ASR**] Refactor BeamSearchTransducer and ErrorCalculatorTrans #2538 by @b-flo

Recipe

[Recipe][ESPnet1][**ASR**] Alignment recipe for CSJ. #2531 by @jnishi

[Recipe][ESPnet1][**ASR**] New Recipe for KsponSpeech (Korean spontaneous speech; 969 hours) #2555 by @jubang0219

[Recipe][ESPnet1][**ASR**] Update TedLium3 conformer results #2600 by @LiChenda

[Recipe][ESPnet1][**ASR**] Update VIVOS models #2574 by @b-flo

[Recipe][ESPnet1][**ASR**] Update model link in Puebla-Nahuatl #2607 by @ftshijt

[Recipe][ESPnet1][**ASR**] Update tedlium2 with conformer results #2599 by @Emrys365

[Recipe][ESPnet1][**ASR**] update the JSUT recipe with conformer #2546 by @sw005320

[Recipe][ESPnet2][**ASR**] Add CSJ conformer config #2560 by @kan-bayashi

[Recipe][ESPnet2][**ASR**] Add CSJ conformer results #2552 by @kan-bayashi

[Recipe][ESPnet2][**ASR**] Small changes for aishell config #2586 by @kamo-naoyuki

[Recipe][ESPnet2][**ASR**] Update espnet2 AISHELL results #2580 by @kamo-naoyuki

[Recipe][ESPnet2][**ASR**] update JSUT espnet2 with pre-trained models #2563 by @sw005320

[Recipe][ESPnet2][**TTS**] Add JSSS recipe for ESPnet2-TTS #2558 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Update ESPnet2 TTS result #2542 by @kan-bayashi

CI

[CI][**Documentation**] Support espnet2/bin in sphinx doc. #2544 by @ShigekiKarita

[CI][Installation][**README**] Add pytorch1.7.0 ci test #2605 by @kamo-naoyuki

Other

[Installation] Install warpctc-pytorch wheel when torch version is 1.1 - 1.6 #2547 by @ysk24ok

[Installation] Modified requirements: "dataclasses; python_version < '3.7'", #2541 by @kamo-naoyuki

[Installation] Remove pip3 check in setup_python.sh #2567 by @ShigekiKarita

Acknowledgements

Special thanks to @Emrys365, @GNroy, @LiChenda, @ShigekiKarita, @b-flo, @ftshijt, @hirofumi0810, @jnishi, @jubang0219, @kamo-naoyuki, @kan-bayashi, @lumaku, @lzm0706, @siddalmia, @sw005320, @ysk24ok.
Source code(tar.gz)
Source code(zip)
v.0.9.4(Sep 30, 2020)
New Features

[New Features][ESPnet1][**ASR**] Transducer v4 #2444 by @b-flo

[New Features][**ESPnet2**] Support audio_format=flac.ark, wav.ark #2451 by @kamo-naoyuki

[New Features][ESPnet2][**ASR**] Support conformer encoder in ESPnet2 ASR #2515 by @kan-bayashi

Bugfix

[Bugfix][**ESPnet1**] Fixed IndexError in BatchBeamSearch.post_process() (#2483) #2484 by @kan-bayashi

[Bugfix][ESPnet1][**LM**] fix multigpu bug if pytorch>=1.5 #2492 by @kamo-naoyuki

[Bugfix][**ESPnet2**] remove cleaner #2529 by @kamo-naoyuki

[Bugfix][ESPnet2][**TTS**] Fix TTS inference bug for GST + Fastspeech2 #2498 by @kan-bayashi

Documentation

[Documentation] Update espnet2_tutorial.md #2528 by @kamo-naoyuki

[Documentation] Update espnet2_tutorial.md #2532 by @kamo-naoyuki

[Documentation] Update espnet2_tutorial.md #2534 by @kamo-naoyuki

[Documentation] Update notebook submodule #2499 by @kan-bayashi

[Documentation][**ESPnet1**] Small fixes for transducer #2514 by @b-flo

[Documentation][ESPnet2][README][**TTS**] Update ESPnet2 TTS README #2516 by @kan-bayashi

[Documentation][**README**] Update README #2504 by @kan-bayashi

[Documentation][README][**ESPnet1**] CTC segmentation - checks for blank chars and RNN models #2535 by @lumaku

Recipe

[Recipe][ESPnet1][**ASR**] add conformer results for librispeech #2510 by @yuekaizhang

[Recipe][ESPnet2][**ASR**] Update ESPnet2 CSJ Transformer results #2497 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Add results for ESPnet2 TTS #2503 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Update Transformer-TTS config #2494 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Update Transformer-TTS configs #2502 by @kan-bayashi

Refactoring

[Refactoring] Modify uttid to "${spkid}-${uttid}" for trn files #2527 by @kamo-naoyuki

[Refactoring][ESPnet1][ASR][**LM**] Remove all future lines #2481 by @ShigekiKarita

[Refactoring][ESPnet1][ASR][MT][**ST**] Unify arguments #2506 by @hirofumi0810

[Refactoring][ESPnet1][ESPnet2][**TTS**] Refactor length regulator to improve the speed #2482 by @kan-bayashi

[Refactoring][ESPnet1][MT][**ST**] Refactor decoding for translation tasks #2501 by @hirofumi0810

[Refactoring][**ESPnet2**] Change add_scalars to add_scalar for tensorboard SummaryWriter #2525 by @kamo-naoyuki

CI

[CI][**ASR**] Make test_e2e_asr.py faster #2488 by @ShigekiKarita

[CI][**ASR**] Make test_e2e_asr_maskctc.py faster. #2493 by @ShigekiKarita

[CI][**ASR**] Make test_recog.py faster #2486 by @ShigekiKarita

[CI][ESPnet1][**ASR**] make test_e2e_asr_mulenc.py faster #2480 by @ruizhilijhu

[CI][ESPnet1][**Installation**] Update shellcheck url. #2500 by @ShigekiKarita

[CI][ESPnet2][**Installation**] Limit test execution time to 2.0 sec #2520 by @ShigekiKarita

[CI][**SE**] Make test_beamformer_net.py faster #2489 by @ShigekiKarita

[CI][**SE**] shorten test time for tasnet #2491 by @LiChenda

Other

[Installation] Update h5py version to avoid errors in Python3.8 #2519 by @shigabeev

[Docker] Docker Updates #2509 by @Fhrozen

Acknowledgements

Special thanks to @Fhrozen, @LiChenda, @ShigekiKarita, @b-flo, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @lumaku, @ruizhilijhu, @shigabeev, @yuekaizhang.
Source code(tar.gz)
Source code(zip)
v.0.9.3(Sep 15, 2020)
New Features

[New Features][**ESPnet2**] Implement --grad_clip_type #2399 by @kamo-naoyuki

[New Features][ESPnet2][**ASR**] Implement batch_score() method for ASR decoder and LM #2377 by @kamo-naoyuki

[New Features][ESPnet2][README][**TTS**] Support Conformer-based FastSpeech / FastSpeech2 #2413 by @kan-bayashi

Bugfix

[Bugfix][CI][ESPnet1][**ESPnet2**] make sure chainer independent #2411 by @kamo-naoyuki

[Bugfix][CI][ESPnet1][**Installation**] Revert ctc seg installation #2392 by @kan-bayashi

[Bugfix][CI][**Installation**] Fix the installation error in CI #2476 by @kan-bayashi

[Bugfix][ESPnet1][**ASR**] Lazy import chainer in asr_utils.py #2407 by @kamo-naoyuki

[Bugfix][ESPnet1][**ASR**] asr: Fix recog issue on Transformer CTC model #2394 by @jaesong

[Bugfix][ESPnet1][MT][**ST**] Fix score_bleu.sh #2400 by @hirofumi0810

[Bugfix][ESPnet1][README][**Typo**] fixed typo in egs/README.md #2473 by @mrazizi

[Bugfix][ESPnet1][**TTS**] lazy import chainer: espnet/nets/tts_interface.py #2409 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Add missing database in db.sh #2427 by @kan-bayashi

[Bugfix][**ESPnet2**] Fix the CommonPreprocessor_multi missing issue #2460 by @LiChenda

[Bugfix][**ESPnet2**] Minor fix of egs2/commonvoice/asr1/local/data.sh #2438 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix the directory for init_file_prefix #2412 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix typo of log_level choices #2472 by @glynpu

[Bugfix][ESPnet2][**ASR**] Add grep -H option #2388 by @kamo-naoyuki

[Bugfix][ESPnet2][**TTS**] Fix wrong sum axis in energy extraction #2469 by @kan-bayashi

[Bugfix][ESPnet2][**Typo**] Fix typo in help comment and docstrings #2470 by @kan-bayashi

[Bugfix][**Installation**] add warpctc_pytorch version==0.1.2 #2403 by @kamo-naoyuki

Documentation

[Documentation] Add bug report template #2396 by @sw005320

[Documentation] Add installation issue template #2397 by @sw005320

[Documentation] Update espnet2_distributed.md #2418 by @kamo-naoyuki

[Documentation] Update espnet2_distributed.md #2419 by @kamo-naoyuki

[Documentation] Update espnet2_training_option.md #2421 by @kamo-naoyuki

[Documentation] Update faq.md #2431 by @kamo-naoyuki

[Documentation] Update parallelization.md #2428 by @kamo-naoyuki

[Documentation][ESPnet2][**README**] Update README.md #2430 by @kamo-naoyuki

Enhancement

[Enhancement][ESPnet1][**ESPnet2**] Add -c option for multi GPUs mode for slurm.conf #2406 by @kamo-naoyuki

[Enhancement][ESPnet1][**Installation**] Install warpctc-pytorch wheel when torch version is 1.1, 1.2 or 1.3 #2453 by @ysk24ok

[Enhancement][ESPnet1][**README**] ADD CSJ RNN pretrained model #2452 by @jnishi

[Enhancement][**ESPnet2**] Update db.sh #2426 by @kamo-naoyuki

[Enhancement][ESPnet2][**TTS**] Update ESPnet2 TTS config #2468 by @kan-bayashi

[Enhancement][ESPnet2][**TTS**] Update and add fastspeech2 configs #2429 by @kan-bayashi

[Enhancement][**Installation**] Add sanity check for setup_cuda_env.sh #2389 by @kamo-naoyuki

[Enhancement][**Installation**] Change cudatoolkit to cuda if cuda_version=8.0 #2405 by @kamo-naoyuki

[Enhancement][**Installation**] Change to refer https://anaconda.org/pytorch/pytorch/files #2404 by @kamo-naoyuki

[Enhancement][**Installation**] Workaround for soundfile issue #2437 by @kamo-naoyuki

Recipe

[Recipe][ESPnet1][**ASR**] Add LibriCSS recipe #2246 by @akreal

[Recipe][ESPnet1][**ASR**] Update for the Official Split of YM Recipe #2435 by @ftshijt

[Recipe][ESPnet1][ESPnet2][**ASR**] Update CommonVoice for Latest Version #2455 by @ftshijt

[Recipe][ESPnet2][**ASR**] [zeroth korean] Not to use pipe format if feats_type=raw #2402 by @kamo-naoyuki

[Recipe][ESPnet2][ASR][**README**] espnet2 zeroth_korean recipe changing feats_type from fbank_pitch to raw. #2393 by @hchung12

[Recipe][ESPnet2][README][**TTS**] Add ESPnet2 TTS finetuning example recipe (JVS) #2465 by @kan-bayashi

CI

[CI] Add codecov actions. #2467 by @ShigekiKarita

[CI] Fix hangup of unittests #2424 by @kamo-naoyuki

[CI] Make espnet2 tts test faster #2461 by @kan-bayashi

[CI] Make test_e2e_{asr,st,mt}_{transformer,conformer}.py faster. #2464 by @ShigekiKarita

[CI] Update .gitignore #2434 by @kan-bayashi

[CI][**ESPnet1**] Make test_(batch_)beam_search.py faster. #2462 by @ShigekiKarita

[CI][**ESPnet1**] Support Debian9 and CentOS7 in Github Actions #2457 by @ShigekiKarita

[CI][ESPnet1][**Installation**] Fix HKUST recipe #2440 by @kamo-naoyuki

Acknowledgements

Special thanks to @LiChenda, @ShigekiKarita, @akreal, @ftshijt, @glynpu, @hchung12, @hirofumi0810, @jaesong, @jnishi, @kamo-naoyuki, @kan-bayashi, @mrazizi, @sw005320, @ysk24ok.
Source code(tar.gz)
Source code(zip)
v.0.9.2(Aug 31, 2020)
New Features

[New Features][**ESPnet1**] CTC segmentation #2301 by @lumaku

[New Features][**ESPnet2**] Support multiple averaged nbest models #2353 by @kamo-naoyuki

[New Features][**ESPnet2**] Support recursive add in pack_funcs and add images to packed model #2367 by @kamo-naoyuki

Bugfix

[Bugfix][ASR][**ESPnet1**] remove ff_scale from conformer constructor arguments #2356 by @koji-okabe-hub

[Bugfix][ASR][**ESPnet2**] use lm_exp instead of lm_tag for inference_tag #2352 by @kamo-naoyuki

[Bugfix][CI][ESPnet1][**Installation**] Remove ctc_segmentation temporary #2385 by @kan-bayashi

[Bugfix][**ESPnet1**] Fix import error of conformer module #2384 by @kan-bayashi

[Bugfix][**ESPnet1**] Fix issue https://github.com/espnet/espnet/issues/2211 #2219 by @Emrys365

[Bugfix][**ESPnet2**] Add missing init.py #2326 by @kan-bayashi

[Bugfix][**ESPnet2**] Fix --out_filename option: format_wav_scp.sh #2348 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix amp #2362 by @kamo-naoyuki

[Bugfix][**ESPnet2**] add egs2/an4/asr1/local/path.sh #2343 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix recursive add: espnet2/main_funcs/pack_funcs.py #2369 by @kamo-naoyuki

[Bugfix][**ESPnet2**] remove unused import #2331 by @kamo-naoyuki

[Bugfix][ESPnet2][Installation][**Typo**] fix typo #2344 by @kamo-naoyuki

[Bugfix][ESPnet2][**README**] Fix typo #2372 by @Piteryo

[Bugfix][ESPnet2][**TTS**] make vietnamese_cleaner to opiton #2341 by @kamo-naoyuki

[Bugfix][**Installation**] Fix python version check for chainer #2342 by @kamo-naoyuki

[Bugfix][**Installation**] add undefined variable: check_pytorch_cuda_compatibility.py #2361 by @kamo-naoyuki

[Bugfix][**TTS**] Fix device allocation error in guided attention loss #2282 #2317 by @kan-bayashi

Documentation

[Documentation] updated comment on the documentation #2351 by @GauravPandey892

[Documentation][**ESPnet2**] Update TTS README #2316 by @kan-bayashi

[Documentation][ESPnet2][**README**] Update ESPnet2 TTS README #2376 by @kan-bayashi

[Documentation][ESPnet2][README][**TTS**] Update README #2330 by @kan-bayashi

[Documentation][**Installation**] Devide setup_python.sh into setup_venv.sh and setup_python.sh #2382 by @kamo-naoyuki

[Documentation][**Installation**] add a description about check install. #2360 by @sw005320

[Documentation][**README**] CTC segmentation - Demo #2347 by @lumaku

[Documentation][**README**] Update README.md #2379 by @kamo-naoyuki

Enhancement

[Enhancement][**ESPnet2**] Change the default inference model to averaged model instead of the best #2346 by @kamo-naoyuki

[Enhancement][ESPnet2][**TTS**] Add pitch and energy stats in packing #2350 by @kan-bayashi

[Enhancement][**Installation**] Add checking for pytorch-cuda compatibility in Makefile #2334 by @kamo-naoyuki

[Enhancement][**Installation**] Show raw error message when failed to import packages #2374 by @kamo-naoyuki

Refactoring

[Refactoring] Apply new version black #2366 by @kamo-naoyuki

[Refactoring][ASR][**ESPnet2**] Not to add _sp to $asr_exp if --asr_exp option is specified #2368 by @kamo-naoyuki

[Refactoring][CI][ESPnet1][ESPnet2][**Installation**] Add installers for sctk and sph2pipe and create tools/extra_path.sh #2332 by @kamo-naoyuki

[Refactoring][ESPnet1][**Recipe**] Disable preparation for lm in wsj recipe #2373 by @kamo-naoyuki

[Refactoring][**ESPnet2**] Update Task design #2345 by @kamo-naoyuki

[Refactoring][ESPnet2][**SE**] Remove unused option from enh.sh:--feats_normalize #2325 by @kamo-naoyuki

Recipe

[Recipe][ASR][**ESPnet1**] MGB-2 #2289 by @AmirHussein96

[Recipe][ASR][**ESPnet1**] Remove duplicated class definition of Conformer and update some new results of Aishell1 and Switchboard. #2364 by @pengchengguo

[Recipe][ASR][ESPnet2][**README**] ASR WSJ RESULT update: Tuning LM #2355 by @kamo-naoyuki

[Recipe][ASR][ESPnet2][**README**] add pretrained model link #2378 by @kamo-naoyuki

CI

[CI][**README**] Update ubuntu images in circle ci #2349 by @ShigekiKarita

[CI][**mergify**] Update .mergify.yml #2333 by @kamo-naoyuki

[CI][**mergify**] Update .mergify.yml #2354 by @kamo-naoyuki

Acknowledgements

Special thanks to @AmirHussein96, @Emrys365, @GauravPandey892, @Piteryo, @ShigekiKarita, @kamo-naoyuki, @kan-bayashi, @koji-okabe-hub, @lumaku, @pengchengguo, @sw005320.
Source code(tar.gz)
Source code(zip)
v.0.9.1(Aug 15, 2020)
New Features

[New Features] Add metric option to checkpoint averaging for Transformer #2259 by @hirofumi0810

[New Features][**ESPnet2**] Generate run.sh in the experiment dir for resuming #2284 by @kamo-naoyuki

[New Features][**ESPnet2**] Support larger num_iters_per_epoch than the number of batches in small corpus #2255 by @kamo-naoyuki

[New Features][**ESPnet2**] Support torch native automatic mixed precision for espnet2 #2257 by @kamo-naoyuki

Documentation

[Documentation] Update comments in MultiHeadAttention #2266 by @placebokkk

[Documentation][**ESPnet2**] append comment in reporter.py #2267 by @kamo-naoyuki

[Documentation][ESPnet2][README][**TTS**] Add ESPnet2 TTS recipe document #2312 by @kan-bayashi

Enhancement

[Enhancement][**ESPnet2**] Tensorboard stats between iterations #2252 by @kamo-naoyuki

Refactoring

[Refactoring][**ESPnet2**] Add some new features and a new recipe for the enhancement task #2238 by @Emrys365

[Refactoring][**Documentation**] Remove installation part of Python from Makefile #2245 by @kamo-naoyuki

Recipe

[Recipe][**ASR**] aidatatang conformer ESPnet1 recipe #2269 by @nzhoward

[Recipe][**ESPnet2**] espnet2 zeroth_korean recipe #2279 by @hchung12

Bug fix

[Bugfix] Fix #2295 #2311 by @kan-bayashi

[Bugfix] Minor fix for Makefile #2268 by @kamo-naoyuki

[Bugfix] Not to install cupy-cuda* for python>=3.8 #2277 by @kamo-naoyuki

[Bugfix] Remove channel: setup_anaconda.sh #2303 by @kamo-naoyuki

[Bugfix][**ASR**] ngram single decoding bug fix #2299 by @qmpzzpmq

[Bugfix][ASR][**ESPnet2**] Add missing init.py #2292 by @kamo-naoyuki

[Bugfix][ASR][**ESPnet2**] decode -> inference #2276 by @kamo-naoyuki

[Bugfix][ASR][**ESPnet2**] remove chainer dependency from show_asr_result.sh #2281 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Avoid illegal summary name for tensorboard #2294 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix average_nbest_models for pytorch=1.6 #2283 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix decode config extension in ESPnet2 CSJ recipe #2258 by @kan-bayashi

[Bugfix][**ESPnet2**] Fix for queue-freegpu.pl #2274 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix samplers about min_batch_size #2305 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Workaround for SGE jobname issue #2253 by @kamo-naoyuki

[Bugfix][**ESPnet2**] add missing shebang #2306 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix bug of reporter #2263 by @kamo-naoyuki

[Bugfix][ESPnet2][**Recipe**] Update zeroth_korean #2308 by @kamo-naoyuki

[Bugfix][ESPnet2][**SE**] add --spk-num 1 #2285 by @kamo-naoyuki

[Bugfix][ESPnet2][**distributed**] Not to save config.yaml if rank!=0 #2287 by @kamo-naoyuki

Others

[CI] Remove unnecessary installation when CI #2307 by @kamo-naoyuki

[CI] Take integration tests into coverage #2254 by @ShigekiKarita

[CI][**ESPnet2**] Add coverage measure for espnet2 integration test #2256 by @kamo-naoyuki

[CI][**Installation**] Install wheel #2304 by @kamo-naoyuki

Acknowledgements

Special thanks to @Emrys365, @ShigekiKarita, @hchung12, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @nzhoward, @placebokkk, @qmpzzpmq.
Source code(tar.gz)
Source code(zip)
v.0.9.0(Aug 1, 2020)
New Features

[New Features][**ASR**] Non-autoregressive ASR with Mask CTC #2070 by @YosukeHiguchi

[New Features][**ASR**] Support Conformer model. #2144 by @pengchengguo

[New Features][ASR][**ST**] CTC posterior visualization during training #2221 by @hirofumi0810

[New Features][**ESPnet2**] Implement espnet2.bin.zenodo_upload #2168 by @kamo-naoyuki

[New Features][**ESPnet2**] Python API for inference #2092 by @kamo-naoyuki

[New Features][**ESPnet2**] Support TTS-Transformer in ESPnet2 #2134 by @kan-bayashi

[New Features][ESPnet2][**ASR**] Enable batch joint decoding with CTC in recog API v2 #2197 by @takaaki-hori

[New Features][ESPnet2][**SE**] Speech Enhancement Frontend for ESPNet2 Phase 1 #2124 by @LiChenda

[New Features][ESPnet2][**TTS**] Support FastSpeech for ESPnet2 TTS #2149 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Support FastSpeech2 (+FastPitch) #2218 by @kan-bayashi

[New Features][ESPnet2][**TTS**] Support GST in ESPnet2 TTS #2139 by @kan-bayashi

[New Features][README][**ASR**] CTC forced alignment in E2E ASR Transformer model #2095 by @simpleoier

[New Features][**VC**] Voice Transformer Network #2064 by @unilight

Enhancement

[Enhancement] Fix error when downloading large files using download_from_google_drive.sh #2074 by @unilight

[Enhancement][**ASR**] added more beam search info #2130 by @sw005320

[Enhancement][**ESPnet2**] Change packed file of espnet2 to zip format #2161 by @kamo-naoyuki

[Enhancement][**ESPnet2**] Make read_text faster #2114 by @kamo-naoyuki

[Enhancement][**ESPnet2**] RESULTS.md -> README.md #2077 by @kamo-naoyuki

[Enhancement][**ESPnet2**] Remove long wave in template recipe #2075 by @kamo-naoyuki

[Enhancement][**ESPnet2**] Update ESPnet2 JSUT TTS recipe and TTS template #2110 by @kan-bayashi

[Enhancement][MT][**ST**] Fix ST/MT models for compatibility with ASR #2179 by @hirofumi0810

[Enhancement][**ST**] Add source case information to json files in ST task #2208 by @hirofumi0810

[Enhancement][**ST**] Refactor multi-task learning in ST #2202 by @hirofumi0810

Recipe

[Recipe][**ASR**] Add aidatatang_200zh recipe #2122 by @nzhoward

[Recipe][**ASR**] Add chime6 info #2250 by @sw005320

[Recipe][**ASR**] CHiME-6 recipe #2171 by @GNroy

[Recipe][**ASR**] Fix a bug in espnet wsj recipe. #2145 by @houwenxin

[Recipe][**ASR**] New Recipe for Yoloxóchitl-Mixtec (SLR89) #2085 by @ftshijt

[Recipe][**ASR**] Support averaging model for Conformer. #2244 by @pengchengguo

[Recipe][**ASR**] Updated model after tuning aidatatang_200zh recipe #2204 by @nzhoward

[Recipe][**ASR**] created a recipe to run asr on ljspeech #1996 by @ibkuroyagi

[Recipe][**ASR**] updatemodel link (add pre-trained bpe model and lm model) #2101 by @ftshijt

[Recipe][ESPnet2][**ASR**] espnet2 librispeech recipe #2109 by @sw005320

[Recipe][ESPnet2][**ASR**] espnet2 librispeech v2 #2189 by @sw005320

[Recipe][ESPnet2][**ASR**] update espnet2 aishell results #2150 by @Cescfangs

[Recipe][ESPnet2][ASR][**TTS**] fix dev_set/eval_sets issues #2142 by @sw005320

[Recipe][ESPnet2][**TTS**] Add ESPnet2 CSMSC TTS recipe #2129 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Add ESPnet2 LJSpeech recipe #2117 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Add VCTK recipe for ESPnet2 TTS #2165 by @kan-bayashi

[Recipe][ESPnet2][**TTS**] Create espnet2 jsut/tts recipe #2047 by @kamo-naoyuki

Refactoring

[Refactoring][**ESPnet2**] Change stats_dir naming not to overwrite #2111 by @kan-bayashi

[Refactoring][**ESPnet2**] Move modules #2086 by @kamo-naoyuki

[Refactoring][**ESPnet2**] Remove $KALDI_ROOT/tools/env.sh from path.sh #2242 by @kamo-naoyuki

[Refactoring][**ESPnet2**] Several update for pretrain model #2212 by @kamo-naoyuki

[Refactoring][**ESPnet2**] Update Makefile #2225 by @kamo-naoyuki

Documentation

[README] Fix URL in README #2090 by @kan-bayashi

[README] Update README about TTS #2079 by @kan-bayashi

[README] Update README.md #2046 by @kamo-naoyuki

[README] Update README.md #2067 by @kamo-naoyuki

[README] Update README.md #2243 by @kamo-naoyuki

[README] Update citation #2206 by @hirofumi0810

[README] Update installation.md #2233 by @kamo-naoyuki

[README][**ESPnet2**] Update egs2/TEMPLATE/README.md #2098 by @kamo-naoyuki

Bugfix

[Bugfix] Add cupy.done in make python #2091 by @kan-bayashi

[Bugfix] Append a missing space in cmd-line args in utils/dump_pcm.sh #2209 by @yistLin

[Bugfix] Fix Makefile #2097 by @kamo-naoyuki

[Bugfix] Fix minor bug of Makefile #2055 by @kamo-naoyuki

[Bugfix] Fix old model compatibility #2048 #2060 #2063 by @kan-bayashi

[Bugfix] Fix pretrained model #2053 #2069 by @kan-bayashi

[Bugfix] Fix pyopenjtalk installation #2108 by @kan-bayashi

[Bugfix] Fix typo in run.sh of TTS recipes #2216 by @hirofumi0810

[Bugfix] Update Makefile to disable cupy for cuda=10.2 or later #2230 by @kamo-naoyuki

[Bugfix] fix path of PESQ #2058 by @kamo-naoyuki

[Bugfix] scorerinterface warning English correction #2076 by @qmpzzpmq

[Bugfix][**CI**] Fix bug in attention plotting #2185 by @hirofumi0810

[Bugfix][**CI**] Freeze the matplotlib version with 3.1.0 #2181 by @sw005320

[Bugfix][**CI**] fix integration_test_ctc_align_wav.bats with a small model #2170 by @simpleoier

[Bugfix][**CI**] temporally disable subsample 6 and 8 tests #2205 by @sw005320

[Bugfix][CI][MT][**ST**] Add integration test for ST/MT tasks #2210 by @hirofumi0810

[Bugfix][**ESPnet2**] Add missing path.sh in egs2/vctk/tts1 #2167 by @kan-bayashi

[Bugfix][**ESPnet2**] Fix TTS inference #2222 by @kan-bayashi

[Bugfix][**ESPnet2**] Fix tts_inference when feats_extract is None #2176 by @kan-bayashi

[Bugfix][**ESPnet2**] Fix bug for feats_type=extracted #2087 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix bug of iterable dataset when num_workers>=1 #2081 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix bug of when espnet2/bin/tokenize_text.py --cutoff or --vocabulary_size is used #2158 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Fix log: benchmark -> deterministic #2080 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Implement configargparse in espnet2 #2157 by @kamo-naoyuki

[Bugfix][**ESPnet2**] Select torchaudio version according to torch version #2214 by @kamo-naoyuki

[Bugfix][**ESPnet2**] avoid UnboundLocalError when lm is not loaded #2227 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix #2050 #2051 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix #2198: PhonemeTokenizer can't perform with multiprocessing #2201 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix best_model_criterion: wsj/asr1/conf/tuning/train_lm.yaml #2153 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix bug of lm.py #2056 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix the stage number: enh.sh #2220 by @kamo-naoyuki

[Bugfix][**ESPnet2**] fix: decode_config -> inference_config #2239 by @kamo-naoyuki

[Bugfix][ESPnet2][**Recipe**] Not removing short/long utterances for eval_sets #2112 by @kamo-naoyuki

[Bugfix][ESPnet2][**SE**] Fix bugs in espnet2/enh and format related directory structures #2215 by @Emrys365

[Bugfix][ESPnet2][**TTS**] Fix feature extractor of TTS for compatibility #2102 by @kamo-naoyuki

Acknowledgements

Special thanks to @Cescfangs, @Emrys365, @GNroy, @LiChenda, @YosukeHiguchi, @ftshijt, @hirofumi0810, @houwenxin, @ibkuroyagi, @kamo-naoyuki, @kan-bayashi, @nzhoward, @pengchengguo, @qmpzzpmq, @simpleoier, @sw005320, @takaaki-hori, @unilight, @yistLin.
Source code(tar.gz)
Source code(zip)
v.0.8.0(Jun 16, 2020)
ESPnet2

[ESPnet2] Solve memory issue with super large corpus training #1972 by @kamo-naoyuki

[ESPnet2] Added model parameter count to trainer #1867 by @SeanNaren

[ESPnet2] Refactoring espnet2/utils/fileio.py -> espnet2/fileio #1807 by @kamo-naoyuki

New Features

[New Features] Lightweight and Dynamic Convolutions. #1599 by @yuyfujit

[New Features] Implement Ngram scorer #1946 by @qmpzzpmq

[New Features] resampling in utils/compute-fbank-feats.py and utils/compute-stft-feats.py #2035 by @kamo-naoyuki

Enhancement

[Enhancement] Ngram scorer update #1992 by @qmpzzpmq

Documentation

[Documentation] fix a typo for the decoder add_argument_group #2030 by @sw005320

[Documentation] Update multiple GPU descriptions. #2016 by @sw005320

[Documentation] Finetuning doc + freezing parameters option #1897 by @b-flo

Bugfix

[Bugfix] Fix memory issue when resuming #2040 by @kamo-naoyuki

[Bugfix] fixed typo in cmvn.py #1988 by @gullyboy007

[Bugfix] update notebook #1986 by @ShigekiKarita

[Bugfix] Fix freezing modules (when using multi-gpu) #1983 by @atozto9

[Bugfix] Fix BLEU/PPL calculation during training #2009 by @hirofumi0810

[Bugfix] Fix download file extension #2042 by @takenori-y

[Bugfix] fix tedlium2/3 model link #2032 by @sw005320

[Bugfix] Fix bug for pure Transformer-CTC #2023 by @hirofumi0810

[Bugfix] li42 recipe: add li42 results; fix bug in adding language id "zh_TW" #1950 by @houwenxin

CI

[CI] Add espnet2 in ci/doc.sh #1976 by @ShigekiKarita

[CI] Add test for pytorch1.5 #1881 by @kamo-naoyuki

Acknowledgements

Special thanks to @SeanNaren, @ShigekiKarita, @atozto9, @b-flo, @gullyboy007, @hirofumi0810, @houwenxin, @kamo-naoyuki, @qmpzzpmq, @sw005320, @takenori-y, @yuyfujit.
Source code(tar.gz)
Source code(zip)
v.0.7.0(May 24, 2020)
Now, the ESPnet project moves on to a new endeavor! We launched espnet2, which aims to refine the modularities (chainer-free, kaldi-free), use a more customizable trainer, support distributed training, and achieve the scalability mainly led by @kamo-naoyuki with his great efforts and leadership. This project is one of the outcomes of our ESPnet hackathon in Tokyo 2019 with a lot of discussions about the design, new features, and community contributions. espnet2 currently supports main ASR recipes (with a well-designed recipe template) and limited TTS recipes. We maintain both espnet1 and espnet2, but gradually move to our development in espnet2. The ESPnet project is further accelerated!

ESPnet2

[ESPnet2] keep the latest model #1769 by @kamo-naoyuki

[ESPnet2] Remove "E2E" from all comments #1766 by @kamo-naoyuki

[ESPnet2] Refactoring for ESPnetDataset #1758 by @kamo-naoyuki

[ESPnet2] Implement SpecAug for ESPnet2 #1746 by @kamo-naoyuki

[ESPnet2] Implement BatchBinSampler #1742 by @kamo-naoyuki

[ESPnet2] Support torch_optimizer #1739 by @kamo-naoyuki

[ESPnet2] Log rotation for launch.py #1737 by @kamo-naoyuki

[ESPnet2] Change the type of --chunk_length to str_or_int #1733 by @kamo-naoyuki

[ESPnet2] Change cudnn deterministic mode to default #1732 by @kamo-naoyuki

[ESPnet2] Add wsj results for espnet2 #1724 by @kamo-naoyuki

[ESPnet2] Show estimated time to finish #1717 by @kamo-naoyuki

[ESPnet2] Add --name option for training job #1714 by @kamo-naoyuki

[ESPnet2] Show the log file when training process is failed: espnet2.bin.launch.py #1713 by @kamo-naoyuki

[ESPnet2] --max_length -> --fold_length #1712 by @kamo-naoyuki

[ESPnet2] Double quoter for NCCL_SOCKET_IFNAME #1706 by @kamo-naoyuki

[ESPnet2] Save apex state in checkpoint and support apex optimizer #1705 by @kamo-naoyuki

[ESPnet2] Update asr.sh #1694 by @zh794390558

[ESPnet2] Update ctc.py #1688 by @zh794390558

[ESPnet2] Update launch.py #1681 by @zh794390558

[ESPnet2] Update asr.sh #1678 by @zh794390558

[ESPnet2] --keep_n_best_checkpoints -> --keep_nbest_models #1647 by @kamo-naoyuki

[ESPnet2] Avoid deprecated warning: reduction="none" #1510 by @kamo-naoyuki

[ESPnet2] Minor change for speed perturbation #1627 by @kamo-naoyuki

[ESPnet2] Fix how2 recipe #1620 by @kamo-naoyuki

[ESPnet2] Fix recipes #1617 by @kamo-naoyuki

[ESPnet2] Renaming #1610 by @kamo-naoyuki

[ESPnet2] Implement chunk iterator #1608 by @kamo-naoyuki

[ESPnet2] Update voxforge RESULTS #1601 by @kamo-naoyuki

[ESPnet2] vivos recipe: --audio_format wav #1592 by @kamo-naoyuki

[ESPnet2] Lower python requirements to 3.6 #1565 by @kamo-naoyuki

[ESPnet2] dirha_wsj recipe for espnet2 #1556 by @yuekaizhang

[ESPnet2] Update AISHELL ASR Recipe #1549 by @Emrys365

[ESPnet2] Remove short data #1531 by @kamo-naoyuki

[ESPnet2] [WIP] Update JSUT ASR Recipe #1529 by @YosukeHiguchi

[ESPnet2] Update HOW2 recipe #1522 by @b-flo

[ESPnet2] [WIP] Update CSJ ASR Recipe #1520 by @YosukeHiguchi

[ESPnet2] Change NoamLR to deprecated and implement WarmupLR #1519 by @kamo-naoyuki

[ESPnet2] Implement --max_cache_size option #1509 by @kamo-naoyuki

[ESPnet2] distributed training #1506 by @kamo-naoyuki

[ESPnet2] ESPNet2 Recipe Update -- commonvoice, babel, ami #1504 by @ftshijt

[ESPnet2] Refactoring #1494 by @kamo-naoyuki

[ESPnet2] Fix ci of flake8 part #1491 by @kamo-naoyuki

[ESPnet2] Tensorboard, --num_iters_per_epoch, etc. #1487 by @kamo-naoyuki

[ESPnet2] Fix espnet2.bin.pack #1486 by @kamo-naoyuki

[ESPnet2] show_result.sh #1478 by @kamo-naoyuki

[ESPnet2] Pack and Unpack model #1477 by @kamo-naoyuki

[ESPnet2] collect-stats mode, trainer class, etc. #1462 by @kamo-naoyuki

[ESPnet2] add test codes for asr decoders #1445 by @kamo-naoyuki

[ESPnet2] Integrate Griffin-Lim with tts_decode() #1442 by @kan-bayashi

[ESPnet2] Update ASR recipe #1439 by @kan-bayashi

[ESPnet2] Update TTS recipes #1430 by @kan-bayashi

[ESPnet2] Disable wer/cer calculation when training #1547 by @kamo-naoyuki

[ESPnet2] Change CTC default to builtin #1546 by @kamo-naoyuki

[ESPnet2] Update chime4 asr1 Recipe #1570 by @yuekaizhang

[ESPnet2] Create documentation for espnet2 #1710 by @kamo-naoyuki

[ESPnet2] shellcheck for local/data.sh #1524 by @kamo-naoyuki

[ESPnet2] commonvoice: RESULTS.md -> README.md #1797 by @kamo-naoyuki

Bugfix

[Bugfix] % -> percent: espnet2/tasks/abs_task.py #1767 by @kamo-naoyuki

[Bugfix] Fix gpu mode for tts_inference.py #1755 by @kamo-naoyuki

[Bugfix] Fix SubReporter #1748 by @kamo-naoyuki

[Bugfix] Fix calculate_all_attentions for espnet2 #1747 by @kamo-naoyuki

[Bugfix] Not to create the averaged mdel if --keep_nbest_models=1 #1744 by @kamo-naoyuki

[Bugfix] Fix --best_model_criterions #1743 by @kamo-naoyuki

[Bugfix] Fix the gpu device when resuming #1731 by @kamo-naoyuki

[Bugfix] Fix error log for espnet2/bin/launch.py #1730 by @kamo-naoyuki

[Bugfix] Disable CUDNN deterministic for CTC: espnet2/asr/ctc.py #1720 by @kamo-naoyuki

[Bugfix] Update default.py #1698 by @zh794390558

[Bugfix] Fix chunk iterator and refactoring for distributed training #1685 by @kamo-naoyuki

[Bugfix] Update vgg_rnn_encoder.py #1676 by @zh794390558

[Bugfix] [ESPnet2] chmod +x: run.sh for JSUT #1628 by @kamo-naoyuki

[Bugfix] [ESPnet2]Remove nlsyms when word scoring #1614 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix setup.sh #1596 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix launch.py for slurm #1588 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix ci for local/data.sh #1572 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix nj of scripts/audio/format_wav_scp.sh #1550 by @kamo-naoyuki

[Bugfix] [ESPnet2] Use load_scp_sequential in formart_wav_scp.py #1541 by @kamo-naoyuki

[Bugfix] [ESPNet2] Minor fix for CSJ recipe #1540 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix transformer #1539 by @kamo-naoyuki

[Bugfix] [ESPnet2] fix rnn_type when bidirectional is used #1533 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix format_wav_scp.py #1532 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix bug of using GPU even if CPU mode #1526 by @kamo-naoyuki

[Bugfix] [ESPnet2 ] Fix --accum_grad #1525 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix voxforge config #1511 by @kamo-naoyuki

[Bugfix] [ESPnet2] Bug fix of splitting files for collect_stats mode #1505 by @kamo-naoyuki

[Bugfix] fix to use queue.conf #1431 by @sw005320

[Bugfix] [ESPnet2] Fix a bug in TTS #1428 by @kan-bayashi

[Bugfix] [ESPnet2] Refactor Encoder and Decoder and bug fix #1427 by @kamo-naoyuki

[Bugfix] [ESPnet2] Fix bug of text-chars converter #1426 by @kamo-naoyuki

[Bugfix] Optionize trans_type in egs/ljspeech/tts2 #1789 by @kan-bayashi

[Bugfix] bugfix in ljspeech/tts2 #1783 by @beckgom

[Bugfix] missing argument for local/data_prep.sh added #1782 by @beckgom

[Bugfix] avoid sentencepiece==0.1.90 #1923 by @kamo-naoyuki

[Bugfix] FIX E523,E541,E741 #1918 by @kamo-naoyuki

[Bugfix] fix reverse option for cmvn #1906 by @magictron

[Bugfix] Error handling for Transformer with CTC-based VAD #1875 by @takenori-y

[Bugfix] Revert deletion of init files #1842 by @Fhrozen

[Bugfix] fix the missing link of tedlium3 #1841 by @sw005320

[Bugfix] Add test for torch>1.1 #1840 by @kamo-naoyuki

[Bugfix] Fix #1808: change the argument order of --batch_type for collect stat… #1810 by @kamo-naoyuki

[Bugfix] Change to configargparse>=1.2.1 #1803 by @kamo-naoyuki

[Bugfix] typo fixed for attention type #1793 by @beckgom

[Bugfix] fix https://github.com/espnet/espnet/issues/1780 #1784 by @qmeeus

[Bugfix] Fix bug of espnet2 asr_inference.py #1952 by @kamo-naoyuki

[Bugfix] Minor fix of import place and comments #1959 by @kan-bayashi

New Features

[New Features] Add utils/translate_wav.sh #1530 by @ShigekiKarita

[New Features] Batch beam search V2 for Transformer (no CTC) #1402 by @ShigekiKarita

Enhancement

[Enhancement] Support multiple sentences in synth_wav.sh #1788 by @kan-bayashi

[Enhancement] fix+update transducer #1760 by @b-flo

Documentation

[Documentation] Update notebook #1963 by @kan-bayashi

[Documentation] Update installation manual #1960 by @kan-bayashi

[Documentation] Update installation.md #1957 by @kamo-naoyuki

[Documentation] Add note in synth_wav.sh #1785 by @kan-bayashi

[Documentation] Update docs #1954 #1955 by @kamo-naoyuki

[Documentation] Update docs #1938 by @kamo-naoyuki

[Documentation] docs: added fbank link to the experiment readme #1910 by @kdubovikov

Recipe

[Recipe] Added some TIMIT results #1819 by @sknadig

[Recipe] add recipe for French Polyphone: ELRA-S0030_02 #1711 by @AdolfVonKleist

[Recipe] Use espnet_tts_frontend #1794 by @kamo-naoyuki

CI

[CI] Use cache in actions #1917 by @ShigekiKarita

[CI] Apply black #1850 by @kamo-naoyuki

[CI] Create .mergify.yml #1813 by @kamo-naoyuki

Acknowledgements

Special thanks to @AdolfVonKleist, @Emrys365, @Fhrozen, @ShigekiKarita, @YosukeHiguchi, @beckgom, @b-flo, @ftshijt, @kamo-naoyuki, @kan-bayashi, @kdubovikov, @magictron, @qmeeus, @sknadig, @sw005320, @takenori-y, @yuekaizhang, @zh794390558
Source code(tar.gz)
Source code(zip)
v.0.6.3(Apr 7, 2020)
New Features

[New Features] VCC2020 baseline recipe #1641 by @unilight

[New Features] Embed defaultlm #1623 by @qmpzzpmq

Enhancement

[Enhancement] add test -d $(KALDI): tools/Makefile #1718 by @kamo-naoyuki

[Enhancement] Add option to load pretrained model in TTS #1639 by @kan-bayashi

[Enhancement] Add reverse_direction option to MT #1658 by @hirofumi0810

Recipe

[Recipe] Remove unnecessary lines on Fisher-CallHome Spanish #1650 by @hirofumi0810

[Recipe] Add the Aishell2 recipe for the master branch. #1615 by @pengchengguo

[Recipe] Reformat the RESULTS.md in vivos #1689 by @sw005320

Documentation

[Documentation] Added multiple GPU TIPS #1734 by @sw005320

[Documentation] added pure attention decoding TIPS #1725 by @sw005320

Docker

[Docker] Docker local updates #1677 by @Fhrozen

[Docker] Docker updates #1624 by @Fhrozen

Bugfix

[Bugfix] fix #1751 #1779 by @qmpzzpmq

[Bugfix] Fix v.0.3.0 pretrained Transformer model compatibility #1778 by @ShigekiKarita

[Bugfix] Fix torch.ctc not implemented in float16 by casting float32 #1777 by @ShigekiKarita

[Bugfix] Workaround for bug of configargparse==1.2 #1764 by @kamo-naoyuki

[Bugfix] change train_iter to be the dataloader object #1741 by @bobchennan

[Bugfix] fix #1634 #1719 by @kamo-naoyuki

[Bugfix] [VCC2020 baseline] Extra reference set #1684 by @unilight

[Bugfix] missing torch version in check_install.py #1675 by @beckgom

[Bugfix] Fix model link in the tedlium2 recipe #1662 by @sw005320

[Bugfix] Update Install for Pytorch version #1659 by @Fhrozen

[Bugfix] Fix lm compatibility for v2 #1653 by @kan-bayashi

[Bugfix] correct results with builtin CTC and PyTorch 1.3 in WSJ recipe #1652 by @Emrys365

[Bugfix] Fix lm backward compatibility #1649 by @kan-bayashi

[Bugfix] fix #1604 #1626 by @TitouanT

[Bugfix] Fix a bug in csmsc recipe #1618 by @kan-bayashi

[Bugfix] Update e2e_asr_common.py #1735 by @zh794390558

[Bugfix] remove non-available options #1738 by @sw005320

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @ShigekiKarita, @TitouanT, @beckgom, @bobchennan, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @pengchengguo, @qmpzzpmq, @sw005320, @unilight, @zh794390558.
Source code(tar.gz)
Source code(zip)
v.0.6.2(Feb 25, 2020)
New Features

[New Features] Transducer v3 (w/ transformer support for encoder/decoder) #1422 by @b-flo

[New Features] Improving LM training (custom optimizer, custom scheduler, Transformer LM, etc) #1246 by @ShigekiKarita

Enhancement

[Enhancement] Add MelGAN pretrained model and support in demo notebook #1581 by @kan-bayashi

Recipe

[Recipe] Update fisher-callhome results #1606 by @hirofumi0810

[Recipe] Update run_rnnt.sh #1602 by @qmpzzpmq

[Recipe] Upload Must-C models #1594 by @hirofumi0810

[Recipe] Upload Libri trans models #1569 by @hirofumi0810

[Recipe] Upload How2 models #1568 by @hirofumi0810

[Recipe] Add Mboshi-French corpus #1545 by @hirofumi0810

[Recipe] Update WSJ results using PyTorch 1.3.1 and builtin CTC #1527 by @Emrys365

[Recipe] [WIP] IWSLT2016 Recipe #1492 by @butsugiri

[Recipe] Update for Common Voice recipe & Multilingual training recipe #1485 by @ftshijt

[Recipe] [WIP] DiPCo Recipe #1472 by @Fhrozen

Documentation

[Documentation] Support markdown-table for sphinx #1611 by @kamo-naoyuki

[Documentation] update docs & README.md #1605 by @kamo-naoyuki

[Documentation] fix a link within README.md #1584 by @sw005320

[Documentation] Add MT result #1576 by @butsugiri

[Documentation] update readme to include Linux installation guides from CI #1567 by @sw005320

[Documentation] Update WSJ results in the main README.md #1537 by @Emrys365

Bugfix

[Bugfix] Fix a typo in AMI script? #1595 by @HuangZiliAndy

[Bugfix] ru_open_stt recipe bug fix #1589 by @qmpzzpmq

[Bugfix] Fix pure CTC decoding #1580 by @takaaki-hori

[Bugfix] fix snapshot/model test condition #1577 by @IceCreamWW

[Bugfix] Fix IWSLT16 Script Permission #1543 by @butsugiri

[Bugfix] Fix bug in MT training script #1515 by @hirofumi0810

[Bugfix] Use Markdown table instead for WER results #1514 by @lijunzh

[Bugfix] Fix a compatibility problem with PyTorch 1.3.0 in ESPnet (v0.6.0) #1421 by @Emrys365

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @HuangZiliAndy, @IceCreamWW, @ShigekiKarita, @b-flo, @butsugiri, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @lijunzh, @qmpzzpmq, @sw005320, @takaaki-hori.
Source code(tar.gz)
Source code(zip)
v.0.6.1(Jan 5, 2020)
Happy new year!

New Features

[New Features] Transformer NMT #1479 by @hirofumi0810

[New Features] Support knowledge distillation in FastSpeech training #1415 by @kan-bayashi

[New Features] Support attention constraint for Tacotron 2 #1407 by @kan-bayashi

Enhancement

[Enhancement] Add focus rate logging in decoding #1412 by @kan-bayashi

[Enhancement] Support Tacotron 2 as a teacher of FastSpeech #1406 by @kan-bayashi

[Enhancement] Support length-weighted normalization in loss calculation #1397 by @kan-bayashi

[Enhancement] Transformer End-to-End Speech Translation #1348 by @hirofumi0810

Recipe

[Recipe] Add LM training/decoding in swbd recipe #1463 by @YosukeHiguchi

[Recipe] Add Fisher-CallHome asr1b recipe #1390 by @hirofumi0810

[Recipe] RECIPE JESC for MT #1346 by @Fhrozen

Documentation

[Documentation] added interspeech 2019 tutorial link and performed spell check #1476 by @sw005320

[Documentation] Updated README in ljspeech about FastSpeech training #1468 by @kan-bayashi

[Documentation] Add knowledge dist based FastSpeech link in README #1465 by @kan-bayashi

Refactoring

[Refactoring] Unify TTS Transformer mask with ASR Transformer #1470 by @kan-bayashi

Bugfix

[Bugfix] fixed a small problem in run.sh #1466 by @Peidong-Wang

[Bugfix] Fix wrong SC2026 fixing #1458 by @kan-bayashi

[Bugfix] Fix multi-encoder ASR integration test #1432 by @ShigekiKarita

[Bugfix] Fix wrong type float -> int #1413 by @kan-bayashi

[Bugfix] Fix missing key error in Tacotron2 #1408 by @kan-bayashi

[Bugfix] TransformerST on Fisher-Callhome #1398 by @hirofumi0810

[Bugfix] fix rnnlm load bug #1391 by @Cescfangs

[Bugfix] Fix gradient accumlation #1388 by @hirofumi0810

Acknowledgements

Special thanks to @Cescfangs, @Fhrozen, @Peidong-Wang, @ShigekiKarita, @YosukeHiguchi, @hirofumi0810, @kan-bayashi, @sw005320.
Source code(tar.gz)
Source code(zip)
v.0.6.0(Nov 21, 2019)
New Features

[New Features] Support Parallel WaveGAN #1333 by @kan-bayashi

[New Features] Support save snapshot by iteration #1204 by @fanlu

[New Features] Multi-encoder architecture with hierarchical attention and per-encoder CTC #1193 by @ruizhilijhu

[New Features] Support multiple inputs #1180 by @ruizhilijhu

[New Features] Add E2E-ST specific modules #1139 by @hirofumi0810

Enhancement

[Enhancement] Fixing compatibility problems with PyTorch 1.3.0 in ESPnet (v0.5.3) #1343 by @Emrys365

[Enhancement] Change log level info -> warning about batchsize #1336 by @kan-bayashi

[Enhancement] Support batch decoding for streaming E2E #1270 by @takenori-y

[Enhancement] Implement attention cache in Transformer for faster decoding #1240 by @ShigekiKarita

Bugfix

[Bugfix] Fix pretrained model URL for master #1351 by @kan-bayashi

[Bugfix] Return parser in add_arguments method for transducer #1337 by @b-flo

[Bugfix] Disabling nonlinear activation of the last encoder layer #1323 by @simpleoier

[Bugfix] Fixed error: "Expected object of device type cuda but got device type cpu" in decoder of transducer #1315 by @rai4

[Bugfix] Fix ASR eval for TTS in the case of trans_type=phn #1368 by @kan-bayashi

[Bugfix] Make --preprocess_conf optional in pack_model.sh #1365 by @kan-bayashi

[Bugfix] Remove set start method to fix #1290 #1363 by @kan-bayashi

[Bugfix] Fix pretrained model URL #1354 by @kan-bayashi

[Bugfix] Fix pretrained model URL #1350 by @kan-bayashi

[Bugfix] Fix TTS transformer attention weight calculation in inference #1331 by @kan-bayashi

[Bugfix] Fix decoding for chainer transformer #1101 by @Fhrozen

Recipe

[Recipe] Update libri_trans asr recipe #1344 by @hirofumi0810

[Recipe] Update LJSpeech to limit frequency range #1330 by @kan-bayashi

[Recipe] IWSLT19 Speech Translation recipe #1169 by @hirofumi0810

[Recipe] Must-C NMT recipe #1168 by @hirofumi0810

[Recipe] How2 NMT recipe #1165 by @hirofumi0810

[Recipe] Update how2 recipe #1148 by @hirofumi0810

[Recipe] Pre-trained CSJ model #1341 by @takenori-y

[Recipe] TTS: add FastSpeech config and result for jsut #1321 by @r9y9

[Recipe] Asr commonvoice recipe update #1241 by @ftshijt

Documentation

[Documentation] Update notebook submodule #1367 by @kan-bayashi

[Documentation] Fix sphinx warning of TTS modules #1366 by @kan-bayashi

[Documentation] Update notebook and add to Sphinx document #1364 by @kan-bayashi

[Documentation] Update notebook #1352 by @kan-bayashi

[Documentation] Doc for Chainer transformer #1017 by @Fhrozen

[Documentation] Update README #1342 by @takenori-y

Refactoring

[Refactoring] Indirect call for training method [chainer] #1256 by @Fhrozen

[Refactoring] Refact transformer for transformer LM #1223 by @Fhrozen

[Refactoring] Refine NMT #1152 by @hirofumi0810

[Refactoring] Small changes in chainer backend #1110 by @Fhrozen

[Refactoring] Format Chainer E2E transformer forward (fixed) #1034 by @Fhrozen

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @ShigekiKarita, @b-flo, @fanlu, @ftshijt, @hirofumi0810, @kan-bayashi, @r9y9, @rai4, @ruizhilijhu, @simpleoier, @takenori-y.
Source code(tar.gz)
Source code(zip)
v.0.5.4(Oct 30, 2019)
Bugfix

[Bugfix] Fixed pretrained model URL in CSMSC reicpe #1314 by @kan-bayashi

[Bugfix] Fix CSMSC wavenet link #1298 by @kan-bayashi

[Bugfix] Minor fix of FastSpeech #1295 by @kan-bayashi

[Bugfix] [bug fixing] Using inplace masked_fill_() #1273 by @Emrys365

[Bugfix] Fix RuntimeError in setting spawn multiple times #1267 by @kan-bayashi

[Bugfix] Use spawn in multiprocessing to fix #404 #1251 by @kan-bayashi

Documentation

[Documentation] Update README.md #1309 by @kan-bayashi

[Documentation] Fix docstrings #1288 by @kan-bayashi

[Documentation] Fixed a typo in swbd asr1 #1220 by @Shujian2015

[Documentation] update notebook #1219 by @ShigekiKarita

Recipe

[Recipe] Update VAIS1000 recipe RESULTS.md #1308 by @kan-bayashi

[Recipe] Fix VAIS1000 recipe #1305 by @kan-bayashi

[Recipe] Update CSMSC results #1299 by @kan-bayashi

[Recipe] Add vais1000 recipe - Vietnamese TTS #1283 by @enamoria

[Recipe] Add VIVOS recipe - Vietnamese ASR #1271 by @hieuthi

[Recipe] Add JNAS tts1 recipe #1269 by @kan-bayashi

[Recipe] Support Polish speakers in M-AILABS #1265 by @kan-bayashi

[Recipe] Add TWEB recipe #1263 by @kan-bayashi

[Recipe] Update M-AILABS results #1262 by @kan-bayashi

[Recipe] Add CSMSC reicpe #1259 by @kan-bayashi

[Recipe] Add JVS recipe #1258 by @kan-bayashi

[Recipe] Add CMU Arctic recipes #1257 by @kan-bayashi

[Recipe] Add M-AILABS pretrained models #1229 by @kan-bayashi

New Features

[New Features] Add eval-interval-epochs for the tiny dataset #1306 by @kan-bayashi

[New Features] ASR-based CER/WER eval for TTS #1190 by @potato-inoue

Enhancement

[Enhancement] Add Mandarin Pretrained Wavenet #1292 by @kan-bayashi

[Enhancement] Add pretrained models: JSUT and LibriTTS #1260 by @r9y9

[Enhancement] Improved JSUT TTS recipe #1216 by @r9y9

Acknowledgements

Special thanks to @Emrys365, @ShigekiKarita, @Shujian2015, @enamoria, @hieuthi, @kan-bayashi, @potato-inoue, @r9y9.
Source code(tar.gz)
Source code(zip)