NeMo: a toolkit for conversational AI

NVIDIA Corporation

Last update: Jan 4, 2023

Related tags

Overview

NVIDIA NeMo

Introduction

NeMo is a toolkit for creating Conversational AI applications.

NeMo product page.

Introductory video.

The toolkit comes with extendable collections of pre-built modules and ready-to-use models for:

Built for speed, NeMo can utilize NVIDIA's Tensor Cores and scale out training to multiple GPUs and multiple nodes.

Requirements

Python 3.6 or above
Pytorch 1.7.1 or above

Installation

Pip

Use this installation mode if you want the latest released version.

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
pip install nemo_toolkit[all]==1.0.0b3

Pip from source

Use this installation mode if you want the a version from particular GitHub branch (e.g main).

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all]

From source

Use this installation mode if you are contributing to NeMo.

apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh

Docker containers:

The easiest way to start training with NeMo is by using NeMo's container. It has all requirements and NeMo 1.0.0b3 already installed.

docker run --gpus all -it --rm --shm-size=8g \
-p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
stack=67108864 --device=/dev/snd nvcr.io/nvidia/nemo:1.0.0b3

If you chose to work with main branch, we recommend using NVIDIA's PyTorch container version 20.11-py3 and then installing from GitHub.

docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g \
-p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:20.11-py3

Examples

Simplest application with NeMo. (runs in Google Colab, no local installation necessary)

Lots of other examples in "Examples" folder.

Documentation

Version	Status	Description
Latest		Documentation of the latest (i.e. main) branch
Stable		Documentation of the stable (i.e. v1.0.0b1) branch

Getting help with NeMo

FAQ can be found on NeMo's Discussions board. You are welcome to ask questions or start discussions there.

Tutorials

The best way to get started with NeMo is to checkout one of our tutorials.

Most NeMo tutorials can be run on Google's Colab.

To run tutorials:

Click on Colab link (see table below)
Connect to an instance with a GPU (Runtime -> Change runtime type -> select "GPU" for hardware accelerator)

Tutorials

Domain	Title	GitHub URL
NeMo	Simple Application with NeMo	Voice swap app
NeMo	Exploring NeMo Fundamentals	NeMo primer
NeMo Models	Exploring NeMo Model Construction	NeMo models
ASR	ASR with NeMo	ASR with NeMo
ASR	ASR with Subword Tokenization	ASR with Subword Tokenization
ASR	Speech Commands	Speech commands
ASR	Speaker Recognition and Verification	Speaker Recognition and Verification
ASR	Online Noise Augmentation	Online noise augmentation
ASR	Beam Search and External Language Model Rescoring	Beam search and external language model rescoring
NLP	Using Pretrained Language Models for Downstream Tasks	Pretrained language models for downstream tasks
NLP	Exploring NeMo NLP Tokenizers	NLP tokenizers
NLP	Text Classification (Sentiment Analysis) with BERT	Text Classification (Sentiment Analysis)
NLP	Question answering with SQuAD	Question answering Squad
NLP	Token Classification (Named Entity Recognition)	Token classification: named entity recognition
NLP	Joint Intent Classification and Slot Filling	Joint Intent and Slot Classification
NLP	GLUE Benchmark	GLUE benchmark
NLP	Punctuation and Capitialization	Punctuation and capitalization
NLP	Named Entity Recognition - BioMegatron	Named Entity Recognition - BioMegatron
NLP	Relation Extraction - BioMegatron	Relation Extraction - BioMegatron
TTS	Speech Synthesis	TTS inference
TTS	Speech Synthesis	Tacotron2 training
Tools	CTC Segmentation	CTC Segmentation
Tools	Text Normalization for Text To Speech	Text Normalization

Contributing

We welcome community contributions! Please refer to the CONTRIBUTING.md CONTRIBUTING.md for the process.

License

NeMo is under Apache 2.0 license.

Comments

T5 pipeline parallel
What does this PR do ?

Adds pipeline parallel training support to T5.

Collection: NLP

Changelog

TBD

Usage

Set pipeline_model_parallel_size=2/4 in megatron_t5_config.yaml.

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines

[ ] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[ ] Bugfix

[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)
opened by MaximumEntropy 70
NLP refactoring - Stage 2
Signed-off-by: Evelina Bakhturina [email protected]

Stage 2 of NLP refactoring:

Cleaning up and restructuring of functions and files in nlp collection.

Cleaning up losses ++ Added weighting option to LossAggregatorNM ++ Moved LossAggregatorNM to the losses.py in the backend common ++ Splited JointIntentSlotLoss into two separate common losses and removed it ++ Merged MaskedLanguageModelingLossNM, PaddedSmoothedCrossEntropyLossNM and SmoothedCrossEntropyLoss into a unified loss SmoothedCrossEntropyLoss ++ Changed QuestionAnsweringLoss to a more general name SpanningLoss ++ Changed TRADEMaskedCrossEntropy to a more general name MaskedXEntropyLoss ++ Removed TokenClassificationLoss, CrossEntropyLoss3D and JointIntentSlotLoss ++ Added weighting and masking support to CrossEntropyLossNM ++ Added dynamic port sizes to CrossEntropyLossNM ++ Changed CrossEntropyLoss to CrossEntropyLossNM to prevent confusion with pytorch's CrossEntropyLoss
opened by ekmb 60
Dialogue task
What does this PR do ?

Add various functionalities to dialogue domain for NeMo

Collection: NLP

Changelog

Support Zero Shot Intent Recognition

Further refactored Dialogue module

Implement Dialogue GPT Generation Model

Support MS Marco Data Processor

Implement Dialogue S2S Generation Model (HF fully supported, Megatron training supported, inference pending integration of common generation API)

Support System Response Generation using user utterance and system slots based on SGD dataset

Support Design Data Processor

Implement HF BART based classifier into zero shot intent model

Implement Dialogue Nearest Neighbour Model

Refactor Dialogue SGD Data Processor to make interface with models cleaner

Update Nearest Neighbour Model and ZeroShotIntentModel to support SGD dataset and ZeroShot Datasets

Support Mellon QA Data Processor

Add Documentation and Tutorial

See details in NVIDIA only dev log

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[x] Make sure you read and followed Contributor guidelines

[x] Did you write any new necessary tests?

[x] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[x] New Feature

[ ] Bugfix

[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)
opened by Zhilin123 58
Binarized memmap dataloader for Megatron NMT, Inference and checkpoint -> nemo
What does this PR do ?

Adds megatron memory-mapped dataloaders to NMT.

Inference script/config with a translate() method.

Collection: NLP

Changelog

Add a new dataset class for megatron memmap dataset.

Add an inference script with the associated yaml config.

Change the use_tarred_dataset arg to a generic dataset_type arg that can take [text, tarred, bin_memmap, text_memmap]

Usage

Set the following in the yaml config

dataset_type: bin_memmap.

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines

[ ] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[ ] Bugfix

[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)
opened by MaximumEntropy 54
Adding Conformer model
Here are some of the main changes:

Added the modules needed for Conformer

Added use_start_end_tokens to the data layers to support dropping these tokens

Updated our CTC loss to support different reduction methods including 'mean_batch'. User may select other reduction approaches from Pytorch. User may set it via ctc_reduction param added to the model config.

Added log_prediction parameter to model's config to control if we want to see prediction samples in output or not

Added LSTM decoder and Swish activation

Added NoamScheduler

Added subsampling module which supports VGGNet and striding approach for subsampling

Added multi-head attention, relative multi-head attention along with positional embedding and relative positional embedding

Fixed the bug in the data layer which added some paddings after normalization (fixing this bug would fail the tests! going to investigate it)
opened by VahidooX 52
Adding cache-aware streaming Conformer with look-ahead support
What does this PR do ?

Adding cache-aware streaming Conformer training and inference with look-ahead support. It is achieved by training a model with limited effective right context and then perform the streaming with activation caching support. Limiting the right context would reduce the accuracy in compare to the an offline model but it gives better accuracy and significantly higher throughput by dropping duplicates in the computations which happens in buffered-based streaming.Large right context decreases the WER while increasing the latency.

It supports the three following modes: 1-fully causal model with zero look-ahead with zero latency 2-regular look-ahead 3-chunk-aware look-ahead with small duplication in computations.

It supports both Conformer-CTC and Conformer-Transducer and they can get trained with regular scripts but the configs files in the following folder: NeMo/examples/asr/conf/conformer/streaming/

A model trained in streaming mode can get evaluated with the following script: NeMo/examples/asr/conf/conformer/streaming/speech_to_text_streaming_infer.py

This script would simulate the streaming inference for a single audio or a manifest of audio files. Streaming can be done in multi-streaming mode (batched inference) for the manifest file to speed up the streaming. It can also compare the results with offline evaluation and report the differences in both the WER and models' outputs.

The accuracy of the model in both the offline evaluation and streaming is going to be exactly the same. In offline mode, the whole audio is passed through the model while in streaming audio is passed chunk by chunk.

Changelog

Added frame-wise streaming Conformer models with look-ahead support and caching mechanism for streaming inference.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

PR Type:

[x ] New Feature

[ ] Bugfix

[ ] Documentation
opened by VahidooX 49
[NeMoMegatron] Pipeline parallelism for GPT
PR to add pipeline parallelism to GPT using fwd/bwd functions from Apex.

FP32, FP16, and BF16 are all working now.

When using pipeline parallel it is recommended to use BF16 + Megatron amp O2.

model.megatron_amp_O2=True trainer.precision='bf16'

TODOs

under review

Known issues

complete method will be supported in a subsequent PR

prompt tuning temporarily disabled, use NeMo 1.6 if needed

when using tensor parallel only, we're still using sync grad all-reduce which reduces perf. Will be fixed in NeMo 1.8.
opened by ericharper 49
Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training
What does this PR do ?

Trains Megatron-based NMT models based on maximum number of samples.

Added support in text_memmap and csv_memmap in Megatron encoder-decoder models (T5, BART, UL2)

Collection: NLP

Usage

Add to command line

model.data.data_impl=text_mmap \ +model.data.data_impl_kwargs.newline_int=10 \ +model.data.data_impl_kwargs.header_lines=0 \ +model.data.data_impl_kwargs.workers=null \ +model.data.data_impl_kwargs.sort_dataset_paths=False

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines

[ ] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[ ] Bugfix

[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)
opened by MaximumEntropy 46
Tn es
What does this PR do ?

Adds Text Normalization for spanish language to nemo_text_normalization

Collection: Nemo Text Normalization

Changelog

Adds Text Normalization for Spanish language. Verbalizers and classifiers are available for following classes

Cardinal

Decimal

Ordinal

Fraction

Money

Measure

Date

Time

Electronic

Whitelist

Also includes a localization option of es-amer, which changes formatting rules to accommodate tendencies in Central American orthography. (e.g. Use of periods to group cardinals instead of commas - as is customary for other Spanish speaking locales.)

Includes updated es_pytests for text normalization and edits to export_grammar.sh and normalize.py to allow deployment. All tests have passed in nemo docker environment.

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[Y ] Make sure you read and followed Contributor guidelines

[Y ] Did you write any new necessary tests?

[Y ] Did you add or update any necessary documentation?

[ ]Y Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ Y] New Feature

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

N. B. Sparrowhawk deployment is altering white-space following periods, so TN is unable to manage both am/pm along with time zones. That is -> 10 a.m. est will cause an error. However, 10.00 h est is stable. This issue is Sparrowhawk related so have been unable to debug.
opened by bonham79 46
Neural Graphs
This (3rd!) PR follows the proposal from "Neural Graphs" design doc: https://docs.google.com/document/d/1218tRm2XtfLbYJvoepnbg3ET1CkJ72CCBTGAUZ1G7gY/edit#

Additionally, it assumes that a graph is developed for training/inference, so changes the mode of the "connected" modules during its build.

[x] Application State (singleton)

[x] Registration of a Neural Graph

[x] Recording of operation/modules forming a Graph

[x] Input port binding - with default port name and option to provide a new name (manual)

[x] Output port binding - with default port name and option to provide a new name (manual)

[x] Graph nesting

[x] Summary of graph/modules in a graph

[x] Export of a graph to YML file

[x] Import of a graph from YML file

[x] Built-in handling of training/inference modes

[x] Serialization of NeuralTypes for connections/inputs/outputs

[x] Graphs with loops

[x] Extended train() signature enabling to pass the "training_graph"

And a whole bunch of unit tests covering different aspects, from simple binding to "nesting of deserialized graph with input and output port bound into a graph with different ports bound" to "a graph with a loop"...
opened by tkornuta-nvidia 46
Text memmap dataset
Signed-off-by: Micha Livne [email protected]

What does this PR do ?

Has mechanism to retire older ind files by updating internal idx version.

Indexing speed of 1443990774 samples in 147 files using 6 workers

Loading speed

[NeMo I 2022-04-29 00:23:22 text_memmap_dataset:85] Time loading 147 mem-mapped files: 0:00:04.395558

In [9]: len(ds) Out[9]: 1443990774 # Timing without tokenizer In [10]: %timeit -n 1000 ds[np.random.randint(len(ds))] 555 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) # TIming with 'byte-level' tokenizer In [20]: %timeit -n 1000 ds[np.random.randint(len(ds))] 724 µs ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Added TextMemMapDataset

Added CSVMemMapDataset

Retired MegatronDataset

Added nemo/collections/nlp/data/language_modeling/text_memmap_dataset.py to preprocess indices (else happs on the fly at first run)

Added nemo/collections/nlp/data/machine_translation/sequence_to_sequence_dataset.py

Added scripts/nlp_language_modeling/build_index_memmap_data.py

Usage

Example for caching index files:

NeMo/scripts/nlp_language_modeling/build_index_memmap_data.py *.txt

Index files will be created when instantiating a memory mapped dataset if missing.

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines

[ ] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[ ] Bugfix

[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)
opened by michalivne 45
Don't add output directory twice when creating shared sentencepiece tokenizer
Signed-off-by: Patrick Simianer [email protected]

What does this PR do ?

As the title says this is a small fix: The output dir was already added to encoder_tokenizer_model on line 789 in the same file.

Collection: NLP

Changelog

Bugfix for creating shared sentencepiece tokenizer.

Usage

The error can be triggered by running preprocessing with NeMo for an MT data set:

#!/bin/bash python ../nemo/examples/nlp/machine_translation/enc_dec_nmt.py \ -cn aayn_base \ do_training=false \ model.preproc_out_dir=./preproc_dir/ \ model.train_ds.use_tarred_dataset=true \ model.train_ds.lines_per_dataset_fragment=1000000 \ model.train_ds.num_batches_per_tarfile=200 \ model.train_ds.src_file_name=../europarl-v7.de-en.en \ model.train_ds.tgt_file_name=../europarl-v7.de-en.de \ model.validation_ds.src_file_name=../valid.en \ model.validation_ds.tgt_file_name=../valid.de \ model.encoder_tokenizer.vocab_size=32000 \ model.decoder_tokenizer.vocab_size=32000 \ model.encoder_tokenizer.library=sentencepiece \ model.encoder_tokenizer.training_sample_size=9999 \ model.decoder_tokenizer.library=sentencepiece \ model.decoder_tokenizer.training_sample_size=9999 \ ~model.test_ds \ trainer.accelerator='cpu' \ +trainer.fast_dev_run=true \ exp_manager=null

Before your PR is "Ready for review"

Pre checks:

[x] Make sure you read and followed Contributor guidelines

[ ] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[x] Bugfix

[ ] Documentation

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information
NLP
opened by pks 0
Sanitize params before DLLogger log_hyperparams
What does this PR do ?

Allows DLLogger to work with types of non-builtin containers.

Before your PR is "Ready for review"

Pre checks:

[x] Make sure you read and followed Contributor guidelines

[ ] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[x] Bugfix

[ ] Documentation

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.
opened by milesial 0
Esperanto example
What does this PR do ?

Adds ASR example for training Esperanto Conformer-CTC-large model.

Collection: ASR

Changelog

Adds Esperanto example to docs/source/asr/examples/

Before your PR is "Ready for review"

Pre checks:

[x] Make sure you read and followed Contributor guidelines

[ ] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[ ] Bugfix

[x] Documentation

ASR
opened by andrusenkoau 0

fix: clamp keep input size in update_cache for causal conv

What does this PR do ?

Sometimes in CausalConv1D.update_cache input_x_keep ends up having no frames (ie size of [M, N, 0]). Make sure that the we have at least one frame.

Collection: asr

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines
[ ] Did you write any new necessary tests?
[ ] Did you add or update any necessary documentation?
[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature
[x] Bugfix
[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas. @VahidooX

Additional Information

To reproduce the original issue in main run:

#!/bin/bash
python examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py \
    --asr_model=stt_en_conformer_ctc_small \
    --chunk_size=100 \
    --shift_size=50 \
    --left_chunks=2 \
    --online_normalization \
    --manifest_file=/datasets/ls_test_other/transcripts.local.json \
    --batch_size=16 \
    --compare_vs_offline \
    --use_amp \
    --debug_mode

Error output:

...
Traceback (most recent call last):
  File "examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py", line 393, in <module>
    main()
  File "examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py", line 349, in main
    streaming_tran, offline_tran = perform_streaming(
  File "examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py", line 154, in perform_streaming
    ) = asr_model.conformer_stream_step(
  File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/mixins/mixins.py", line 475, in conformer_stream_step
    (encoded, encoded_len, cache_last_channel_next, cache_last_time_next) = self.encoder.cache_aware_stream_step(
  File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/mixins/streaming.py", line 61, in cache_aware_stream_step
    encoder_output = self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/core/classes/common.py", line 1087, in __call__
    outputs = wrapped(*args, **kwargs)
  File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/modules/conformer_encoder.py", line 471, in forward
    audio_signal = layer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/submodules/conformer_modules.py", line 191, in forward
    x = self.conv(x, pad_mask=pad_mask, cache=cache_last_time, cache_next=cache_last_time_next)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/submodules/conformer_modules.py", line 350, in forward
    x = self.depthwise_conv(x, cache=cache, cache_next=cache_next)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/submodules/causal_convs.py", line 162, in forward
    x = self.update_cache(x, cache=cache, cache_next=cache_next)
  File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/submodules/causal_convs.py", line 158, in update_cache
    cache_next[self._cache_id, :, :, -cache_keep_size:] = input_x_kept[:, :, -cache_keep_size:]
RuntimeError: The expanded size of the tensor (1) must match the existing size (0) at non-singleton dimension 2.  Target sizes: [16, 176, 1].  Tensor sizes: [16, 176, 0]

ASR

opened by messiaen 0

ASR evaluator
What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines

[ ] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[ ] Bugfix

[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

ASR
opened by fayejf 0
[ASR] Use a subset of manifest when using `scatter` shard strategy
What does this PR do ?

When using shard_strategy='scatter' this PR loads only a subset of lines from the manifest file. This may reduce the time to process a manifest file by an order of magnitude.

Opening as a draft to get feedback if there are any underlying assumptions which may be broken by this change.

Collection: ASR

Changelog

| File | Change | | ---------------- | -------- | | manifest.py::item_iter | Load only a subset of lines if shard_strategy == 'scatter' | | collections.py:: ASRAudioText | forward shard_strategy, global_rank and world_size to manifest.item_iter | | collections.py:: AudioText | Use rank and world size to restore the original data list length for shard_strategy='scatter' | | audio_to_text.py | forward shard_strategy, global_rank and world_size to collections.ASRAudioText | | utils,py | Added a function to get number of lines from a text file | | test_utils.py | Added unit test for the utility function added above |

Before your PR is "Ready for review"

Pre checks:

[x] Make sure you read and followed Contributor guidelines

[x] Did you write any new necessary tests?

[ ] Did you add or update any necessary documentation?

[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)

[ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ ] New Feature

[x] Bugfix

[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

ASR common
opened by anteju 0

Releases(v1.14.0)

v1.14.0(Dec 24, 2022)
Highlights

NeMo ASR

Hybrid CTC + Transducer loss ASR #5364

Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216

ASR Adapters hyper parameter search scripts #5159

RNNT {ONNX, TorchScript} x GPU export infer #5248

Exportable MelSpectrogram (TorchScript) #5512

Audio To Audio Dataset Processor #5196

Multi Channel Audio Transcription #5479

Silence Augmentation #5476

NeMo Megatron

Support for the Mixture of Experts for T5

Fix PTL model size output for GPT-3 and BERT

BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

TTS Zh Fastpitch HifiGan SFSpeech

Detailed Changelogs

ASR

Changelog

[Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158

Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248

Add Gradio App to ASR Docs by @titu1994 :: PR: #5270

Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216

Speed up HF data processing script for ASR by @titu1994 :: PR: #5330

bugfix in volume loss for CTC models by @bmwshop :: PR: #5348

Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279

Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442

Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440

Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439

[ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479

Add Silence Augmentation by @fayejf :: PR: #5476

add exportable mel spec by @1-800-BAD-CODE :: PR: #5512

add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312

[ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196

Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534

Pool stats with padding by @1-800-BAD-CODE :: PR: #5403

Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364

Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552

Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557

Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568

[STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog

[TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218

[TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241

[TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208

[TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276

[TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307

[TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304

Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311

[TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298

[TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318

[TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327

[TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344

[TTS] Add Spanish model documentation by @rlangman :: PR: #5390

[TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383

[TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392

[TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262

[TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430

[TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358

JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450

[TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455

TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532

[TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605

[TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596

[TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624

[TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643

[TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652

[TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog

Option to pad the last validation input sequence if its smaller than the encoder sequence length for MegatronGPT by @anmolgupt :: PR: #5243

Fixes bugs with loss averaging with for Megatron GPT by @shanmugamr1992 :: PR: #5329

Fixing bug in Megatron BERT when loss mask is all zeros by @shanmugamr1992 :: PR: #5424

support to disable sequence length + 1 input tokens for each sample in MegatronGPT by @anmolgupt :: PR: #5363

[TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Bug fix/gpt by @shanmugamr1992 :: PR: #5493

prompt tuning fix for unscale grad errors by @arendu :: PR: #5523

Bert sequence parallel support by @shanmugamr1992 :: PR: #5494

NLP docs fixes by @vsl9 :: PR: #5528

Switch order of args in optimizer_step override by @ericharper :: PR: #5549

Upgrade to 22.11 by @ericharper :: PR: #5550

Merge r1.13.0 main by @ericharper :: PR: #5570

some tokenizers do not have additional_special_tokens_ids attribute by @arendu :: PR: #5642

Remove cell output from tutorial by @ericharper :: PR: #5689

Text Normalization / Inverse Text Normalization

Changelog

[ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435

[TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog

Fixed the onnx bug in conformer for non-streaming models. by @VahidooX :: PR: #5242

Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248

Fixes for Conformer-xl export by @borisfom :: PR: #5309

Remove onnx graphsurgery from Dockerfile by @titu1994 :: PR: #5320

add exportable mel spec by @1-800-BAD-CODE :: PR: #5512

General Improvements

Changelog

bugfix in volume loss for CTC models by @bmwshop :: PR: #5348

Fix setting up of learning rate scheduler by @PeganovAnton :: PR: #5444

Better patch hydra by @titu1994 :: PR: #5591

[TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643

Add fully torch.jit.script-able speaker clustering module by @tango4j :: PR: #5191

Update perturb.py by @stevehuang52 :: PR: #5231

remove CV requirements. by @XuesongYang :: PR: #5233

checks for accepted adapter type at module level by @arendu :: PR: #5194

fix hypotheses return by @nithinraok :: PR: #5253

Support for inserting additional subsampling in conformer encoder by @shan18 :: PR: #5224

update tutorials to use meeting config as default and VAD by @nithinraok :: PR: #5237

Specifying audio signal dropout separately for the Conformer Encoder by @shan18 :: PR: #5263

created by @bmwshop :: PR: #5268

Fix failing speaker counting for short audio samples by @tango4j :: PR: #5267

O2bert + apex pipeline functions by @shanmugamr1992 :: PR: #5221

Upperbound PTL by @titu1994 :: PR: #5302

Update Interface(s) phonetic entry by @blisc :: PR: #5212

add label inference support to EncDecSpeakerLabel class by @nithinraok :: PR: #5278

Add italian model checkpoints by @Kipok :: PR: #5315

Text Memmap Parsing Improvements by @michalivne :: PR: #5265

Update librosa signature in HF processing script by @titu1994 :: PR: #5321

Force wav file format for audio_filepath by @titu1994 :: PR: #5323

Updates to T0 Dataset and Model by @MaximumEntropy :: PR: #5201

[DOC] add sphinx-copybutton requirement to copy button on code snippets. by @XuesongYang :: PR: #5326

Add support for Hydra multirun to NeMo by @titu1994 :: PR: #5159

typo fix by @arendu :: PR: #5328

add precommit hood to automatic sort entries in requirements. by @XuesongYang :: PR: #5333

Add speaker clustering arguments to forward function by @tango4j :: PR: #5306

Fixing de-autocast by @borisfom :: PR: #5319

[Bugfix] Added rm -f / wget- nc command to avoid bash error in multispeaker sim notebook by @tango4j :: PR: #5292

[DOC] added ipython dependency to support IPython.sphinxext extension by @XuesongYang :: PR: #5345

Bug fix (removing old compute consumed samples) by @shanmugamr1992 :: PR: #5355

removed uninstall nemo_cv and nemo_simple_gan and relax numba version… by @XuesongYang :: PR: #5332

Enable mlflow logger by @whrichd :: PR: #4893

Fix Python type hints according to Python Docs by @artbataev :: PR: #5370

Distributed optimizer support for BERT by @timmoon10 :: PR: #5305

SpeakerClustering: fix tensor dimennsions in forward() by @virajkarandikar :: PR: #5387

add squad by @arendu :: PR: #5407

added python and c++ alignment code by @yzhang123 :: PR: #5346

Add MoE support for T5 model (w/o expert parallel) by @aklife97 :: PR: #5409

Fix for concat map dataset by @1-800-BAD-CODE :: PR: #5133

Support for finetuning and finetuning inference with .ckpt files & batch size refactoring by @MaximumEntropy :: PR: #5339

update doc in terms of get_label for lang id model by @fayejf :: PR: #5366

Debug support for interleaved pipeline parallelism with the distributed Adam optimizer by @timmoon10 :: PR: #5236

Create codeql.yml by @titu1994 :: PR: #5445

Update codeql.yml by @titu1994 :: PR: #5449

Fix support for legacy sentencepiece models by @Numeri :: PR: #5406

Update docs with Comparison tool info, and slightly change .sh for ea… by @Jorjeous :: PR: #5182

Add float32 type casting for get_samples function by @tango4j :: PR: #5399

Add missing import in transcribe_utils.py by @jonghwanhyeon :: PR: #5487

Add auto-labeler by @SeanNaren :: PR: #5498

Add more glob patterns for labeler by @SeanNaren :: PR: #5504

Fix issues with PL 1.8 by @SeanNaren :: PR: #5353

[BugFix] Removing tokens from decoding timestamp by @tango4j :: PR: #5481

Upperbound the torchmetrics version by @SeanNaren :: PR: #5537

Data parallel collect results by @michalivne :: PR: #5547

Fix log-rank-0-only logic by @mikolajblaz :: PR: #5555

Fixed Docker build by @borisfom :: PR: #5562

Patch hydra launch by @titu1994 :: PR: #5589

Fix race condition bug with hydra multirun by @titu1994 :: PR: #5594

Update Dockerfile to use numba==0.53.1 by @stevehuang52 :: PR: #5614

Fixed a missing import for gather_objects by @michalivne :: PR: #5622

Source code(tar.gz)
Source code(zip)
v1.13.0(Dec 7, 2022)
Highlights

NeMo ASR

Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder

Support for codeswitched manifests during training

Support for Language ID during inference for ML models

Support of cache-aware streaming for offline models

Word confidence estimation for CTC & RNNT greedy decoding

NeMo Megatron

Interleaved Pipeline schedule

Transformer Engine for GPT

HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning

IA3 and Adapter Tuning (Tensor + Pipeline Parallel)

Pipeline Parallel Support for T5 Prompt Learning

MegatronNMT export

NeMo TTS

TTS introductory tutorial

Phonemizer/espeak removal (Spanish/German)

Char-only support for Spanish/German models

Documentation Refactor

NeMo Core

Upgrade to NGC PyTorch 22.09 container

Add pre-commit hooks

Exponential moving average (EMA) of weights during training

NeMo Models

ASR Conformer Croatian: stt_hr_conformer_ctc_large and stt_hr_conformer_transducer_large

ASR Conformer Belarusian: stt_be_conformer_ctc_large and stt_be_conformer_transducer_large

ASR Squeezeformer Librispeech: 6 checkpoints (XS, S, SM, M, ML, L)

SLURP Intent Classification / Slot Filling: slu_conformer_transformer_large_slurp

LanguageID AmberNet: langid_ambernet

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.09

Known Issues

Issues

pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.

ASR

Changelog

Add docs tutorial on kinyarwanda asr by @bene-ges :: PR: #4953

Asr codeswitch by @bmwshop :: PR: #4821

Add test for nested ASR model by @titu1994 :: PR: #5002

Greedy decoding confidence for CTC and RNNT by @GNroy :: PR: #4931

[ASR][Tools] RIR corpus generator by @anteju :: PR: #4927

Add Squeezeformer CTC model checkpoints on Librispeech by @titu1994 :: PR: #5121

adding loss normalization options to rnnt joint by @bmwshop :: PR: #4829

Asr concat dataloader by @bmwshop :: PR: #5108

Added ASR model comparison to SDE by @Jorjeous :: PR: #5043

Add scripts for converting Spoken Wikipedia to asr dataset by @bene-ges :: PR: #5138

ASR confidence bug fix for older Python versions by @GNroy :: PR: #5180

Update ASR Scores and Results by @titu1994 :: PR: #5254

[STT] Add Ru ASR Conformer-CTC and Conformer-Transducer by @ssh-meister :: PR: #5340

TTS

Changelog

[TTS] Adding speaker embedding conditioning in fastpitch by @subhankar-ghosh :: PR: #4986

[TTS] Remove PhonemizerTokenizer by @rlangman :: PR: #4990

[TTS] FastPitch speaker interpolation by @subhankar-ghosh :: PR: #4997

RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947

[TTS] remove phonemizer.py by @XuesongYang :: PR: #5090

[TTS] Add NeMo TTS Primer Tutorial by @rlangman :: PR: #4933

[TTS] Add SpanishCharsTokenizer by @rlangman :: PR: #5135

Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training by @Kipok :: PR: #5118

refactor TTS documentation organization and add new contents. by @XuesongYang :: PR: #5137

[TTS][DOC] update models trained on HifiTTS dataset. by @XuesongYang :: PR: #5173

[TTS] Fix TTS Primer image markup by @rlangman :: PR: #5192

[TTS] deprecate TextToWaveform base class. by @XuesongYang :: PR: #5205

[TTS] remove the avoidance of circular imports by @XuesongYang :: PR: #5214

[TTS] remove LinVocoder and apply Vocoder as parent class. by @XuesongYang :: PR: #5206

[TTS] unify requirements_tts.txt and requirements_torch_tts.txt by @XuesongYang :: PR: #5232

Minor typo fixes in TTS tutorial by @redoctopus :: PR: #5266

Radtts 1.13 by @borisfom :: PR: #5451

Radtts 1.13 plus by @borisfom :: PR: #5457

NLP / NMT

Changelog

IA3 support for GPT and T5 by @arendu :: PR: #4909

Fix and refactor consumed samples save/restore for Megatron models. by @MaximumEntropy :: PR: #5077

Remove unsupported arguments from MegatronNMT by @MaximumEntropy :: PR: #5065

Update megatron interface to dialogue by @Zhilin123 :: PR: #4936

gpt ia3 CI tests by @arendu :: PR: #5140

Fix NMT Eval Sampler by @aklife97 :: PR: #5154

Add interleaved pipeline schedule to GPT by @ericharper :: PR: #5025

fix for bug in bignlp by @arendu :: PR: #5172

Fixes some args that were not removed properly for multilingual Megatron NMT by @MaximumEntropy :: PR: #5142

Fix absolute path in GPT Adapter CI tests by @arendu :: PR: #5184

Add ability to configure drop last batch for validation datasets with MegatronGPT by @shanmugamr1992 :: PR: #5067

Megatron Export Update by @Davood-M :: PR: #5343

Fix GPT generation when using sentencepiece tokenizer by @MaximumEntropy :: PR: #5413

Disable sync_batch_comm in validation_step for GPT by @ericharper :: PR: #5397

Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448

Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475

Text Normalization / Inverse Text Normalization

Changelog

[Chinese text normalization] speed up graph building by @pengzhendong :: PR: #5128

NeMo Tools

Changelog

Added ASR model comparison to SDE by @Jorjeous :: PR: #5043

Export

Changelog

Fix export bug by @VahidooX :: PR: #5009

RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947

Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164

Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052

Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114

replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217

Megatron Export Update by @Davood-M :: PR: #5343

Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446

export_utils bugfix by @Davood-M :: PR: #5480

Export fixes for Riva by @borisfom :: PR: #5496

General Improvements and Bugfixes

Changelog

don't use bfloat16 when in jit by @bmwshop :: PR: #5051

Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448

Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475

Pin Transformers version to fix CI by @SeanNaren :: PR: #4955

Fix changelog builder (#4962) by @titu1994 :: PR: #4963

Checkpoint averaging class fix by @michalivne :: PR: #4946

Add ability to give seperate datasets for test, train and validation by @shanmugamr1992 :: PR: #4798

Add simple pre-commit file by @SeanNaren :: PR: #4983

Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execut… by @liji-nv :: PR: #4951

Improvements to AMI script by @SeanNaren :: PR: #4974

clean warnings from tests and CI runs, and prepare for upgrade to PTL 1.8 by @nithinraok :: PR: #4830

Update libraries by @titu1994 :: PR: #5010

add close inactive issues and PRs github action. by @XuesongYang :: PR: #5015

Fix filename extraction in vad_utils.py by @GKPr0 :: PR: #4999

Add black to pre-commit by @SeanNaren :: PR: #5027

[CI] Enable previous build abort when new commit pushed by @SeanNaren :: PR: #5041

Tutorials and Docs for Multi-scale Diarization Decoder by @tango4j :: PR: #4930

Refactor output directory for MSDD Inference Notebook by @SeanNaren :: PR: #5044

text_memmap dataset index range testing fix by @michalivne :: PR: #5034

fix undefined constant in code example by @bene-ges :: PR: #5046

Text generation refactor and RETRO text generation implementation by @yidong72 :: PR: #4985

Lids by @bmwshop :: PR: #4820

Add datasets folder, add diarization datasets voxconverse/aishell by @SeanNaren :: PR: #5042

Fix the bugs in cache-aware streaming Conformer by @VahidooX :: PR: #5032

Bug fix - Limit val batches set to 1.0 by @shanmugamr1992 :: PR: #5023

[bug_fix] kv_channels is used when available by @arendu :: PR: #5066

Add spe_split_by_unicode_script arg by @piraka9011 :: PR: #5072

Transformer Engine Integration by @ericharper :: PR: #5104

Text memmap dataset index memory efficiency by @michalivne :: PR: #5056

Add NGC links for Aligner and FastPitch by @redoctopus :: PR: #5235

Fix link to inference notebook by @redoctopus :: PR: #5247

Fix links to speaker identification notebook by @SeanNaren :: PR: #5260

Fix bug into Dialogue tutorial by @Zhilin123 :: PR: #5277

PCLA tutorial typo fix by @jubick1337 :: PR: #5288

Fix dialogue tutorial bug by @Zhilin123 :: PR: #5297

small bugfix for r1.13.0 by @fayejf :: PR: #5310

Add italian model checkpoints by @Kipok :: PR: #5316

Pcla tutorial fixes by @jubick1337 :: PR: #5313

Fix issue with HF Model upload tutorial by @titu1994 :: PR: #5359

P&C LA tutorial fixes by @jubick1337 :: PR: #5354

Add SDP documentation by @erastorgueva-nv :: PR: #5274

[Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 by @tango4j :: PR: #5375

Rename Speech Dataset Processor to Speech Data Processor by @erastorgueva-nv :: PR: #5378

fix for num worker 0 causing issues in losses after 1 epoch by @arendu :: PR: #5379

Fixed bug in notebook by @vadam5 :: PR: #5382

Force MHA QKV onto fp32 by @titu1994 :: PR: #5391

Fix for prompt table restore error by @vadam5 :: PR: #5393

Fix activation checkpoint args for T5 by @MaximumEntropy :: PR: #5410

Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5421

disable pc test by @ekmb :: PR: #5426

Revert Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5431

Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False by @MaximumEntropy :: PR: #5420

Add num layers check for full activation checkpointing by @MaximumEntropy :: PR: #5470

Cherry Pick T5 finetuning changes into 1.13 by @MaximumEntropy :: PR: #5478

T5 Eval bugfix by @Davood-M :: PR: #5521

added set_start_method + function param bugfix by @Davood-M :: PR: #5539

Remove notebook by @ericharper :: PR: #5548

Remove broadcast from T5 prompt learning inference by @MaximumEntropy :: PR: #5558

Fix all gather while writing to a file during T5 finetuning by @MaximumEntropy :: PR: #5561

Source code(tar.gz)
Source code(zip)
v1.12.0(Oct 10, 2022)
Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.08

ASR

Changelog

Add support for RNNT Char/Word Timestamp Calculation by @titu1994 :: PR: #4665

add conditional logic to rnnt_wer to handle when arrays have no elements by @mgoldey :: PR: #4776

fix handling of the final word for rnnt word timestamps by @mgoldey :: PR: #4779

amend rnnt word timestamps by @mgoldey :: PR: #4782

fix type error in rnnt_wer.py, rnnt_wer_bpe.py, wer_bpe.py by @hainan-xv :: PR: #4822

add kab language asr models by @nithinraok :: PR: #4819

[Tutorial][ASR][Fix] Data paths in ASR with NeMo tutorial by @anteju :: PR: #4845

[ASR] Fix for multi-channel signals in AudioSegment by @anteju :: PR: #4824

[ASR] Generate multichannel noise by @anteju :: PR: #4870

Fix asr model order by @nithinraok :: PR: #4959

Fix ASR issues by @titu1994 :: PR: #4984

Fix diarization ASR inference link in notebook by @SeanNaren :: PR: #5016

Code switching by @KunalDhawan :: PR: #4784

Release SOTA Lang ID model by @fayejf :: PR: #5080

Stateless decoder for RNN-T by @hainan-xv :: PR: #4710

TTS

Changelog

[TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679

TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690

[TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781

ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850

IPA G2P bugfixes by @redoctopus :: PR: #4869

[TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886

[TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902

[TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718

[TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949

[TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992

[TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070

[TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087

NLP / NMT

Changelog

Fix bug intent slot classification tokenizer to dialogue by @Zhilin123 :: PR: #4694

Intent slot model onnx export test by @Zhilin123 :: PR: #4731

Fix megatron p tuning notebook by @nithinraok :: PR: #4741

Add support for Apex distributed Adam optimizer with GPT-3 by @timmoon10 :: PR: #4487

Fixes NLPModel's load from checkpoint due to PTL private function changes by @MaximumEntropy :: PR: #4755

Adapter tuning for Megatron GPT models by @arendu :: PR: #4717

Megatron Encoder Decoder models with RPE and PP > 2 by @MaximumEntropy :: PR: #4663

add kab language asr models by @nithinraok :: PR: #4819

add chinese to language doc and fix bug by @yzhang123 :: PR: #4834

Spoken Language Identification by @fayejf :: PR: #4846

Fix decoding bug for megatron enc-dec models with O2 by @MaximumEntropy :: PR: #4989

Updating Megatron LM conversion according to PTL 1.7 by @Davood-M :: PR: #5038

Adding RETRO model Faiss sharding index and KNN sharding index by @yidong72 :: PR: #4713

MLP Prompt Learning Encoder by @vadam5 :: PR: #4849

Update the prompt learning to handle large lanague model by @yidong72 :: PR: #4906

Text Normalization / Inverse Text Normalization

Changelog

[TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781

[Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826

Fix zh tn by @yzhang123 :: PR: #5035

Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047

Added P&C lexical audio model by @jubick1337 :: PR: #4802

Export

Changelog

Intent slot model onnx export test by @Zhilin123 :: PR: #4731

General Improvements

Changelog

Fix logger reference by @SeanNaren :: PR: #4786

Fix error with class method reference in msdd by @SeanNaren :: PR: #4865

Add sync for logging calls to ensure aggregation across devices by @SeanNaren :: PR: #4876

Fix saving the last checkpoint when using val check interval by @SeanNaren :: PR: #4905

Add support for skipping validation on resume + extend saving last ckpt test by @SeanNaren :: PR: #4922

Move trainer calls for ssl models to training and validation steps only by @sam1373 :: PR: #4685

Change Num Partitions size expansion fix by @aklife97 :: PR: #4719

upgrade to PTL 1.7 by @nithinraok :: PR: #4672

Fixing outputs of infer() and use of NeMo length regulator helper by @borisfom :: PR: #4724

bug fix: enable async grad reduction when DP > 1 by @erhoo82 :: PR: #4740

Add LayerNorm1P, weight decay for LN and unscaled initialization by @mikolajblaz :: PR: #4743

Data Simulator by @chooper1 :: PR: #4686

jenkins data simulator fix by @nithinraok :: PR: #4751

Mutiscale Diarization Decoder (MSDD) model and module files by @tango4j :: PR: #4650

Fix logging in gradient clipping with PTL 1.7.2 by @MaximumEntropy :: PR: #4769

Fix checkpoint restoring by @nithinraok :: PR: #4777

avoid data clipping after convolution with rir samples by @nithinraok :: PR: #4806

Fixed in_features dim if bidirectional is True by @farisalasmary :: PR: #4588

Fix float/integer type error in WER.update() by @fujimotos :: PR: #4816

[Speech Data Explorer] An option to explicitly specify the base dir by @anteju :: PR: #4678

adding instancenorm as an option for conv normalization by @bmwshop :: PR: #4827

Fix small spelling mistakes by @SeanNaren :: PR: #4839

[Tutorials] Fix matplotlib version and directory name in Multispeaker_Simulator by @anteju :: PR: #4804

Update diarization folder structure by @tango4j :: PR: #4823

Missing types in clustering by @SeanNaren :: PR: #4858

add new models by @Jorjeous :: PR: #4852

Fix decoding for T5 models with RPE by @MaximumEntropy :: PR: #4847

Update Speaker Diarization notebooks with unknown oracle_num_speakers by @fayejf :: PR: #4861

Fix mha bug by @yzhang123 :: PR: #4859

Updates to adapter training by @arendu :: PR: #4842

Changes to MSDD code after review, fix test log call by @SeanNaren :: PR: #4881

Fixed output of BERT to be [batch x seq x hidden] by @michalivne :: PR: #4887

Add AMI dataset script by @SeanNaren :: PR: #4864

Update label_models.py by @stevehuang52 :: PR: #4891

Update tutorials.rst for question answering by @Zhilin123 :: PR: #4895

removed unused imports for all domains. by @XuesongYang :: PR: #4901

Fix ptl_load_state not providing cls by @MaximumEntropy :: PR: #4914

Remove unused cv collection by @okuchaiev :: PR: #4907

Add mixed-representation config to PhonemizerTokenizer by @rlangman :: PR: #4904

Fix implicit bug in _AudioLabelDataset by @stevehuang52 :: PR: #4923

Fix and refactor label models by @fayejf :: PR: #4913

Sparrowhawk deployment fix by @ekmb :: PR: #4928

Upgrade to NGC PyTorch 22.08 Container by @ericharper :: PR: #4929

Fixes for Cherry Picked PRs by @titu1994 :: PR: #4962

Fix cherry pick workflow by @ericharper :: PR: #4964

check for active conda environment by @nithinraok :: PR: #4970

fix label models restoring issue from weighted cross entropy by @nithinraok :: PR: #4968

Add simple pre-commit file (#4983) by @SeanNaren :: PR: #4995

Fix bug in Squeezeformer Conv block by @titu1994 :: PR: #5011

Fix bugs by @Zhilin123 :: PR: #5036

Add black to pre-commit (#5027) by @SeanNaren :: PR: #5045

Fix bug in question answering tutorial by @Zhilin123 :: PR: #5049

Missing fixes from r1.11.0 to T5 finetuning eval by @MaximumEntropy :: PR: #5054

P&C docs by @jubick1337 :: PR: #5068

probabilites -> probabilities by @nithinraok :: PR: #5078

Notebook bug fixes by @vadam5 :: PR: #5084

update strategy in notebook from ddp_fork to dp by @Zhilin123 :: PR: #5088

Fix Unhashable type list for Numba Cuda spec augment kernel by @titu1994 :: PR: #5093

Remove numba import by @titu1994 :: PR: #5095

T5 prompt learning fixes missing from r.11.0 merge by @MaximumEntropy :: PR: #5075

T5 Decoding with PP > 2 fix by @MaximumEntropy :: PR: #5091

Multiprocessing fix by @jubick1337 :: PR: #5106

[Bug fix] PC lexical + audio by @ekmb :: PR: #5109

bugfix: pybtex.database.InvalidNameString: Too many commas in author … by @XuesongYang :: PR: #5112

Source code(tar.gz)
Source code(zip)
v1.11.0(Sep 8, 2022)
Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.07

ASR

Changelog

Add ASR CTC Decoding module by @titu1994 :: PR: #4342

Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414

Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523

Update signature of Hypothesis alignments by @titu1994 :: PR: #4511

Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480

Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576

Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595

Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592

Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582

Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608

Add Squeezeformer to ASR by @titu1994 :: PR: #4416

Fix ASR notebooks by @titu1994 :: PR: #4738

Add pretrained ASR models for Croatian by @anteju :: PR: #4682

Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187

Multilingual VAD model by @fayejf :: PR: #4734

Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687

Fp16 support for Conformer by @bmwshop :: PR: #4571

Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625

Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512

Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465

TTS

Changelog

Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388

Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392

Add static method decorator. by @XuesongYang :: PR: #4443

Fix typo in HiFi-GAN config's max steps by @XuesongYang :: PR: #4450

Relaxed support for both CPUs and GPUs by @XuesongYang :: PR: #4461

Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4413

Created the finetuning Hifigan 44100Hz recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4478

Fix dataset parameter typo on tacotron2 example yaml by @saarus72 :: PR: #4471

Update cmudict by @jasro23 :: PR: #4510

Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582

Fix off-by-1 bug in Beta Binomial Prior by @rlangman :: PR: #4616

G2P Aligner by @redoctopus :: PR: #4604

RADTTS ADLR-NEMO porting by @MikyasDesta :: PR: #4538

Fixed wrong pronunciations for r1.11. by @XuesongYang :: PR: #4677

Incremented the version number to 22.08 in tutorials. by @XuesongYang :: PR: #4684

Bugfix for missing configs. by @XuesongYang :: PR: #4725

Fix pynini install in TTS tutorials by @redoctopus :: PR: #4729

Updated config with a German IPA phoneme tokenizer by @XuesongYang :: PR: #4756

Add multi-speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4763

Add single male speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4770

Deprecated old scripts for ljspeech. by @XuesongYang :: PR: #4780

Fix MixerTTS data loading index error by @redoctopus :: PR: #4811

G2P docs by @ekmb :: PR: #4841

NMESC speaker counting algorithm update by @tango4j :: PR: #4500

NLP / NMT

Changelog

Add O2 support for RETRO model by @yidong72 :: PR: #4411

Add MTEncDec Finetune support by @aklife97 :: PR: #4540

Fix metric setup for finetuning without a test set by @MaximumEntropy :: PR: #4585

T0 model and dataset by @MaximumEntropy :: PR: #4598

Add prompt learning for T5 by @HeyyyyyyG :: PR: #4391

Add MuTransfer Capablity to RETRO model pretraining by @yidong72 :: PR: #4643

Label Smoothing in VocabParallelCrossEntropy by @MaximumEntropy :: PR: #4602

Megatron BART BOS / EOS bug fix by @michalivne :: PR: #4495

GPT Prompt Learning Improvements by @vadam5 :: PR: #4496

Megatron perceiver with tensor parallelism only by @MaximumEntropy :: PR: #4318

Refactor for punctuation model by @jubick1337 :: PR: #4367

Update megatron prompt learning interface to dialogue by @Zhilin123 :: PR: #4545

Removed NLPDDPPlugin Import check by @vadam5 :: PR: #4555

Option to disregard document boundaries for t5, bart, ul2 by @MaximumEntropy :: PR: #4481

Add Tokenization and Normalization pre-proecssing script for NMT by @aklife97 :: PR: #4557

Integrating support for GPT/T5/BART for Question Answering by @ameyasm1154 :: PR: #4532

NeMo Megatron: Add sequence parallelism and selective activation checkpointing (rebased) by @ericharper :: PR: #4380

Update megatron t5 interface to dialogue by @Zhilin123 :: PR: #4626

Additional sentencepiece args - Byte fallback, split digits, split_on_whitespace by @MaximumEntropy :: PR: #4525

Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training by @MaximumEntropy :: PR: #4396

NeMo Megatron Doc updates1 by @okuchaiev :: PR: #4633

Asymmetric Encoder and Decoder Configuration for Megatron Models by @MaximumEntropy :: PR: #4568

Add sentencepiece legacy arg to megatron tokenizer configs by @MaximumEntropy :: PR: #4659

Megatron encode function with RPE fix by @MaximumEntropy :: PR: #4692

Updates to NeMo Megatron OSS docs by @okuchaiev :: PR: #4709

Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499

fix bug relating to ddp strategy in joint intent slot classification … by @Zhilin123 :: PR: #4762

Fix qa notebook typos and branch by @ericharper :: PR: #4788

Colab py37 compatibility megatron by @Zhilin123 :: PR: #4791

added/fixed export for Megatron models by @Davood-M :: PR: #4712

Fix providing glue in seq2seq eval by @MaximumEntropy :: PR: #4843

Fix Megatron NMT consumed samples and ckpt_to_nemo split rank by @MaximumEntropy :: PR: #4884

Fixing Megatron BERT output dimensions to [batch x sec x hidden] by @michalivne :: PR: #4894

Prompt Learning Inference Improvements by @vadam5 :: PR: #4566

MegaMolBART Compatibility by @michalivne :: PR: #4603

Text Normalization / Inverse Text Normalization

Changelog

Add ITN pt by @guidefloripa :: PR: #4516

add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595

Fix ITN pt by @guidefloripa :: PR: #4623

Bug fix hundred in Audio-based, added method so split text in sentences by @ekmb :: PR: #4610

Fix itn pt time by @guidefloripa :: PR: #4630

Pin lightning version to be < 1.7.0 by @MaximumEntropy :: PR: #4660

G2P for OOV and heteronyms by @ekmb :: PR: #4624

Publish pretrained itn t5 model for English by @bene-ges :: PR: #4748

Added MLM Scoring by @yzhang123 :: PR: #4476

Export

Changelog

update fastpitch to add export controls by @blisc :: PR: #4509

Fix Fastpitch Export by @blisc :: PR: #4676

Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499

Added/fixed export for Megatron models by @Davood-M :: PR: #4712

Bugfixes

Changelog

Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388

Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392

Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553

Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636

Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640

Improve mAES algorithm with patches by @titu1994 :: PR: #4662

General Improvements

Changelog

Option to disable mp in VAD via num_workers=1 by @gkucsko :: PR: #4317

Remove redundant bias expand by @xrennvidia :: PR: #4382

Add option for specifying wandb save_dir from config by @shan18 :: PR: #4379

Quick wav2vec fix. In-place operation adding convolutional positions … by @bonham79 :: PR: #4383

Fixing import error in some cases by @borisfom :: PR: #4401

Update with new conformer checkpoints. by @VahidooX :: PR: #4417

Wav2vec fix by @bonham79 :: PR: #4467

Relative Audio Paths by @stevehuang52 :: PR: #4470

Allow Noam lr scheduler to run for more than max_steps by @alancucki :: PR: #4472

Support for Different LRs with Param Groups by @stevehuang52 :: PR: #4508

Fix runtime check by @borisfom :: PR: #4501

Update finetune label models by @nithinraok :: PR: #4504

Weighted bucketing by @tbartley94 :: PR: #4530

Relative Audio Path by @stevehuang52 :: PR: #4520

Fix duplex inference with grammars by @ekmb :: PR: #4517

Add nsys profiling by @ericharper :: PR: #4539

Remove the variable that is not used in the context. by @XuesongYang :: PR: #4547

Adding multispeaker fastpitch and hifigan en model links to available… by @subhankar-ghosh :: PR: #4550

Add length ratio filtering script by @MaximumEntropy :: PR: #4551

Relative audio path in speech data explorer by @anteju :: PR: #4570

Dividing generative question-answering CI tests by @ameyasm1154 :: PR: #4600

Updating the default parameters in the example adapters config file by @shan18 :: PR: #4607

Improve normalize_batch ValueError message by @piraka9011 :: PR: #4614

Support listing Hugging Face model info by @titu1994 :: PR: #4619

Update diarization data loader to train meeting data by @tango4j :: PR: #4567

Fix HF check for model card info by @titu1994 :: PR: #4628

Add Github Action for auto webpage build by @titu1994 :: PR: #4645

Empty commit by @titu1994 :: PR: #4646

Force git config for doc build by @titu1994 :: PR: #4647

Correct branch name for github page source by @titu1994 :: PR: #4648

Adding lang id to shard by @bmwshop :: PR: #4649

Fix special tokens in vocab to arguments of constructor by @gwarmstrong :: PR: #4631

Fix apex for r1.11 by @michalivne :: PR: #4666

Update readme by @nithinraok :: PR: #4667

Removed trailing spaces in CI test by @vadam5 :: PR: #4671

Pynini dependency fix by @ekmb :: PR: #4674

Fix for incorrect batch size issue while decoding by @rilango :: PR: #4675

Fix to fetch config file by @nithinraok :: PR: #4699

Fix notebook for buffered inference by @titu1994 :: PR: #4703

Prompt Learning Notebook Bug Fix by @vadam5 :: PR: #4689

Add psutils to mock imports by @ericharper :: PR: #4728

Update Aligner model and tutorial to add NGC checkpoint loading by @redoctopus :: PR: #4714

Updated docs and doc paths by @vadam5 :: PR: #4754

Update r1.11 to new heteronyms list by @redoctopus :: PR: #4745

Update CMUdict with more recent 0.7b entries by @redoctopus :: PR: #4768

Add pynini to Docker container by @artbataev :: PR: #4733

Fix tutorial formatting by @redoctopus :: PR: #4778

Fix initializing weights from ptl ckpt with exclude by @sam1373 :: PR: #4807

T5 prompt learning fixes by @MaximumEntropy :: PR: #4771

Updated inference code and squad scripts by @vadam5 :: PR: #4835

Fix uppercasing mismatch for IPA heteronyms by @redoctopus :: PR: #4860

Set the number of workers to 0 for validation and test sets in all enc-dec models by @MaximumEntropy :: PR: #4790

Fix mha by @yzhang123 :: PR: #4866

ipa bug fix by @ekmb :: PR: #4871

Added utf8 encoding by @vadam5 :: PR: #4892

Fix question answering docs r1p11 by @Zhilin123 :: PR: #4897

Source code(tar.gz)
Source code(zip)
v1.10.0(Jul 1, 2022)
Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.05

Known Issues

Issues

Tutorial: Fastpitch_Training_GermanTTS.ipynb is experimental and still being tested.

ASR

Changelog

Multilang asr tutorial by @bmwshop :: PR: #3931

Add ASR with Adapters Tutorial by @titu1994 :: PR: #4149

Add support for Decoder + Joint Adapters for ASR by @titu1994 :: PR: #4189

updating PretrainedModelInfo and benchmark sheet for ASR models by @krishnacpuvvada :: PR: #4259

Remove verbose flag from Dali Index Creator by @titu1994 :: PR: #4309

updating PretrainedModelInfo for ASR SSL models by @krishnacpuvvada :: PR: #4292

Adding docs for ASR SSL by @krishnacpuvvada :: PR: #4303

Add ASR Scores to Docs by @titu1994 :: PR: #4412

[ASR] Replace all paths with /content/ by @titu1994 :: PR: #4427

added conformer mandarin model. by @VahidooX :: PR: #4201

Runtime audio segment sampling for SSL by @krishnacpuvvada :: PR: #4126

TTS

Changelog

[TTS] Add volume passthrough to fp for riva by @blisc :: PR: #4167

Update TTS Configs from LAMB to AdamW by @redoctopus :: PR: #4233

Add benchmark=false to all TTS configs by @redoctopus :: PR: #4263

[TTS] add staticmethod decoration for BetaBinomialInterpolator by @XuesongYang :: PR: #4319

[TTS] capture exception of non-supported windows. by @XuesongYang :: PR: #4320

[TTS] enforced pin_memory = True by @XuesongYang :: PR: #4341

[TTS] Training Fastpitch on German text and phonemes and finetuning HiFi-GAN on predicted mels by @aroraakshit :: PR: #4266

IPA support for TTS by @redoctopus :: PR: #4310

Bits of RADTTS support by @borisfom :: PR: #4343

NLP / NMT

Changelog

Megatron NMT Restore from T5/BART and finetune by @MaximumEntropy :: PR: #3977

Binarized memmap dataloader for Megatron NMT, Inference and checkpoint -> nemo by @MaximumEntropy :: PR: #4137

Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298

Removes debug logging statements in Megatron NMT by @MaximumEntropy :: PR: #4312

Raise error if trainer object is None for MegatronBaseModel by @MaximumEntropy :: PR: #4356

Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399

unify intent slot dataset util functions in tutorials by @Zhilin123 :: PR: #4445

Fix for TP=2,PP=2 decoding with megatron encoder-decoder models by @MaximumEntropy :: PR: #4484

Add RETRO model for pretraining by @yidong72 :: PR: #4121

Add async grad allreduce and chunk optimization by @xrennvidia :: PR: #4084

Implements the UL2 Dataset and config by @MaximumEntropy :: PR: #4184

Add RETRO indexed dataset and inference by @yidong72 :: PR: #4220

Finetune T5 on the prefix-lm objective by @MaximumEntropy :: PR: #4328

Fuse bias with geglu in ParallelMLP by @xrennvidia :: PR: #4213

Support larger datasets for question answering by @Zhilin123 :: PR: #4205

Refactor bias act fusion by @MaximumEntropy :: PR: #4376

Prompt Learning Pipeline Parallel by @vadam5 :: PR: #4291

Text memmap dataset by @michalivne :: PR: #4068

Fuse grad division into async grad allreduce by @xrennvidia :: PR: #4327

Text Normalization / Inverse Text Normalization

Changelog

[TN] WFST to normalize punctuation by @ekmb :: PR: #4108

[TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323

Tn tutorial by @yzhang123 :: PR: #4090

[TN] WFST to normalize punctuation by @ekmb :: PR: #4108

Tn add rules by @yzhang123 :: PR: #4302

[TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323

Tn install by @yzhang123 :: PR: #4055

Fix electronic bug, new time ITN rule by @ekmb :: PR: #4355

[TN] Bug fix: expand serial coverage of unknown symbol, remove constraints from word graph by @ekmb :: PR: #4463

Configure T5 finetuning metrics by @MaximumEntropy :: PR: #4122

Export

Changelog

Added support for subnet export by @borisfom :: PR: #4299

Core

Changelog

Add Module-level Adapters, Save-Restore and tests by @titu1994 :: PR: #4114

Add NeMo Adapters tutorial to Core by @titu1994 :: PR: #4311

NeMo Model to HF Hub Upload Tutorial by @titu1994 :: PR: #4322

General Improvements and Fixes

Changelog

Update container to 22.05 by @ericharper :: PR: #4329

Fix PTL step calculation by @titu1994 :: PR: #4307

[NLP] P&C Fix multi node cache issue, add pynini guard by @ekmb :: PR: #4410

NeMo Megatron GPT Unit Tests by @ericharper :: PR: #4099

Add the PP2 GPT eval CI test by @yidong72 :: PR: #4168

BigNLP perf regression fix by @MaximumEntropy :: PR: #4267

Fixes for Megatron Base Model Artifacts by @MaximumEntropy :: PR: #4248

Fix a wrong description in offline_diarization_with_asr.yaml by @tango4j :: PR: #4141

bugfix for import error in Offline_ASR_with_VAD_for_CTC_models by @fayejf :: PR: #4424

[Fix] ASR RNNT Tutorial by @stevehuang52 :: PR: #4352

[TTS] Fix Hifigan finetune tutorial by @subhankar-ghosh :: PR: #4182

[Bugfix][TTS] wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4432

[bugfix][TTS] pitch, voiced_mask, prob_voiced have the same values. by @XuesongYang :: PR: #4435

[TTS] [bugfix] German FastPitch HiFi-GAN tutorial and lr by @aroraakshit :: PR: #4459

[TTS] [bugfix] update indentation by @aroraakshit :: PR: #4468

Fix some 's' cases for IPA G2P by @redoctopus :: PR: #4460

Fix ASR Typos in tutorials by @titu1994 :: PR: #4384

Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298

Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399

Dialogue tasks unit test by @Zhilin123 :: PR: #4112

fix error by @yzhang123 :: PR: #4120

fix typo by @stevehuang52 :: PR: #4134

Fix cmudict typo: phoneme YI1 -> IY1 in NVME by @redoctopus :: PR: #4139

transcribe: scan directories recursively by @virajkarandikar :: PR: #4159

Add 44KHz yaml file for Fastpitch training by @subhankar-ghosh :: PR: #4161

[bugfix] consistent highfreq to both fastpitch and hifigan in their 44100 configs. by @XuesongYang :: PR: #4177

Upperbound OmegaConf by @titu1994 :: PR: #4191

Prompt tokenization bugfix by @vadam5 :: PR: #4197

Updated to Prompt Learning Model to Use Distributed Sampler by @vadam5 :: PR: #4208

Freesound fixes by @virajkarandikar :: PR: #4155

Patch Hydra by @titu1994 :: PR: #4202

Prompt Learning Model Saving Changes by @vadam5 :: PR: #4212

Speakertasks manifest by @yzhang123 :: PR: #4185

SSL Multi-loss Update by @sam1373 :: PR: #4186

Support load_adapters with just adapter_name by @titu1994 :: PR: #4255

Add special tokens to existing (trained) SentencePiece models by @aklife97 :: PR: #4203

Fixing the speed slow-down for speech models. by @VahidooX :: PR: #4260

Fix and add functions in speaker utils by @tango4j :: PR: #4138

pt container 1.10->1.11.0 by @ekmb :: PR: #4273

ssl fixes by @sam1373 :: PR: #4268

Save Virtual Prompt Weights Only by @vadam5 :: PR: #4237

add 'relative positional embedding (RPE)' feature - re-creating after… by @khcs :: PR: #4256

Docs CSS: Update h4 tag style for the right side bar by @nickolyamba :: PR: #4284

Fix Docs CSS: align docs left and increase width for large screens by @nickolyamba :: PR: #4154

remove redundant condition for fastpitch. by @XuesongYang :: PR: #4281

[Add] automaticly resolving relative audio path by @stevehuang52 :: PR: #4277

forcing conv subsampling to 32 bit by @bmwshop :: PR: #4293

Add library name and version when downloading from the Hugging Face Hub by @osanseviero :: PR: #4304

clear access registry when adding if not empty by @sam1373 :: PR: #4306

[collections] bugfix for capturing NotImplementedError of non-supported sup data types. by @XuesongYang :: PR: #4297

Adjust lr for AdamW from LAMB default by @redoctopus :: PR: #4308

Fix bugs in indexed dataset exam script by @yidong72 :: PR: #4325

Torchaudio installation fix by @GNroy :: PR: #4330

Speedup the speech commands dataset processing script by @shan18 :: PR: #4347

fix wrong requirement by @yzhang123 :: PR: #4349

Refactored path to manifest by @treacker :: PR: #4251

Fix the post LN bug by @yidong72 :: PR: #4350

[Fix] Hanging for Fully Randomized Bucketing by @stevehuang52 :: PR: #4348

Auto-switch the input dimensions in the conformer encoder adapter to correct value by @shan18 :: PR: #4354

Set headscale false by @MaximumEntropy :: PR: #4364

Add wandb as dependency by @titu1994 :: PR: #4365

Fix trainer.global_steps in WandB logging by @titu1994 :: PR: #4366

Finetuning changes for BART by @MaximumEntropy :: PR: #4003

Make position embedding expansion specific to a batch to avoid checkpoint size mismatches by @MaximumEntropy :: PR: #4357

Correct support for dataclasses in default module dim by @titu1994 :: PR: #4372

Fix no attribute 'pad_id' bug when pre-processing by @yidong72 :: PR: #4377

Question answering bug fix by @Zhilin123 :: PR: #4381

Docs for NeMo Adapters by @titu1994 :: PR: #4369

Update NeMo docs by @titu1994 :: PR: #4397

Fixing import error in some cases by @borisfom :: PR: #4402

Fix tutorial typos and docs by @titu1994 :: PR: #4415

Add reconfigure on validation epoch start by @MaximumEntropy :: PR: #4393

Re-apply fixes from r1.9.0 by @redoctopus :: PR: #4425

Fix hanging issue by multiprocessing in SD tutorial and add ETA for VAD processing by @fayejf :: PR: #4405

Fix notebook text by @yidong72 :: PR: #4438

Update dialogue tutorial version by @Zhilin123 :: PR: #4437

Docs: Add table overflow handling by @nickolyamba :: PR: #4441

Docs: Decrease Font Size on Tables by @nickolyamba :: PR: #4444

Notebook bug fix: add subfolder by @ekmb :: PR: #4442

Fix typo in HiFi-GAN config's max steps by @redoctopus :: PR: #4446

Updated notebook to fix batch configuration and precision bugs by @vadam5 :: PR: #4447

fix branch in link by @ekmb :: PR: #4454

t5-rpe-fix targeting r1.10.0; raise exception for PP>2. by @khcs :: PR: #4469

Add kwargs to exact string match by @MaximumEntropy :: PR: #4479

Source code(tar.gz)
Source code(zip)
v1.9.0(Jun 3, 2022)
Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.04

ASR

Changelog

Fix changed function name in offline vad asr notebeook by @fayejf :: PR: #4007

NeMo Adapters Support + ASR Adapters by @titu1994 :: PR: #3942

Update ASR configs with num_workers and pin_memory by @titu1994 :: PR: #4270

Verbose k2 install, skip if failed by @GNroy :: PR: #4289

Torch conversion for VAD-Diarization pipeline by @tango4j :: PR: #3930

Multiprocess improvements by @nithinraok :: PR: #4127

TTS

Changelog

Tn tts e by @ekmb :: PR: #3988

Remove AudioToCharWithPriorAndPitchDataset dependency from fastpitch by @subhankar-ghosh :: PR: #4008

Deprecation by @blisc :: PR: #4082

FastPitch FT notebook - Improving Speech Quality clarifications by @redoctopus :: PR: #3954

NLP / NMT

Changelog

Option to remove bias terms from Megatron transformers by @MaximumEntropy :: PR: #3973

Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009

Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050

Fix GPT model parallel eval by @yidong72 :: PR: #4054

Updating with main by @jpilaul :: PR: #4073

Cherry-pick fix for megatron ckpt conversion script when using BCP by @ericharper :: PR: #4089

Check implicit grad acc in GLUE dataset building by @MaximumEntropy :: PR: #4123

Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144

Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199

Raise error if bicleaner is not installed in NMT Data preprocesing notebook by @MaximumEntropy :: PR: #4264

Fix epoch end for NeMo NMT by @MaximumEntropy :: PR: #4265

Update YAML with trainer.benchmark=False for NLP by @MaximumEntropy :: PR: #4261

Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009

Continuous prompt refactor by @vadam5 :: PR: #3877

T5 finetuning for generic small text-to-text datasets by @MaximumEntropy :: PR: #4032

Text Normalization / Inverse Text Normalization

Changelog

Tn special text support by @yzhang123 :: PR: #3969

Tn update numbers by @yzhang123 :: PR: #3992

Tn tts e by @ekmb :: PR: #3988

Itn vi by @yzhang123 :: PR: #4029

Refactor tn data folder, and update of measure by @yzhang123 :: PR: #4028

Remove conda dependency for tn by @yzhang123 :: PR: #4057

Tn electronic by @yzhang123 :: PR: #4053

ThutmoseTaggerModel, a new model for inverse text normalization by @bene-ges :: PR: #4011

Tutorial on ITN with Thutmose tagger and small fixes by @bene-ges :: PR: #4117

Cleaned up TN/ ITN doc by @yzhang123 :: PR: #4119

Update default for SH by @ekmb :: PR: #4135

Update ContextNet version by @titu1994 :: PR: #4207

NeMo Tools

Changelog

Added exception handling for audio player in SDE by @vsl9 :: PR: #4077

NeMo Core

Changelog

Support pre-extracted nemo checkpoint for restoration by @titu1994 :: PR: #4061

Fix type checking to be compatible with named tuples by @artbataev :: PR: #3986

Update num worker calculation due to PTL flag changes by @redoctopus :: PR: #4056

Refresh NeMo documentation to Sphinx Book Theme by @titu1994 :: PR: #3996

Generalize adapter merge strategy for future adapters by @titu1994 :: PR: #4091

General Improvements

Changelog

Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050

Fix restoring from checkpoint for case when is provided by @PeganovAnton :: PR: #4136

Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144

Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199

Ability to set log_prediction to false by @bmwshop :: PR: #3929

Glu activation variants by @MaximumEntropy :: PR: #3951

Ranking merge by @yzhang123 :: PR: #3906

Fix path in doc by @nithinraok :: PR: #3979

Adding fisher audio conversion script from old NeMo branch by @jbalam-nv :: PR: #3991

improvements to geet_commonvoice_data script by @bmwshop :: PR: #3999

Bugfix and variable name change for clustering code by @tango4j :: PR: #4023

Exp manager log rank 0 only arguments by @MaximumEntropy :: PR: #4026

Force import test on PR by @titu1994 :: PR: #4037

Drop support for kaldi-io by @titu1994 :: PR: #4042

Cherry pick HF integration and bug fixes from 1.8.1 by @ericharper :: PR: #4052

Make saving prompt encoder embeddings non-configurable by @vadam5 :: PR: #4071

Replace sampled tokens with EOD after EOD has been sampled once by @vadam5 :: PR: #4070

Added answer only loss for prompt learning by @vadam5 :: PR: #4069

added stacking suport to conformer. by @VahidooX :: PR: #4045

Update LJSpeech whitelist file path by @redoctopus :: PR: #4078

Added check for microbatch calculator by @vadam5 :: PR: #4043

Prompt Learning Docs by @vadam5 :: PR: #4046

Fix link to prompt tuning page by @SeanNaren :: PR: #4081

Add docs for by @titu1994 :: PR: #4079

Dialogue task by @Zhilin123 :: PR: #3884

RMSNorm, Normformer and fixes from merging 1.8.0 into main by @MaximumEntropy :: PR: #4048

Correct link to PTL by @titu1994 :: PR: #4088

Added encoder and decoder modules for RETRO model by @yidong72 :: PR: #4038

Upgrade container to NGC PyTorch 22.04 by @ericharper :: PR: #4085

Tarred fix label models by @nithinraok :: PR: #4092

Fix link to tutorial in dialogue docs by @Zhilin123 :: PR: #4093

Prompt learning Notebook by @vadam5 :: PR: #4031

Add more papers by @yzhang123 :: PR: #4097

Ignore speakers with few utterances by @nithinraok :: PR: #3722

Access mixin by @sam1373 :: PR: #4098

Add CharParser for Cyrillic letters by @karpov-nick :: PR: #4101

Restored tests previously disabled for 22.03 base by @borisfom :: PR: #4109

Add augmentation to label models by @nithinraok :: PR: #4113

Fix register artifacts by @ramanathan831 :: PR: #4116

Fix typo by @yzhang123 :: PR: #4140

bug_fix_diarization_manifest_creation by @yzhang123 :: PR: #4125

Tacotron2 retrain by @treacker :: PR: #4103

WaveGlow input type fixes by @redoctopus :: PR: #4151

Notebooks' link, typo and import fix by @fayejf :: PR: #4158

Thutmose tagger bug fixes by @bene-ges :: PR: #4162

Update speaker docs by @nithinraok :: PR: #4164

Set plugin to None when no apex by @ekmb :: PR: #4171

Fix doc by @yzhang123 :: PR: #4152

Small import name fix by @fayejf :: PR: #4180

Rename folder VAD -> vad by @fayejf :: PR: #4163

Fix the server key value problem in the notebook by @yidong72 :: PR: #4196

Pin omegaconf for r1.9.0 by @ericharper :: PR: #4195

Fix cherrypicks by @titu1994 :: PR: #4204

Fix bugs for dialogue tutorial by @Zhilin123 :: PR: #4211

Tacotron2 1.9.0 bugfixes by @redoctopus :: PR: #4209

Add docs for Thutmose Tagger by @bene-ges :: PR: #4173

Dialogue tutorial fix by @Zhilin123 :: PR: #4221

Fix syntax error in ipynb-file by @bene-ges :: PR: #4228

Fix JSON serialization problem by @yidong72 :: PR: #4235

Prompt Learning Typo Fixes by @vadam5 :: PR: #4238

Fixing bug 3642622 by @pasandi20 :: PR: #4250

Fix broken link in the tutorial by @bene-ges :: PR: #4257

Prompt learning notebook bugfix by @vadam5 :: PR: #4262

Fix missing validation dataset, whitelist certain keywords for datasets by @titu1994 :: PR: #4269

Set Save on train end to false by @vadam5 :: PR: #4274

Updated config to fix CI test OOM error by @vadam5 :: PR: #4279

Changed total virtual prompt tokens by @vadam5 :: PR: #4295

Source code(tar.gz)
Source code(zip)
v1.8.2(Apr 26, 2022)
Known Issues

Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

TTS

Fastpitch Tutorial fix by @subhankar-ghosh :: PR: #4044

Source code(tar.gz)
Source code(zip)
v1.8.1(Apr 22, 2022)
Known Issues

Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

TTS

Restore_buffer bug fix and update NeMo checkpoint URL by @subhankar-ghosh :: PR: #4041

Hugging Face Hub Integration

Add support for Huggingface Hub to NeMo by @titu1994 :: PR: #4030

Bug Fixes

Added apex import guard back

Patch commons.py by @ericharper :: PR: #4039

Fixing pretrained name by @borisfom :: PR: #4022

Add back Citrinet zh by @titu1994 :: PR: #4040

Source code(tar.gz)
Source code(zip)
v1.8.0(Apr 20, 2022)
Known Issues

Issues

Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

pytest for Vietnamese inverse text normalization are failing. Fixed in main

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.03

ASR

Changelog

ASR SSL Update by @sam1373 :: PR: #3714

Polylang asr by @bmwshop :: PR: #3721

Test grad accumulation for RNNT loss by @titu1994 :: PR: #3731

Add readme files describing model execution flow for ASR tasks by @titu1994 :: PR: #3812

add fr asr ckpt to doc by @yzhang123 :: PR: #3809

Fix asr tests in 22.02 by @titu1994 :: PR: #3823

Add new pretrained Spanish ASR models by @erastorgueva-nv :: PR: #3830

Documentation updates for ASR by @titu1994 :: PR: #3846

Offline VAD+ASR tutorial by @fayejf :: PR: #3828

Added Hindi and Marathi Models in Nemo pretrained ASR_CTC_BPE models … by @meghmak13 :: PR: #3856

Add a missing line to ASR_with_NeMo.ipynb by @lifefeel :: PR: #3908

Multilang asr models by @bmwshop :: PR: #3907

added stt_en_conformer_transducer_large_ls to NGC by @VahidooX :: PR: #3920

Fix DALI test on 22.03 by @titu1994 :: PR: #3911

Adding RNN encoder for LSTM-Transducer and LSTM-CTC models by @VahidooX :: PR: #3886

Fix issue with Segfault in ASR models by @titu1994 :: PR: #3956

Added Mandarin pretrained Conformer-Transducer-Large model trained on AISHELL2. by @VahidooX :: PR: #3970

TTS

Changelog

Bump TTS deprecation version to 1.9 by @blisc :: PR: #3955

Add pinned pynini and scipy installs to TTS training tutorial by @redoctopus :: PR: #3967

Compatability override to load_state_dict for old TTS checkpoints by @redoctopus :: PR: #3978

NLP / NMT

Changelog

Use worker processes for data preprocessing by @crcrpar :: PR: #3665

Set find_unused_parameters to False in GPT example script by @ericharper :: PR: #3837

GPT multinode eval by @ericharper :: PR: #3821

Fix MegatronPretrainingRandomSampler by taking into account by @crcrpar :: PR: #3826

Add slot filling into DST Generative model by @Zhilin123 :: PR: #3695

Disable nvfuser for gpt by @ericharper :: PR: #3845

Multi-Label Joint Intent Slot Classification by @chenrichard10 :: PR: #3742

fix bug in intent/slot model reloading by @carolmanderson :: PR: #3874

Make test_gpt_eval unit test less strict by @yidong72 :: PR: #3898

Comment gpt resume ci test by @MaximumEntropy :: PR: #3901

Neural Machine Translation with Megatron Transformer Models (Tensor Parallel and Tarred Datasets Only) by @MaximumEntropy :: PR: #3861

Megatron support by @ramanathan831 :: PR: #3893

Populate the GPT/BERT with uploaded models by @yidong72 :: PR: #3885

Megatron BART by @michalivne :: PR: #3666

Additional Japanese processor for NMT that uses MeCab segmentation. Fix for BLEU in one-many NMT by @MaximumEntropy :: PR: #3889

NMT GRPC sever URL fix by @MaximumEntropy :: PR: #3918

Megatron legacy conversion support by @ramanathan831 :: PR: #3919

Update max_epochs on megatron configs by @ericharper :: PR: #3958

Fix NMT variable passing bug by @aklife97 :: PR: #3985

Fix nemo megatron restore with artifacts by @ericharper :: PR: #3997

Fix megatron notebook by @ramanathan831 :: PR: #4004

Megatron work-arounds by @borisfom :: PR: #3998

Add T5 model P-tuning support by @yidong72 :: PR: #3768

Make index mappings dir configurable by @ericharper :: PR: #3868

T5 pipeline parallel by @MaximumEntropy :: PR: #3750

Text Normalization / Inverse Text Normalization

Changelog

Tn es by @bonham79 :: PR: #3632

Fix single GPU training issue + change deprecated Lightning args by @aklife97 :: PR: #4010

Export

Changelog

Conformer WARs for TRT8.2 by @borisfom :: PR: #3787

bert_module: fix inputs of export model by @virajkarandikar :: PR: #3815

Exports 22.03 war by @borisfom :: PR: #3957

Bugfixes

Changelog

patch librosa deprecation and fix by @fayejf :: PR: #3818

General Improvements

Changelog

Pynini pip by @yzhang123 :: PR: #3726

upgrade PTL trainer flags by @nithinraok :: PR: #3589

Updated Speech Data Explorer by @vsl9 :: PR: #3710

Fix spelling error in num_workers parameter to actually set number of dataset workers specified in yaml configs by @themikem :: PR: #3800

Support for Camembert Huggingface bert-like models by @itzsimpl :: PR: #3799

Update to 22.02 by @ericharper :: PR: #3771

Fixing the defaults of conformer models in the config files by @VahidooX :: PR: #3836

Fix T5 Encoder Mask while decoding by @MaximumEntropy :: PR: #3838

fix: multilingual transcribe does not require lang id param by @bmwshop :: PR: #3833

Misc improvements by @titu1994 :: PR: #3843

Change container by @MaximumEntropy :: PR: #3844

Making gender assignment random for cardinals, fractions, and decimal… by @bonham79 :: PR: #3759

Jenkinsfile test changes by @chenrichard10 :: PR: #3879

Adding a RegEx tokenizers by @michalivne :: PR: #3839

enable bias+dropout+add fusion with nvfuser at inference by @erhoo82 :: PR: #3869

Add text_generation_util to support TopK, TopP sampling + Tabular Data Generation. by @yidong72 :: PR: #3834

Ptl requirements bound by @MaximumEntropy :: PR: #3903

doc links update by @ekmb :: PR: #3891

add citations by @yzhang123 :: PR: #3902

Update NeMo CI to 22.03 by @MaximumEntropy :: PR: #3900

Add domain groups to changelog builder by @titu1994 :: PR: #3904

add input threshhold by @yzhang123 :: PR: #3913

improvements to commonvoice data script by @bmwshop :: PR: #3892

fixes to the cleanup flag by @bmwshop :: PR: #3921

Upgrade to PTL 1.6.0 by @ericharper :: PR: #3890

JSON output from diarization now includes sentences. Optimized senten… by @demsarjure :: PR: #3897

Stateless timer fix for PTL 1.6 by @MaximumEntropy :: PR: #3925

fix save_best missing chpt bug, update for setup_tokenizer() changes by @ekmb :: PR: #3932

Fix tarred sentence dataset length by @MaximumEntropy :: PR: #3941

remove old doc by @ekmb :: PR: #3946

Fix issues with librosa deprecations by @titu1994 :: PR: #3950

Fix notebook bugs for branch r1.8.0 by @yidong72 :: PR: #3948

Fix global batch fit loop by @ericharper :: PR: #3936

Refactor restorefrom by @ramanathan831 :: PR: #3927

Fix variable name and move models to CPU in Change partition by @aklife97 :: PR: #3972

Fix notebook error by @yidong72 :: PR: #3975

Notebook Bug Fixes for r1.8.0 by @vadam5 :: PR: #3989

Fix compat override for TalkNet Aligner by @redoctopus :: PR: #3993

docs fixes by @ekmb :: PR: #3987

Fixes val_check_interval, skip loading train data during eval by @MaximumEntropy :: PR: #3968

LogProb calculation performance fix by @yidong72 :: PR: #3984

Fix P-Tune T5 model by @yidong72 :: PR: #4001

Fix the broadcast shape mismatch by @yidong72 :: PR: #4017

Add known issues to notebook by @ericharper :: PR: #4024

Source code(tar.gz)
Source code(zip)
v1.7.2(Mar 17, 2022)
GPT Bugfixes

GPT dataloader improvements and fixes by @crcrpar :: PRs #3826 , #3665

Disable nvfuser by @ericharper :: PR #3845

Set find_unused_parameters to False by @ericharper :: PR #3837

T5 XNLI Example

T5 xnli eval by @yaoyu-33 :: PR: #3848

Source code(tar.gz)
Source code(zip)
v1.7.1(Mar 8, 2022)
Known Issues

find_unused_parameters should be False when training GPT: #3837

Bugfixes

revert changes by @yzhang123 :: PR: #3785

Fixed soft prompt eval loading bug by @vadam5 :: PR: #3805

mT5 whole word masking and T5 finetuning config fixes by @MaximumEntropy :: PR: #3776

Raise error if FP16 training is tried with O2 recipe. by @ericharper :: PR: #3806

Source code(tar.gz)
Source code(zip)
v1.7.0(Mar 2, 2022)
Known Issues

Megatron GPT training with O2 and FP16 is bugged. FP16 and O1 still works.

find_unused_parameters should be False when training GPT: #3837

FastPitch training may result in stalled GPUs. Users will have to manually kill their runs and continue training from the latest checkpoint.

mT5 issue with whole word masking, see #3776

T5 finetuning config issue, see #3776

Container

NOTE: From NeMo 1.7.0 onwards, NeMo containers will follow the YY.MM conversion for naming, where the YY.MM value is based on the base container. For additional information regarding NeMo containers, please visit : https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.01

ASR

Wav2vec by @tbartley94 :: PR: #3297

Fix bug in multi-checkpoint loading by @sam1373 :: PR: #3536

Add HuggingFace Datasets to NeMo ASR Dataset script by @titu1994 :: PR: #3513

Add support for Gradient Clipping (clamp) in RNNT Numba loss by @titu1994 :: PR: #3550

Enable Tarred Dataset Support for NVIDIA DALI by @titu1994 :: PR: #3485

Add initial support for Buffered RNNT Scripts by @titu1994 :: PR: #3602

Significantly speed up RNNT loss on CUDA by @titu1994 :: PR: #3653

Fixing the bug in the stateful rnnt decoder. by @VahidooX :: PR: #3673

Add Buffered RNNT with LCS Merge algorithm by @titu1994 :: PR: #3669

Asr noise data scripts by @jbalam-nv :: PR: #3660

ASR SSL update by @sam1373 :: PR: #3746

Add randomized bucketing by @VahidooX :: PR: #3445

Self-supervised tutorial & update by @sam1373 :: PR: #3344

Updated conformer models. by @VahidooX :: PR: #3741

Added speaker identification script with cosine and neural classifier… by @nithinraok :: PR: #3672

Fix in clustering diarizer by @nithinraok :: PR: #3701

Add a function that writes cluster label in diarization pipeline by @tango4j :: PR: #3643

TTS

port UnivNet to NeMo TTS collection by @L0SG :: PR: #3186

E2E TTS fixes by @redoctopus :: PR: #3508

New structure for TTS datasets in scripts/dataset_processing, VocoderDataset, update TTSDataset by @Oktai15 :: PR: #3484

Depreciate some TTS models and TTS datasets by @Oktai15 :: PR: #3576

Fix bugs in HiFi-GAN (scheduler, optimizers) and add input_example() in Mixer-TTS/Mixer-TTS-X by @Oktai15 :: PR: #3564

Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585

Fix typo in FastPitch config (pitch_avg -> pitch_mean) by @eyentei :: PR: #3593

Fix incorrect usage of TTSDataset in some files and fix one-line bug in NVIDIA's CMUDict by @Oktai15 :: PR: #3594

Convert entry from UTF-16 to UTF-8 by @redoctopus :: PR: #3597

remove CheckInstall by @blisc :: PR: #3577

Fix UnivNet LibriTTS pretrained location by @m-toman :: PR: #3615

FastPitch training tutorial by @subhankar-ghosh :: PR: #3631

Update Aligner, add new methods to AlignmentEncoder by @Oktai15 :: PR: #3641

Add Mixed Representation Training by @blisc :: PR: #3473

Add speakerID to libritts/get_data.py by @subhankar-ghosh :: PR: #3662

Update TTS tutorials, Simplification of testing Mixer-TTS and FastPitch by @Oktai15 :: PR: #3680

Clean FastPitch_Finetuning.ipynb notebook by @Oktai15 :: PR: #3698

Add cache_size to BetaBinomialInterpolator, fix bugs in TTS tutorials and FastPitch by @Oktai15 :: PR: #3706

Fix bugs in VocoderDataset and TTSDataset by @Oktai15 :: PR: #3713

Fix bugs in E2E TTS, Mixer-TTS and FastPitch by @Oktai15 :: PR: #3740

NLP / NMT

NLPDDPPlugin find_unused_parameters is configurable by @mlgill :: PR: #3478

Megatron encoder-decoder refactor by @michalivne :: PR: #3542

Finetuning NeMo Megatron T5 Models on GLUE by @MaximumEntropy :: PR: #3408

Pipeline parallelism for GPT by @ericharper :: PR: #3388

Generalized the P-tuning method to support various NLP tasks by @yidong72 :: PR: #3623

Megatron_LM checkpoint to NeMo checkpoint support by @yidong72 :: PR: #3692

Bugfix for GPT eval by @ericharper :: PR: #3744

Yuya/megatron t5 glue eval by @yaoyu-33 :: PR: #3751

Enforce legacy tokenizer for sentencepiece to add special tokens for T5 by @MaximumEntropy :: PR: #3457

Added P-Tuning method by @yidong72 :: PR: #3488

O2 style mixed precision training for T5 by @MaximumEntropy :: PR: #3664

LM adapted T5 dataset by @MaximumEntropy :: PR: #3654

Fix consumed samples calculation + PTune Model bugs by @yidong72 :: PR: #3738

Add pipeline support to eval methods by @ericharper :: PR: #3684

XNli benchmark by @yidong72 :: PR: #3693

Refactor dialogue state tracking for modelling/dataset interoperability by @Zhilin123 :: PR: #3526

Changes to support mean n-gram size masking for T5 by @MaximumEntropy :: PR: #3646

Dialogue state tracking refactor by @Zhilin123 :: PR: #3667

Parallel prompt tuning by @vadam5 :: PR: #3670

GEGLU activation for T5 by @MaximumEntropy :: PR: #3694

Text Normalization / Inverse Text Normalization

Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451

ITN bug fixes, ip address, card num support, whitelist clean up by @ekmb :: PR: #3574

Fix tn bugs by @yzhang123 :: PR: #3580

add serial number to itn by @yzhang123 :: PR: #3584

ITN: SH bug fixes for telephone by @ekmb :: PR: #3592

Tn bug 1.7.0 by @yzhang123 :: PR: #3730

TN docs update by @ekmb :: PR: #3735

Export

Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585

Conformer onnx fix by @borisfom :: PR: #3524

Add onnx support for speaker models by @nithinraok :: PR: #3650

Jasper mask/export fix by @borisfom :: PR: #3691

Bugfixes

Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451

Dialogue state tracking refactor/ SGDGEN patch 2 by @Zhilin123 :: PR: #3674

lower bound PTL to 1.5.10 and remove last ckpt patch fix by @nithinraok :: PR: #3690

Improvements

Wfst tutorial by @tbartley94 :: PR: #3479

Update CMUdict with ADLR version pronunciations by @redoctopus :: PR: #3446

Fix docs by @yzhang123 :: PR: #3523

Add docstring to UnivNetModel by @L0SG :: PR: #3529

Increase lower bound due to security vulnerability by @ericharper :: PR: #3537

Add Change Log builder to NeMo by @titu1994 :: PR: #3527

Bugfix, need to freeze the model by @yidong72 :: PR: #3540

Bucketing quick fix by @tbartley94 :: PR: #3543

More fixes to SentencePiece for T5 by @MaximumEntropy :: PR: #3515

Update CONTRIBUTING.md by @Oktai15 :: PR: #3569

Update pr template and re-add Changelog builder by @titu1994 :: PR: #3575

Apex quick fix by @ekmb :: PR: #3591

Upgrade to 22.01 container by @ericharper :: PR: #3571

Fix typo and update minimal version of scipy by @Oktai15 :: PR: #3604

Add env variable to force transformers to run offline during CI by @ericharper :: PR: #3607

Correctly install NeMo wheel by @titu1994 :: PR: #3599

Fix wheel build by @titu1994 :: PR: #3610

Fixed EH and error reporting in restore_from by @borisfom :: PR: #3583

Clarifying documentation by @itzsimpl :: PR: #3616

Improve docs for finetuning by @titu1994 :: PR: #3622

Add NeMo version to all new .nemo files by @titu1994 :: PR: #3605

Update numba if NVIDIA_PYTORCH_VERSION not correct by @itzsimpl :: PR: #3614

Remove @experimental decorator in diarization related files. by @tango4j :: PR: #3625

Remove compression from .nemo files by @okuchaiev :: PR: #3626

Update adobe analytics by @ericharper :: PR: #3645

Add ssl tutorial to tutorial docs page by @sam1373 :: PR: #3649

Fix number of channels>1 issue by @ekmb :: PR: #3652

Fixed the bug in bucketing. by @VahidooX :: PR: #3663

Adding guard by @yzhang123 :: PR: #3655

Add tutorial paths by @titu1994 :: PR: #3651

Folder name update by @ekmb :: PR: #3671

Test HF online for SGD-GEN only by @MaximumEntropy :: PR: #3681

Update Librosa support to 0.9 by @titu1994 :: PR: #3682

Comment out numba in 22.01 release by @titu1994 :: PR: #3685

Fix failing tests inside of the 22.01 container in PR 3571 by @fayejf :: PR: #3609

Fixed Apex guard when imported classes are used for default values by @michalivne :: PR: #3700

Update citrinet_512.yaml by @Jorjeous :: PR: #3642

update torchaudio in Dockerfile to match torch version by @GNroy :: PR: #3637

Enforce import tests on the three domains by @titu1994 :: PR: #3702

Audio based norm speed up by @ekmb :: PR: #3703

Fix device on notebook by @titu1994 :: PR: #3732

pynini pip by @yzhang123 :: PR: #3729

Removed fp16 converting in complete method by @dimapihtar :: PR: #3709

Mirror AN4 while CMU servers are down by @titu1994 :: PR: #3743

Fix SSL configs for 1.7 by @sam1373 :: PR: #3748

Punct process bug fix by @ekmb :: PR: #3747

Specify gpus in SSL notebook by @sam1373 :: PR: #3753

Duplex model inference fix, money encoder fix by @ekmb :: PR: #3754

Update decoding strategy docs and override general value for tutorials by @titu1994 :: PR: #3755

Fix directories in ssl notebook by @sam1373 :: PR: #3758

Update Tacotron2_Training.ipynb by @blisc :: PR: #3769

Fix dockerfile by @yzhang123 :: PR: #3778

Prompt-Tuning-Documentation by @vadam5 :: PR: #3777

Prompt tuning bug fix by @vadam5 :: PR: #3780

Source code(tar.gz)
Source code(zip)
v1.6.2(Feb 5, 2022)
Bug fix

Changed Apex not found error to warning to enable NLP models which aren't apex dependent when Apex isn't installed.

Source code(tar.gz)
Source code(zip)
v1.6.1(Feb 2, 2022)
Bug Fixes

Fix embedding name for verifying speakers #3578

Add rank check and barrier helpers compilation for megatron dataset #3581

Add apex import guards #3579

Source code(tar.gz)
Source code(zip)
v1.6.0(Jan 29, 2022)
ASR

Add new features to ASR with diarization with modified tutorial and README. by @tango4j :: PR: #3007

Enable stateful decoding of RNNT over multiple transcribe calls by @titu1994 :: PR: #3037

Move vocabs from asr to common by @Oktai15 :: PR: #3084

Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node by @VahidooX :: PR: #3017

CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072

Adding pretrained French ASR models to ctc_bpe and rnnt_bpe listings by @tbartley94 :: PR: #3225

adding german conformer ctc and rnnt by @yzhang123 :: PR: #3242

Add aishell and fisher dataset processing scripts for ASR by @jbalam-nv :: PR: #3203

Better default for RNNT greedy decoding by @titu1994 :: PR: #3332

Add uniform ASR evaluation script for all models by @titu1994 :: PR: #3334

CTC Segmentation-Citrinet support by @ekmb :: PR: #3279

Updates on ASR with diarization util files by @tango4j :: PR: #3359

Asr fr by @tbartley94 :: PR: #3404

Refactor ASR Examples Directory by @titu1994 :: PR: #3392

Asr patches by @titu1994 :: PR: #3443

Properly support -1 for labels in ctc char models by @titu1994 :: PR: #3487

TTS

MixerTTS, MixerTTSDataset and small updates in tts tokenizers by @Oktai15 :: PR: #2859

ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082

Update name of files to one style in TTS folder by @Oktai15 :: PR: #3189

Update TTS Dataset, FastPitch with TTS dataset and small improvements in HiFiGAN by @Oktai15 :: PR: #3205

Add Beta-binomial Interpolator to TTSDataset by @Oktai15 :: PR: #3230

Normalizer to TTS models, TTS tokenizer updates, AxisKind updates by @Oktai15 :: PR: #3271

Update Mixer-TTS, FastPitch and TTSDataset by @Oktai15 :: PR: #3366

Minor Updates to TTS Finetuning by @blisc :: PR: #3455

NLP / NMT

NMT timing and tokenizer stats utils by @michalivne :: PR: #3004

Add offsets calculation to MegatronGPTModel.complete method by @dimapihtar :: PR: #3117

NMT checkpoint averaging by @michalivne :: PR: #3096

NMT validation examples with inputs by @michalivne :: PR: #3194

Improve data pipeline for punctuation capitalization model and make other useful changes by @PeganovAnton :: PR: #3159

Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286

NLP text augmentation by @michalivne :: PR: #3291

Adding Megatron NeMo Bert support by @yidong72 :: PR: #3303

Added Script to convert Megatron LM to . nemo file by @yidong72 :: PR: #3371

Support Changing Number of Tensor Parallel Partitions for Megatron by @aklife97 :: PR: #3365

Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293

T5 Pre-training in NeMo using Megatron by @MaximumEntropy :: PR: #3036

NMT MIM mean variance fix by @michalivne :: PR: #3385

NMT Shared Embeddings Weights by @michalivne :: PR: #3340

Make saving .nemo during on_train_end configurable by @ericharper :: PR: #3427

Byte-level Multilingual NMT by @aklife97 :: PR: #3368

BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435

NMT documentation for bottleneck architecture by @michalivne :: PR: #3464

(1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view by @erhoo82 :: PR: #3259

Text Normalization / Inverse Text Normalization

Tn clean upsample by @yzhang123 :: PR: #3024

Tn add nn wfst and doc by @yzhang123 :: PR: #3135

Update english tn ckpt by @yzhang123 :: PR: #3143

WFST_tutorial for ITN development by @tbartley94 :: PR: #3128

German TN wfst by @yzhang123 :: PR: #3174

Add ITN Vietnamese by @binh234 :: PR: #3217

WFST TN updates by @ekmb :: PR: #3235

Itn german refactor by @yzhang123 :: PR: #3262

Tn german deterministic by @yzhang123 :: PR: #3308

TN updates by @ekmb :: PR: #3285

Added double digits to EN ITN by @yzhang123 :: PR: #3321

TN_non_deterministic optimized by @ekmb :: PR: #3343

Missing init for TN German by @ekmb :: PR: #3355

Ru TN by @ekmb :: PR: #3390

Update ContextNet models trained on more datasets by @titu1994 :: PR: #3440

NeMo Tools

CTC Segmentation-Citrinet support by @ekmb :: PR: #3279

Updated NumPy SDE requirement by @vsl9 :: PR: #3442

Export

ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082

CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072

Documentation

Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133

Tn add nn wfst and doc by @yzhang123 :: PR: #3135

Add apex into by @PeganovAnton :: PR: #3214

Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232

Nemo container docker building instruction - merge to main by @fayejf :: PR: #3236

Doc link fixes by @nithinraok :: PR: #3264

French ASR Doc updates by @tbartley94 :: PR: #3322

german asr doc page update by @yzhang123 :: PR: #3325

update docs and replace speakernet with titanet in tutorials by @nithinraok :: PR: #3405

Asr fr by @tbartley94 :: PR: #3404

Update copyright to 2022 by @ericharper :: PR: #3426

Update Speech Classificatoin - VAD doc by @fayejf :: PR: #3430

Update speaker diarization docs by @tango4j :: PR: #3419

NMT documentation for bottleneck architecture by @michalivne :: PR: #3464

Add verification helper function and update docs by @nithinraok :: PR: #3514

Prompt tuning documentation by @vadam5 :: PR: #3541

French ASR Doc updates by @tbartley94 :: PR: #3322

German asr doc page update by @yzhang123 :: PR: #3325

Bugfixes

Fixed wrong tgt_length for timing by @michalivne :: PR: #3050

Update nltk version with a CVE fix by @thomasdhc :: PR: #3054

Fix README by @ericharper :: PR: #3070

Transformer Decoder: Fix swapped input name issue by @aklife97 :: PR: #3066

Fixes bugs in collect_tokenizer_dataset_stats.py by @michalivne :: PR: #3060

Attribute is not working in . by @PeganovAnton :: PR: #3099

Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133

A quick fix for issue #3094 index out-of-bound when truncating long text to max_seq_length by @bugface :: PR: #3131

Fixed two typos by @bene-ges :: PR: #3157

Merge r1.5.0 bugfixes to main by @ericharper :: PR: #3173

LJSpeech alignment scripts fixed for latest MFA by @m-toman :: PR: #3177

Add apex into by @PeganovAnton :: PR: #3214

Patch omegaconf for cfg by @fayejf :: PR: #3224

Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232

CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072

Fix Masked SE for Citrinets + export Limited Context Citrinet by @titu1994 :: PR: #3216

Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233

Fix cast type in _se_pool_step_script related functions by @Oktai15 :: PR: #3239

Doc link fixes by @nithinraok :: PR: #3264

Escape chars fix by @ekmb :: PR: #3253

Fix asr output - eval mode by @nithinraok :: PR: #3274

Remove ArrayLike because it is not supported in numpy 1.18 by @PeganovAnton :: PR: #3282

Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278

Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286

Tn en money fix by @yzhang123 :: PR: #3290

Fixing the bucketing_batch_size bug. by @VahidooX :: PR: #3294

Adaptiv fixed positional embeddings by @michalivne :: PR: #3263

Fix specaugment time start for numba kernel by @titu1994 :: PR: #3299

Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304

Fix bucketing list bug. by @VahidooX :: PR: #3315

Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330

Fix german and vietnames grammar by @yzhang123 :: PR: #3331

Fix readme to show cmd by @yzhang123 :: PR: #3345

Fix speaker label models training convergence by @nithinraok :: PR: #3354

Tqdm get datasets by @bmwshop :: PR: #3358

Fixed future masking in cross attention of Perceiver by @michalivne :: PR: #3314

Fixed the bug of fixed-size bucketing. by @VahidooX :: PR: #3364

Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376

Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293

fixed the bug of bucketing when fixed-size batch is used. by @VahidooX :: PR: #3399

TalkNet Fix by @stasbel :: PR: #3092

Fix linear annealing not annealing lr to min_lr by @MaximumEntropy :: PR: #3400

Resume training on SLURM multi-node multi-gpu by @itzsimpl :: PR: #3374

Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413

Fix order of lang checking to ignore input langs by @MaximumEntropy :: PR: #3417

NMT MIM mean variance fix by @michalivne :: PR: #3385

Fix bug for missing variable by @MaximumEntropy :: PR: #3437

Asr patches by @titu1994 :: PR: #3443

Prompt tuning loss mask fix by @vadam5 :: PR: #3438

BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435

Fix hysterisis loading by @MaximumEntropy :: PR: #3460

Fix the tutorial notebooks bug by @yidong72 :: PR: #3465

Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461

WFST Punct post fix + punct tutorial fixes by @ekmb :: PR: #3469

Process correctly label ids dataset parameter + standardize type of label ids model attribute + minor changes (error messages, typing) by @PeganovAnton :: PR: #3471

file name fix - Segmentation tutorial by @ekmb :: PR: #3474

Patch fix for the multiple last checkpoints issue by @nithinraok :: PR: #3468

Fix bug with arguments for TalkNet's preprocessor by @Oktai15 :: PR: #3481

Fix description by @PeganovAnton :: PR: #3482

typo fix in diarization notebooks by @nithinraok :: PR: #3480

Fix checkpoint converter in O2 style by @yaoyu-33 :: PR: #3486

Remove pickled features from tarred dataset by @PeganovAnton :: PR: #3491

Fix link to NGC page for ASR by @titu1994 :: PR: #3512

vad typo fix by @fayejf :: PR: #3490

fixed the num_classes bug of conv decoder. by @VahidooX :: PR: #3525

Fixed section typo by @vadam5 :: PR: #3522

Fixed duplicate cell bug by @vadam5 :: PR: #3518

Fix bug in inference tts notebook by @Oktai15 :: PR: #3532

Fix nmt resume by @ericharper :: PR: #3539

TN bug fix by @ekmb :: PR: #3538

Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546

Fix an issue with wandb not displaying updated config changes by @titu1994 :: PR: #3552

Fix bug in inference tts notebook by @Oktai15 :: PR: #3532

Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546

Fix asr output - eval mode by @nithinraok :: PR: #3274

Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304

Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233

Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330

Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461

Fix link to NGC page for ASR by @titu1994 :: PR: #3512

Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278

Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376

Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413

Fix description by @PeganovAnton :: PR: #3482

Fix nmt resume by @ericharper :: PR: #3539

TN bug fix by @ekmb :: PR: #3538

Fix german and vietnames grammar by @yzhang123 :: PR: #3331

Tn en money fix by @yzhang123 :: PR: #3290

Improvements:

Remove STFT checks due to min PT version of 1.10 by @titu1994 :: PR: #3034

Add a stateless timer to specify max_time per run instead of global m… by @MaximumEntropy :: PR: #3056

(1) reduce the validation loss within a epoch, (2) convert global-bat… by @erhoo82 :: PR: #3055

Timer class monitors total time (train + validation + testing) to monitor when to end training by @MaximumEntropy :: PR: #3061

Add new by @PeganovAnton :: PR: #2963

Add PUBLICATIONS.md by @titu1994 :: PR: #3051

Hg cache by @yzhang123 :: PR: #3080

Add sequence axis to AxisKind.from_str() and improve time axis by @Oktai15 :: PR: #3090

Add logging to LS script by @titu1994 :: PR: #3141

Modify speaker input by @nithinraok :: PR: #3100

Typo correction in README.rst by @satpalsr :: PR: #3103

Self-supervised pre-training for speech models by @sam1373 :: PR: #3139

Add AISHELL 2 processing script by @titu1994 :: PR: #3195

Add support for multi-speaker FastPitch export by @ryanleary :: PR: #3192

Reduce number of log files for large runs by @blisc :: PR: #3191

Add support to modify nemo cache directory by @titu1994 :: PR: #3208

Add Pitch, Duration Tensors for Riva by @blisc :: PR: #3207

Upgrade to NVIDIA PyTorch 21.11 Container by @ericharper :: PR: #3234

Add WMT21 paper to Publications by @MaximumEntropy :: PR: #3256

Support for gecko tool by @nithinraok :: PR: #3266

Adding adaptive bucketing for tarred datasets. by @VahidooX :: PR: #3222

Initial refactor by @borisfom :: PR: #3272

Refactored prepare_for_export calls to ensure input size of example i… by @borisfom :: PR: #3305

Replacing outdated exports scripts by @borisfom :: PR: #3311

Batch implementation by @dimapihtar :: PR: #3276

Multiscale processing feature for speaker diarization by @tango4j :: PR: #3296

Add titanet by @nithinraok :: PR: #3333

update sparrowhawk export grammars to able to skip pynini by @yzhang123 :: PR: #3346

Prompt tuning by @vadam5 :: PR: #3309

Remove wordninja by @ekmb :: PR: #3363

Repair arbitrary file or folder deletion vulnerability by @haby0 :: PR: #3362

Moved shebangs to the first line by @davidalami :: PR: #3361

Added new method for logprobs computation by @dimapihtar :: PR: #3329

Update speaker collate functions by @nithinraok :: PR: #3381

Cache_hf by @ekmb :: PR: #3406

Update to NVIDIA PyTorch 21.12 Container by @ericharper :: PR: #3424

Working around Pytorch exporter issue with expand() by @borisfom :: PR: #3422

Remove apex by @ekmb :: PR: #3428

Vad infer refactor by @fayejf :: PR: #3394

Update LJSpeech preprocessing by @Oktai15 :: PR: #3423

Preprocess an entire folder of .json or .json.gz files into a single .bin and .idx file. by @MaximumEntropy :: PR: #3425

TimingCallback default buffer_size=1 by @michalivne :: PR: #3439

Extending input_example() to take max batch and dimension arguments by @borisfom :: PR: #3429

Refactor data preprocessing script by @yzhang123 :: PR: #3444

Test only if the model was trained on single GPU for accurate results. by @titu1994 :: PR: #3470

Upper bound ptl for r1.6.0, lower bound numpy in general by @ericharper :: PR: #3466

Add Apex import guard by @ericharper :: PR: #3467

Adding missing init files by @yzhang123 :: PR: #3505

Typos by @ekmb :: PR: #3504

Update titanet conf by @nithinraok :: PR: #3507

Raise PTL upper bound on r1.6.0 by @ericharper :: PR: #3510

Enforce utf-8 on all file r/w by @titu1994 :: PR: #3520

Pushing updated WFST Tutorial to r1.6.0 by @tbartley94 :: PR: #3521

WFST tutorial update by @tbartley94 :: PR: #3531

Update nvidia container check by @ericharper :: PR: #3535

Remove extra instance during restore by @ericharper :: PR: #3551

Remove wordtokenizer example from NLP tokenizer notebook by @aklife97 :: PR: #3477

Source code(tar.gz)
Source code(zip)
v1.5.1(Dec 4, 2021)
Features

Minor updates to expose speaker id, pitch, and duration on export of FastPitch #3192, #3207

Known Issues

Training of speaker models converge very slowly due to a bug (fixed in main: #3354)

ASR training does not reach adequate WER due to bug in Numba Spec Augment (fixed in main : #3299). For details refer to https://github.com/NVIDIA/NeMo/issues/3288#issuecomment-1000766337 . For a temporary workaround, disable Numba Spec Augment with https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/audio_preprocessing.py#L471 set to False in the config for SpecAugment in the yaml config. The fix will be part of 1.6.0.

Source code(tar.gz)
Source code(zip)
v1.5.0(Nov 20, 2021)
Features

Megatron GPT pre-training with tensor model parallelism #2975

NMT encoder and decoder with different hidden size #2856

Logging timing of train/val/test steps #2936

Logging NMT encoder and decoder timing #2956

Logging timing per sentence length and tokenized text statistics #3004

Upgrade to PyTorch Lightning 1.5.0, bfloat support #2975

French Inverse Text Normalization #2921

Bucketing of tarred datasets for ASR models #2999

ASR with diarization #3007

Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node #3017

Documentation Updates

RNNT

Contributors

@ericharper @michalivne @MaximumEntropy @VahidooX @titu1994 @blisc @okuchaiev @tango4j @erastorgueva-nv @fayejf @vadam5 @ekmb @yaoyu-33 @nithinraok @erhoo82 @tbartley94 @PeganovAnton @madhukarkm @yzhang123 (Please let us know if you have contributed to this release and we have missed you here.)
Source code(tar.gz)
Source code(zip)
v1.4.0(Oct 2, 2021)
Features

Improved speaker clustering #2729

Upgrade to NVIDIA PyTorch 21.08 container #2799

RNNT mAES beam search support #2802

Transfer learning for new speakers #2684

Simplify speaker scripts #2777

Perceiver-encoder architecture #2737

Relative paths in tarred datasets #2776

Torch only TTS package #2643

Inverse text normalization for Spanish #2489

Tutorial Notebooks

Duration and pitch control for TTS # 2700

Bug fixes

Fixed max delta generation #2727

Waveglow export #2671, #2699

Contributors

@tango4j @titu1994 @paarthneekhara @nithinraok @michalivne @erastorgueva-nv @borisfom @blisc (some contributors may not be listed explicitly)
Source code(tar.gz)
Source code(zip)
v1.3.0(Aug 27, 2021)
Added

RNNT Exportable to ONNX #2510

Multi-batch inference support for speaker diarization #2522

DALI Integration for char/subword ASR #2567

VAD Postprocessing #2636

Perceiver encoder for NMT #2621

gRPC NMT server #2656

German ITN # 2486

Russian TN and ITN #2519

Save/restore connector # 2592

PTL 1.4+ # 2600

Tutorial Notebooks

Non-English downstream NLP task #2532

RNNT Basics #2651

Bug Fixes

NMESE clustering for very small audio files #2566

Contributors

@pasandi20 @ekmb @nithinraok @titu1994 @ryanleary @yzhang123 @ericharper @michalivne @MaximumEntropy @fayejf (some contributors may not be listed explicitly)
Source code(tar.gz)
Source code(zip)
v1.2.0(Jul 30, 2021)
Added

Improve performance of speak clustering (#2445)

Update Conformer for ONNX conversion (#2439)

Mean and length normalization for better embeddings speaker verification and diarization (#2397)

FastEmit RNNT Loss Numba for reducing latency (#2374)

Multiple datasets, right to left models, noisy channel re-ranking, ensembling for NMT (#2379)

Byte level tokenization (#2365)

Bottleneck with attention bridge for more efficient NMT training (#2390)

Tutorial notebook for NMT data cleaning and preprocessing (#2467)

Streaming Conformer inference script for long audio files (#2373)

Res2Net Ecapa equivalent implementation for speaker verification and diarization (#2468)

Update end-to-end tutorial notebook to use CitriNet (#2457)

Contributors

@nithinraok @tango4j @jbalam-nv @titu1994 @MaximumEntropy @mchrzanowski @michalivne @jbalam-nv @fayejf @okuchaiev

(some contributors may not be listed explicitly)

Known Issues

import nemo.collections.nlp as nemo_nlp will result in an error. This will be patched in the upcoming version. Please try to import the individual files as a work-around.

Source code(tar.gz)
Source code(zip)
v1.1.0(Jul 2, 2021)
NeMo 1.1.0 release is our first release in our new monthly release cadence. Monthly releases will focus on adding new features that enable new NeMo Models or improve existing ones.

Added

Pretrained Megatron-LM encoders (including model parallel) for NMT (#2238)

RNNT Numba loss (#1995)

Enable multiple models to be restored (#2245)

Audio based text normalization (#2285)

Multilingual NMT (#2160)

FastPitch export (#2355)

ASR fine-tuning tutorial for other languages (#2346)

Bugfixes

HiFiGan Export (#2279)

OmegaConf forward compatibilty (#2319)

Documentation

ONNX export documentation (#2330

Contributors

@borisfom @MaximumEntropy @ericharper @aklife97 @titu1994 @ekmb @yzhang123 @blisc

(some contributors may not be listed explicitly)
Source code(tar.gz)
Source code(zip)
v1.0.2(Jun 11, 2021)

Release 1.0.2

NeMo 1.0.2 is a minor change over 1.0.0 adding version checks for Hydra dependency.
Source code(tar.gz)
Source code(zip)
v1.0.1(Jun 9, 2021)

Release 1.0.1

NeMo 1.0.1 is a minor change over 1.0.0 adding proper version bounds for some external dependencies.
Source code(tar.gz)
Source code(zip)
v1.0.0(Jun 3, 2021)
Release 1.0.0

NeMo 1.0.0 release is a stable version of "1.0.0 release candidate". It substantially improves overall quality and documentation. This update adds support for new tasks such as neural machine translation and many new models pretrained in different languages. As a mature tool for ASR and TTS it also adds new features for text normalization and denormalization, dataset creation based on CTC-segmentation and speech data explorer. These updates will benefit researchers in academia and industry by making it easier for them to develop and train new conversational AI models.

To install this specific version from pip do:

apt-get update && apt-get install -y libsndfile1 ffmpeg pip install Cython pip install nemo-toolkit['all']==1.0.0
Source code(tar.gz)
Source code(zip)
v1.0.0rc1(Apr 7, 2021)
Release 1.0.0rc1

This release contains major new models, features and docs improvements. It is a "candidate" release for 1.0.0.

To install from Pip do:

apt-get update && apt-get install -y libsndfile1 ffmpeg pip install Cython pip install nemo_toolkit['all']==1.0.0rc1

It adds the following model architectures:

CitriNet and Conformer-CTC for ASR

HiFiGan, MelGan, GlowTTS, UniGlow SqueezeWave for TTS

In NLP collections, a neural machine translation task (NMT) has been added with Transformer-based models. This release includes pre-trained NMT models for these language pairs (in both directions):

En<->Es

En<->Ru

En<->Zh

En<->De

En<->Fr

For ASR task, we also added QuartzNet models, trained on the following languages from Mozilla's Common Voice dataset: Zh, Ru, Es, Pl, Ca, It, Fr and De. In total, this release adds 60 new pre-trained models.

This release also adds new NeMo tools for:

Text normalization

Dataset Creation Tool Based on CTC-Segmentation

Speech Data Explorer

Known Issues

This version is not compatible with PyTorch 1.8.* Please use 1.7.* with it or use our container.
Source code(tar.gz)
Source code(zip)
1-100_roman_numeral_table_spanish.csv(8.87 KB)
Screen.Shot.2021-04-08.at.2.23.25.PM.png(86.93 KB)
test_data.tar.gz(9.96 MB)
test_data.tar.gz-stable.gz(7.00 MB)
v1.0.0b4(Feb 16, 2021)

Release 1.0.0b4

This release is compatible with Jarvis and TLT public beta. It also updates versions of many dependencies and contains minor bug fixes over 1.0.0b3.
Source code(tar.gz)
Source code(zip)
v1.0.0b3(Dec 11, 2020)

Release 1.0.0b3

This release contains minor bug fixes over 1.0.0b2. It sets compatible version ranges for Hugging Face Transformers and Pytorch Lightning packages.
Source code(tar.gz)
Source code(zip)
v1.0.0b2(Nov 17, 2020)
Release 1.0.0b2

This release contains stability improvements and bug fixes. It also adds beam search support for CTC based ASR models.

Highlights

Added beam search and external LM rescoring support for character-based CTC ASR models.

Switch to Pytorch Lightning version 1.0.5 or above.

Switch to Hydra version 1.0.3 or above.

Increase NVIDIA Pytorch container version to 20.09

Known Issues

This version will not work with Hugging Face transformers library version >=4.0.0. Please make sure your transformers library version is transformers>=3.1.0 and <4.0.0.

Toolkit in an early version software.
Source code(tar.gz)
Source code(zip)
v1.0.0b1(Oct 5, 2020)
Release 1.0.0b1

This release is a major re-design compared to previous version. All NeMo models and modules are now compatible out-of-the box with Pytorch and Pytorch Lightning. Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. Every NeMo model has an example configuration file and a corresponding script that contains all configurations needed for training. NeMo, Pytorch Lightning, and Hydra makes all NeMo models have the same look and feel so that it is easy to do Conversational AI research across multiple domains. New models such as Speaker verification and Megatron are added.

Highlights

Pytorch Lightning based Core

Hydra and Omegaconf configuration management

All model's files tarred together as .nemo files make it easy for users to download models automatically from NGC

NGC collections now includes a collection of all NeMo assets in one

New Models & tutorials

ASR: SpeakerNet speaker verification model

NLP: Bio Megatron state of the art model trained on bio medical tasks

ASR, NLP and TTS tutorials as interactive notebooks

Known Issues

Toolkit in an early version software. Breaking changes compared to previous version.

Resolved Issues

All models and modules can be used anywhere torch.nn.Module is expected.
Source code(tar.gz)
Source code(zip)
v0.11.0(Jul 10, 2020)
Release 0.11.0

This release improves ease of use and adds new features

Highlights

Neural Graphs and NeMo models for ASR

New models:

Voice activity detection

Speaker identification

Matchboxnet Speech commands

Megatron BERT trained on bio medical data

Various improvements and bugfixes

Source code(tar.gz)
Source code(zip)
test_data.tar.gz(4.72 MB)
test_data.tar.gz.stable-04-30-2021.gz(4.58 MB)

NeMo: a toolkit for conversational AI

Related tags

Overview

NVIDIA NeMo

Introduction

Requirements

Installation

Pip

Pip from source

From source

Docker containers:

Examples

Documentation

Getting help with NeMo

Tutorials

Contributing

License

Comments

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

What does this PR do ?

Changelog

Usage

TODOs

Known issues

What does this PR do ?

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Who can review?

Additional Information

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

What does this PR do ?

Before your PR is "Ready for review"

Who can review?

What does this PR do ?

Changelog

Before your PR is "Ready for review"

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?