End-to-end speech secognition toolkit

Jinchuan Tian

Last update: Dec 28, 2022

Related tags

Deep Learning e2e_lfmmi

Overview

End-to-end speech secognition toolkit

This is an E2E ASR toolkit modified from Espnet1 (version 0.9.9).
This is the official implementation of paper:
Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
This is also the official implementation of paper:
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
We achieve state-of-the-art results on two of the most popular results in Aishell-1 and AIshell-2 Mandarin datasets.
Please feel free to change / modify the code as you like. :)

Update

2021/12/29: Release the first version, which contains all MMI-related features, including MMI training criteria, MMI Prefix Score (for attention-based encoder-decoder, AED) and MMI Alignment Score (For neural transducer, NT).
2022/1/6: Release the word-level N-gram LM scorer.

Environment:

The main dependencies of this code can be divided into three part: kaldi, espnet and k2.

kaldi is mainly used for feature extraction. To install kaldi, please follow the instructions here.
Espnet is a open-source end-to-end speech recognition toolkit. please follow the instructions here to install its environment.
2.1. Pytorch, cudatoolkit, along with many other dependencies will be install automatically during this process. 2.2. If you are going to use NT models, you are recommend to install a RNN-T warpper. Please run ${ESPNET_ROOT}/tools/installer/install_warp-transducer.sh
2.3. Once you have installed the espnet envrionment successfully, please run pip uninstall espnet to remove the espnet library. So our code will be used.
2.4. Also link the kaldi in ${ESPNET_ROOT}: ln -s ${KALDI-ROOT} ${ESPNET_ROOT}
k2 is a python-based FST library. Please follow the instructions here to install it. GPU version is required.
3.1. To use word N-gram LM, please also install kaldilm
There might be some dependency conflicts during building the environment. We report ours below as a reference:
4.1 OS: CentOS 7; GCC 7.3.1; Python 3.8.10; CUDA 10.1; Pytorch 1.7.1; k2-fsa 1.2 (very old for now)
4.2 Other python libraries are in requirement.txt (It is not recommend to use this file to build the environment directly).

Results

Currently we have released examples on Aishell-1 and Aishell-2 datasets.

With MMI training & decoding methods and the word-level N-gram LM. We achieve results on Aishell-1 and Aishell-2 as below. All results are in CER%

Test set	Aishell-1-dev	Aishell-1-test	Aishell-2-ios	Aishell-2-android	Aishell-2-mic
AED	4.73	5.32	5.73	6.56	6.53
AED + MMI + Word Ngram	4.08	4.45	5.26	6.22	5.92
NT	4.41	4.81	5.70	6.75	6.58
NT + MMI + Word Ngram	3.86	4.18	5.06	6.08	5.98

(example on Librispeech is not fully prepared)

Get Start

Take Aishell-1 as an example. Working process for other examples are very similar.
Prepare data and LMs

cd ${ESPNET_ROOT}/egs/aishell1
source path.sh
bash prepare.sh # prepare the data

split the json file of training data for each GPU. (we use 8GPUs)

python3 espnet_utils/splitjson.py -p 
   
     dump/train_sp/deltafalse/data.json

Training and decoding for NT model:

bash nt.sh      # to train the nueal transducer model

Training and decoding for AED model:

bash aed.sh     # or to train the attention-based encoder-decoder model

Several Hint:

Please change the paths in path.sh accordingly before you start
Please change the data to config your data path in prepare.sh
Our code runs in DDP style. Before you start, you need to set them manually. We assume Pytorch distributed API works well on your machine.

export HOST_GPU_NUM=x       # number of GPUs on each host
export HOST_NUM=x           # number of hosts
export NODE_NUM=x           # number of GPUs in total (on all hosts)
export INDEX=x              # index of this host
export CHIEF_IP=xx.xx.xx.xx # IP of the master host

Multiple choices are available during decoding (we take aed.sh as an example, but the usage of nt.sh is the same).
To use the MMI-related scorers, you need train the model with MMI auxiliary criterion;

To use MMI Prefix Score (in AED) or MMI Alignment score (in NT):

bash aed.sh --stage 2 --mmi-weight 0.2

To use any external LM, you need to train them in advance (as implemented in prepare.sh)

To use word-level N-gram LM:

bash aed.sh --stage 2 --word-ngram-weight 0.4

To use character-level N-gram LM:

bash aed.sh --stage 2 --ngram-weight 1.0

To use neural network LM:

bash aed.sh --stage 2 --lm-weight 1.0

Reference

kaldi: https://github.com/kaldi-asr/kaldi
Espent: https://github.com/espnet/espnet
k2-fsa: https://github.com/k2-fsa/k2

Citations

@article{tian2021consistent,  
  title={Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI},  
  author={Tian, Jinchuan and Yu, Jianwei and Weng, Chao and Zhang, Shi-Xiong and Su, Dan and Yu, Dong and Zou, Yuexian},  
  journal={arXiv preprint arXiv:2112.02498},  
  year={2021}  
}  

@misc{tian2022improving,
      title={Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model}, 
      author={Jinchuan Tian and Jianwei Yu and Chao Weng and Yuexian Zou and Dong Yu},
      year={2022},
      eprint={2201.01995},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Authorship

Jinchuan Tian; [email protected] or [email protected]
Jianwei Yu; [email protected] (supervisor)
Chao Weng; [email protected]
Yuexian Zou; [email protected]

Comments

kick off train failed: AttributeError: can't set attribute

When I kick off training, it met error.

File "/home/Data/jing.lu/project/e2e_lfmmi/egs/aishell1.bk/espnet/asr/pytorch_backend/asr.py", line 694, in train updater = CustomUpdater( File "/home/Data/jing.lu/project/e2e_lfmmi/egs/aishell1.bk/espnet/asr/pytorch_backend/asr.py", line 186, in init self.device = device AttributeError: can't set attribute

Traceback (most recent call last): File "/home/Data/jing.lu/project/e2e_lfmmi/egs/aishell1.bk/../..//bin/asr_train.py", line 699, in main(sys.argv[1:]) File "/home/Data/jing.lu/project/e2e_lfmmi/egs/aishell1.bk/../..//bin/asr_train.py", line 685, in main train(args) File "/home/Data/jing.lu/project/e2e_lfmmi/egs/aishell1.bk/espnet/asr/pytorch_backend/asr.py", line 694, in train updater = CustomUpdater( File "/home/Data/jing.lu/project/e2e_lfmmi/egs/aishell1.bk/espnet/asr/pytorch_backend/asr.py", line 186, in init self.device = device AttributeError: can't set attribute /home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 91009) of binary: /home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/bin/python3 Traceback (most recent call last): File "/home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run elastic_launch( File "/home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/Data/jing.lu/tools/wenet/wenet/miniconda3/envs/K2/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/Data/jing.lu/project/e2e_lfmmi/egs/aishell1.bk/../..//bin/asr_train.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2022-06-26_18:59:57 host : scq03-802A13U0811-ai-app-13-2-msxf.host rank : 0 (local_rank: 0) exitcode : 1 (pid: 91009) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

opened by emilyluj 6
code switch变量解释

非常有用的code-switch的工作！我有一个小小的疑问：

https://github.com/jctian98/e2e_lfmmi/blob/34b8056906631c6282bd2f0320764d645c33ca0d/nets/pytorch_backend/e2e_asr_transducer_cs.py#L346

ys_pad, cls_ids = ys_pad[:, 1:], ys_pad[:, 0].squeeze(0) 这里的cls_ids是用的全局的language id作为label吗？分三类：中文、英文、中英混？

opened by jiay7 3
env question
I followed the instructions, but there is always a version conflict. How can I improve it?

[root@b394acbc6baf tools]# make TH_VERSION=1.7.1 CUDA_VERSION=10.1 CUDA_VERSION=10.1 PYTHON=/anaconda/envs/lfmmi/bin/python3 PYTHON_VERSION=Python 3.9.15 USE_CONDA=1 TH_VERSION=1.7.1 WITH_OMP=ON . ./activate_python.sh && ./installers/install_torch.sh "true" "1.7.1" "10.1" 2022-12-07T05:51:42 (install_torch.sh:149:main) [INFO] python_version=3.9.15 2022-12-07T05:51:42 (install_torch.sh:150:main) [INFO] torch_version=1.7.1 2022-12-07T05:51:42 (install_torch.sh:151:main) [INFO] cuda_version=10.1 2022-12-07T05:51:43 (install_torch.sh:97:install_torch) conda install -y pytorch=1.7.1 torchaudio=0.7.2 cudatoolkit=10.1 -c pytorch Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: | Found conflicts! Looking for incompatible packages. This can take several minutes. Press CTRL-C to abort. failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package pytorch conflicts for: torchaudio=0.7.2 -> pytorch==1.7.1 pytorch=1.7.1

Package _openmp_mutex conflicts for: pytorch=1.7.1 -> libgcc-ng[version='>=7.3.0'] -> _openmp_mutex[version='>=4.5'] python=3.9 -> libgcc-ng[version='>=11.2.0'] -> _openmp_mutex[version='>=4.5']

Package cudatoolkit conflicts for: torchaudio=0.7.2 -> pytorch==1.7.1 -> cudatoolkit[version='>=10.1,<10.2|>=11.0,<11.1|>=10.2,<10.3|>=9.2,<9.3'] cudatoolkit=10.1The following specifications were found to be incompatible with your system:

feature:/linux-64::__glibc==2.17=0

feature:|@/linux-64::__glibc==2.17=0

python=3.9 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.17

make: *** [Makefile:102: pytorch.done] Error 1
opened by casper-110368104 2
about SOTA

hello jctian98 Impressive work! Very happy to see that aishell's sota has been refreshed again. I have some doubts, hope you can help me figure it out, Why the aishell-1 result on paper with code leaderboard is 4.18% https://paperswithcode.com/sota/speech-recognition-on-aishell-1 but the result on aishell-1 in you paper is, 4.10%, Is there any different between two result? Is the dev set used during training? Sincerely hope to get your reply.

opened by SSRDMP 2
Librispeech

Hi,

Thanks for publishing your work this is very interesting.

Is there any chance that you can upload the librispeech egs folder so others can reproduce it?

opened by q121q 2
OSError: [Errno 38] Function not implemented

Hi! I met this error while running lm-mmi decoding. If I set nj=188 ,10 jobs won't have this issue and have a common decoing result. But if I set nj=50, all of my jobs is crashed.

number of phones 218 Found parameter lm_scores with shape torch.Size([47960]) Found parameter lo.1.weight with shape torch.Size([219, 512]) Found parameter lo.1.bias with shape torch.Size([219]) Using MMI scorer type: frame MMI Scorer Module: <class 'espnet.nets.scorers.mmi_rnnt_scorer.MMIRNNTScorer'> Traceback (most recent call last): File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/../..//bin/asr_recog.py", line 456, in main(sys.argv[1:]) File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/../..//bin/asr_recog.py", line 433, in main recog(args) File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/asr/pytorch_backend/asr.py", line 1197, in recog word_ngram_scorer = word_ngram_scorer( File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/nets/scorers/word_ngram.py", line 179, in init self.WordNgram = WordNgram(lang, device) File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/nets/scorers/word_ngram.py", line 51, in init self.load_G() File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/nets/scorers/word_ngram.py", line 66, in load_G fcntl.flock(f, fcntl.LOCK_EX) # lock OSError: [Errno 38] Function not implemented

Any idea how to solve this issue?

opened by eggplant95 2
version issue

Could you please tell me the version of your lhotse? I have problem running the decoding benchmark because librosa need numpy<1.22, while lhotse is complied with numpy==1.22, the error are as follws

2022-02-22 17:47:07,153 (asr_init:162) WARNING: reading model parameters from exp/train_sp_pytorch_8v100_ddp_rnnt_mmi/results_0/model.last91_100.avg.best RuntimeError: module compiled against API version 0xf but this version of numpy is 0xe Traceback (most recent call last): File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/../..//bin/asr_recog.py", line 456, in main(sys.argv[1:]) File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/../..//bin/asr_recog.py", line 433, in main recog(args) File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/asr/pytorch_backend/asr.py", line 1063, in recog model, train_args = load_trained_model(args.model, training=False) File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/asr/pytorch_backend/asr_init.py", line 172, in load_trained_model model_class = dynamic_import(model_module) File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/utils/dynamic_import.py", line 22, in dynamic_import m = importlib.import_module(module_name) File "/mypath/install_dir/anaconda3/envs/k2/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/nets/pytorch_backend/e2e_asr_transducer.py", line 52, in from espnet.snowfall.warpper.warpper_mmi import K2MMI File "/mypath/work/k2/E2E-ASR-Framework/egs/aishell1/espnet/snowfall/warpper/warpper_mmi.py", line 19, in from lhotse.utils import nullcontext File "/mypath/install_dir/anaconda3/envs/k2/lib/python3.8/site-packages/lhotse/init.py", line 4, in from .cut import CutSet, MonoCut File "/mypath/install_dir/anaconda3/envs/k2/lib/python3.8/site-packages/lhotse/cut.py", line 33, in from lhotse.features import ( File "/mypath/install_dir/anaconda3/envs/k2/lib/python3.8/site-packages/lhotse/features/init.py", line 1, in from .base import ( File "/mypath/install_dir/anaconda3/envs/k2/lib/python3.8/site-packages/lhotse/features/base.py", line 19, in from lhotse.features.io import FeaturesWriter, get_reader File "/mypath/install_dir/anaconda3/envs/k2/lib/python3.8/site-packages/lhotse/features/io.py", line 7, in import lilcom File "/mypath/install_dir/anaconda3/envs/k2/lib/python3.8/site-packages/lilcom/init.py", line 2, in from .lilcom_interface import compress, decompress, get_shape File "/mypath/install_dir/anaconda3/envs/k2/lib/python3.8/site-packages/lilcom/lilcom_interface.py", line 3, in from . import lilcom_extension

Thank you very much.

opened by eggplant95 2
Bigram LM receives supervision only from numerator FSA?
Thanks for this great framework for unified e2e lfmmi training and inference! I noticed that snowfall has updated the MmiTrainingGraphCompiler in snowfall/training/mmi_graph.py:135 from

ctc_topo_P_vec = k2.create_fsa_vec([ctc_topo_P.detach()])

to a non-detach version

ctc_topo_P_vec = k2.create_fsa_vec([self.ctc_topo_P])

Since in this repo the denominator FSA vec is detached, It seems that bigram LM FSA parameters can only get supervision (gradient) from numerator FSA.

I'm not sure if I missed something or the detach operation could be a problem.

Any help or explanation would be appreciated. Thanks!
opened by IceCreamWW 1
loss for lf_mmi is high.

Hi, sorry for interrupting. I was runing the demo egs for aishell and noticing an abnormal phenomenon. My setting is the same as aed.sh. While checking the loss thrend, I notice that the loss_ctc(ctc_type==k2_mmi) is going up after 3 epoch. Am I doing something wrong?

opened by eggplant95 1

word_ngram result doesn't seem right

Hi, I tried running nets/scorers/word_ngram.py, but the result doesn't seem to be as expected

if __name__ == "__main__":
    device = torch.device("cuda:0")
    lang = sys.argv[1]
    word_ngram = WordNgram(lang, device)

    texts = ["甚至出现交易停滞的情况", "甚至出现交易停滞的情形", "欲擒故纵", "欲擒故放"]
    for i in range(1):
        scores = word_ngram.score_texts(texts, log_semiring=True)
        print(scores)

result

tensor([-inf, -inf, -inf, -inf], device='cuda:0', dtype=torch.float64)

G.fst.txt:

ngram-count -order 3 -lm aishell.arpa -kndiscount -interpolate -text text

python3 -m kaldilm \
  --read-symbol-table="words.txt" \
  --disambig-symbol='#0' \
  --max-order=3 \
  aishell.arpa > G.fst.txt

The words.txt file has 137079 lines, and nothing abnormal was found.

thanks.

opened by 601222543 1

use_adversial_examples

https://github.com/jctian98/e2e_lfmmi/blob/34b8056906631c6282bd2f0320764d645c33ca0d/nets/pytorch_backend/e2e_asr_transducer_cs.py#L390 Hi, Thank you for your project in advance. Would you tell me where define "use_adversial_examples" ? I find self.use_adversial_examples=args.cs_use_adversial_examples, but I don't get args.cs_use_adversial_examples anywhere.

Thx again.

opened by Innocence-0712 0

Owner

Jinchuan Tian

Graduate student @ Peking University, Shenzhen; Research intern @ Tencent AI LAB; [email protected]

GitHub

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

10 Aug 19, 2022

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

14 Dec 2, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17.3k Dec 29, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17k Feb 11, 2021

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi PIKA is a lightweight speech processing toolkit based on Pytorch and (Py)

336 Nov 25, 2022

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

114 Dec 12, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

African language Speech Recognition - Speech-to-Text

Swahili-Speech-To-Text Table of Contents Swahili-Speech-To-Text Overview Scenario Approach Project Structure data: models: notebooks: scripts tests: l

2 Jan 5, 2023

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

157 Dec 11, 2022

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

34 May 24, 2022

End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

472 Dec 22, 2022

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

187 Dec 24, 2022

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

687 Jan 7, 2023

PURE: End-to-End Relation Extraction

PURE: End-to-End Relation Extraction This repository contains (PyTorch) code and pre-trained models for PURE (the Princeton University Relation Extrac

657 Jan 9, 2023

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

41 Dec 9, 2022

End-to-end speech secognition toolkit

Related tags

Overview

End-to-end speech secognition toolkit

Update

Environment:

Results

Get Start

Reference

Citations

Authorship

Comments

/home/Data/jing.lu/project/e2e_lfmmi/egs/aishell1.bk/../..//bin/asr_train.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2022-06-26_18:59:57 host : scq03-802A13U0811-ai-app-13-2-msxf.host rank : 0 (local_rank: 0) exitcode : 1 (pid: 91009) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Owner

Jinchuan Tian

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

African language Speech Recognition - Speech-to-Text

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

End-to-End Object Detection with Fully Convolutional Network

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

PURE: End-to-End Relation Extraction

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

An end-to-end PyTorch framework for image and video classification

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)