Production First and Production Ready End-to-End Speech Recognition Toolkit

Overview

WeNet

中文版

License Python-Version

Discussions | Docs | Papers | Runtime (x86) | Runtime (android) | Pretrained Models

We share neural Net together.

The main motivation of WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models, to reduce the effort of productionizing E2E models, and to explore better E2E models for production.

Highlights

  • Production first and production ready: The python code of WeNet meets the requirements of TorchScript, so the model trained by WeNet can be directly exported by Torch JIT and use LibTorch for inference. There is no gap between the research model and production model. Neither model conversion nor additional code is required for model inference.
  • Unified solution for streaming and non-streaming ASR: WeNet implements Unified Two Pass (U2) framework to achieve accurate, fast and unified E2E model, which is favorable for industry adoption.
  • Portable runtime: Several demos will be provided to show how to host WeNet trained models on different platforms, including server x86 and on-device android.
  • Light weight: WeNet is designed specifically for E2E speech recognition, with clean and simple code. It is all based on PyTorch and its corresponding ecosystem. It has no dependency on Kaldi, which simplifies installation and usage.

Performance Benchmark

Please see examples/$dataset/s0/README.md for benchmark on different speech datasets.

Installation

  • Clone the repo
git clone https://github.com/wenet-e2e/wenet.git
conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
  • Optionally, if you want to use x86 runtime or language model(LM), you have to build the runtime as follows. Otherwise, you can just ignore this step.
# runtime build requires cmake 3.14 or above
cd runtime/server/x86
mkdir build && cd build && cmake .. && cmake --build .

Discussion & Communication

Please visit Discussions for further discussion.

For Chinese users, you can aslo scan the QR code on the left to follow our offical account of WeNet. We created a WeChat group for better discussion and quicker response. Please scan the personal QR code on the right, and the guy is responsible for inviting you to the chat group.

If you can not access the QR image, please access it on gitee.

Or you can directly discuss on Github Issues.

Contributors

Acknowledge

  1. We borrowed a lot of code from ESPnet for transformer based modeling.
  2. We borrowed a lot of code from Kaldi for WFST based decoding for LM integration.
  3. We referred EESEN for building TLG based graph for LM integration.
  4. We referred to OpenTransformer for python batch inference of e2e models.

Citations

@inproceedings{yao2021wenet,
  title={WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit},
  author={Yao, Zhuoyuan and Wu, Di and Wang, Xiong and Zhang, Binbin and Yu, Fan and Yang, Chao and Peng, Zhendong and Chen, Xiaoyu and Xie, Lei and Lei, Xin},
  booktitle={Proc. Interspeech},
  year={2021},
  address={Brno, Czech Republic }
  organization={IEEE}
}

@article{zhang2020unified,
  title={Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition},
  author={Zhang, Binbin and Wu, Di and Yao, Zhuoyuan and Wang, Xiong and Yu, Fan and Yang, Chao and Guo, Liyong and Hu, Yaguang and Xie, Lei and Lei, Xin},
  journal={arXiv preprint arXiv:2012.05481},
  year={2020}
}

@article{wu2021u2++,
  title={U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition},
  author={Wu, Di and Zhang, Binbin and Yang, Chao and Peng, Zhendong and Xia, Wenjing and Chen, Xiaoyu and Lei, Xin},
  journal={arXiv preprint arXiv:2106.05642},
  year={2021}
}
Comments
  • BUG for ONNX inference

    BUG for ONNX inference

    when i inference with u2++_conformer, execute just 50 wav files, a bug will be thrown as below: I0515 00:10:40.909694 146005 decoder_main.cc:67] num frames 1118 I0515 00:10:41.026697 146005 decoder_main.cc:86] Partial result: 在机关 I0515 00:10:41.056061 146005 decoder_main.cc:86] Partial result: 在机关服务 I0515 00:10:41.085124 146005 decoder_main.cc:86] Partial result: 在机关围剿 I0515 00:10:41.110785 146005 decoder_main.cc:86] Partial result: 在机关围剿和 I0515 00:10:41.136417 146005 decoder_main.cc:86] Partial result: 在机关围剿和 I0515 00:10:41.176227 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程 I0515 00:10:41.217715 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程多处的 I0515 00:10:41.251241 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程多处的战斗中 I0515 00:10:41.282459 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程多处的战斗中太勇敢 I0515 00:10:41.311969 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程多处的战斗中太勇敢坚定 I0515 00:10:41.341024 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程多处的战斗中太勇敢坚定是 I0515 00:10:41.398414 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程多处的战斗中太勇敢坚定是一军的 I0515 00:10:41.429834 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程多处的战斗中太勇敢坚定是一军的 I0515 00:10:41.462321 146005 decoder_main.cc:86] Partial result: 在机关围剿和工程多处的战斗中太勇敢坚定是一军的主要将领 Segmentation fault (core dumped) this file shoud be processed completely, i will go deep to locate the bug info.

    onnxruntime: 1.10.0 and 1.11.1

    opened by Fred-cell 26
  • LibTorch gpu cmake error

    LibTorch gpu cmake error

    Hello, when I execute " mkdir build && cd build && cmake -DGRPC=ON ..", the following error is reported, Native environment: centors 7.9 nvidia: 11.3 cuda version: 11


    (wenet_gpu) [ZYJ@localhost build]$ cmake -DGPU=ON .. -- The C compiler identification is GNU 4.8.5 -- The CXX compiler identification is GNU 4.8.5 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Populating libtorch -- Configuring done -- Generating done -- Build files have been written to: /home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/fc_base/libtorch-subbuild [ 11%] Performing download step (download, verify and extract) for 'libtorch-populate' -- verifying file... file='/home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/fc_base/libtorch-subbuild/libtorch-populate-prefix/src/libtorch-shared-with-deps-1.10.0%2Bcu113.zip' -- File already exists and hash match (skip download): file='/home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/fc_base/libtorch-subbuild/libtorch-populate-prefix/src/libtorch-shared-with-deps-1.10.0%2Bcu113.zip' SHA256='0996a6a4ea8bbc1137b4fb0476eeca25b5efd8ed38955218dec1b73929090053' -- extracting... src='/home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/fc_base/libtorch-subbuild/libtorch-populate-prefix/src/libtorch-shared-with-deps-1.10.0%2Bcu113.zip' dst='/home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/fc_base/libtorch-src' -- extracting... [tar xfz] -- extracting... [analysis] -- extracting... [rename] -- extracting... [clean up] -- extracting... done [ 22%] No patch step for 'libtorch-populate' [ 33%] No update step for 'libtorch-populate' [ 44%] No configure step for 'libtorch-populate' [ 55%] No build step for 'libtorch-populate' [ 66%] No install step for 'libtorch-populate' [ 77%] No test step for 'libtorch-populate' [ 88%] Completed 'libtorch-populate' [100%] Built target libtorch-populate -- Looking for pthread.h -- Looking for pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE
    -- Found CUDA: /usr/local/cuda-11.3 (found version "11.3") -- Caffe2: CUDA detected: 11.3 -- Caffe2: CUDA nvcc is: /usr/local/cuda-11.3/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda-11.3 CMake Error at fc_base/libtorch-src/share/cmake/Caffe2/public/cuda.cmake:75 (message): Caffe2: Couldn't determine version from header: Change Dir: /home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake cmTC_3d968/fast

    /usr/bin/gmake -f CMakeFiles/cmTC_3d968.dir/build.make CMakeFiles/cmTC_3d968.dir/build

    gmake[1]: 进入目录“/home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/build/CMakeFiles/CMakeTmp”

    Building CXX object CMakeFiles/cmTC_3d968.dir/detect_cuda_version.cc.o

    /usr/bin/c++ -I/usr/local/cuda-11.3/include -std=c++14 -pthread -fPIC -o CMakeFiles/cmTC_3d968.dir/detect_cuda_version.cc.o -c /home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/build/detect_cuda_version.cc

    c++: 错误:unrecognized command line option ‘-std=c++14’

    gmake[1]: *** [CMakeFiles/cmTC_3d968.dir/detect_cuda_version.cc.o] 错误 1

    gmake[1]: 离开目录“/home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/build/CMakeFiles/CMakeTmp”

    gmake: *** [cmTC_3d968/fast] 错误 2

    Call Stack (most recent call first): fc_base/libtorch-src/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) fc_base/libtorch-src/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/libtorch.cmake:52 (find_package) CMakeLists.txt:35 (include)

    -- Configuring incomplete, errors occurred! See also "/home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/build/CMakeFiles/CMakeOutput.log". See also "/home/ZYJ/WeNet/wenet_gpu/wenet/runtime/LibTorch/build/CMakeFiles/CMakeError.log".


    please what should Ido?

    opened by zhaoyinjiang9825 16
  • Streaming performance issues on upgrading to release v2.0.0

    Streaming performance issues on upgrading to release v2.0.0

    Describe the bug On updating to release v2.0.0, I've been noticing some performance issues when running real-time audio streams against a quantized e2e model (no LM) via runtime/server/x86/bin/websocket_server_main. For some stretches of time, performance may be comparable between v1 and v2, but there are points where I can expect to see upwards of 20s delay on a given response. Outside of a few minor updates related to the switch, nothing else (e.g. resource allocations) has been changed on my end.

    Thus far, I haven't been able to pinpoint much of a pattern to the lag, except that it seems to consistently happen (in addition to other times) at the start of the stream. Have you observed any similar performance issues between v1 and v2, or is there some v2-specific runtime configuration I may have missed?

    Expected behavior Comparable real-time performance between releases v1 and v2.

    Screenshots The following graphs show the results from a single test. The x-axes represent the progression of the audio file being tested, and the y-axes represent round-trip response times from wenet minus some threshold, i.e. any data points above 0 indicate additional round-trip latency above an acceptable threshold (in my case, 500ms). As you can see, in the v1 graph responses are largely generated and returned below the threshold time (with the exception of a few final-marked transcripts). However, in the v2 graph, there are several lengthy periods during which responses take an unusually long time to return (I've capped the graph at 2s for clearer viewing, but in reality responses are taking up to 20s to return).

    Wenet v1 Snag_4f1b28

    Wenet v2 Snag_508b72

    Additional context Both tests were run with wenet hosted via AWS ECS/EC2. So far as I've seen, increasing CPU + memory allocations to the wenet container doesn't seem to resolve the issue.

    opened by kangnari 16
  • onnx runtime error 2: not enough space: expected 318080, got 314240

    onnx runtime error 2: not enough space: expected 318080, got 314240

    Describe the bug 这个bug或许是tritonserver的问题,在使用代码中提供的gpu生产服务(triton server)部署后。直接测试encoder模块时,我需要直接发送fbank的特征到服务器上,此时假如我有三个线程并发的请求,每个线程请求的的step是随机的,也就是fbank的时间步是不一样长的,此时转写的速度会比较慢,但不会报错。这里猜测是由于每个请求的step不一样长,所以没办法组成batch,服务器端的dynamic_batching等待组batch等待耗时较长。于是添加参数max_queue_delay_microseconds等于70000,也就是70ms后就不要等待batch了直接预测,此时客户端就会有一定概率出现异常,异常如下: Traceback (most recent call last): File "debug_encoder.py", line 30, in input_numpy response = triton_client.infer("encoder", File "/opt/conda/lib/python3.8/site-packages/tritonclient/grpc/init.py", line 1156, in infer raise_error_grpc(rpc_error) File "/opt/conda/lib/python3.8/site-packages/tritonclient/grpc/init.py", line 62, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] onnx runtime error 2: not enough space: expected 318080, got 314240 此时我请求的三个fbank特征的step是482, 497, 485,dims是80,batch_size是1,318080刚好是497808,也就是模型在预测497那个请求时,莫名遇到空间不足的问题。而且在多次并发请求中,这种错是偶发的,出现后继续请求也有可能成功。如果不并发请求,而是一个个请求的话,则不会报错,如果并发请求的尺寸是固定的也不会报错,只有在并发请求不固定长度的时候,且max_queue_delay_microseconds比较小时会报错。

    Desktop (please complete the following information):

    • triton server:21.11
    • 服务器 内存16G 显存16G T4显卡,应该不可能是显存或者内存不足
    opened by piekey1994 15
  • Runtime: words containing non-ASCII characters are concatenated without space

    Runtime: words containing non-ASCII characters are concatenated without space

    The runtime outputs decoded words containing non-ASCII characters as concatenated with neighbouring words: e.g. "aa ää xx yy" is transformed to "aaääxx yy".

    This is caused by the code block starting at https://github.com/wenet-e2e/wenet/blob/604231391c81efdf06454dbc99406bbc06cb030d/runtime/core/decoder/torch_asr_decoder.cc#L217

    I understand that this is done in order to output Chinese "words" correctly (i.e., without spaces). However, this should at least be configurable, as currently it breaks wenet runtime for most other languages (i.e. those that have words with non-ASCII characters and where words are separated by spaces in the orthography).

    opened by alumae 14
  • cmake compile server/x86 error

    cmake compile server/x86 error

    Describe the bug A clear and concise description of what the bug is.

    environment: centos7
    gcc version 7.5.0
    cmake version: 3.18.3
    CUDA version: 10.2
    gpu version:  Quadro RTX 8000
    
    
    install steps:
    $ conda create -n wenet python=3.8
    $ conda activate wenet
    $ pip install -r requirements.txt
    $ conda install pytorch==1.6.0 cudatoolkit=10.2 torchaudio -c pytorch
    
    $ cd wenet/runtime/server/x86/
    $ mkdir build && cd build && cmake .. && cmake --build .
    

    ERROR is as follows:

    [ 50%] Linking CXX executable ctc_prefix_beam_search_test
    /home4/md510/cmake-3.18.3/bin/cmake -E cmake_link_script CMakeFiles/ctc_prefix_beam_search_test.dir/link.txt --verbose=1
    /home3/md510/gcc-7.5.0/bin/g++  -std=c++14 -pthread -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -DC10_USE_GLOG -L/cm/shared/apps/cuda10.2/toolkit/10.2.89/lib64 CMakeFiles/ctc_prefix_beam_search_test.dir/decoder/ctc_prefix_beam_search_test.cc.o -o ctc_prefix_beam_search_test   -L/home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/build/openfst/lib  -Wl,-rpath,/home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/build/openfst/lib:/home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/fc_base/libtorch-src/lib lib/libgtest_main.a lib/libgmock.a libdecoder.a lib/libgtest.a ../fc_base/libtorch-src/lib/libtorch.so -Wl,--no-as-needed,/home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so -Wl,--as-needed ../fc_base/libtorch-src/lib/libc10.so -lpthread -Wl,--no-as-needed,/home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch.so -Wl,--as-needed ../fc_base/libtorch-src/lib/libc10.so kaldi/libkaldi-decoder.a kaldi/libkaldi-lat.a kaldi/libkaldi-util.a kaldi/libkaldi-base.a libutils.a -lfst 
    /home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `lgammaf@GLIBC_2.23'
    /home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/fc_base/libtorch-src/lib/libtorch_cpu.so: undefined reference to `lgamma@GLIBC_2.23'
    collect2: error: ld returned 1 exit status
    gmake[2]: *** [ctc_prefix_beam_search_test] Error 1
    gmake[2]: Leaving directory `/home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/build'
    gmake[1]: *** [CMakeFiles/ctc_prefix_beam_search_test.dir/all] Error 2
    gmake[1]: Leaving directory `/home3/md510/w2020/wenet_20210512/wenet/runtime/server/x86/build'
    gmake: *** [all] Error 2
    
    

    Could you help me to solve it ?

    opened by shanguanma 14
  • DLL load failed while importing _wenet: 找不到指定的模块。

    DLL load failed while importing _wenet: 找不到指定的模块。

    我安装了wenet, pip install wenet. 安装提示成功了。 我用例子程序做识别。 程序如下: import sys import wenet

    def get_text_from_wav(dir, wav): model_dir = dir wav_file = wav decoder = wenet.Decoder(model_dir) ans = decoder.decode_wav(wav_file) print(ans)

    if name == 'main': dir = "./models" wav = "./1.wav" get_text_from_wav(dir,wav)

    但是运行报错如下: Traceback (most recent call last): File "D:\codes\speech2word\main.py", line 2, in import wenet File "D:\codes\speech2word\venv\lib\site-packages\wenet_init_.py", line 1, in from .decoder import Decoder # noqa File "D:\codes\speech2word\venv\lib\site-packages\wenet\decoder.py", line 17, in import _wenet ImportError: DLL load failed while importing _wenet: 找不到指定的模块。

    请问如何解决?

    opened by billqu01 13
  • [Draft] Cache control v2

    [Draft] Cache control v2

    This is not a merge-ready PR, I just push my testing code for discussion and further evaluation (such as GPU perf, ONNX export, ...).

    Performance on CPU (intel i7-10510U @ 1.80GHz), RTF from 0.1 -> 0.07, about 30% improvement: image

    Detailed descriptions (in Chinese): https://horizonrobotics.feishu.cn/sheets/shtcniLh77AgP6NJAXhd5UHXDwh

    Test code:

    bash rtf.sh --api 1 > log.txt.1
    bash rtf.sh --api 2 > log.txt.2
    grep "RTF:" log.txt.1
    grep "RTF:" log.txt.2
    

    u2++_conformer.zip: https://horizonrobotics.feishu.cn/file/boxcnO50Ea8m0rR2p9FwJ8ZHEIc words.txt: https://horizonrobotics.feishu.cn/file/boxcnBpSEOWoBSIgLdlHetsjOFd

    opened by xingchensong 13
  • Use DDP training to get stuck

    Use DDP training to get stuck

    Describe the bug

    I got stuck when using DDP training with my own wenet and my own data. And stuck(GPU utilization 100%) at the beginning of the second epoch every time. After debugging, it was found to be stuck in this position:

    # wenet/utils/executor.py
    with torch.cuda.amp.autocast(scaler is not None):
        loss, loss_att, loss_ctc = model(
            feats, feats_lengths, target, target_lengths)
    

    Environment

    CentOS Linux release 7.8.2003 (Core) GPU Driver Version: 450.80.02 CUDA Version: 10.2 torch==1.8.0 torchaudio==1.8.1 torchvision==0.9.0

    Some Attempts

    I did some attempts later and found: 1 gpu no problem multi gpu stuck static batch no problem dynamic batch stuck conformer no problem unified_conformer stuck

    Other attempts: Upgrade pytorch version to 1.9.0, 1.10.0 is useless Set num_workers=0/1 is useless V100 -> P40 useless Sleep 1 minute after completing an epoch is useless NCCL is completely stuck without error log GLOO error log:

    2021-12-07 11:36:17,011 INFO Epoch 0 CV info cv_loss 115.3632936241356
    2021-12-07 11:36:17,011 INFO Epoch 1 TRAIN info lr 6.08e-06
    2021-12-07 11:36:17,014 INFO using accumulate grad, new batch size is 8 times larger than before
    2021-12-07 11:36:17,335 INFO Epoch 0 CV info cv_loss 115.36239801458647
    2021-12-07 11:36:17,335 INFO Epoch 1 TRAIN info lr 6.200000000000001e-06
    2021-12-07 11:36:17,338 INFO using accumulate grad, new batch size is 8 times larger than before
    2021-12-07 11:36:17,579 INFO Epoch 0 CV info cv_loss 115.36309641650827
    2021-12-07 11:36:17,579 INFO Epoch 1 TRAIN info lr 5.96e-06
    2021-12-07 11:36:17,582 INFO using accumulate grad, new batch size is 8 times larger than before
    2021-12-07 11:36:17,926 INFO Epoch 0 CV info cv_loss 115.36275817930736
    2021-12-07 11:36:17,926 INFO Checkpoint: save to checkpoint exp/conformer/0.pt
    2021-12-07 11:36:18,889 INFO Epoch 1 TRAIN info lr 6.32e-06
    2021-12-07 11:36:18,892 INFO using accumulate grad, new batch size is 8 times larger than before
    terminate called after throwing an instance of 'gloo::EnforceNotMet'
      what():  [enforce fail at /opt/conda/conda-bld/pytorch_1614378062065/work/third_party/gloo/gloo/transport/tcp/pair.cc:490] op.preamble.length <= op.nbytes. 939336 vs 4
    ./run.sh: line 165:  7108 Aborted                 (core dumped) python wenet/bin/train.py --gpu $gpu_id --config $train_config --data_type $data_type --symbol_table $dict --train_data data/$train_set/data.list --cv_data data/dev/data.list ${checkpoint:+--checkpoint $checkpoint} --model_dir $dir --ddp.init_method $init_method --ddp.world_size $world_size --ddp.rank $rank --ddp.dist_backend $dist_backend --num_workers 8 $cmvn_opts --pin_memory
    /homepath/envs/anaconda3/lib/python3.8/multiprocessing/process.py:108: ResourceWarning: unclosed file <_io.BufferedReader name='/homepath/tools/wenet-uio/examples/aishell/s0/data/train/shards/shards_000000002.tar'>
      self._target(*self._args, **self._kwargs)
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    /homepath/envs/anaconda3/lib/python3.8/multiprocessing/process.py:108: ResourceWarning: unclosed file <_io.BufferedReader name='/homepath/tools/wenet-uio/examples/aishell/s0/data/train/shards/shards_000000110.tar'>
      self._target(*self._args, **self._kwargs)
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    /homepath/envs/anaconda3/lib/python3.8/multiprocessing/process.py:108: ResourceWarning: unclosed file <_io.BufferedReader name='/homepath/tools/wenet-uio/examples/aishell/s0/data/train/shards/shards_000000112.tar'>
      self._target(*self._args, **self._kwargs)
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    /homepath/envs/anaconda3/lib/python3.8/multiprocessing/process.py:108: ResourceWarning: unclosed file <_io.BufferedReader name='/homepath/tools/wenet-uio/examples/aishell/s0/data/train/shards/shards_000000075.tar'>
      self._target(*self._args, **self._kwargs)
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    /homepath/envs/anaconda3/lib/python3.8/multiprocessing/process.py:108: ResourceWarning: unclosed file <_io.BufferedReader name='/homepath/tools/wenet-uio/examples/aishell/s0/data/train/shards/shards_000000001.tar'>
      self._target(*self._args, **self._kwargs)
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    /homepath/envs/anaconda3/lib/python3.8/multiprocessing/process.py:108: ResourceWarning: unclosed file <_io.BufferedReader name='/homepath/tools/wenet-uio/examples/aishell/s0/data/train/shards/shards_000000086.tar'>
      self._target(*self._args, **self._kwargs)
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    Traceback (most recent call last):
      File "wenet/bin/train.py", line 277, in <module>
        main()
      File "wenet/bin/train.py", line 250, in main
        executor.train(model, optimizer, scheduler, train_data_loader, device,
      File "/homepath/tools/wenet-uio/wenet/utils/executor.py", line 71, in train
        loss.backward()
      File "/homepath/envs/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/homepath/envs/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
        Variable._execution_engine.run_backward(
    RuntimeError: [/opt/conda/conda-bld/pytorch_1614378062065/work/third_party/gloo/gloo/transport/tcp/pair.cc:575] Connection closed by peer [11.88.165.7]:54008
    Traceback (most recent call last):
      File "wenet/bin/train.py", line 277, in <module>
        main()
      File "wenet/bin/train.py", line 250, in main
        executor.train(model, optimizer, scheduler, train_data_loader, device,
      File "/homepath/tools/wenet-uio/wenet/utils/executor.py", line 71, in train
        loss.backward()
      File "/homepath/envs/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/homepath/envs/anaconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
        Variable._execution_engine.run_backward(
    RuntimeError: Application timeout caused pair closure
    

    To Reproduce

    Finally, pull the latest wenet code,reproduced the above problem with aishell recipe:

    data_type=shard
    train_config=conf/train_unified_conformer.yaml
    cmvn=false
    dynamic batch
    accum_grad=8
    

       How should this be solved? Thank you.

    opened by 601222543 12
  • Decoding hangs when using LM rescoring

    Decoding hangs when using LM rescoring

    I'm following this tutorial to use LM rescoring for decoding: https://github.com/wenet-e2e/wenet/blob/23a61b212bf2c3886546925913f5574f779f474a/examples/librispeech/s0/run.sh#L234

    I didn't re-train a model and instead, I use the pre-trained conformer model. I had no problem building the TLG.fst, but ./tools/decode.sh hangs forever when evaluating on the test set. Could you provide any suggestions on where the problem would be and how to debug this?

    The below is the code I used for LM rescoring (I took this out from run.sh):

    pretrained_model=wenet/models/20210216_conformer_exp
    dict=$pretrained_model/words.txt
    bpemodel=$pretrained_model/train_960_unigram5000
    
    lm=data/local/lm
    lexicon=data/local/dict/lexicon.txt
    mkdir -p $lm
    mkdir -p data/local/dict
    
    # 7.1 Download & format LM
    which_lm=3-gram.pruned.1e-7.arpa.gz
    if [ ! -e ${lm}/${which_lm} ]; then
        wget http://www.openslr.org/resources/11/${which_lm} -P ${lm}
    fi
    echo "unzip lm($which_lm)..."
    gunzip -k ${lm}/${which_lm} -c > ${lm}/lm.arpa
    echo "Lm saved as ${lm}/lm.arpa"
    
    # 7.2 Prepare dict
    unit_file=$dict
    bpemodel=$bpemodel
    # use $dir/words.txt (unit_file) and $dir/train_960_unigram5000 (bpemodel)
    # if you download pretrained librispeech conformer model
    cp $unit_file data/local/dict/units.txt
    if [ ! -e ${lm}/librispeech-lexicon.txt ]; then
        wget http://www.openslr.org/resources/11/librispeech-lexicon.txt -P ${lm}
    fi
    echo "build lexicon..."
    tools/fst/prepare_dict.py $unit_file ${lm}/librispeech-lexicon.txt \
        $lexicon $bpemodel.model
    echo "lexicon saved as '$lexicon'"
    
    # 7.3 Build decoding TLG
    tools/fst/compile_lexicon_token_fst.sh \
       data/local/dict data/local/tmp data/local/lang
    tools/fst/make_tlg.sh data/local/lm data/local/lang data/lang_test || exit 1;
    
    # 7.4 Decoding with runtime
    echo "Start decoding..."
    fst_dir=data/lang_test
    dir=$pretrained_model
    recog_set="test_clean"
    for test in ${recog_set}; do
        ./tools/decode.sh --nj 2 \
            --beam 10.0 --lattice_beam 5 --max_active 7000 --blank_skip_thresh 0.98 \
            --ctc_weight 0.5 --rescoring_weight 1.0 --acoustic_scale 1.2 \
            --fst_path $fst_dir/TLG.fst \
            data/$test/wav.scp.10 data/$test/text.10 $dir/final.zip $fst_dir/words.txt \
            $dir/lm_with_runtime_${test}
        tail $dir/lm_with_runtime_${test}/wer
    done
    
    opened by boliangz 12
  • macOS M1 support?

    macOS M1 support?

    [ 96%] Linking CXX shared library libwenet_api.dylib ld: warning: ignoring file ../../../fc_base/libtorch-src/lib/libtorch.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64

    opened by jinfagang 11
  • When using libtorch, gpu decoding is slower than cpu.

    When using libtorch, gpu decoding is slower than cpu.

    When using gpu to decode, gpu memory gets allocated but gpu-util rises after a lot of time. For example, if you proceed with decoding 600 voices, it progresses very slowly until about the 100th, and then speeds up from the point when gpu-util rises. Increasing the number of threads in decoder_main.cc makes it faster, but I'd like to fix the problem when it's single-threaded. What should I do?

    cpu = 24 cores gpu = rtx a5000(24gb) x 2 ubuntu 20.04.4

    opened by hms1205 0
  • Quantized model under checkpoint mode performs quite different from the one under jit mode

    Quantized model under checkpoint mode performs quite different from the one under jit mode

    I have trained an original asr model and i convert it into quantized model in both jit mode(named asr_quant.zip) and checkpoint mode (named asr_quant_checkpoint.pt). But the results from the jit mode and the checkpoint mode are quite different.

    Quantized model in jit mode: test Final result: 甚至出现交易几乎停滞的情况

    Quantized model in checkpoint mode: INFO BAC009S0764W0121 ▁LAWS骑钰阐易ISH燕▁CRITIC▁QUANTITY▁GOING骑燕▁MORE鲨ANSISH致▁GOING燕▁GOING燕▁GOING▁DESIRED▁GOING▁BREATH▁CRITIC俏尺骑▁GOING骑▁PERFECTION燕▁GOING燕▁SH燕▁SH谊▁PERFECTION敷唬诊▁SH定▁OVEN▁ORDERS尹O▁IGNORISH▁PRESIDENTO锣OKA▁PERFECTIONISH燕▁EIGHTEEN笛燕何▁PERFECTION▁INFORMEDLAND何骑▁PRETTY燕湿O▁PERFECTION尺O燕汐辆女何燕翼鲨O▁PERFECTION▁FIRST架燕绘翼盘锣▁THIS▁PRETTY▁SONG▁PERFECTION唬▁INFORMED障渲▁EIGHTEEN锣燕咏劈赌盘涉燕轧▁ABSORB汐O▁PERFECTION锣▁EIGHTEEN燕▁SH燕▁SH敷▁PRESIDENT书敷诊唬治唬唯轧辆▁IGNOR▁DOESN▁PERFECTION▁IGNOR洒翼O▁SAVE▁FIRST▁KISS▁PERFECTION锣▁PERFECTION备惭骑企洒▁PERFECTION洒慌▁SH▁CANDLE▁CHIN▁CANDLE企▁CHIN▁LIBERTY锣▁WEATHER▁FIRST▁COUNTRY敷▁CLERK

    opened by PPGGG 2
  • windows识别没有输出,也没有错报

    windows识别没有输出,也没有错报

    python version = 3.8.5

    先是安装了runtime pip install wenetruntime

    然后脚本如下: import sys import torch import wenetruntime as wenet

    wav_file = sys.argv[1] decoder = wenet.Decoder(lang='chs') ans = decoder.decode_wav(wav_file) print(ans)

    执行脚本给定一个audio.wav音频,没有任何输出,也没有报错信息,脚本就结束了 有人知道是为啥吗?我还缺了哪些环境配置吗?

    opened by zhhl9101 1
  • Efficient Conformer implementation

    Efficient Conformer implementation

    This PR is about our implementation of Efficient Conformer for WeNet encoder structure and runtime.

    • Original paper: https://arxiv.org/pdf/2109.01163.pdf
    • Original code: https://github.com/burchim/EfficientConformer

    In 58.Com Inc, using Efficient Conformer can reduce CER by 6% relative to Conformer and a 10% increase in inference speed (CPU JIT runtime). Combined with int8 quantization, the inference speed can be improved by 50~70%. More detail of our work: https://mp.weixin.qq.com/s/7T1gnNrVmKIDvQ03etltGQ

    Added features

    • [X] Efficient Conformer Encoder structure
      • [X] StrideConformerEncoderLayer for "Progressive Downsampling to the Conformer encoder"
      • [X] GroupedRelPositionMultiHeadedAttention for "Grouped Attention"
      • [X] Conv2dSubsampling2 for 1/2 Convolution Downsampling
    • [X] Recognize and JIT export
      • [X] forward_chunk and forward_chunk_by_chunk in wenet/efficient_conformer/encoder.py
    • [X] Streaming inference at JIT runtime
      • [X] TorchAsrModelEfficient in runtime/core/decoder for Progressive Downsampling
    • [X] Configuration file of Aishell-1
      • [X] train_u2++_efficonformer_v1.yaml for our online deployment
      • [X] train_u2++_efficonformer_v2.yaml for Original paper

    Developers

    • Efficient Conformer Encoder structure: ( Yaru Wang & Wei Zhou )
    • Recognize and JIT export: ( Wei Zhou )
    • Streaming inference at JIT runtime: ( Yongze Li )
    • Configuration file of Aishell-1: ( Wei Zhou )

    TODO

    • [ ] ONNX export and runtime
    • [x] Aishell-1 experiment
    opened by zwglory 2
  • Export ONNX fail  with export_onnx_gpu.py

    Export ONNX fail with export_onnx_gpu.py

    error.log Attached error.log is showed with verbose.

    i tried with different onnxruntime versions, still gave the same errors. Simple log is as follow:

    python3 wenet/bin/export_onnx_gpu.py --config=/home/ricky/heqing/8w-hours/squeezeformer-8whr-avg2/train.yaml --checkpoint=/home/ricky/heqing/8w-hours/squeezeformer-8whr-avg2/avg_10_156000_13_196000.pt --cmvn_file=/home/ricky/heqing/8w-hours/squeezeformer-8whr-avg2/global_cmvn --ctc_weight=0.5 --output_onnx_dir=/tmp Failed to import k2 and icefall. Notice that they are necessary for hlg_onebest and hlg_rescore Update ctc weight to 0.5 /home/ricky/wenet_train_res/wenet_tools_git/wenet/utils/mask.py:213: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! max_len = max_len if max_len > 0 else lengths.max().item() /home/ricky/wenet_train_res/wenet_tools_git/wenet/transformer/embedding.py:96: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert offset + size < self.max_len /home/ricky/wenet_train_res/wenet_tools_git/wenet/squeezeformer/attention.py:187: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if cache.size(0) > 0: /home/ricky/wenet_train_res/wenet_tools_git/wenet/squeezeformer/attention.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if mask.size(2) > 0: # time2 > 0 /home/ricky/wenet_train_res/wenet_tools_git/wenet/squeezeformer/convolution.py:140: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if mask_pad.size(2) > 0: # time > 0 /home/ricky/wenet_train_res/wenet_tools_git/wenet/squeezeformer/convolution.py:171: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if mask_pad.size(2) > 0: # time > 0 /home/ricky/wenet_train_res/wenet_tools_git/wenet/squeezeformer/subsampling.py:159: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if L - T < 0: [0, 0, 0] [-1, -1, -1] 2022-12-21 19:30:33.720608405 [W:onnxruntime:, constant_folding.cc:150 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_2506' 2022-12-21 19:30:33.768034651 [W:onnxruntime:, constant_folding.cc:150 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_2506' 2022-12-21 19:30:33.812875437 [W:onnxruntime:, constant_folding.cc:150 ApplyImpl] Unsupported output type of N11onnxruntime22SequenceTensorTypeBaseE. Can't constant fold SequenceEmpty node 'SequenceEmpty_2506' 2022-12-21 19:30:35.151413519 [E:onnxruntime:, sequential_executor.cc:333 Execute] Non-zero status code returned while running MatMul node. Name:'MatMul_2528' Status Message: Not satisfied: K_ == right_shape[right_num_dims - 2] || transb && K_ == right_shape[right_num_dims - 1] matmul_helper.h:42 ComputeMatMul dimension mismatch Traceback (most recent call last): File "wenet/bin/export_onnx_gpu.py", line 574, in onnx_config = export_enc_func(model, configs, args, logger, encoder_onnx_path) File "wenet/bin/export_onnx_gpu.py", line 331, in export_offline_encoder ort_outs = ort_session.run(None, ort_inputs) File "/home/ricky/anaconda3/envs/wenet/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 124, in run return self.sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running MatMul node. Name:'MatMul_2528' Status Message: Not satisfied: K == right_shape[right_num_dims - 2] || transb && K_ == right_shape[right_num_dims - 1] matmul_helper.h:42 ComputeMatMul dimension mismatch

    opened by rickychanhoyin 9
  • undefined value chunk_masks: in squeezformer

    undefined value chunk_masks: in squeezformer

    Just pulled the latest wenet code and tried out Squeezformer. The training is failed with this log attached below. Any suggestion would be helpful. Thanks.

    `the number of model params: 135,220,418 Traceback (most recent call last): File "wenet/bin/train.py", line 309, in main() File "wenet/bin/train.py", line 205, in main script_model = torch.jit.script(model) File "/home/bsen/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/jit/_script.py", line 1257, in script return torch.jit._recursive.create_script_module( File "/home/bsen/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 451, in create_script_module return create_script_module_impl(nn_module, concrete_type, stubs_fn) File "/home/bsen/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 517, in create_script_module_impl create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs) File "/home/bsen/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 368, in create_methods_and_properties_from_stubs concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults) File "/home/bsen/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 869, in compile_unbound_method create_methods_and_properties_from_stubs(concrete_type, (stub,), ()) File "/home/bsen/miniconda3/envs/wenet/lib/python3.8/site-packages/torch/jit/_recursive.py", line 368, in create_methods_and_properties_from_stubs concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults) RuntimeError: undefined value chunk_masks: File "/home/bsen/wenet_new/examples/squeezformer/wenet/squeezeformer/encoder.py", line 379 pos_emb = recover_pos_emb mask_pad = recover_mask_pad xs = xs.masked_fill(~chunk_masks[:, 0, :].unsqueeze(-1), 0.0) ~~~~~~~~~~~ <--- HERE

            factor = self.calculate_downsampling_factor(i)
    

    'SqueezeformerEncoder.forward_chunk' is being compiled since it was called from 'ASRModel.forward_encoder_chunk' File "/home/bsen/wenet_new/examples/squeezformer/wenet/transformer/asr_model.py", line 776

        """
        return self.encoder.forward_chunk(xs, offset, required_cache_size,
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                          att_cache, cnn_cache)
                                          ~~~~~~~~~~~~~~~~~~~~ <--- HERE`
    
    opened by senbukai0203 2
Releases(v2.1.0)
  • v2.1.0(Nov 25, 2022)

    What's Changed

    • allow instantiate multiple models in #1580
    • do not pack libtorch.so in python binding to reduce wheel in #1573 and #1576
    • support iOS by @Ma-Dan in #1549 🛫
    • support HLG decode by @aluminumbox in #1521 💯
    • support squeezeformer by @yygle in #1519 👍
    • support XPU by @imoisture in #1455 🚀
    • and so on ...
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 21, 2022)

  • v2.0.0(Jun 14, 2022)

    The following features are stable.

    • [x] U2++ framework for better accuracy
    • [x] n-gram + WFST language model solution
    • [x] Context biasing(hotword) solution
    • [x] Very big data training support with UIO
    • [x] More dataset support, including WenetSpeech, GigaSpeech, HKUST and so on.
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jun 21, 2021)

    Model

    • propose and support U2++, as the following graph shows, which uses both forward and backward information at training and decoding.

    image

    • support dynamic left chunk training and decoding, so we can limit history chunk at decoding to save memory and computation.
    • support distributed training.

    Dataset

    Now we support the following five standard speech datasets, and we got SOTA result or close to SOTA result. | 数据集 | 语言 | 数据量(h) | 测试集 | CER/WER | SOTA | |-------------|------|-----------|------------|---------|---------------| | aishell-1 | 中文 | 200 | test | 4.36 | 4.36(WeNet) | | aishell-2 | 中文 | 1000 | test_ios | 5.39 | 5.39(WeNet) | | multi-cn | 中文 | 2385 | / | / | / | | librispeech | 英文 | 1000 | test_clean | 2.66 | 2.10(EspNet) | | gigaspeech | 英文 | 10000 | test | 11.0 | 10.80(EspNet) |

    Productivity

    Here are some features related to productivity.

    • LM support. Here is the system design or LM supporting. WeNet can work with/without LM according to your applications/scenarios.

    image

    • timestamp support.
    • n-best support.
    • endpoint support.
    • gRPC support
    • further refine x86 server and on-device android recipe.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Feb 4, 2021)

Owner
Production First and Production Ready End-to-End Speech Toolkit
null
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Progressive Transformers for End-to-End Sign Language Production Source code for "Progressive Transformers for End-to-End Sign Language Production" (B

null 58 Dec 21, 2022
Contra is a lightweight, production ready Tensorflow alternative for solving time series prediction challenges with AI

Contra AI Engine A lightweight, production ready Tensorflow alternative developed by Styvio styvio.com » How to Use · Report Bug · Request Feature Tab

styvio 14 May 25, 2022
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

TorchRL Disclaimer This library is not officially released yet and is subject to change. The features are available before an official release so that

Meta Research 860 Jan 7, 2023
African language Speech Recognition - Speech-to-Text

Swahili-Speech-To-Text Table of Contents Swahili-Speech-To-Text Overview Scenario Approach Project Structure data: models: notebooks: scripts tests: l

null 2 Jan 5, 2023
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

Jaehyeon Kim 1.7k Jan 8, 2023
PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

null 10 Aug 19, 2022
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

null 78 Dec 27, 2022
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

AdaFocusV2 This repo contains the official code and pre-trained models for AdaFo

null 79 Dec 26, 2022
This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

A three-stage detection and recognition pipeline of complex meters in wild This is the first released system towards detection and recognition of comp

Yan Shu 19 Nov 28, 2022
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

Keon Lee 114 Dec 12, 2022
PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi PIKA is a lightweight speech processing toolkit based on Pytorch and (Py)

null 336 Nov 25, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 29, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17k Feb 11, 2021
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022