DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

Overview

(简体中文|English)


PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

Speech Recognition
Input Audio Recognition Result

I knocked at the door on the ancient side of the building.

我认为跑步最重要的就是给我带来了身体健康。
Speech Translation (English to Chinese)
Input Audio Translations Result

我 在 这栋 建筑 的 古老 门上 敲门。
Text-to-Speech
Input Text Synthetic Audio
Life was like a box of chocolates, you never know what you're gonna get.
早上好,今天是2020/10/29,最低温度是-3°C。
季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。

For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.

Punctuation Restoration
Input Text Output Text
今天的天气真不错啊你下午有空吗我想约你一起去吃饭 今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。

🔥 Hot Activities

Features

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:

  • 📦 Ease of Use: low barriers to install, and CLI is available to quick-start your journey.
  • 🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology.
  • 💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
  • Varieties of Functions that Vitalize both Industrial and Academia:
    • 🛎️ Implementation of critical audio tasks: this toolkit contains audio functions like Audio Classification, Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, etc.
    • 🔬 Integration of mainstream models and datasets: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also model list for more details.
    • 🧩 Cascaded models application: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

Recent Update

  • 🤗 2021.12.14: Our PaddleSpeech ASR and TTS Demos on Hugging Face Spaces are available!
  • 👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.

Community

  • Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation.

Installation

We strongly recommend our users to install PaddleSpeech in Linux with python>=3.7. Up to now, Linux supports CLI for the all our tasks, Mac OSX and Windows only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install PaddleSpeech, please see installation.

Quick Start

Developers can have a try of our models with PaddleSpeech Command Line. Change --input to test your own audio/text.

Audio Classification

paddlespeech cls --input input.wav

Automatic Speech Recognition

paddlespeech asr --lang zh --input input_16k.wav

Speech Translation (English to Chinese)

(not support for Mac and Windows now)

paddlespeech st --input input_16k.wav

Text-to-Speech

paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --output output.wav

Text Postprocessing

  • Punctuation Restoration
    paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭

For more command lines, please see: demos

If you want to try more functions like training and tuning, please have a look at Speech-to-Text Quick Start and Text-to-Speech Quick Start.

Model List

PaddleSpeech supports a series of most popular models. They are summarized in released models and attached with available pretrained models.

Speech-to-Text contains Acoustic Model, Language Model, and Speech Translation, with the following details:

Speech-to-Text Module Type Dataset Model Type Link
Speech Recogination Aishell DeepSpeech2 RNN + Conv based Models deepspeech2-aishell
Transformer based Attention Models u2.transformer.conformer-aishell
Librispeech Transformer based Attention Models deepspeech2-librispeech / transformer.conformer.u2-librispeech / transformer.conformer.u2-kaldi-librispeech
Alignment THCHS30 MFA mfa-thchs30
Language Model Ngram Language Model kenlm
TIMIT Unified Streaming & Non-streaming Two-pass u2-timit
Speech Translation (English to Chinese) TED En-Zh Transformer + ASR MTL transformer-ted
FAT + Transformer + ASR MTL fat-st-ted

Text-to-Speech in PaddleSpeech mainly contains three modules: Text Frontend, Acoustic Model and Vocoder. Acoustic Model and Vocoder models are listed as follow:

Text-to-Speech Module Type Model Type Dataset Link
Text Frontend tn / g2p
Acoustic Model Tacotron2 LJSpeech tacotron2-ljspeech
Transformer TTS transformer-ljspeech
SpeedySpeech CSMSC speedyspeech-csmsc
FastSpeech2 AISHELL-3 / VCTK / LJSpeech / CSMSC fastspeech2-aishell3 / fastspeech2-vctk / fastspeech2-ljspeech / fastspeech2-csmsc
Vocoder WaveFlow LJSpeech waveflow-ljspeech
Parallel WaveGAN LJSpeech / VCTK / CSMSC PWGAN-ljspeech / PWGAN-vctk / PWGAN-csmsc
Multi Band MelGAN CSMSC Multi Band MelGAN-csmsc
Style MelGAN CSMSC Style MelGAN-csmsc
HiFiGAN CSMSC HiFiGAN-csmsc
Voice Cloning GE2E Librispeech, etc. ge2e
GE2E + Tactron2 AISHELL-3 ge2e-tactron2-aishell3
GE2E + FastSpeech2 AISHELL-3 ge2e-fastspeech2-aishell3

Audio Classification

Task Dataset Model Type Link
Audio Classification ESC-50 PANN pann-esc50

Punctuation Restoration

Task Dataset Model Type Link
Punctuation Restoration IWLST2012_zh Ernie Linear iwslt2012-punc0

Documents

Normally, Speech SoTA, Audio SoTA and Music SoTA give you an overview of the hot academic topics in the related area. To focus on the tasks in PaddleSpeech, you will find the following guidelines are helpful to grasp the core ideas.

The Text-to-Speech module is originally called Parakeet, and now merged with this repository. If you are interested in academic research about this task, please see TTS research overview. Also, this document is a good guideline for the pipeline components.

Citation

To cite PaddleSpeech for research, please use the following format.

@misc{ppspeech2021,
title={PaddleSpeech, a toolkit for audio processing based on PaddlePaddle.},
author={PaddlePaddle Authors},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}},
year={2021}
}

Contribute to PaddleSpeech

You are warmly welcome to submit questions in discussions and bug reports in issues! Also, we highly appreciate if you are willing to contribute to this project!

Contributors

Acknowledgement

Besides, PaddleSpeech depends on a lot of open source repositories. See references for more information.

License

PaddleSpeech is provided under the Apache-2.0 License.

Comments
  • 中文的deploy问题

    中文的deploy问题

    终于,我在docker中启动了服务器和客户端,然后说了一段中文,出现这样的错误: Exception happened during processing of request from ('127.0.0.1', 59312) Traceback (most recent call last): File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock self.process_request(request, client_address) File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request self.finish_request(request, client_address) File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python2.7/SocketServer.py", line 652, in init self.handle() File "deploy/demo_server.py", line 108, in handle (finish_time - start_time, transcript)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-48: ordinal not in range(128) 我个人觉得,可以把识别结果存储到一个文件中,没必要打印出来,当然,如果作者可以解决打印的问题,就更好了。

    opened by yyhlvdl 43
  • 安装PaddleSpeech相关问题讨论(Windows)

    安装PaddleSpeech相关问题讨论(Windows)

    无论使用怎样的安装方法,所需的C++也安装了,总是报错:

    Failed to build pyworld webrtcvad bottleneck ERROR: Could not build wheels for pyworld, bottleneck, which is required to install pyproject.toml-based projects

    网上找了各种方法都不成功,诚心求教!

    Installation 
    opened by qibinran 40
  • 使用官方提供的模型作为预训练模型训练自己数据集报错

    使用官方提供的模型作为预训练模型训练自己数据集报错

    我使用如下的官方中文训练模型作为预训练模型训练自己的数据集,报错, image

    错误信息如下:

    Traceback (most recent call last):
      File "train.py", line 118, in <module>
        main()
      File "train.py", line 114, in main
        train()
      File "train.py", line 109, in train
        test_off=args.test_off)
      File "/DeepSpeech/model_utils/model.py", line 307, in train
        pre_epoch = self.init_from_pretrained_model(exe, train_program)
      File "/DeepSpeech/model_utils/model.py", line 161, in init_from_pretrained_model
        filename="params.pdparams")
      File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 784, in load_params
        filename=filename)
      File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 668, in load_vars
        filename=filename)
      File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 727, in load_vars
        format(orig_shape, each_var.name, new_shape))
    RuntimeError: Shape not matching: the Program requires a parameter with a shape of ((1312L, 3072L)), while the loaded parameter (namely [ layer_2_forward_fc_weight ]) has a shape of  ((1312, 6144)).
    Failed in training!
    
    opened by yeyupiaoling 35
  • download_lm_en.sh broken

    download_lm_en.sh broken

    Hi all,

    Got error when trying run /models/lm/download_lm_en.sh to download http://paddlepaddle.bj.bcebos.com/model_zoo/speech/common_crawl_00.prune01111.trie.klm

    {
    "code": "AccountOverdue",
    "message": "Your request is denied because there is an overdue bill of your account.",
    "requestId": "4b684141-1175-4691-a9cd-52c458a94845"
    }
    
    opened by haoqiang 30
  • aishell的deploy的问题

    aishell的deploy的问题

    我直接使用你们发布的aishell模型,执行python deploy/demo_server.py,然后出现了错误:

    root@095d9ada1b1d:/DeepSpeech# python deploy/demo_server.py
    -----------  Configuration Arguments -----------
    alpha: 2.15
    beam_size: 500
    beta: 0.35
    cutoff_prob: 1.0
    cutoff_top_n: 40
    decoding_method: ctc_beam_search
    host_ip: localhost
    host_port: 8086
    lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
    mean_std_path: asset/preprocess/mean_std.npz
    model_path: asset/train/params.tar.gz
    num_conv_layers: 2
    num_rnn_layers: 3
    rnn_layer_size: 2048
    share_rnn_weights: False
    specgram_type: linear
    speech_save_dir: demo_cache
    use_gpu: True
    use_gru: True
    vocab_path: asset/preprocess/vocab.txt
    warmup_manifest: asset/preprocess/test
    ------------------------------------------------
    I1205 10:14:34.175657    15 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1 
    [INFO 2017-12-05 10:14:35,626 layers.py:2606] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
    [INFO 2017-12-05 10:14:35,626 layers.py:3133] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
    [INFO 2017-12-05 10:14:35,627 layers.py:7224] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
    [INFO 2017-12-05 10:14:35,627 layers.py:2606] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
    [INFO 2017-12-05 10:14:35,628 layers.py:3133] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
    [INFO 2017-12-05 10:14:35,628 layers.py:7224] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
    -----------------------------------------------------------
    Warming up ...
    ('Warm-up Test Case %d: %s', 0, u'asset/data/aishell/wav/test/S0765/BAC009S0765W0205.wav')
    [INFO 2017-12-05 10:14:42,337 model.py:230] begin to initialize the external scorer for decoding
    [INFO 2017-12-05 10:14:50,941 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
    [INFO 2017-12-05 10:14:50,941 model.py:242] end initializing scorer. Start decoding ...
    Traceback (most recent call last):
      File "deploy/demo_server.py", line 224, in <module>
        main()
      File "deploy/demo_server.py", line 220, in main
        start_server()
      File "deploy/demo_server.py", line 204, in start_server
        num_test_cases=3)
      File "deploy/demo_server.py", line 143, in warm_up_test
        (finish_time - start_time, transcript))
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-94: ordinal not in range(128)
    

    于是,我将transcript注释掉,重新执行,然后可以继续了。只是

    [INFO 2017-12-05 10:46:41,054 model.py:230] begin to initialize the external scorer for decoding
    [INFO 2017-12-05 10:46:42,193 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
    [INFO 2017-12-05 10:46:42,193 model.py:242] end initializing scorer. Start decoding ...
    Response Time: 1174.020508
    ('Warm-up Test Case %d: %s', 1, u'asset/data/aishell/wav/test/S0767/BAC009S0767W0141.wav')
    

    一个文件就需要1174s,这么长的时间,请问,有办法可以提速吗?

    opened by yyhlvdl 26
  • mac下run_train.sh内存占用持续增加,有泄漏?

    mac下run_train.sh内存占用持续增加,有泄漏?

    我想用自己的语料来训练DeepSpeech, 训练过程系统内存占用持续增高,直至交换文件把磁盘撑爆。 但是python进程本身的内存占用又没有增加,不知道是哪里吃的内存。

    (paddle)loong@MacBook-Pro:~/l/lab/py/ml/baidu/wav on master$ sh run_train.sh 
    -----------  Configuration Arguments -----------
    augment_conf_path: arg.config
    batch_size: 4
    dev_manifest: data/manifest.train
    init_model_path: None
    is_local: 1
    learning_rate: 5e-05
    max_duration: 27.0
    mean_std_path: data/mean_std.npz
    min_duration: 0.0
    num_conv_layers: 2
    num_iter_print: 100
    num_passes: 40
    num_proc_data: 16
    num_rnn_layers: 3
    output_model_dir: ./models
    rnn_layer_size: 1024
    share_rnn_weights: 0
    shuffle_method: batch_shuffle_clipped
    specgram_type: linear
    test_off: 0
    train_manifest: data/manifest.train
    trainer_count: 1
    use_gpu: 0
    use_gru: 0
    use_sortagrad: 1
    vocab_path: data/vocab.txt
    ------------------------------------------------
    I0202 21:40:14.468683 2907198400 Util.cpp:166] commandline:  --use_gpu=0 --rnn_use_batch=True --log_clipping=True --trainer_count=1 
    [INFO 2018-02-02 21:40:14,484 layers.py:2689] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
    [INFO 2018-02-02 21:40:14,485 layers.py:3251] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
    [INFO 2018-02-02 21:40:14,487 layers.py:7409] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
    [INFO 2018-02-02 21:40:14,488 layers.py:2689] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
    [INFO 2018-02-02 21:40:14,490 layers.py:3251] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
    [INFO 2018-02-02 21:40:14,493 layers.py:7409] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
    I0202 21:40:14.751354 2907198400 GradientMachine.cpp:94] Initing parameters..
    I0202 21:40:15.287204 2907198400 GradientMachine.cpp:101] Init parameters done.
    

    1 2 3

    opened by kvinwang 19
  • bash run.sh

    bash run.sh

    (base) root@a8e4df74e22d:/DeepSpeech/DeepSpeech-develop/examples/tiny# bash run.sh /root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py:297: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( Skip downloading and unpacking. Data already exists in /DeepSpeech/DeepSpeech-develop/examples/tiny/../..//examples/dataset/aishell. Creating manifest data/manifest ...

    /root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py:297: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( ----------- Configuration Arguments ----------- count_threshold: 0 manifest_paths: ['data/manifest.train', 'data/manifest.dev'] vocab_path: data/vocab.txt

    /root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py:297: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( ----------- Configuration Arguments ----------- manifest_path: data/manifest.train num_samples: 2000 output_path: data/mean_std.npz specgram_type: linear

    Aishell data preparation done. using 2 gpus... /root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py:297: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( Traceback (most recent call last): File "/DeepSpeech/DeepSpeech-develop/examples/tiny/../..//deepspeech/exps/deepspeech2/bin/train.py", line 26, in from deepspeech.exps.deepspeech2.config import get_cfg_defaults File "/DeepSpeech/DeepSpeech-develop/deepspeech/exps/deepspeech2/config.py", line 16, in from deepspeech.models.deepspeech2 import DeepSpeech2Model File "/DeepSpeech/DeepSpeech-develop/deepspeech/models/deepspeech2.py", line 31, in from deepspeech.modules.ctc import CTCDecoder File "/DeepSpeech/DeepSpeech-develop/deepspeech/modules/ctc.py", line 23, in from deepspeech.decoders.swig_wrapper import Scorer File "/DeepSpeech/DeepSpeech-develop/deepspeech/decoders/swig_wrapper.py", line 16, in import swig_decoders ModuleNotFoundError: No module named 'swig_decoders'

    opened by zhangyifei1 18
  • Parallel WaveGAN with CSMSC error

    Parallel WaveGAN with CSMSC error

    我参考这个案例进行训练 https://github.com/PaddlePaddle/PaddleSpeech/tree/19f67e1f564f1dcd49b89159b39bb4a34b7b6cdd/examples/csmsc/voc1

    已经下载好了数据集

    (base) root@ff21c21bf0ea:/opt/PaddleSpeech/examples/csmsc/voc1# ll ~/datasets/
    total 1156
    drwxr-sr-x 3 root   users 593920 Feb 25 09:31 ./
    drwsrwsr-x 1 jovyan users   4096 Feb 28 06:06 ../
    drwxr-sr-x 3 root   users 577536 Feb 25 09:31 BZNSYP/
    

    baker_alignment_tone.tar.gz文件也解压到了当前的目录中 已经满足了README文件中的条件

    Assume the path to the dataset is ~/datasets/BZNSYP. Assume the path to the MFA result of CSMSC is ./baker_alignment_tone. Run the command below to
    
    source path.
    preprocess the dataset.
    train the model.
    synthesize wavs.
    synthesize waveform from metadata.jsonl.
    ./run.sh
    

    在运行run.sh 的时候,我得到了下面的错误信息,我不知道该怎样解决

    (base) root@ff21c21bf0ea:/opt/PaddleSpeech/examples/csmsc/voc1# ./run.sh
    Generate durations.txt from MFA results ...
    Extract features ...
    /home/jovyan/datasets/BZNSYP
    Get features' stats ...
    Traceback (most recent call last):
      File "/opt/PaddleSpeech/utils/compute_statistics.py", line 109, in <module>
        main()
      File "/opt/PaddleSpeech/utils/compute_statistics.py", line 84, in main
        with jsonlines.open(args.metadata, 'r') as reader:
      File "/opt/conda/lib/python3.8/site-packages/jsonlines/jsonlines.py", line 623, in open
        fp = builtins.open(file, mode=mode + "t", encoding=encoding)
    FileNotFoundError: [Errno 2] No such file or directory: 'dump/train/raw/metadata.jsonl'
    Normalize ...
    Traceback (most recent call last):
      File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 133, in <module>
        main()
      File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 81, in main
        with jsonlines.open(args.metadata, 'r') as reader:
      File "/opt/conda/lib/python3.8/site-packages/jsonlines/jsonlines.py", line 623, in open
        fp = builtins.open(file, mode=mode + "t", encoding=encoding)
    FileNotFoundError: [Errno 2] No such file or directory: 'dump/train/raw/metadata.jsonl'
    Traceback (most recent call last):
      File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 133, in <module>
        main()
      File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 81, in main
        with jsonlines.open(args.metadata, 'r') as reader:
      File "/opt/conda/lib/python3.8/site-packages/jsonlines/jsonlines.py", line 623, in open
        fp = builtins.open(file, mode=mode + "t", encoding=encoding)
    FileNotFoundError: [Errno 2] No such file or directory: 'dump/dev/raw/metadata.jsonl'
    Traceback (most recent call last):
      File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 133, in <module>
        main()
      File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 81, in main
        with jsonlines.open(args.metadata, 'r') as reader:
      File "/opt/conda/lib/python3.8/site-packages/jsonlines/jsonlines.py", line 623, in open
        fp = builtins.open(file, mode=mode + "t", encoding=encoding)
    FileNotFoundError: [Errno 2] No such file or directory: 'dump/test/raw/metadata.jsonl'
    

    dump/train/raw/metadata.jsonl 应该是run.sh 生成的吧,但实际上并没有生成这个文件

    opened by yJun-Chen 17
  • 安装错误

    安装错误

    你好,我今天发现,paddlepaddle可以用pip安装了,就在uabntu16.04上pip安装了它。然而,在git clone https://github.com/PaddlePaddle/models.git cd models/deep_speech_2 sh setup.sh 这一步安装setup.sh 和swig的setup.sh时,都出现错误:openfst-1.6.3/src/include/fst/union.h:33:40: warning: typedef ‘using StateId = typename Arc::StateId’ locally defined but not used [-Wunused-local-typedefs] using StateId = typename Arc::StateId; ^ error: command 'gcc' failed with exit status 1 请问,可以解决下吗?毕竟pip安装的paddlepaddle比docker安装的要便捷多了。

    opened by yyhlvdl 17
  • 升级3070后,python3进程卡住cpu 100%

    升级3070后,python3进程卡住cpu 100%

    开始测试1080ti,程序正常,环境: docker运行 paddlepaddle_gpu-1.8.5.post107 cuda 10.1 cdnn 7.5 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A | | 40% 27C P0 58W / 250W | 0MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A | | 47% 36C P0 57W / 250W | 0MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... Off | 00000000:89:00.0 Off | N/A | | 49% 30C P0 54W / 250W | 0MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... Off | 00000000:8A:00.0 Off | N/A | | 55% 29C P0 53W / 250W | 0MiB / 11178MiB | 5% Default | +-------------------------------+----------------------+----------------------+

    另一台机器升级3070,环境相同,但驱动因为需要升级: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 3070 Off | 00000000:82:00.0 Off | N/A | | 0% 33C P8 6W / 220W | 303MiB / 7982MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

    cuda安装10.1,使用nvcc -V查看: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105

    程序启动后,就夯住了,python3进程cpu 100%

    W0617 04:06:56.153712 194 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 86, Driver API Version: 11.2, Runtime API Version: 10.0 W0617 04:06:56.153888 194 device_context.cc:260] device: 0, cuDNN Version: 7.5. W0617 04:06:56.512885 194 device_context.h:155] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.6, but CUDNN version in your machine is 7.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.

    image

    Stale Deployment 
    opened by hebo1982 16
  • 训练过程中,train loss : nan

    训练过程中,train loss : nan

    aishell训练过程中, bash ./local/run_train.sh

    epoch: 0 , batch : 6700 , train loss : 64.74555 epoch: 0 , batch : 6800 , train loss : nan epoch: 0 , batch : 6900 , train loss : nan .......

    train loss : nan 这个是什么问题导致的 有没什么影响

    nvidia-smi 查看 一直正常

    Question 
    opened by monkeycc 15
  • File

    File "/root/anaconda3/envs/rpa/lib/python3.8/site-packages/yacs/config.py", line 141, in getattr raise AttributeError(name) AttributeError: preprocess_config

    跑脚本代码为:

    CUDA_VISIBLE_DEVICES=1 ./local/test_wav.sh conf/deepspeech2.yaml conf/tuning/decode.yaml exp/deepspeech2/checkpoints/avg_1 data/demo_01_03.wav,
    

    报的错为:

    Traceback (most recent call last): File "/data/rshm/PaddleSpeech-develop/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py", line 201, in main(config, args) File "/data/rshm/PaddleSpeech-develop/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py", line 169, in main main_sp(config, args) File "/data/rshm/PaddleSpeech-develop/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py", line 163, in main_sp exp = DeepSpeech2Tester_hub(config, args) File "/data/rshm/PaddleSpeech-develop/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py", line 42, in init self.preprocess_conf = config.preprocess_config File "/root/anaconda3/envs/rpa/lib/python3.8/site-packages/yacs/config.py", line 141, in getattr raise AttributeError(name) AttributeError: preprocess_config

    Bug S2T 
    opened by shumeirao 1
  • [S2T] PaddleSpeech illegal instruction 4 on Apple Silicon M1

    [S2T] PaddleSpeech illegal instruction 4 on Apple Silicon M1

    This error is very persistent on Apple Silicon M1 - I have tried alternative installations with pip and docker with the same error.

    The safest installation due to better dependency checks seems to be with conda.

    Describe the bug After installation, any paddlespeech command throws the error message:

    paddlespeech help
    Illegal instruction: 4
    

    To Reproduce Steps to reproduce the behavior:

    CONDA_SUBDIR=osx-64 create -n paddle pip python=3.10 sox libsndfile swig bzip2
    conda activate paddle
    
    # install paddlepaddle with conda for osx64
    CONDA_SUBDIR=osx-64 conda install paddlepaddle --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
    
    # due to thrown error for invalid wheel for opencv-python
    pip install --upgrade pip --force
    pip install paddleocr --upgrade
    
    # due to installation issue https://github.com/PaddlePaddle/PaddleSpeech/issues/2687
    pip install paddleaudio==1.0.1
    
    # final paddlespeech installation
    pip install paddlespeech
    

    Expected behavior A clear and concise description of what you expected to happen.

    Screenshots If applicable, add screenshots to help explain your problem.

    Environment (please complete the following information):

    • OS: [e.g. Ubuntu]
    • GCC/G++ Version [e.g. 8.3]
    • Python Version [e.g. 3.7]
    • PaddlePaddle Version [e.g. 2.0.0]
    • Model Version [e.g. 2.0.0]
    • GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00]
    • CUDA/CUDNN Version [e.g. cuda-10.2]
    • MKL Version
    • TensorRT Version

    Additional context Add any other context about the problem here.

    Bug S2T 
    opened by agilebean 1
  • [TTS] JETS -> E2E FastSpeech2 + HiFiGAN

    [TTS] JETS -> E2E FastSpeech2 + HiFiGAN

    opened by yt605155624 0
  • [TTS] iSTFTNet -> speed up HiFiGAN !!

    [TTS] iSTFTNet -> speed up HiFiGAN !!

    opened by yt605155624 0
Releases(r1.3.0)
  • r1.3.0(Dec 14, 2022)

    HighLIght

    S2T

    • Support U2/U2++ Conformer dy2static, and U2/U2++ C++ High Performance Streaming ASR Deployment. @zh794390558
    • Add Wav2vec2ASR-en, wav2vec2.0 fine-tuning for ASR on LibriSpeech. @Zth9730
    • Add Whisper CLI and Demos, support multi language recognition and translation. @zxcd
    • Add Wav2vec2 CLI and Demos, support ASR and Feature Extraction. @Zth9730
    • Add whisper. #2640 #2704 by @zxcd
    • Fix gpu training hang. #2478 by @Zth9730
    • Support u2++ based cli and server. #2489 #2510 by @Zth9730
    • Add wav2vec2-en. #2518 #2527 #2637 by @Zth9730
    • Add wav2vec2-zh cli. #2697 by @Zth9730

    T2S

    • Add seek for BytesIO. https://github.com/PaddlePaddle/PaddleSpeech/pull/2484 by @ZapBird
    • Add mix finetune. https://github.com/PaddlePaddle/PaddleSpeech/pull/2525 https://github.com/PaddlePaddle/PaddleSpeech/pull/2647 by @lym0302
    • Add streaming TTS fastdeploy serving. https://github.com/PaddlePaddle/PaddleSpeech/pull/2528 by @HexToString
    • Add SSML for Chinese Text Frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2531 by @david-95
    • Add end-to-end Prosody Prediction pipeline (including using prosody labels in Acoustic Model). https://github.com/PaddlePaddle/PaddleSpeech/pull/2548 https://github.com/PaddlePaddle/PaddleSpeech/pull/2615 https://github.com/PaddlePaddle/PaddleSpeech/pull/2693 by @WongLaw
    • Add Adversarial Loss for Chinese English mixed TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2588 by @lym0302
    • Fix frontend bugs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2539 https://github.com/PaddlePaddle/PaddleSpeech/pull/2606 by @yt605155624
    • Add TN for English unit. https://github.com/PaddlePaddle/PaddleSpeech/pull/2629 by @WongLaw
    • Add male voice for TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2660 by @lym0302
    • Add double byte char for zh normalization. https://github.com/PaddlePaddle/PaddleSpeech/pull/2661 by @david-95
    • Add TTS Paddle-Lite x86 inference. https://github.com/PaddlePaddle/PaddleSpeech/pull/2636 https://github.com/PaddlePaddle/PaddleSpeech/pull/2667 by @yt605155624
    • Add greek char and fix #2571. https://github.com/PaddlePaddle/PaddleSpeech/pull/2683 by @david-95
    • Add Slim for TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2729 by @yt605155624

    Audio

    • Move paddlespeech/audio to paddleaudio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2706 by @SmileGoat

    Demo

    • Add TTSAndroid demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2703 by @yt605155624

    New Contributors

    • @ZapBird made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2484
    • @HexToString made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2528
    • @dahu1 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2554
    • @kFoodie made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2664
    • @zxcd made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2640
    • @michael-skynorth made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2666
    • @heyudage made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2688

    Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.2.0...r1.3.0

    Source code(tar.gz)
    Source code(zip)
  • r1.2.0(Oct 10, 2022)

    S2T

    • Fix conformer/transformer multi GPU training. #2327 #2334 #2336 #2372 by @Zth9730
    • Fix deepspeech2 decode_wav. #2351 by @Zth9730
    • Support BiTransformer decoder. #2415 by @Zth9730

    T2S

    • Update VITS to support VITS and its voice cloning training on AISHELL-3. #2268 by @HighCWu
    • Add ERNIE-SAT synthesize_e2e. #2287 #2316 #2355 #2378 #2432 by @yt605155624
    • Specify the input data type of G2PW. #2288 by @kslz
    • Add TTS finetune example. #2297 #2385 #2418 #2430 by @lym0302
    • Fix Chinese English mixed TTS frontend. #2299 #2493 by @lym0302
    • Add words into polyphonic.yaml for g2pW. #2300 by @david-95
    • Update the quantifier unit in Text Normalization. #2308 by @pengzhendong
    • Fix Chinese frontend bugs. #2312 #2323 by @david-95
    • Add AISHELL-3 Voice Cloning with ECAPA-TDNN speaker encoder. #2359 #2429 by @yt605155624
    • Add pre-install doc for G2P and TN, update version of pypinyin. #2364 by @WongLaw
    • Add tools to compare two test results of G2P to show differences. #2367 by @david-95
    • Revise must_neural_tone_words. #2370 by @WongLaw
    • Add type-hint for g2pW. #2390 by @yt605155624
    • Replaced fixed path with path variable in MFA. #2416 by @WongLaw
    • Solve "unknown format: 3" for wavfile.write(). #2422 by @zhoupc2015

    Text

    • Create preprocess.py for Punctuation Restoration. #2295 by @THUzyt21

    Demo

    • Add Voice Cloning, TTS finetune, and ERNIE-SAT in speech_web. #2412 #2451 by @iftaken

    Server

    • Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21
    • Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw

    Doc

    • Add Chinese doc and language switcher for metaverse, style_fs2 and story_talker. #2357 by @WongLaw
    • Update API docs. #2406 by @yt605155624
    • Add finetune demos in readthedocs. #2411 by @yt605155624

    Test

    • Add barrier for distributed training using multiple machines. #2309 #2311 by @sneaxiy
    • Fix prepare.sh for PWGAN TIPC. #2376 by @yuehuayingxueluo

    Other

    • Format paddlespeech with pre-commit. #2331 by @yt605155624

    Acknowledgements

    Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat

    New Contributors

    • @HighCWu made their first contribution in #2268
    • @pengzhendong made their first contribution in #2308
    • @Zth9730 made their first contribution in #2327
    • @WongLaw made their first contribution in #2357
    • @yuehuayingxueluo made their first contribution in #2376
    • @zhoupc2015 made their first contribution in #2422

    Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.1.0...r1.2.0

    Source code(tar.gz)
    Source code(zip)
  • r1.1.0(Aug 19, 2022)

    S2T

    • Add wer tools. https://github.com/PaddlePaddle/PaddleSpeech/pull/1709
    • Add optimize attention cache used for attention ; 0-dim tensor for model export. https://github.com/PaddlePaddle/PaddleSpeech/pull/2124
    • Fix cnn cache dy2st shape. https://github.com/PaddlePaddle/PaddleSpeech/pull/2168

    TTS

    • Fix random speaker embedding bug in voice clone. https://github.com/PaddlePaddle/PaddleSpeech/pull/1828 by @jerryuhoo
    • Add VITS model. https://github.com/PaddlePaddle/PaddleSpeech/pull/1855 https://github.com/PaddlePaddle/PaddleSpeech/pull/1957 https://github.com/PaddlePaddle/PaddleSpeech/pull/2040
    • Add kunlun support for speedyspeech. https://github.com/PaddlePaddle/PaddleSpeech/pull/1879 by @QingshuChen
    • Normalize wav max value to 1 in preprocess. https://github.com/PaddlePaddle/PaddleSpeech/pull/1887 by @jerryuhoo
    • Remove fluid dependence in TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/1940
    • Add onnx models for aishell3/ljspeech/vctk's tts3/voc1/voc5. https://github.com/PaddlePaddle/PaddleSpeech/pull/2068
    • Add TTS static/onnx models in pretrained_models.py. https://github.com/PaddlePaddle/PaddleSpeech/pull/2074
    • Add Ernie SAT model. https://github.com/PaddlePaddle/PaddleSpeech/pull/2052 https://github.com/PaddlePaddle/PaddleSpeech/pull/2117
    • Add Chinese English mixed TTS frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2143
    • Add Chinese English mixed TTS example. https://github.com/PaddlePaddle/PaddleSpeech/pull/2234
    • Fix English text frontend bug. https://github.com/PaddlePaddle/PaddleSpeech/pull/2235 by @david-95
    • Add g2pW to Chinese frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2230 by @BarryKCL
    • Fix text frontend bugs. https://github.com/PaddlePaddle/PaddleSpeech/pull/1912 https://github.com/PaddlePaddle/PaddleSpeech/pull/2250 https://github.com/PaddlePaddle/PaddleSpeech/pull/2254 https://github.com/PaddlePaddle/PaddleSpeech/pull/2255 https://github.com/PaddlePaddle/PaddleSpeech/pull/2272

    Speechx

    • add custom asr script. https://github.com/PaddlePaddle/PaddleSpeech/pull/1946
    • refactor frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2003
    • deepspeech2 to onnx https://github.com/PaddlePaddle/PaddleSpeech/pull/2034
    • Refactor audio/data/feature cache. https://github.com/PaddlePaddle/PaddleSpeech/pull/1638
    • Frontend refactor . https://github.com/PaddlePaddle/PaddleSpeech/pull/1640
    • Fix nnet itf header. https://github.com/PaddlePaddle/PaddleSpeech/pull/1641
    • Refactor speech egs. https://github.com/PaddlePaddle/PaddleSpeech/pull/1707
    • Refactor egs and more egs for TLG wfst graph build. https://github.com/PaddlePaddle/PaddleSpeech/pull/1715
    • Speedup ngram building . https://github.com/PaddlePaddle/PaddleSpeech/pull/1729
    • Update speechx install doc. https://github.com/PaddlePaddle/PaddleSpeech/pull/1736
    • Fix nnet input and output name. https://github.com/PaddlePaddle/PaddleSpeech/pull/1740
    • Update wfst graph. https://github.com/PaddlePaddle/PaddleSpeech/pull/1742
    • Fix model params path name. https://github.com/PaddlePaddle/PaddleSpeech/pull/1750
    • Remove fluid tools for onnx export. https://github.com/PaddlePaddle/PaddleSpeech/pull/2116

    Audio

    • Refactor paddleaudio to paddlespeech.audio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2007
    • Add webdataset in paddlespeech.audio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2062

    Server

    • Remove extra logs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2111 https://github.com/PaddlePaddle/PaddleSpeech/pull/2113
    • Change streaming tts servers' fs from 24k to models' fs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2121
    • Fix bug in engine_warmup. https://github.com/PaddlePaddle/PaddleSpeech/pull/2171 by @Betterman-qs
    • Replace default vocoder in seerver to mb_melgan. https://github.com/PaddlePaddle/PaddleSpeech/pull/2214
    • Fix bug in streaming_asr_server with punctuation restoration. https://github.com/PaddlePaddle/PaddleSpeech/pull/2244
    • Rename time_s and time_ns to time_b and time_nb. https://github.com/PaddlePaddle/PaddleSpeech/pull/2133
    • More accuracy decoding somthing. https://github.com/PaddlePaddle/PaddleSpeech/pull/2128

    CLI

    • Add paddlespeech.resource module. https://github.com/PaddlePaddle/PaddleSpeech/pull/1917
    • Dynamic cli commands registration. https://github.com/PaddlePaddle/PaddleSpeech/pull/1959
    • Fix unnecessary download. https://github.com/PaddlePaddle/PaddleSpeech/pull/2103
    • Remove extra logs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2084 https://github.com/PaddlePaddle/PaddleSpeech/pull/2085 https://github.com/PaddlePaddle/PaddleSpeech/pull/2107
    • Add Chinese English mixed TTS CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2249
    • Add onnxruntime infer for CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2222

    Demo

    • Add speech web demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2039 https://github.com/PaddlePaddle/PaddleSpeech/pull/2080
    • Add kws cli and demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2063
    • Use paddle web for streaming asr. https://github.com/PaddlePaddle/PaddleSpeech/pull/2105
    • add custom asr script https://github.com/PaddlePaddle/PaddleSpeech/pull/1946
    • More cli for speech demos. https://github.com/PaddlePaddle/PaddleSpeech/pull/2138

    Doc

    • Add API doc. https://github.com/PaddlePaddle/PaddleSpeech/pull/2075
    • Format tts doc string for read the docs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2115

    Others

    • Fix CPU Dockerfile. https://github.com/PaddlePaddle/PaddleSpeech/pull/2172 by @BrightXiaoHan
    • Add PaddleSpeech Dockerfile for hard mode of installation. https://github.com/PaddlePaddle/PaddleSpeech/pull/2127 by @buchongyu2

    Acknowledgements

    Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624

    New Contributors

    • @QingshuChen made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1879
    • @Zhangjingyu06 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1951
    • @ryanrussell made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1976
    • @freeliuzc made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2044
    • @vpegasus made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2043
    • @dependabot made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2061
    • @raycool made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2109
    • @YDX-2147483647 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2125
    • @chenkui164 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2130
    • @0x45f made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2162
    • @Doubledongli made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2167
    • @Betterman-qs made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2171
    • @BrightXiaoHan made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2172
    • @THUzyt21 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2202
    • @david-95 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2235
    • @BarryKCL made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2230

    Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.0.0...r1.1.0

    Source code(tar.gz)
    Source code(zip)
  • r1.0.0(May 13, 2022)

    Highlight

    More

    ASR

    • DeepSpeech2 streaming model aishell cer 6.66%
    • DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)
    • Conformer aishell cer 4.64%
    • Conformer streaming model aishell cer 5.44%
    • Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)

    Speechx

    KWS

    • [KWS] Add kws example on HeySnips dataset. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1558
    • [KWS] Update KWS example. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1783

    Audio

    • [Audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1758
    • [Audio] Fix mcd issue. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1658
    • [Audio] Remove mcd. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1659
    • [Audio] Add VoxCeleb dataset for speaker recognition.
    • [Audio] Add HeySnips dataset for keyword spotting.

    What's Changed

    • [R1.0][asr][server]add vector server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1845
    • [R1.0][asr][server]join streaming asr and punc server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1846
    • [R1.0]asr streaming server add time stamp by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1850
    • [R1.0][tts][server] update readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1852
    • [R1.0] update cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1854
    • [r1.0] update version to r1.0.0 by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1857
    • [R1.0] Add doc for wenetspeech model (ds2 online, conformer online) by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1862
    • [R1.0][server] improve server code by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1866
    • [R1.0][asr][server]update the streaming asr readme by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1871
    • [R1.0] Updata released model info ( Wenetspeech ds2 online, conformer online) by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1869
    • [R1.0]fix server doc and decode_method by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1889
    • [speechx] add custom_streaming_asr @SmileGoat #1891
    • [speechx] speedup ngram building @zh794390558 #1729
    • [speechx] refactor egs and more egs for TLG wfst graph build @zh794390558 #1715
    • [speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn @SmileGoat #1676
    • [speechx] Add websocket & make it work @SmileGoat #1720
    • [speechx] Frontend refactor @SmileGoat #1640
    • [Speechx] add tlg decoder @SmileGoat #1599

    Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.0.0a...r1.0.0

    Source code(tar.gz)
    Source code(zip)
  • r1.0.0a(Apr 28, 2022)

    Highlight

    • Release Streaming ASR and Streaming TTS system for industrial application.
    • Support KWS model
    • Deepspeech2 streaming model aishell cer 6.66%
    • Conformer aishell cer 4.64%
    • Conformer streaming model aishell cer 5.44%
    • SpeechX Deepspeech2 streaming with WFST

    What's Changed

    • [speechx] refactor audio/data/feature cache by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1638
    • [speechx] Frontend refactor by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1640
    • [speechx] fix nnet itf header by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1641
    • [TTS]add license and reference for some models by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1642
    • [Doc] supplement note by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1643
    • [vec][search] update search demo README by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1644
    • [speechx]refactor linear feature:unify vector & remove redundant function & add remained_wav cache shift wav by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1649
    • [Audio] Fix mcd issue. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1658
    • [Audio] Remove mcd. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1659
    • [vec]update the speaker verification model by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1663
    • [ASR] update ds2 online model by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1668
    • [TTS]fix preprocess bug, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1660
    • update README, test=doc by @iftaken in https://github.com/PaddlePaddle/PaddleSpeech/pull/1672
    • [Punc] Update RESULTS.md. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1675
    • [CLI] update ds2 online model in cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1674
    • [CLI] ASR: Add duration limitation for asr by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1666
    • [vec]add speaker verification score method by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1646
    • [TTS]add onnx inference for fastspeech2 + hifigan/mb_melgan by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1665
    • [doc]update readme by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1680
    • [WebSocket] fixed online model md5 error , test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1682
    • [speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1676
    • [server] add stream tts server by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1652
    • [speechx]remove mutable in audio_cache by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1687
    • [Doc] update readem for aishell/asr0 by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1677
    • [vec] add speaker diarization pipeline by @ccrrong in https://github.com/PaddlePaddle/PaddleSpeech/pull/1651
    • [vec]voxceleb convert dataset format to paddlespeech by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1630
    • [Speechx] add tlg decoder by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1599
    • [vec]add vector necessary note, test=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1690
    • Revert "[WebSocket] fixed online model md5 error , test=doc" by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1691
    • [WebSocket] added online web client, test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1692
    • 修复 example/aishell 目录中speech单词拼写错误问题 by @buchongyu2 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1694
    • 修改hack 单词拼写错误 by @buchongyu2 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1697
    • [TTS]change NLC to NCL in speedyspeech, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1693
    • [doc]fix typo, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1698
    • [doc]add pwgan onnx model, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1700
    • [WebSocket] added online asr doc and online asr command line, test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1701
    • [vec][server] vpr demo support by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1696
    • [speechx] refactor speech egs by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1707
    • [asr]add wer tools by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1709
    • [asr][websocket]fix the ws send bug, cache buffer, text=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1710
    • [TTS]add fastspeech2 cnndecoder onnx model by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1712
    • [speechx] refactor egs and more egs for TLG wfst graph build by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1715
    • [vec][score] add plda model by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1681
    • [CLI]update cli, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1716
    • [server] add streaming am infer by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1713
    • [speechx] Add websocket & make it work by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1720
    • [asr][websocket] add asr conformer websocket server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1704
    • [vec][loss] add NCE Loss from RNNLM by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1719
    • [vec][loss] add FocalLoss to deal with class imbalances by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1722
    • [TTS]restructure syn_utils.py, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1723
    • [TTS]add paddle device set for ort and inference by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1727
    • [vec] add GRL to domain adaptation by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1725
    • [speechx] speedup ngram building by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1729
    • [asr] Add new cer tools by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1673
    • [speechx]add websocket lib by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1732
    • [speechx]update speechx install doc by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1736
    • [Doc] prefect the packing scripts by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1735
    • [Doc]renew the released mode by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1739
    • [asr][websocket]add streaming asr demo by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1737
    • [speechx] fix nnet input and output name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1740
    • [ASR] remove redundant log by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1741
    • [speechx] update wfst graph by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1742
    • [speechx] Add recognizer_test_main script by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1743
    • [vec][doc]update the voxceleb readme.md, test=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1744
    • [ASR] fix CER tools by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1747
    • [Doc] Fix release_model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1746
    • [Doc] Updata released model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1748
    • Updata released model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1749
    • [speechx] fix model params path name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1750
    • [speechx] fix linear-spectrogram-wo-db-norm-ol read feature issue by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1751
    • [TTS]fix wavernn white noise bug for paddle develop(2.3) by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1752
    • [server] add onnx tts engine by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1733
    • [TTS]Update paddle2onnx by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1754
    • [Setup] to r1.0.0a by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1759
    • [audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1758
    • [speechx] to_float32, fix shell script by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1757
    • [vec] bug fix to adapt VUE by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1760
    • [asr][weboscket]fix the streaming asr server bug, server client by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1761
    • [speechx] fbank and mfcc by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1765
    • format code by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1764
    • [CLI] Add conformer_aishell, conformer_online_aishell by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1767
    • [speechx]make cmvn global in run.sh by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1768
    • [ASR] ds2: add log_interval and fix lr problem when resume training by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1766
    • [speechx] set nnet param by flags by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1769
    • [server] add streaming tts demos by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1771
    • [server] fix tts streaming server by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1774
    • [KWS]Add kws example on HeySnips dataset. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1558
    • [text][server]add text punc server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1772
    • [ASR] fix asr cli infer by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1770
    • [vec] add GE2E to support unlabeled data training by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1731
    • [ASR] fix time restricion in test_cli.sh by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1777
    • [ASR] Replace fbank by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1776
    • [CLI] add color for test_cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1778
    • [speechx] add sucess log in run.sh by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1779
    • [KWS]Update KWS example. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1783
    • [server] update readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1782
    • [Doc] Update ds2online model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1781
    • [CLI] renew ds2 online model by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1786
    • [speechx] fix speechx ws server to return dummpy partial result by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1787
    • [asr][server]asr client add punctuatjion server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1784
    • [asr] patch func to var by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1788
    • [asr][server]fix client parse the asr result bug by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1789
    • [Bug fix] fix test_cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1794
    • [vec] update readme by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1796
    • [R1.0]update the streaming output and punc default ip, port by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1800
    • Renew ds2 online model [cer 6.66%] by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1802
    • [R1.0] update the streaming asr server readme by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1810
    • [R1.0] Renew ds2 online doc info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1809
    • [server] update streaming demos readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1806
    • [R1.0]update the paddlespeech_client asr_online cli by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1818
    • [r1.0][doc] fix readme by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1825

    New Contributors

    • @iftaken made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1672
    • @ccrrong made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1651
    • @buchongyu2 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1694

    Acknowledgements

    Special thanks to @zh794390558 @Honei @Jackwaterveg @lym0302 @qingen @GT-ZhangAcer @yt605155624 @WilliamZhang06 @SmileGoat @ccrrong

    Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r0.2.0...r1.0.0a

    Source code(tar.gz)
    Source code(zip)
  • r0.2.0(Apr 1, 2022)

    S2T

    • Replace kaidi_fbank with paddleaudio #1612
    • Support CTC decoder online #821 #1626
    • Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577

    TTS

    • Add SpeedySpeech multi-speaker support for synthesize_e2e.py. https://github.com/PaddlePaddle/PaddleSpeech/pull/1370 by @jerryuhoo
    • Add WaveRNN for CSMSC dataset. https://github.com/PaddlePaddle/PaddleSpeech/pull/1379
    • Add Tacotron2 for CSMSC / LJSpeech datasets. https://github.com/PaddlePaddle/PaddleSpeech/pull/1314 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1416
    • Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. https://github.com/PaddlePaddle/PaddleSpeech/pull/1419
    • Update text frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/1506
    • Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. https://github.com/PaddlePaddle/PaddleSpeech/pull/1549 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1581 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1587
    • Add NPU support for TransformerTTS. #1593 by @windstamp
    • Add CNN Decoder for Streaming Fastspeech2. https://github.com/PaddlePaddle/PaddleSpeech/pull/1634

    Audio

    • Add paddleaudio.compliance modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518
    • Unittest and benchmark for audio feature APIs. #1548
    • [Audio] - [audio] refactor audio arch #1494 by @zh794390558
    • [Audio] - [audio] dtw metric #1493 by @zh794390558
    • [Audio] - [audio] fix complicance bug #1597 by @zh794390558

    Deployment

    • [Deployment] - [speechx] high performance inference of speech task #1496 by @SmileGoat @zh794390558
    • [Deployment] - [Speechx]fix normalizer bug #1600 #1621 #1619 #1633 #1635 #1619 by @SmileGoat
    • [Deployment] - [speechx] refactor speechx #1631 #1616 #1576 #1572 #1541 by @zh794390558
    • [Deployment] - [speechx] simplify cmake compiler #1538 #1536 #1535 by @zh794390558

    server

    • [server] - [websocket] added online asr engine #1627 by @WilliamZhang06
    • [server] - [server] added engine type and asr inference #1475 by @WilliamZhang06
    • [server] - [Server] added asr engine #1413 by @WilliamZhang06
    • [server] - [Server] added engine factory and config #1399 by @WilliamZhang06
    • [server] - [server] added engine framework #1383 by @WilliamZhang06
    • [server] - [server] update readme #1604 by @lym0302
    • [server] - [server] add server cls #1554 by @lym0302
    • [server] - [server] add paddlespeech_server stats #1510 by @lym0302
    • [server] - [server] add cli #1466 by @lym0302
    • [server] - [server] add tts postprocess #1411 by @lym0302
    • [server] - [server] tts server #1386 by @lym0302

    vector

    • [vector] - [vector] ecapa-tdnn on voxceleb #1523 by @Honei

    CLI

    • Batch input supported. #1460
    • TTS: Add WaveRNN for CSMSC dataset.
    • TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
    • Vector: add speaker verification demo and doc #1605 by @Honei

    Demo

    • [Demo] - [vec][search] update client image url #1628 by @qingen
    • [Demo] - [server] add server demo #1480 by @lym0302
    • [Demo] - [vec][search] add audio similarity search #1609 by @qingen

    Acknowledgements

    Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @Honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen

    Source code(tar.gz)
    Source code(zip)
  • r0.1.2(Feb 25, 2022)

  • r0.1.1(Jan 14, 2022)

    New Features

    CLI :

    • Add cli stats. #1274
    • Add unit test. #1321
    • ASR: Support English: Add transformer_libirspeech model. #1297
    • ASR: Support 4 decoding methods: ctc_greedy_search, ctc_beam_search, attention, attention_rescoring. #1297
    • ASR & ST: Use the unified config. #1305 / #1312
    • ASR: Refactor the code. #1260 by @AdamBear
    • TTS: Support long input text by default. #1241
    • TTS: Add Style MelGAN and HiFiGAN. #1241

    ASR

    • Refactor configs in examples. #1225

    TTS

    • Fix some frontend bugs. #1262 by @JiehangXie / #1310
    • Add speaker embedding and speaker id for style fastspeech2 inference. #1197 by @jerryuhoo
    • Add support for finetuning speedyspeech. #1302 by @jerryuhoo / #1322 / #1337
    • Update VCTK Parallel WaveGAN. #1294
    • Update Multi Band MelGAN. #1272

    ST

    • Refactor configs in examples. #1225

    Text

    • Refactor Punctuation Restoration example. #1215

    Docs

    • Add topic note for releasing python packages
    • Add TTS papers. #1330
    • Add Frontend G2P topic. #1254

    Others

    • Update released models and results. #1306

    Acknowledgements

    @zh794390558 @yt605155624 @Jackwaterveg @KPatr1ck @Mingxue-Xu @JiehangXie @grasswolfs @jerryuhoo @AdamBear @LittleChenCc @JamesLim-sy

    Source code(tar.gz)
    Source code(zip)
  • r0.1.0(Dec 23, 2021)

    Features

    CLI : New Feature

    • Easy install by pip pip install paddlespeech
    • CLI to quick explore ASR, TTS, audio classification, speech translation and punctuation restoration.

    ASR

    • Join CTC LM decoder
    • Transformer LM model
    • Improve DeepSpeech2 online model
    • Refactor some configs

    TTS

    CLS

    • Add audio classification example on ESC-50 and custom dataset.
    • Add audio tagging demo based on PANNs and Audioset labels.

    ST

    • ST-MTL
    • FAT-ST-MTL

    Docs

    • Add quick start
    • Add read the doc
    • Improve installation documentation
    • Add README for each example

    Demos

    • Audio_tagging
    • Automatic_video_subtitiles
    • Metaverse
    • Punctuation_restoration
    • Speech_recognition
    • Speech_translation
    • Story_talker
    • Style_fs2
    • Text_to_speech

    Others

    • Update released models and results

    Acknowledgements

    @zh794390558 @KPatr1ck @Jackwaterveg @yt605155624 @Mingxue-Xu @grasswolfs @jerryuhoo

    Source code(tar.gz)
    Source code(zip)
  • v2.1.1(Aug 16, 2021)

    1. ctc alignment
    2. refactor data pipeline
    3. autolog for deepspeech test
    4. refactor checkpoint save/load
    5. deepspeech online model
    6. mfa alignment example
    7. add text normaliztion example
    8. TLG for aishell
    9. more dataest: thchs30, aidatatang, timit etc.
    10. 8k speech example
    11. ted en-zh st example
    12. more utils
    Source code(tar.gz)
    Source code(zip)
  • v2.1.0(Jun 29, 2021)

  • v1.1(Feb 25, 2021)

  • v1.0(Feb 25, 2021)

Owner
null
Asr abc - Automatic speech recognition(ASR),中文语音识别

语音识别的简单示例,主要在课堂演示使用 创建python虚拟环境 在linux 和macos 上验证通过 # 如果已经有pyhon3.6 环境,跳过该步骤,使用

LIyong.Guo 8 Nov 11, 2022
Ukrainian TTS (text-to-speech) using Coqui TTS

title emoji colorFrom colorTo sdk app_file pinned Ukrainian TTS ?? green green gradio app.py false Ukrainian TTS ?? ?? Ukrainian TTS (text-to-speech)

Yurii Paniv 85 Dec 26, 2022
Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

VAD-SLI-ASR Python scripts for a speech processing pipeline with Voice Activity

Dynamics of Language 14 Dec 9, 2022
An end to end ASR Transformer model training repo

END TO END ASR TRANSFORMER 本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统 Model Instructions 1.数据准备: 自行下载数据,遵循文件结构如下: ├── data │ ├── train │

旷视天元 MegEngine 10 Jul 19, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
vits chinese, tts chinese, tts mandarin

vits chinese, tts chinese, tts mandarin 史上训练最简单,音质最好的语音合成系统

AmorTX 12 Dec 14, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

st3 STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch. Currently it supports converting pbmm models to pt scripts with integra

Vlad Ki 8 Oct 18, 2021
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Mutian He 19 Oct 14, 2022
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

Machine Translation @ UPC 21 Dec 20, 2022
API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

gpt-j-api ?? An API to interact with the GPT-J language model. You can use and test the model in two different ways: Streamlit web app at http://api.v

Víctor Gallego 276 Dec 31, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 86 Jun 11, 2021
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

?? Contributing to OpenSpeech ?? OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

Openspeech TEAM 513 Jan 3, 2023
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 3, 2023