DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

Last update: Jan 3, 2023

Related tags

Text Data & NLP text-to-speech transformer speech-recognition speech-to-text ngram language-model callcenter conformer deepspeech speech-translation punctuation-restoration ctc-decode fastspeech2 parallel-wavegan mandarin-language text-frontend streaming-asr speech-alignment

Overview

(简体中文|English)

Quick Start | Documents | Models List

PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

Speech Recognition

Input Audio	Recognition Result
	I knocked at the door on the ancient side of the building.
	我认为跑步最重要的就是给我带来了身体健康。

Speech Translation (English to Chinese)

Input Audio	Translations Result
	我在这栋建筑的古老门上敲门。

Text-to-Speech

Input Text	Synthetic Audio
Life was like a box of chocolates, you never know what you're gonna get.
早上好，今天是2020/10/29，最低温度是-3°C。
季姬寂，集鸡，鸡即棘鸡。棘鸡饥叽，季姬及箕稷济鸡。鸡既济，跻姬笈，季姬忌，急咭鸡，鸡急，继圾几，季姬急，即籍箕击鸡，箕疾击几伎，伎即齑，鸡叽集几基，季姬急极屐击鸡，鸡既殛，季姬激，即记《季姬击鸡记》。

For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.

Punctuation Restoration

Input Text	Output Text
今天的天气真不错啊你下午有空吗我想约你一起去吃饭	今天的天气真不错啊！你下午有空吗？我想约你一起去吃饭。

🔥 Hot Activities

2021.12.21~12.24

4 Days Live Courses: Depth interpretation of PaddleSpeech!

Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130

Features

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:

📦 Ease of Use: low barriers to install, and CLI is available to quick-start your journey.
🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology.
💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
Varieties of Functions that Vitalize both Industrial and Academia:
- 🛎️ Implementation of critical audio tasks: this toolkit contains audio functions like Audio Classification, Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, etc.
- 🔬 Integration of mainstream models and datasets: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also model list for more details.
- 🧩 Cascaded models application: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

Recent Update

🤗 2021.12.14: Our PaddleSpeech ASR and TTS Demos on Hugging Face Spaces are available!
👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.

Community

Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation.

Installation

We strongly recommend our users to install PaddleSpeech in Linux with python>=3.7. Up to now, Linux supports CLI for the all our tasks, Mac OSX and Windows only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install PaddleSpeech, please see installation.

Quick Start

Developers can have a try of our models with PaddleSpeech Command Line. Change --input to test your own audio/text.

Audio Classification

paddlespeech cls --input input.wav

Automatic Speech Recognition

paddlespeech asr --lang zh --input input_16k.wav

Speech Translation (English to Chinese)

(not support for Mac and Windows now)

paddlespeech st --input input_16k.wav

Text-to-Speech

paddlespeech tts --input "你好，欢迎使用飞桨深度学习框架！" --output output.wav

web demo for Text to Speech is integrated to Huggingface Spaces with Gradio. See Demo: TTS Demo

Text Postprocessing

Punctuation Restoration

paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭

For more command lines, please see: demos

If you want to try more functions like training and tuning, please have a look at Speech-to-Text Quick Start and Text-to-Speech Quick Start.

Model List

PaddleSpeech supports a series of most popular models. They are summarized in released models and attached with available pretrained models.

Speech-to-Text contains Acoustic Model, Language Model, and Speech Translation, with the following details:

Speech-to-Text Module Type	Dataset	Model Type	Link
Speech Recogination	Aishell	DeepSpeech2 RNN + Conv based Models	deepspeech2-aishell
	Aishell	Transformer based Attention Models	u2.transformer.conformer-aishell
	Librispeech	Transformer based Attention Models	deepspeech2-librispeech / transformer.conformer.u2-librispeech / transformer.conformer.u2-kaldi-librispeech
Alignment	THCHS30	MFA	mfa-thchs30
Language Model	Ngram Language Model		kenlm
Language Model	TIMIT	Unified Streaming & Non-streaming Two-pass	u2-timit
Speech Translation (English to Chinese)	TED En-Zh	Transformer + ASR MTL	transformer-ted
Speech Translation (English to Chinese)	TED En-Zh	FAT + Transformer + ASR MTL	fat-st-ted

Text-to-Speech in PaddleSpeech mainly contains three modules: Text Frontend, Acoustic Model and Vocoder. Acoustic Model and Vocoder models are listed as follow:

Text-to-Speech Module Type	Model Type	Dataset	Link
Text Frontend			tn / g2p
Acoustic Model	Tacotron2	LJSpeech	tacotron2-ljspeech
	Transformer TTS	LJSpeech	transformer-ljspeech
	SpeedySpeech	CSMSC	speedyspeech-csmsc
	FastSpeech2	AISHELL-3 / VCTK / LJSpeech / CSMSC	fastspeech2-aishell3 / fastspeech2-vctk / fastspeech2-ljspeech / fastspeech2-csmsc
Vocoder	WaveFlow	LJSpeech	waveflow-ljspeech
	Parallel WaveGAN	LJSpeech / VCTK / CSMSC	PWGAN-ljspeech / PWGAN-vctk / PWGAN-csmsc
	Multi Band MelGAN	CSMSC	Multi Band MelGAN-csmsc
	Style MelGAN	CSMSC	Style MelGAN-csmsc
	HiFiGAN	CSMSC	HiFiGAN-csmsc
Voice Cloning	GE2E	Librispeech, etc.	ge2e
	GE2E + Tactron2	AISHELL-3	ge2e-tactron2-aishell3
	GE2E + FastSpeech2	AISHELL-3	ge2e-fastspeech2-aishell3

Audio Classification

Task	Dataset	Model Type	Link
Audio Classification	ESC-50	PANN	pann-esc50

Punctuation Restoration

Task	Dataset	Model Type	Link
Punctuation Restoration	IWLST2012_zh	Ernie Linear	iwslt2012-punc0

Documents

Normally, Speech SoTA, Audio SoTA and Music SoTA give you an overview of the hot academic topics in the related area. To focus on the tasks in PaddleSpeech, you will find the following guidelines are helpful to grasp the core ideas.

The Text-to-Speech module is originally called Parakeet, and now merged with this repository. If you are interested in academic research about this task, please see TTS research overview. Also, this document is a good guideline for the pipeline components.

Citation

To cite PaddleSpeech for research, please use the following format.

@misc{ppspeech2021,
title={PaddleSpeech, a toolkit for audio processing based on PaddlePaddle.},
author={PaddlePaddle Authors},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}},
year={2021}
}

Contribute to PaddleSpeech

You are warmly welcome to submit questions in discussions and bug reports in issues! Also, we highly appreciate if you are willing to contribute to this project!

Contributors

Acknowledgement

Many thanks to yeyupiaoling/PPASR/PaddlePaddle-DeepSpeech/VoiceprintRecognition-PaddlePaddle/AudioClassification-PaddlePaddle for years of attention, constructive advice and great help.
Many thanks to AK391 for TTS web demo on Huggingface Spaces using Gradio.
Many thanks to mymagicpower for the Java implementation of ASR upon short and long audio files.
Many thanks to JiehangXie/PaddleBoBo for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
Many thanks to 745165806/PaddleSpeechTask for contributing Punctuation Restoration model.

Besides, PaddleSpeech depends on a lot of open source repositories. See references for more information.

License

PaddleSpeech is provided under the Apache-2.0 License.

Comments

中文的deploy问题

终于，我在docker中启动了服务器和客户端，然后说了一段中文，出现这样的错误： Exception happened during processing of request from ('127.0.0.1', 59312) Traceback (most recent call last): File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock self.process_request(request, client_address) File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request self.finish_request(request, client_address) File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python2.7/SocketServer.py", line 652, in init self.handle() File "deploy/demo_server.py", line 108, in handle (finish_time - start_time, transcript)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-48: ordinal not in range(128) 我个人觉得，可以把识别结果存储到一个文件中，没必要打印出来，当然，如果作者可以解决打印的问题，就更好了。

opened by yyhlvdl 43
安装PaddleSpeech相关问题讨论（Windows）

无论使用怎样的安装方法，所需的C++也安装了，总是报错：

Failed to build pyworld webrtcvad bottleneck ERROR: Could not build wheels for pyworld, bottleneck, which is required to install pyproject.toml-based projects

网上找了各种方法都不成功，诚心求教！
Installation

opened by qibinran 40

使用官方提供的模型作为预训练模型训练自己数据集报错

我使用如下的官方中文训练模型作为预训练模型训练自己的数据集，报错，

错误信息如下：

Traceback (most recent call last):
  File "train.py", line 118, in <module>
    main()
  File "train.py", line 114, in main
    train()
  File "train.py", line 109, in train
    test_off=args.test_off)
  File "/DeepSpeech/model_utils/model.py", line 307, in train
    pre_epoch = self.init_from_pretrained_model(exe, train_program)
  File "/DeepSpeech/model_utils/model.py", line 161, in init_from_pretrained_model
    filename="params.pdparams")
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 784, in load_params
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 668, in load_vars
    filename=filename)
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/io.py", line 727, in load_vars
    format(orig_shape, each_var.name, new_shape))
RuntimeError: Shape not matching: the Program requires a parameter with a shape of ((1312L, 3072L)), while the loaded parameter (namely [ layer_2_forward_fc_weight ]) has a shape of  ((1312, 6144)).
Failed in training!

opened by yeyupiaoling 35

download_lm_en.sh broken
Hi all,

Got error when trying run /models/lm/download_lm_en.sh to download http://paddlepaddle.bj.bcebos.com/model_zoo/speech/common_crawl_00.prune01111.trie.klm

{ "code": "AccountOverdue", "message": "Your request is denied because there is an overdue bill of your account.", "requestId": "4b684141-1175-4691-a9cd-52c458a94845" }
opened by haoqiang 30

aishell的deploy的问题

我直接使用你们发布的aishell模型，执行python deploy/demo_server.py，然后出现了错误：

root@095d9ada1b1d:/DeepSpeech# python deploy/demo_server.py
-----------  Configuration Arguments -----------
alpha: 2.15
beam_size: 500
beta: 0.35
cutoff_prob: 1.0
cutoff_top_n: 40
decoding_method: ctc_beam_search
host_ip: localhost
host_port: 8086
lang_model_path: models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: asset/preprocess/mean_std.npz
model_path: asset/train/params.tar.gz
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 2048
share_rnn_weights: False
specgram_type: linear
speech_save_dir: demo_cache
use_gpu: True
use_gru: True
vocab_path: asset/preprocess/vocab.txt
warmup_manifest: asset/preprocess/test
------------------------------------------------
I1205 10:14:34.175657    15 Util.cpp:166] commandline:  --use_gpu=True --trainer_count=1 
[INFO 2017-12-05 10:14:35,626 layers.py:2606] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 10:14:35,626 layers.py:3133] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 10:14:35,627 layers.py:7224] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2017-12-05 10:14:35,627 layers.py:2606] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-05 10:14:35,628 layers.py:3133] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2017-12-05 10:14:35,628 layers.py:7224] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
-----------------------------------------------------------
Warming up ...
('Warm-up Test Case %d: %s', 0, u'asset/data/aishell/wav/test/S0765/BAC009S0765W0205.wav')
[INFO 2017-12-05 10:14:42,337 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-05 10:14:50,941 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-05 10:14:50,941 model.py:242] end initializing scorer. Start decoding ...
Traceback (most recent call last):
  File "deploy/demo_server.py", line 224, in <module>
    main()
  File "deploy/demo_server.py", line 220, in main
    start_server()
  File "deploy/demo_server.py", line 204, in start_server
    num_test_cases=3)
  File "deploy/demo_server.py", line 143, in warm_up_test
    (finish_time - start_time, transcript))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-94: ordinal not in range(128)

于是，我将transcript注释掉，重新执行，然后可以继续了。只是

[INFO 2017-12-05 10:46:41,054 model.py:230] begin to initialize the external scorer for decoding
[INFO 2017-12-05 10:46:42,193 model.py:241] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2017-12-05 10:46:42,193 model.py:242] end initializing scorer. Start decoding ...
Response Time: 1174.020508
('Warm-up Test Case %d: %s', 1, u'asset/data/aishell/wav/test/S0767/BAC009S0767W0141.wav')

一个文件就需要1174s，这么长的时间，请问，有办法可以提速吗？

opened by yyhlvdl 26

mac下run_train.sh内存占用持续增加，有泄漏？

我想用自己的语料来训练DeepSpeech，训练过程系统内存占用持续增高，直至交换文件把磁盘撑爆。但是python进程本身的内存占用又没有增加，不知道是哪里吃的内存。

(paddle)loong@MacBook-Pro:~/l/lab/py/ml/baidu/wav on master$ sh run_train.sh 
-----------  Configuration Arguments -----------
augment_conf_path: arg.config
batch_size: 4
dev_manifest: data/manifest.train
init_model_path: None
is_local: 1
learning_rate: 5e-05
max_duration: 27.0
mean_std_path: data/mean_std.npz
min_duration: 0.0
num_conv_layers: 2
num_iter_print: 100
num_passes: 40
num_proc_data: 16
num_rnn_layers: 3
output_model_dir: ./models
rnn_layer_size: 1024
share_rnn_weights: 0
shuffle_method: batch_shuffle_clipped
specgram_type: linear
test_off: 0
train_manifest: data/manifest.train
trainer_count: 1
use_gpu: 0
use_gru: 0
use_sortagrad: 1
vocab_path: data/vocab.txt
------------------------------------------------
I0202 21:40:14.468683 2907198400 Util.cpp:166] commandline:  --use_gpu=0 --rnn_use_batch=True --log_clipping=True --trainer_count=1 
[INFO 2018-02-02 21:40:14,484 layers.py:2689] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-02-02 21:40:14,485 layers.py:3251] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-02-02 21:40:14,487 layers.py:7409] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-02-02 21:40:14,488 layers.py:2689] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-02-02 21:40:14,490 layers.py:3251] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-02-02 21:40:14,493 layers.py:7409] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
I0202 21:40:14.751354 2907198400 GradientMachine.cpp:94] Initing parameters..
I0202 21:40:15.287204 2907198400 GradientMachine.cpp:101] Init parameters done.

opened by kvinwang 19

bash run.sh

(base) root@a8e4df74e22d:/DeepSpeech/DeepSpeech-develop/examples/tiny# bash run.sh /root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py:297: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( Skip downloading and unpacking. Data already exists in /DeepSpeech/DeepSpeech-develop/examples/tiny/../..//examples/dataset/aishell. Creating manifest data/manifest ...

/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py:297: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( ----------- Configuration Arguments ----------- count_threshold: 0 manifest_paths: ['data/manifest.train', 'data/manifest.dev'] vocab_path: data/vocab.txt

/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py:297: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( ----------- Configuration Arguments ----------- manifest_path: data/manifest.train num_samples: 2000 output_path: data/mean_std.npz specgram_type: linear

Aishell data preparation done. using 2 gpus... /root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py:297: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default. warnings.warn( Traceback (most recent call last): File "/DeepSpeech/DeepSpeech-develop/examples/tiny/../..//deepspeech/exps/deepspeech2/bin/train.py", line 26, in from deepspeech.exps.deepspeech2.config import get_cfg_defaults File "/DeepSpeech/DeepSpeech-develop/deepspeech/exps/deepspeech2/config.py", line 16, in from deepspeech.models.deepspeech2 import DeepSpeech2Model File "/DeepSpeech/DeepSpeech-develop/deepspeech/models/deepspeech2.py", line 31, in from deepspeech.modules.ctc import CTCDecoder File "/DeepSpeech/DeepSpeech-develop/deepspeech/modules/ctc.py", line 23, in from deepspeech.decoders.swig_wrapper import Scorer File "/DeepSpeech/DeepSpeech-develop/deepspeech/decoders/swig_wrapper.py", line 16, in import swig_decoders ModuleNotFoundError: No module named 'swig_decoders'

opened by zhangyifei1 18

Parallel WaveGAN with CSMSC error

我参考这个案例进行训练 https://github.com/PaddlePaddle/PaddleSpeech/tree/19f67e1f564f1dcd49b89159b39bb4a34b7b6cdd/examples/csmsc/voc1

已经下载好了数据集

(base) root@ff21c21bf0ea:/opt/PaddleSpeech/examples/csmsc/voc1# ll ~/datasets/
total 1156
drwxr-sr-x 3 root   users 593920 Feb 25 09:31 ./
drwsrwsr-x 1 jovyan users   4096 Feb 28 06:06 ../
drwxr-sr-x 3 root   users 577536 Feb 25 09:31 BZNSYP/

baker_alignment_tone.tar.gz文件也解压到了当前的目录中已经满足了README文件中的条件

Assume the path to the dataset is ~/datasets/BZNSYP. Assume the path to the MFA result of CSMSC is ./baker_alignment_tone. Run the command below to

source path.
preprocess the dataset.
train the model.
synthesize wavs.
synthesize waveform from metadata.jsonl.
./run.sh

在运行run.sh 的时候，我得到了下面的错误信息，我不知道该怎样解决

(base) root@ff21c21bf0ea:/opt/PaddleSpeech/examples/csmsc/voc1# ./run.sh
Generate durations.txt from MFA results ...
Extract features ...
/home/jovyan/datasets/BZNSYP
Get features' stats ...
Traceback (most recent call last):
  File "/opt/PaddleSpeech/utils/compute_statistics.py", line 109, in <module>
    main()
  File "/opt/PaddleSpeech/utils/compute_statistics.py", line 84, in main
    with jsonlines.open(args.metadata, 'r') as reader:
  File "/opt/conda/lib/python3.8/site-packages/jsonlines/jsonlines.py", line 623, in open
    fp = builtins.open(file, mode=mode + "t", encoding=encoding)
FileNotFoundError: [Errno 2] No such file or directory: 'dump/train/raw/metadata.jsonl'
Normalize ...
Traceback (most recent call last):
  File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 133, in <module>
    main()
  File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 81, in main
    with jsonlines.open(args.metadata, 'r') as reader:
  File "/opt/conda/lib/python3.8/site-packages/jsonlines/jsonlines.py", line 623, in open
    fp = builtins.open(file, mode=mode + "t", encoding=encoding)
FileNotFoundError: [Errno 2] No such file or directory: 'dump/train/raw/metadata.jsonl'
Traceback (most recent call last):
  File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 133, in <module>
    main()
  File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 81, in main
    with jsonlines.open(args.metadata, 'r') as reader:
  File "/opt/conda/lib/python3.8/site-packages/jsonlines/jsonlines.py", line 623, in open
    fp = builtins.open(file, mode=mode + "t", encoding=encoding)
FileNotFoundError: [Errno 2] No such file or directory: 'dump/dev/raw/metadata.jsonl'
Traceback (most recent call last):
  File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 133, in <module>
    main()
  File "/opt/PaddleSpeech/paddlespeech/t2s/exps/gan_vocoder/parallelwave_gan/../normalize.py", line 81, in main
    with jsonlines.open(args.metadata, 'r') as reader:
  File "/opt/conda/lib/python3.8/site-packages/jsonlines/jsonlines.py", line 623, in open
    fp = builtins.open(file, mode=mode + "t", encoding=encoding)
FileNotFoundError: [Errno 2] No such file or directory: 'dump/test/raw/metadata.jsonl'

dump/train/raw/metadata.jsonl 应该是run.sh 生成的吧，但实际上并没有生成这个文件

opened by yJun-Chen 17

安装错误

你好，我今天发现，paddlepaddle可以用pip安装了，就在uabntu16.04上pip安装了它。然而，在git clone https://github.com/PaddlePaddle/models.git cd models/deep_speech_2 sh setup.sh 这一步安装setup.sh 和swig的setup.sh时，都出现错误：openfst-1.6.3/src/include/fst/union.h:33:40: warning: typedef ‘using StateId = typename Arc::StateId’ locally defined but not used [-Wunused-local-typedefs] using StateId = typename Arc::StateId; ^ error: command 'gcc' failed with exit status 1 请问，可以解决下吗？毕竟pip安装的paddlepaddle比docker安装的要便捷多了。

opened by yyhlvdl 17
升级3070后，python3进程卡住cpu 100%

开始测试1080ti，程序正常，环境： docker运行 paddlepaddle_gpu-1.8.5.post107 cuda 10.1 cdnn 7.5 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A | | 40% 27C P0 58W / 250W | 0MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A | | 47% 36C P0 57W / 250W | 0MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... Off | 00000000:89:00.0 Off | N/A | | 49% 30C P0 54W / 250W | 0MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... Off | 00000000:8A:00.0 Off | N/A | | 55% 29C P0 53W / 250W | 0MiB / 11178MiB | 5% Default | +-------------------------------+----------------------+----------------------+

另一台机器升级3070，环境相同，但驱动因为需要升级： +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 3070 Off | 00000000:82:00.0 Off | N/A | | 0% 33C P8 6W / 220W | 303MiB / 7982MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

cuda安装10.1，使用nvcc -V查看： nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105

程序启动后，就夯住了，python3进程cpu 100%

W0617 04:06:56.153712 194 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 86, Driver API Version: 11.2, Runtime API Version: 10.0 W0617 04:06:56.153888 194 device_context.cc:260] device: 0, cuDNN Version: 7.5. W0617 04:06:56.512885 194 device_context.h:155] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.6, but CUDNN version in your machine is 7.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.

Stale Deployment

opened by hebo1982 16
训练过程中，train loss : nan

aishell训练过程中， bash ./local/run_train.sh

epoch: 0 , batch : 6700 , train loss : 64.74555 epoch: 0 , batch : 6800 , train loss : nan epoch: 0 , batch : 6900 , train loss : nan .......

train loss : nan 这个是什么问题导致的有没什么影响

nvidia-smi 查看一直正常
Question

opened by monkeycc 15
File "/root/anaconda3/envs/rpa/lib/python3.8/site-packages/yacs/config.py", line 141, in getattr raise AttributeError(name) AttributeError: preprocess_config
跑脚本代码为：

CUDA_VISIBLE_DEVICES=1 ./local/test_wav.sh conf/deepspeech2.yaml conf/tuning/decode.yaml exp/deepspeech2/checkpoints/avg_1 data/demo_01_03.wav，

报的错为：

Traceback (most recent call last): File "/data/rshm/PaddleSpeech-develop/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py", line 201, in main(config, args) File "/data/rshm/PaddleSpeech-develop/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py", line 169, in main main_sp(config, args) File "/data/rshm/PaddleSpeech-develop/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py", line 163, in main_sp exp = DeepSpeech2Tester_hub(config, args) File "/data/rshm/PaddleSpeech-develop/paddlespeech/s2t/exps/deepspeech2/bin/test_wav.py", line 42, in init self.preprocess_conf = config.preprocess_config File "/root/anaconda3/envs/rpa/lib/python3.8/site-packages/yacs/config.py", line 141, in getattr raise AttributeError(name) AttributeError: preprocess_config
Bug S2T
opened by shumeirao 1
[S2T] PaddleSpeech illegal instruction 4 on Apple Silicon M1
This error is very persistent on Apple Silicon M1 - I have tried alternative installations with pip and docker with the same error.

The safest installation due to better dependency checks seems to be with conda.

Describe the bug After installation, any paddlespeech command throws the error message:

paddlespeech help Illegal instruction: 4

To Reproduce Steps to reproduce the behavior:

CONDA_SUBDIR=osx-64 create -n paddle pip python=3.10 sox libsndfile swig bzip2 conda activate paddle # install paddlepaddle with conda for osx64 CONDA_SUBDIR=osx-64 conda install paddlepaddle --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ # due to thrown error for invalid wheel for opencv-python pip install --upgrade pip --force pip install paddleocr --upgrade # due to installation issue https://github.com/PaddlePaddle/PaddleSpeech/issues/2687 pip install paddleaudio==1.0.1 # final paddlespeech installation pip install paddlespeech

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

OS: [e.g. Ubuntu]

GCC/G++ Version [e.g. 8.3]

Python Version [e.g. 3.7]

PaddlePaddle Version [e.g. 2.0.0]

Model Version [e.g. 2.0.0]

GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00]

CUDA/CUDNN Version [e.g. cuda-10.2]

MKL Version

TensorRT Version

Additional context Add any other context about the problem here.
Bug S2T
opened by agilebean 1
[TTS] JETS -> E2E FastSpeech2 + HiFiGAN
E2E FastSpeech2 + HiFiGAN

Paper: JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech

Also check for more information about E2E-training TTS:

https://github.com/PaddlePaddle/PaddleSpeech/issues/1699

feature request T2S good first issue
opened by yt605155624 0
[TTS] iSTFTNet -> speed up HiFiGAN !!
speed up HiFiGAN !!

Paper: iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Code: https://github.com/rishikksh20/iSTFTNet-pytorch

Please compare with original HiFiGAN to get the diff: https://github.com/jik876/hifi-gan

After this work, you can also modify VITS to VITS_iSTFT
feature request T2S good first issue
opened by yt605155624 0

Releases(r1.3.0)

r1.3.0(Dec 14, 2022)
HighLIght

S2T

Support U2/U2++ Conformer dy2static, and U2/U2++ C++ High Performance Streaming ASR Deployment. @zh794390558

Add Wav2vec2ASR-en, wav2vec2.0 fine-tuning for ASR on LibriSpeech. @Zth9730

Add Whisper CLI and Demos, support multi language recognition and translation. @zxcd

Add Wav2vec2 CLI and Demos, support ASR and Feature Extraction. @Zth9730

Add whisper. #2640 #2704 by @zxcd

Fix gpu training hang. #2478 by @Zth9730

Support u2++ based cli and server. #2489 #2510 by @Zth9730

Add wav2vec2-en. #2518 #2527 #2637 by @Zth9730

Add wav2vec2-zh cli. #2697 by @Zth9730

T2S

Add seek for BytesIO. https://github.com/PaddlePaddle/PaddleSpeech/pull/2484 by @ZapBird

Add mix finetune. https://github.com/PaddlePaddle/PaddleSpeech/pull/2525 https://github.com/PaddlePaddle/PaddleSpeech/pull/2647 by @lym0302

Add streaming TTS fastdeploy serving. https://github.com/PaddlePaddle/PaddleSpeech/pull/2528 by @HexToString

Add SSML for Chinese Text Frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2531 by @david-95

Add end-to-end Prosody Prediction pipeline (including using prosody labels in Acoustic Model). https://github.com/PaddlePaddle/PaddleSpeech/pull/2548 https://github.com/PaddlePaddle/PaddleSpeech/pull/2615 https://github.com/PaddlePaddle/PaddleSpeech/pull/2693 by @WongLaw

Add Adversarial Loss for Chinese English mixed TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2588 by @lym0302

Fix frontend bugs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2539 https://github.com/PaddlePaddle/PaddleSpeech/pull/2606 by @yt605155624

Add TN for English unit. https://github.com/PaddlePaddle/PaddleSpeech/pull/2629 by @WongLaw

Add male voice for TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2660 by @lym0302

Add double byte char for zh normalization. https://github.com/PaddlePaddle/PaddleSpeech/pull/2661 by @david-95

Add TTS Paddle-Lite x86 inference. https://github.com/PaddlePaddle/PaddleSpeech/pull/2636 https://github.com/PaddlePaddle/PaddleSpeech/pull/2667 by @yt605155624

Add greek char and fix #2571. https://github.com/PaddlePaddle/PaddleSpeech/pull/2683 by @david-95

Add Slim for TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/2729 by @yt605155624

Audio

Move paddlespeech/audio to paddleaudio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2706 by @SmileGoat

Demo

Add TTSAndroid demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2703 by @yt605155624

New Contributors

@ZapBird made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2484

@HexToString made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2528

@dahu1 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2554

@kFoodie made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2664

@zxcd made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2640

@michael-skynorth made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2666

@heyudage made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2688

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.2.0...r1.3.0
Source code(tar.gz)
Source code(zip)
r1.2.0(Oct 10, 2022)
S2T

Fix conformer/transformer multi GPU training. #2327 #2334 #2336 #2372 by @Zth9730

Fix deepspeech2 decode_wav. #2351 by @Zth9730

Support BiTransformer decoder. #2415 by @Zth9730

T2S

Update VITS to support VITS and its voice cloning training on AISHELL-3. #2268 by @HighCWu

Add ERNIE-SAT synthesize_e2e. #2287 #2316 #2355 #2378 #2432 by @yt605155624

Specify the input data type of G2PW. #2288 by @kslz

Add TTS finetune example. #2297 #2385 #2418 #2430 by @lym0302

Fix Chinese English mixed TTS frontend. #2299 #2493 by @lym0302

Add words into polyphonic.yaml for g2pW. #2300 by @david-95

Update the quantifier unit in Text Normalization. #2308 by @pengzhendong

Fix Chinese frontend bugs. #2312 #2323 by @david-95

Add AISHELL-3 Voice Cloning with ECAPA-TDNN speaker encoder. #2359 #2429 by @yt605155624

Add pre-install doc for G2P and TN, update version of pypinyin. #2364 by @WongLaw

Add tools to compare two test results of G2P to show differences. #2367 by @david-95

Revise must_neural_tone_words. #2370 by @WongLaw

Add type-hint for g2pW. #2390 by @yt605155624

Replaced fixed path with path variable in MFA. #2416 by @WongLaw

Solve "unknown format: 3" for wavfile.write(). #2422 by @zhoupc2015

Text

Create preprocess.py for Punctuation Restoration. #2295 by @THUzyt21

Demo

Add Voice Cloning, TTS finetune, and ERNIE-SAT in speech_web. #2412 #2451 by @iftaken

Server

Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21

Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw

Doc

Add Chinese doc and language switcher for metaverse, style_fs2 and story_talker. #2357 by @WongLaw

Update API docs. #2406 by @yt605155624

Add finetune demos in readthedocs. #2411 by @yt605155624

Test

Add barrier for distributed training using multiple machines. #2309 #2311 by @sneaxiy

Fix prepare.sh for PWGAN TIPC. #2376 by @yuehuayingxueluo

Other

Format paddlespeech with pre-commit. #2331 by @yt605155624

Acknowledgements

Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat

New Contributors

@HighCWu made their first contribution in #2268

@pengzhendong made their first contribution in #2308

@Zth9730 made their first contribution in #2327

@WongLaw made their first contribution in #2357

@yuehuayingxueluo made their first contribution in #2376

@zhoupc2015 made their first contribution in #2422

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.1.0...r1.2.0
Source code(tar.gz)
Source code(zip)
r1.1.0(Aug 19, 2022)
S2T

Add wer tools. https://github.com/PaddlePaddle/PaddleSpeech/pull/1709

Add optimize attention cache used for attention ; 0-dim tensor for model export. https://github.com/PaddlePaddle/PaddleSpeech/pull/2124

Fix cnn cache dy2st shape. https://github.com/PaddlePaddle/PaddleSpeech/pull/2168

TTS

Fix random speaker embedding bug in voice clone. https://github.com/PaddlePaddle/PaddleSpeech/pull/1828 by @jerryuhoo

Add VITS model. https://github.com/PaddlePaddle/PaddleSpeech/pull/1855 https://github.com/PaddlePaddle/PaddleSpeech/pull/1957 https://github.com/PaddlePaddle/PaddleSpeech/pull/2040

Add kunlun support for speedyspeech. https://github.com/PaddlePaddle/PaddleSpeech/pull/1879 by @QingshuChen

Normalize wav max value to 1 in preprocess. https://github.com/PaddlePaddle/PaddleSpeech/pull/1887 by @jerryuhoo

Remove fluid dependence in TTS. https://github.com/PaddlePaddle/PaddleSpeech/pull/1940

Add onnx models for aishell3/ljspeech/vctk's tts3/voc1/voc5. https://github.com/PaddlePaddle/PaddleSpeech/pull/2068

Add TTS static/onnx models in pretrained_models.py. https://github.com/PaddlePaddle/PaddleSpeech/pull/2074

Add Ernie SAT model. https://github.com/PaddlePaddle/PaddleSpeech/pull/2052 https://github.com/PaddlePaddle/PaddleSpeech/pull/2117

Add Chinese English mixed TTS frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2143

Add Chinese English mixed TTS example. https://github.com/PaddlePaddle/PaddleSpeech/pull/2234

Fix English text frontend bug. https://github.com/PaddlePaddle/PaddleSpeech/pull/2235 by @david-95

Add g2pW to Chinese frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2230 by @BarryKCL

Fix text frontend bugs. https://github.com/PaddlePaddle/PaddleSpeech/pull/1912 https://github.com/PaddlePaddle/PaddleSpeech/pull/2250 https://github.com/PaddlePaddle/PaddleSpeech/pull/2254 https://github.com/PaddlePaddle/PaddleSpeech/pull/2255 https://github.com/PaddlePaddle/PaddleSpeech/pull/2272

Speechx

add custom asr script. https://github.com/PaddlePaddle/PaddleSpeech/pull/1946

refactor frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/2003

deepspeech2 to onnx https://github.com/PaddlePaddle/PaddleSpeech/pull/2034

Refactor audio/data/feature cache. https://github.com/PaddlePaddle/PaddleSpeech/pull/1638

Frontend refactor . https://github.com/PaddlePaddle/PaddleSpeech/pull/1640

Fix nnet itf header. https://github.com/PaddlePaddle/PaddleSpeech/pull/1641

Refactor speech egs. https://github.com/PaddlePaddle/PaddleSpeech/pull/1707

Refactor egs and more egs for TLG wfst graph build. https://github.com/PaddlePaddle/PaddleSpeech/pull/1715

Speedup ngram building . https://github.com/PaddlePaddle/PaddleSpeech/pull/1729

Update speechx install doc. https://github.com/PaddlePaddle/PaddleSpeech/pull/1736

Fix nnet input and output name. https://github.com/PaddlePaddle/PaddleSpeech/pull/1740

Update wfst graph. https://github.com/PaddlePaddle/PaddleSpeech/pull/1742

Fix model params path name. https://github.com/PaddlePaddle/PaddleSpeech/pull/1750

Remove fluid tools for onnx export. https://github.com/PaddlePaddle/PaddleSpeech/pull/2116

Audio

Refactor paddleaudio to paddlespeech.audio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2007

Add webdataset in paddlespeech.audio. https://github.com/PaddlePaddle/PaddleSpeech/pull/2062

Server

Remove extra logs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2111 https://github.com/PaddlePaddle/PaddleSpeech/pull/2113

Change streaming tts servers' fs from 24k to models' fs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2121

Fix bug in engine_warmup. https://github.com/PaddlePaddle/PaddleSpeech/pull/2171 by @Betterman-qs

Replace default vocoder in seerver to mb_melgan. https://github.com/PaddlePaddle/PaddleSpeech/pull/2214

Fix bug in streaming_asr_server with punctuation restoration. https://github.com/PaddlePaddle/PaddleSpeech/pull/2244

Rename time_s and time_ns to time_b and time_nb. https://github.com/PaddlePaddle/PaddleSpeech/pull/2133

More accuracy decoding somthing. https://github.com/PaddlePaddle/PaddleSpeech/pull/2128

CLI

Add paddlespeech.resource module. https://github.com/PaddlePaddle/PaddleSpeech/pull/1917

Dynamic cli commands registration. https://github.com/PaddlePaddle/PaddleSpeech/pull/1959

Fix unnecessary download. https://github.com/PaddlePaddle/PaddleSpeech/pull/2103

Remove extra logs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2084 https://github.com/PaddlePaddle/PaddleSpeech/pull/2085 https://github.com/PaddlePaddle/PaddleSpeech/pull/2107

Add Chinese English mixed TTS CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2249

Add onnxruntime infer for CLI. https://github.com/PaddlePaddle/PaddleSpeech/pull/2222

Demo

Add speech web demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2039 https://github.com/PaddlePaddle/PaddleSpeech/pull/2080

Add kws cli and demo. https://github.com/PaddlePaddle/PaddleSpeech/pull/2063

Use paddle web for streaming asr. https://github.com/PaddlePaddle/PaddleSpeech/pull/2105

add custom asr script https://github.com/PaddlePaddle/PaddleSpeech/pull/1946

More cli for speech demos. https://github.com/PaddlePaddle/PaddleSpeech/pull/2138

Doc

Add API doc. https://github.com/PaddlePaddle/PaddleSpeech/pull/2075

Format tts doc string for read the docs. https://github.com/PaddlePaddle/PaddleSpeech/pull/2115

Others

Fix CPU Dockerfile. https://github.com/PaddlePaddle/PaddleSpeech/pull/2172 by @BrightXiaoHan

Add PaddleSpeech Dockerfile for hard mode of installation. https://github.com/PaddlePaddle/PaddleSpeech/pull/2127 by @buchongyu2

Acknowledgements

Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624

New Contributors

@QingshuChen made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1879

@Zhangjingyu06 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1951

@ryanrussell made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1976

@freeliuzc made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2044

@vpegasus made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2043

@dependabot made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2061

@raycool made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2109

@YDX-2147483647 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2125

@chenkui164 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2130

@0x45f made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2162

@Doubledongli made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2167

@Betterman-qs made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2171

@BrightXiaoHan made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2172

@THUzyt21 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2202

@david-95 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2235

@BarryKCL made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/2230

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.0.0...r1.1.0
Source code(tar.gz)
Source code(zip)
r1.0.0(May 13, 2022)
Highlight

Release PP-ASR: Streaming ASR with timestamp and punctuation restoration, uses WenetSpeech Streaming Conformer and DeepSpeech2 ASR model.

Release PP-TTS: Streaming TTS system for industrial application.

Release PP-VPR: Industrial Voiceprint Recognition system and ECAPA-TDNN model.

Custom ASR apply reimbursement for transportation

Support MDTC KWS model

More

ASR

DeepSpeech2 streaming model aishell cer 6.66%

DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)

Conformer aishell cer 4.64%

Conformer streaming model aishell cer 5.44%

Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)

Speechx

[SpeechX] DeepSpeech2 streaming with WFST in streaming asr example

[SpeechX] Add websocket websocket example

[SpeechX] custom asr, apply reimbursement for transportation demo

KWS

[KWS] Add kws example on HeySnips dataset. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1558

[KWS] Update KWS example. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1783

Audio

[Audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1758

[Audio] Fix mcd issue. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1658

[Audio] Remove mcd. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1659

[Audio] Add VoxCeleb dataset for speaker recognition.

[Audio] Add HeySnips dataset for keyword spotting.

What's Changed

[R1.0][asr][server]add vector server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1845

[R1.0][asr][server]join streaming asr and punc server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1846

[R1.0]asr streaming server add time stamp by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1850

[R1.0][tts][server] update readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1852

[R1.0] update cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1854

[r1.0] update version to r1.0.0 by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1857

[R1.0] Add doc for wenetspeech model (ds2 online, conformer online) by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1862

[R1.0][server] improve server code by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1866

[R1.0][asr][server]update the streaming asr readme by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1871

[R1.0] Updata released model info ( Wenetspeech ds2 online, conformer online) by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1869

[R1.0]fix server doc and decode_method by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1889

[speechx] add custom_streaming_asr @SmileGoat #1891

[speechx] speedup ngram building @zh794390558 #1729

[speechx] refactor egs and more egs for TLG wfst graph build @zh794390558 #1715

[speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn @SmileGoat #1676

[speechx] Add websocket & make it work @SmileGoat #1720

[speechx] Frontend refactor @SmileGoat #1640

[Speechx] add tlg decoder @SmileGoat #1599

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r1.0.0a...r1.0.0
Source code(tar.gz)
Source code(zip)
r1.0.0a(Apr 28, 2022)
Highlight

Release Streaming ASR and Streaming TTS system for industrial application.

Support KWS model

Deepspeech2 streaming model aishell cer 6.66%

Conformer aishell cer 4.64%

Conformer streaming model aishell cer 5.44%

SpeechX Deepspeech2 streaming with WFST

What's Changed

[speechx] refactor audio/data/feature cache by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1638

[speechx] Frontend refactor by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1640

[speechx] fix nnet itf header by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1641

[TTS]add license and reference for some models by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1642

[Doc] supplement note by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1643

[vec][search] update search demo README by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1644

[speechx]refactor linear feature:unify vector & remove redundant function & add remained_wav cache shift wav by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1649

[Audio] Fix mcd issue. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1658

[Audio] Remove mcd. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1659

[vec]update the speaker verification model by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1663

[ASR] update ds2 online model by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1668

[TTS]fix preprocess bug, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1660

update README, test=doc by @iftaken in https://github.com/PaddlePaddle/PaddleSpeech/pull/1672

[Punc] Update RESULTS.md. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1675

[CLI] update ds2 online model in cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1674

[CLI] ASR: Add duration limitation for asr by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1666

[vec]add speaker verification score method by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1646

[TTS]add onnx inference for fastspeech2 + hifigan/mb_melgan by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1665

[doc]update readme by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1680

[WebSocket] fixed online model md5 error , test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1682

[speechx]add aishell test script & json parser & no db norm linear feature & json2kaldi type cmvn by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1676

[server] add stream tts server by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1652

[speechx]remove mutable in audio_cache by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1687

[Doc] update readem for aishell/asr0 by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1677

[vec] add speaker diarization pipeline by @ccrrong in https://github.com/PaddlePaddle/PaddleSpeech/pull/1651

[vec]voxceleb convert dataset format to paddlespeech by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1630

[Speechx] add tlg decoder by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1599

[vec]add vector necessary note, test=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1690

Revert "[WebSocket] fixed online model md5 error , test=doc" by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1691

[WebSocket] added online web client, test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1692

修复 example/aishell 目录中speech单词拼写错误问题 by @buchongyu2 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1694

修改hack 单词拼写错误 by @buchongyu2 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1697

[TTS]change NLC to NCL in speedyspeech, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1693

[doc]fix typo, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1698

[doc]add pwgan onnx model, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1700

[WebSocket] added online asr doc and online asr command line, test=doc by @WilliamZhang06 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1701

[vec][server] vpr demo support by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1696

[speechx] refactor speech egs by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1707

[asr]add wer tools by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1709

[asr][websocket]fix the ws send bug, cache buffer, text=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1710

[TTS]add fastspeech2 cnndecoder onnx model by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1712

[speechx] refactor egs and more egs for TLG wfst graph build by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1715

[vec][score] add plda model by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1681

[CLI]update cli, test=doc by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1716

[server] add streaming am infer by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1713

[speechx] Add websocket & make it work by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1720

[asr][websocket] add asr conformer websocket server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1704

[vec][loss] add NCE Loss from RNNLM by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1719

[vec][loss] add FocalLoss to deal with class imbalances by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1722

[TTS]restructure syn_utils.py, test=tts by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1723

[TTS]add paddle device set for ort and inference by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1727

[vec] add GRL to domain adaptation by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1725

[speechx] speedup ngram building by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1729

[asr] Add new cer tools by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1673

[speechx]add websocket lib by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1732

[speechx]update speechx install doc by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1736

[Doc] prefect the packing scripts by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1735

[Doc]renew the released mode by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1739

[asr][websocket]add streaming asr demo by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1737

[speechx] fix nnet input and output name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1740

[ASR] remove redundant log by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1741

[speechx] update wfst graph by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1742

[speechx] Add recognizer_test_main script by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1743

[vec][doc]update the voxceleb readme.md, test=doc by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1744

[ASR] fix CER tools by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1747

[Doc] Fix release_model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1746

[Doc] Updata released model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1748

Updata released model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1749

[speechx] fix model params path name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1750

[speechx] fix linear-spectrogram-wo-db-norm-ol read feature issue by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1751

[TTS]fix wavernn white noise bug for paddle develop(2.3) by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1752

[server] add onnx tts engine by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1733

[TTS]Update paddle2onnx by @yt605155624 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1754

[Setup] to r1.0.0a by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1759

[audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1758

[speechx] to_float32, fix shell script by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1757

[vec] bug fix to adapt VUE by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1760

[asr][weboscket]fix the streaming asr server bug, server client by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1761

[speechx] fbank and mfcc by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1765

format code by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1764

[CLI] Add conformer_aishell, conformer_online_aishell by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1767

[speechx]make cmvn global in run.sh by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1768

[ASR] ds2: add log_interval and fix lr problem when resume training by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1766

[speechx] set nnet param by flags by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1769

[server] add streaming tts demos by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1771

[server] fix tts streaming server by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1774

[KWS]Add kws example on HeySnips dataset. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1558

[text][server]add text punc server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1772

[ASR] fix asr cli infer by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1770

[vec] add GE2E to support unlabeled data training by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1731

[ASR] fix time restricion in test_cli.sh by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1777

[ASR] Replace fbank by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1776

[CLI] add color for test_cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1778

[speechx] add sucess log in run.sh by @SmileGoat in https://github.com/PaddlePaddle/PaddleSpeech/pull/1779

[KWS]Update KWS example. by @KPatr1ck in https://github.com/PaddlePaddle/PaddleSpeech/pull/1783

[server] update readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1782

[Doc] Update ds2online model info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1781

[CLI] renew ds2 online model by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1786

[speechx] fix speechx ws server to return dummpy partial result by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1787

[asr][server]asr client add punctuatjion server by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1784

[asr] patch func to var by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1788

[asr][server]fix client parse the asr result bug by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1789

[Bug fix] fix test_cli by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1794

[vec] update readme by @qingen in https://github.com/PaddlePaddle/PaddleSpeech/pull/1796

[R1.0]update the streaming output and punc default ip, port by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1800

Renew ds2 online model [cer 6.66%] by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1802

[R1.0] update the streaming asr server readme by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1810

[R1.0] Renew ds2 online doc info by @Jackwaterveg in https://github.com/PaddlePaddle/PaddleSpeech/pull/1809

[server] update streaming demos readme by @lym0302 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1806

[R1.0]update the paddlespeech_client asr_online cli by @Honei in https://github.com/PaddlePaddle/PaddleSpeech/pull/1818

[r1.0][doc] fix readme by @zh794390558 in https://github.com/PaddlePaddle/PaddleSpeech/pull/1825

New Contributors

@iftaken made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1672

@ccrrong made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1651

@buchongyu2 made their first contribution in https://github.com/PaddlePaddle/PaddleSpeech/pull/1694

Acknowledgements

Special thanks to @zh794390558 @Honei @Jackwaterveg @lym0302 @qingen @GT-ZhangAcer @yt605155624 @WilliamZhang06 @SmileGoat @ccrrong

Full Changelog: https://github.com/PaddlePaddle/PaddleSpeech/compare/r0.2.0...r1.0.0a
Source code(tar.gz)
Source code(zip)
r0.2.0(Apr 1, 2022)
S2T

Replace kaidi_fbank with paddleaudio #1612

Support CTC decoder online #821 #1626

Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577

TTS

Add SpeedySpeech multi-speaker support for synthesize_e2e.py. https://github.com/PaddlePaddle/PaddleSpeech/pull/1370 by @jerryuhoo

Add WaveRNN for CSMSC dataset. https://github.com/PaddlePaddle/PaddleSpeech/pull/1379

Add Tacotron2 for CSMSC / LJSpeech datasets. https://github.com/PaddlePaddle/PaddleSpeech/pull/1314 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1416

Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. https://github.com/PaddlePaddle/PaddleSpeech/pull/1419

Update text frontend. https://github.com/PaddlePaddle/PaddleSpeech/pull/1506

Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. https://github.com/PaddlePaddle/PaddleSpeech/pull/1549 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1581 / https://github.com/PaddlePaddle/PaddleSpeech/pull/1587

Add NPU support for TransformerTTS. #1593 by @windstamp

Add CNN Decoder for Streaming Fastspeech2. https://github.com/PaddlePaddle/PaddleSpeech/pull/1634

Audio

Add paddleaudio.compliance modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518

Unittest and benchmark for audio feature APIs. #1548

[Audio] - [audio] refactor audio arch #1494 by @zh794390558

[Audio] - [audio] dtw metric #1493 by @zh794390558

[Audio] - [audio] fix complicance bug #1597 by @zh794390558

Deployment

[Deployment] - [speechx] high performance inference of speech task #1496 by @SmileGoat @zh794390558

[Deployment] - [Speechx]fix normalizer bug #1600 #1621 #1619 #1633 #1635 #1619 by @SmileGoat

[Deployment] - [speechx] refactor speechx #1631 #1616 #1576 #1572 #1541 by @zh794390558

[Deployment] - [speechx] simplify cmake compiler #1538 #1536 #1535 by @zh794390558

server

[server] - [websocket] added online asr engine #1627 by @WilliamZhang06

[server] - [server] added engine type and asr inference #1475 by @WilliamZhang06

[server] - [Server] added asr engine #1413 by @WilliamZhang06

[server] - [Server] added engine factory and config #1399 by @WilliamZhang06

[server] - [server] added engine framework #1383 by @WilliamZhang06

[server] - [server] update readme #1604 by @lym0302

[server] - [server] add server cls #1554 by @lym0302

[server] - [server] add paddlespeech_server stats #1510 by @lym0302

[server] - [server] add cli #1466 by @lym0302

[server] - [server] add tts postprocess #1411 by @lym0302

[server] - [server] tts server #1386 by @lym0302

vector

[vector] - [vector] ecapa-tdnn on voxceleb #1523 by @Honei

CLI

Batch input supported. #1460

TTS: Add WaveRNN for CSMSC dataset.

TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.

Vector: add speaker verification demo and doc #1605 by @Honei

Demo

[Demo] - [vec][search] update client image url #1628 by @qingen

[Demo] - [server] add server demo #1480 by @lym0302

[Demo] - [vec][search] add audio similarity search #1609 by @qingen

Acknowledgements

Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @Honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen
Source code(tar.gz)
Source code(zip)
r0.1.2(Feb 25, 2022)
Bug Fix:

FIxed the version of librosa==0.8.1. Solve the compatibility issue caused by librosa upgrading. #1426

Source code(tar.gz)
Source code(zip)
r0.1.1(Jan 14, 2022)
New Features

CLI :

Add cli stats. #1274

Add unit test. #1321

ASR: Support English: Add transformer_libirspeech model. #1297

ASR: Support 4 decoding methods: ctc_greedy_search, ctc_beam_search, attention, attention_rescoring. #1297

ASR & ST: Use the unified config. #1305 / #1312

ASR: Refactor the code. #1260 by @AdamBear

TTS: Support long input text by default. #1241

TTS: Add Style MelGAN and HiFiGAN. #1241

ASR

Refactor configs in examples. #1225

TTS

Fix some frontend bugs. #1262 by @JiehangXie / #1310

Add speaker embedding and speaker id for style fastspeech2 inference. #1197 by @jerryuhoo

Add support for finetuning speedyspeech. #1302 by @jerryuhoo / #1322 / #1337

Update VCTK Parallel WaveGAN. #1294

Update Multi Band MelGAN. #1272

ST

Refactor configs in examples. #1225

Text

Refactor Punctuation Restoration example. #1215

Docs

Add topic note for releasing python packages

Add TTS papers. #1330

Add Frontend G2P topic. #1254

Others

Update released models and results. #1306

Acknowledgements

@zh794390558 @yt605155624 @Jackwaterveg @KPatr1ck @Mingxue-Xu @JiehangXie @grasswolfs @jerryuhoo @AdamBear @LittleChenCc @JamesLim-sy
Source code(tar.gz)
Source code(zip)
r0.1.0(Dec 23, 2021)
Features

CLI : New Feature

Easy install by pip pip install paddlespeech

CLI to quick explore ASR, TTS, audio classification, speech translation and punctuation restoration.

ASR

Join CTC LM decoder

paper link

Transformer LM model

Improve DeepSpeech2 online model

Refactor some configs

TTS

Merge Parakeet into PaddleSpeech

Add FastSpeech2-Conformer

paper link: fastspeech2 、conformer

example link

Add Multi Band MelGAN

paper link

example link

Add HiFiGAN

paper link

example link

Add Style MelGAN

paper link

example link

Add FastSpeech2 Voice Cloning with GE2E (SV2TTS)

paper link

example link

CLS

Add audio classification example on ESC-50 and custom dataset.

Add audio tagging demo based on PANNs and Audioset labels.

ST

ST-MTL

FAT-ST-MTL

Docs

Add quick start

Add read the doc

Improve installation documentation

Add README for each example

Demos

Audio_tagging

Automatic_video_subtitiles

Metaverse

Punctuation_restoration

Speech_recognition

Speech_translation

Story_talker

Style_fs2

Text_to_speech

Others

Update released models and results

Acknowledgements

@zh794390558 @KPatr1ck @Jackwaterveg @yt605155624 @Mingxue-Xu @grasswolfs @jerryuhoo
Source code(tar.gz)
Source code(zip)
v2.1.1(Aug 16, 2021)
ctc alignment

refactor data pipeline

autolog for deepspeech test

refactor checkpoint save/load

deepspeech online model

mfa alignment example

add text normaliztion example

TLG for aishell

more dataest: thchs30, aidatatang, timit etc.

8k speech example

ted en-zh st example

more utils

Source code(tar.gz)
Source code(zip)
v2.1.0(Jun 29, 2021)
Transformer/Conformer Offline/Online ASR

Unified CTC Loss for DS2 model and Transformer Model

Source code(tar.gz)
Source code(zip)
v1.1(Feb 25, 2021)

paddle 1.8.x with python2
Source code(tar.gz)
Source code(zip)
v1.0(Feb 25, 2021)

master latest code
Source code(tar.gz)
Source code(zip)

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

Related tags

Overview

Quick Start | Documents | Models List

Speech Recognition

Speech Translation (English to Chinese)

Text-to-Speech

Punctuation Restoration

🔥 Hot Activities

Features

Recent Update

Community

Installation

Quick Start

Model List

Documents

Citation

Contribute to PaddleSpeech

Contributors

Acknowledgement

License

Comments

aishell训练过程中， bash ./local/run_train.sh

epoch: 0 , batch : 6700 , train loss : 64.74555 epoch: 0 , batch : 6800 , train loss : nan epoch: 0 , batch : 6900 , train loss : nan .......

Releases(r1.3.0)

r1.3.0(Dec 14, 2022)

HighLIght

S2T

T2S

Audio

Demo

New Contributors

r1.2.0(Oct 10, 2022)

S2T

T2S

Text

Demo

Server

Doc

Test

Other

Acknowledgements

New Contributors

r1.1.0(Aug 19, 2022)

S2T

TTS

Speechx

Audio

Server

CLI

Demo

Doc

Others

Acknowledgements

New Contributors

r1.0.0(May 13, 2022)

Highlight

More

ASR

Speechx

KWS

Audio

What's Changed

r1.0.0a(Apr 28, 2022)

Highlight

What's Changed

New Contributors

Acknowledgements

r0.2.0(Apr 1, 2022)

S2T

TTS

Audio

Deployment

server

vector

CLI

Demo

Acknowledgements

r0.1.2(Feb 25, 2022)

Bug Fix: