A demo of chinese asr

Last update: Dec 9, 2021

Related tags

Overview

chinese_asr_demo

一个端到端的中文语音识别模型训练、测试框架具备数据预处理、模型训练、解码、计算wer等等功能

训练数据

训练数据采用thchs_30，下载地址：http://www.openslr.org/18/ 下载后将data_thchs30解压到data文件中

使用模型

bi-lstm + ctc 可直接将语音转换为中文

生成训练文件格式

字典

_ 作为ctc的blank

预处理

1、提取80维fbank特征

2、标准化：计算特征的均值和方差进行标准化，加快模型收敛速度，因为计算标准化的速度较慢，已将参数放到data_file/thchs_30/stand_nor.txt文件中，可直接调用

模型下载

bi_lstm_150_nl，测试集wer为29.278%

百度云链接：https://pan.baidu.com/s/1VVavLKLeY584HudHtC5uIQ

提取码：n18y

计划

1、加入CNN降维度

2、更新tranformer，conformer模型

3、更新其他数据集的训练方法

4、更新其他解码方法

A minimal Conformer ASR implementation adapted from ESPnet.

Conformer ASR A minimal Conformer ASR implementation adapted from ESPnet. Introduction I want to use the pre-trained English ASR model provided by ESP

3 Jan 24, 2022

Paddlespeech Streaming ASR GUI

Paddlespeech-Streaming-ASR-GUI Introduction A paddlespeech Streaming ASR GUI. Us

3 Jan 5, 2022

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

VAD-SLI-ASR Python scripts for a speech processing pipeline with Voice Activity

14 Dec 9, 2022

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 3, 2023

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

pkuseg：一个多领域中文分词工具包 (English Version) pkuseg 是基于论文[Luo et. al, 2019]的工具包。其简单易用，支持细分领域分词，有效提升了分词准确度。目录主要亮点编译和安装各类分词工具包的性能对比使用方式论文引用作者常见问题及解答主要

6k Dec 29, 2022

Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库，可以方便的处理中文文本内容，是受到了TextBlob的启发而写的，由于现在大部分的自然语言处理库基本都是针对英文的，于是写了一个方便处理中文的类库，并且和TextBlob

6k Jan 2, 2023

Chinese segmentation library

What is loso? loso is a Chinese segmentation system written in Python. It was developed by Victor Lin ([email protected]) for Plurk Inc. Copyright &

82 Jun 28, 2022

a chinese segment base on crf

Genius Genius是一个开源的python中文分词组件，采用 CRF(Conditional Random Field)条件随机场算法。 Feature 支持python2.x、python3.x以及pypy2.x。支持简单的pinyin分词支持用户自定义break 支持用户自定义合并词

237 Nov 4, 2022

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据，将清华新闻数据、搜狗新闻数据等新闻数据集，以及开源的一些摘要数据进行整理清洗，构建一个较完善的中文摘要数据集。数据集清洗时，仅进行了简单地规则清洗。

785 Dec 29, 2022

Releases(python)

python(Nov 19, 2021)

Source code(tar.gz)
Source code(zip)

Owner

GitHub

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.