Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Yi-Chang Chen

Last update: Aug 23, 2022

Related tags

Overview

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

| paper | dataset | pretrained detection model |

Authors: Yi-Chang Chen, Chun-Yen Cheng, Chien-An Chen, Ming-Chieh Sung and Yi-Ren Yeh

Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.

Honors

Our paper won the best paper of ROCLING 2021.

Getting Started

Dependency

This work was tested with PyTorch 1.7.0, CUDA 10.1, python 3.6 and Ubuntu 16.04.
requirements : requirements.txt

pip install -r requirements.txt

Download pretrained model

Download pretrained detection model on AISHELL3: https://storage.googleapis.com/esun-ai/bert_detection.zip

mkdir saved_models
cd saved_models
wget https://storage.googleapis.com/esun-ai/bert_detection.zip
unzip bert_detection.zip
cd ..

Test Phonetic MLM

python src/test_phonetic_mlm.py --config configs/config_phonetic_mlm.py --json data/aishell3_test.json

Inference Phonetic MLM

python src/predict_phonetic_mlm.py --config configs/config_phonetic_mlm.py --text_path misc/demo.txt

Train Your Own Detection Model

Train BERT detection model

python src/train_typo_detector.py --config configs/config_detect.py

Test BERT detection model

python src/test_typo_detector.py --config configs/config_detect.py --checkpoint saved_models/bert_detection/best_f1.pth --json data/aishell3_test.json

Inference BERT detection model

python src/predict_typo_detector.py --config configs/config_detect.py --checkpoint saved_models/bert_detection/best_f1.pth --text_path misc/demo.txt

Citation

Please consider citing this work in your publications if it helps your research.

@inproceedings{chen-etal-2021-integrated,
    title = "Integrated Semantic and Phonetic Post-correction for {C}hinese Speech Recognition",
    author = "Chen, Yi-Chang and Cheng, Chun-Yen and Chen, Chien-An and Sung, Ming-Chieh and Yeh, Yi-Ren",
    booktitle = "Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)",
    month = oct,
    year = "2021",
    address = "Taoyuan, Taiwan",
    publisher = "The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)",
    url = "https://aclanthology.org/2021.rocling-1.13",
    pages = "95--102",
    abstract = "Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.",
}

The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework that ensures reliability, high concurrency and scalability of services.

savior是一个能够进行快速集成算法模块并支持高性能部署的轻量开发框架。能够帮助将团队进行快速想法验证（PoC），避免重复的去github上找模型然后复现模型；能够帮助团队将功能进行流程拆解，很方便的提高分布式执行效率；能够有效减少代码冗余，减少不必要负担。

125 Dec 22, 2022

PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

PyDeepFakeDet An integrated and scalable library for Deepfake detection research. Introduction PyDeepFakeDet is an integrated and scalable Deepfake de

49 Dec 11, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

ivadomed is an integrated framework for medical image analysis with deep learning.

Repository on the collaborative IVADO medical imaging project between the Mila and NeuroPoly labs.

144 Dec 19, 2022

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021) Overview of paths used in DIG and IG. w is the word being attributed. The

17 Oct 27, 2022

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

AASIST This repository provides the overall framework for training and evaluating audio anti-spoofing systems proposed in 'AASIST: Audio Anti-Spoofing

56 Jan 2, 2023

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Related tags

Overview

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Honors

Getting Started

Dependency

Download pretrained model

Test Phonetic MLM

Inference Phonetic MLM

Train Your Own Detection Model

Train BERT detection model

Test BERT detection model

Inference BERT detection model

Citation

You might also like...

The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework that ensures reliability, high concurrency and scalability of services.

PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ivadomed is an integrated framework for medical image analysis with deep learning.

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

Examples of using f2py to get high-speed Fortran integrated with Python easily

A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Owner

Yi-Chang Chen

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

African language Speech Recognition - Speech-to-Text

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese clinical named entity recognition using pre-trained BERT model

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

A simple tutoral for error correction task, based on Pytorch

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Related tags

Overview

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Honors

Getting Started

Dependency

Download pretrained model

Test Phonetic MLM

Inference Phonetic MLM

Train Your Own Detection Model

Train BERT detection model

Test BERT detection model

Inference BERT detection model

Citation

You might also like...

The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework that ensures reliability, high concurrency and scalability of services.

PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ivadomed is an integrated framework for medical image analysis with deep learning.

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

Examples of using f2py to get high-speed Fortran integrated with Python easily

A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Owner

Yi-Chang Chen

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

African language Speech Recognition - Speech-to-Text

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese clinical named entity recognition using pre-trained BERT model

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

A simple tutoral for error correction task, based on Pytorch

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,