Romanian Automatic Speech Recognition from the ROBIN project

RACAI

Last update: Jan 1, 2023

Related tags

Deep Learning text-to-speech pytorch romanian automatic-speech-recognition asr deepspeech kenlm

Overview

RobinASR

This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, together with a KenLM language model to imporve the transcriptions.

The pretrained text-to-speech model can be downloaded from here and the pretrained KenLM can be downloaded from here.

Also, make sure to visit:

A demo of the ASR system available in the RELATE platform: https://relate.racai.ro/index.php?path=robin/asr
A post-processing web service allowing hyphenation and basic capitalization restoration: https://github.com/racai-ai/RobinASRHyphenationCorrection

Installation

Docker

Download the pretrained text-to-speech model and the pretrained KenLM at the above links, and copy them in a models directory inside this repository.
Build the docker image using the Dockerfile. Make sure that deepspeech_pytorch/configs/inference_config.py has the desired configuration.

docker build --tag RobinASR .

Run the docker image.

docker run --gpus all -p 8888:8888 --net=host --ipc=host RobinASR

From Source

You must have Python 3.6+ and PyTorch 1.5.1+ installed in your system. Also. Cuda 10.1+ is required if you want to use the (recommended) GPU version.
Clone the repository and install its dependencies:

git clone https://github.com/racai-ai/RobinASR.git
cd RobinASR
pip3 install -r requirements.txt
pip3 install -e .

Install Nvidia Apex:

git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .

If you want to use Beam Search and the KenLM language model, you must install CTCDecode:

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

Inference Server

Firstly, take a look at the configuration file in deepspeech_pytorch/configs/inference_config.py and make sure that the configuration meets your requirements. Then, run the following command:

python3 server.py

Train a New Model

You must create 3 csv manifest files (train, valid and test) that contain on each line the the path to a wav file and the path to its corresponding transcription, separated by commas:

path_to_wav1,path_to_txt1
path_to_wav2,path_to_txt2
path_to_wav3,path_to_txt3
...

Then you must modify correspondingly with your configuration the file located at deepspeech_pytorch/configs/train_config.py and start training with:

python train.py

Acknowledgments

We would like to thank Sean Narnen for making his DeepSpeech2 implementation publicly-available. We used a lot of his code in our implementation.

Cite

If you are using this repository, please cite the following paper as a thank you to the authors:

Avram, A.M., Păiș, V. and Tufis, D., 2020, October. Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2. In Proc. Rom. Acad. Ser. A (Vol. 21, pp. 395-402).

or in BibTeX format:

@inproceedings{avram2020towards,
  title={Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2},
  author={Avram, Andrei-Marius and Păiș, Vasile and Tufiș, Dan},
  booktitle={Proceedings of the Romanian Academy, Series A},
  pages={395--402},
  year={2020}
}

Speech Recognition using DeepSpeech2.

deepspeech.pytorch Implementation of DeepSpeech2 for PyTorch using PyTorch Lightning. The repo supports training/testing and inference using the DeepS

2k Jan 4, 2023

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

5 Nov 11, 2022

A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

6 Oct 4, 2022

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

3 Jan 4, 2023

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

14 Dec 2, 2022

Comments

Error 404

Hello Andrei and thanks for sharing the source code and setup instructions for RobinASR. We have successfully setup the environment with the DockerHub and have it up and running. However, we had hard time on accessing the service and using it as we are constantly getting a 404 error. We would appreciate any help regarding this matter.

As a side note, having checked out the demo page - the RobinASR works really good and is an excellent offline solution for Romanian SR. Is there a way we could commercially setup this service on our premises?

Many thanks, P

opened by pbochan 6
Training phase question ?

Will the train phase, use the "deepspeech_final.pth" model or can it be made to enhance this model ? I am asking because I just want to improve the model not build one from scratch.

Thanks

opened by danionescu0 2

Romanian Automatic Speech Recognition from the ROBIN project

Related tags

Overview

RobinASR

Installation

Docker

From Source

Inference Server

Train a New Model

Acknowledgments

Cite

You might also like...

Speech Recognition using DeepSpeech2.

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

A real-time speech emotion recognition application using Scikit-learn and gradio

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Comments

Error 404

Training phase question ?

Owner

RACAI

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

African language Speech Recognition - Speech-to-Text

Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

AI grand challenge 2020 Repo (Speech Recognition Track)

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"