An end to end ASR Transformer model training repo

Related tags

Overview

END TO END ASR TRANSFORMER

本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统

Model

Instructions

1.数据准备:
- 自行下载数据，遵循文件结构如下：

├── data
│   ├── train
│   ├── dev
│   ├── test

2.数据预处理：
- 运行prepare_data.py对数据进行预处理, 获得整个词表，每个样本音频的mel-scale-spectrogram，文本的token-ids
3.模型训练：
- 运行train_transformer.py --ngpus 8进行transformer网络的训练. 该网络输入mel-scale-spectrogram, 输出token-ids
4.模型推理：
- 运行evlauate.py在dev/test上测试准确率

Acknowledgements

Reference

Ashish Vaswani et al. “Attention Is All You Need” (2017).
Abdel-rahman Mohamed et al. “Transformers with convolutional context for ASR” arXiv: Computation and Language (2019).
Albert Zeyer et al. “Improved Training of End-to-end Attention Models for Speech Recognition” Conference of the International Speech Communication Association (2018).

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

48 Jan 2, 2023

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

End-to-end neural table-text understanding models.

914 Jan 7, 2023

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

9 Nov 7, 2022

An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Neural Networks and Deep Learning lab, MIPT

6k Dec 30, 2022

6k Dec 31, 2022

5k Feb 18, 2021

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

1.4k Feb 18, 2021

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

612 Jan 4, 2023

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

456 Jan 6, 2023

An end to end ASR Transformer model training repo

Related tags

Overview

END TO END ASR TRANSFORMER

Model

Instructions

Acknowledgements

Reference

You might also like...

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

An open source library for deep learning end-to-end dialog systems and chatbots.

An open source library for deep learning end-to-end dialog systems and chatbots.

An open source library for deep learning end-to-end dialog systems and chatbots.

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Owner

旷视天元 MegEngine

Model for recasing and repunctuating ASR transcripts

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Maix Speech AI lib, including ASR, chat, TTS etc.

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

A demo of chinese asr

A minimal Conformer ASR implementation adapted from ESPnet.

Paddlespeech Streaming ASR GUI

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"