This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Last update: Aug 24, 2022

Related tags

Text Data & NLP Interspeech2021

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This renders difficult the objective comparison between SSL approaches and the evaluation of their impact on building speech systems.

In this repository, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. Also, it targets speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets.

Our pre-trained SSL models for French are available through this HuggingFace link: https://huggingface.co/LeBenchmark

Our benchmark tasks are available on the following directories:

ASR: Automatic Speech Recognition

SLU: Spoken Language Understanding

AER: Automatic Emotion Recognition

AST: Automatic Speech Translation

Detailed descriptions of experiments and results are given in on our paper: https://arxiv.org/pdf/2104.11462.pdf

(this page is still under construction)

You might also like...

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

1 Dec 20, 2021

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

69 Jan 3, 2023

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

37 Dec 4, 2022

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Speech-Backbones This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. Grad-TTS Official implementation of the Grad-

295 Jan 7, 2023

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

1 Dec 9, 2021

VoiceFixer VoiceFixer is a framework for general speech restoration.

VoiceFixer VoiceFixer is a framework for general speech restoration. We aim at the restoration of severly degraded speech and historical speech. Paper

174 Jan 6, 2023

Comments

AER/FeatureExtraction/DatasetHandling/Recola_46.py does not work
Hi,

I am trying to make the AER scripts work on my computer for the RECOLA dataset. While I successfully created the environment thanks to the environment.yml file and successfully preprocessed the data with the Preprocess.py script, I have been unable to create the data.json file with the Recola_46.py script.

Indeed, every time I launch the script, I get the following error

Traceback (most recent call last): File "./FeatureExtraction/DatasetHandling/Recola_46.py", line 1, in <module> from DataClasses import * ModuleNotFoundError: No module named 'DataClasses'

I reinstalled the dataclasses library, that comes with python since 3.7, but it did not change the error.

I also removed caps from the import (from dataclasses import * instead of from DataClasses import *) but I got this error

Traceback (most recent call last): File "./FeatureExtraction/DatasetHandling/Recola_46.py", line 66, in <module> main() File "./FeatureExtraction/DatasetHandling/Recola_46.py", line 22, in main fileDict = AudioSample() NameError: name 'AudioSample' is not defined

Thus, I believe that you are using a custom module named DataClasses.py, called by the line below, and that is not present in the repository. https://github.com/LeBenchmark/Interspeech2021/blob/08bcd974c864f5a39477928a1a91d37e7635596e/AER/FeatureExtraction/DatasetHandling/Recola_46.py#L1

Could you please confirm this and if possible provide the corresponding file ?
opened by clmpt 7
add results table in the task READMEs ....

Many thanks @SinaAlisamir for puting some content in the readme...maybe you could also add your results' table as was done for AST by @formiel ... This is useful for users that may want to replicate our findings...

opened by LeBenchmark 2

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Related tags

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

You might also like...

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Simple Speech to Text, Text to Speech

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

VoiceFixer VoiceFixer is a framework for general speech restoration.

Comments

AER/FeatureExtraction/DatasetHandling/Recola_46.py does not work

add results table in the task READMEs ....

Owner

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

NLP library designed for reproducible experimentation management

NLP library designed for reproducible experimentation management

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Speech Recognition for Uyghur using Speech transformer