Conferencing Speech Challenge

Overview

ConferencingSpeech 2021 challenge

This repository contains the datasets list and scripts required for the ConferencingSpeech challenge. For more details about the challenge, please see our website.

Details

  • baseline, this folder contains baseline system include inference model exported by onnx and inference scripts;

  • eval, this folder contains evaluation scripts to calculate PESQ, STOI and SI-SNR;

  • selected_lists, the selected wave about train speech and noise wave name from aishell-1, aishell-3, librispeech-360, VCTK, MUSAN, Audioset. Each participant is only allowed to use the selected speech and noise data below :

    • selected_lists/dev/circle.name circle RIR wave utt name of dev set
    • selected_lists/dev/linear.name linear RIR wave utt name of dev set
    • selected_lists/dev/non_uniform.name non uniform linear RIR wave utt name of dev set
    • selected_lists/dev/clean.name wave utt name of dev set used clean set
    • selected_lists/dev/noise.name wave utt name of dev set used noise set
    • selected_lists/train/aishell_1.name wave utt name from aishell-1 set used in train set
    • selected_lists/train/aishell_3.name wave utt name from aishell-3 set used in train set
    • selected_lists/train/librispeech_360.name wave utt name from librispeech-360 set used in train set
    • selected_lists/train/vctk.name wave utt name from VCTK set used in train set
    • selected_lists/train/audioset.name wave utt name from Audioset used in train set
    • selected_lists/train/musan.name wave utt name from MUSAN used in train set
    • selected_lists/train/circle.name circle wave RIR name of train set
    • selected_lists/train/linear.name linear wave RIR name of train set
    • selected_lists/train/non_uniform.name non unifrom linear RIR utt name of train set
  • simulation, about simulation scripts, how to use to see ReadMe

    • simulation/mix_wav.py simulate dev set and train set
    • simulation/prepare.sh use selected_lists/*/*name to select used wave from downloaded raw data, or you can select them by yourself scripts.
    • simulation/quick_select.py quickly select the name by a name list instead of grep -r -f
    • simulation/challenge_rirgenerator.py the script to simulate RIRs in train and dev set
    • simulation/data/dev_circle_simu_mix.config dev circle set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
    • simulation/data/dev_linear_simu_mix.config dev linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
    • simulation/data/dev_non_uniform_linear_simu_mix.config dev non uniform linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
    • simulation/data/train_simu_circle.config train circle set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
    • simulation/data/train_simu_linear.config train linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
    • simulation/data/train_simu_non_uniform.config train non uniform linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
  • requirements.txt, dependency

Notes:

1. \*.config file should be replaced with correct path of audio files.
2. Training config files have been released together with challenge data.

Requirements

python3.6 or above

pip install -r requirements.txt

if you simulation RIRs by yourself with our scripts, you may better install this:

pyrirgen

Code license

Apache 2.0

Comments
  • Generating the synth examples. Step 3 not clear.

    Generating the synth examples. Step 3 not clear.

    In simulation/README.md:

    What does it means for step 3:

    Attention to the data/[dev | train]_[linear|circle]_simu_mix.config . In the config file path should be replaced with the corresponding path.

    do we have to produce a script for replacing path with our own paths ? If so, can you include in the repo the script you have used to replace the paths so each participant has not to write its own ? (i am lazy :) ).

    opened by popcornell 4
  • The version mismatch of VCTK.

    The version mismatch of VCTK.

    Hi, all

    Without the VCTK in my cluster, I used the given download link to download the VCTK corpus. However, I found the given link maybe not correct one to download.

    The VCTK corpus is updated with version 0.92 now, which is given by the link in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/ReadMe.md and https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/ReadMe.md. The new version of VCTK actually is quite different from the original widely used version 0.80 ( which I believe is also used by the official baseline, inferred from the selected_list content). Now the new version gets two tracks of audios with _mic1.flac or _mic2.flac suffix. However, the VCTK files in selected_list is in .wav suffix.

    So, I think maybe the correct version for the VCTK is 0.80, which could be downloaded from the link: https://datashare.ed.ac.uk/handle/10283/2651. Please check it.

    And the 0.80 version contains both raw recording and the waves without the silence part. And they are using the same file name as p376_295.wav.
    I'd also like to know which one should be used. Because the quick_select.py may encounter some "accident" in this part.

    Thanks.

    opened by shincling 2
  • Update ReadMe.md

    Update ReadMe.md

    In your dropbox these files called train_simu_circle.config train_simu_linear.config train_simu_non_uniform.config. Also these configs contain rir waves

    opened by vvasily 1
  • add flag to dynamic mixing; add gpus configuration.

    add flag to dynamic mixing; add gpus configuration.

    Hello,

    I came across some errors when running the simulation scripts, I fixed them and the script ran successfully.

    1. writing scp file out.
    2. adding quick select in rir preparation.
    3. fixing other mistakes.

    And I add a flag to choose whether to use dynamic mixing or not.

    I also found some errors when using multiple gpus, and I add a configuration item in train.yml to set gpu ids to use.

    opened by pkufool 1
  • Fix simulation files

    Fix simulation files

    Hello,

    I tried to run the scripts and came across some errors that I believe are mistakes in the code (please correct me if I'm wrong). I also took the liberty to add an argument to prepare.sh to make it more useable. I fixed them and the script ran successfully.

    I would also like you to clarify a point that is unclear to me. The Training_set.zip downloaded from Dropbox contains linear_rir.zip non_linear_rir.zip and circle_rir.zip Am I correct when I assume that this files have to be unzipped and renamed linear non_uniform and circle to match the prepare.sh and the selected_lists folder expectations ? And is that correct that these folders that are in the Training_set directory also contains the RIRs for the Development _test_set ?

    Finally can you give an estimation of the amount storage space needed to store the simulated data ?

    opened by JorisCos 1
  • Can you use PR?

    Can you use PR?

    How about using a PR to commit the sources, even for the internal change? This will make us easy to review the change in the source code. (In other words, it is not easy to review the changes if you directly commit the master without PR).

    opened by sw005320 1
  • The version mis-match of VCTK corpus.

    The version mis-match of VCTK corpus.

    Hi, all

    Without the VCTK in my cluster, I used the given download link to download the VCTK corpus. However, I found the given link maybe not correct one to download.

    The VCTK corpus is updated with version 0.92 now, which is given by the link in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/ReadMe.md and https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/ReadMe.md. The new version of VCTK actually is quite different from the original widely used version 0.80 ( which I believe is also used by the official baseline, inferred from the selected_list content). Now the new version gets two tracks of audios with _mic1.flac or _mic2.flac suffix. However, the VCTK files in selected_list is in .wav suffix.

    So, I think maybe the correct version for the VCTK is 0.80, which could be downloaded from the link: https://datashare.ed.ac.uk/handle/10283/2651. Please check it, thx.

    And the 0.80 version contains both raw recording and the waves without the silence part. And they are using the same file name as p376_295.wav.
    I'd also like to know which one should be used. Because the quick_select.py may encounter some "accident" now.

    Thanks.

    opened by shincling 0
  • Cannot get the dataset

    Cannot get the dataset

    We have already registered the competition using the educational mailbox. However we have not received the sharing code and do not have the permission to download data. Could you help us to fix it?

    opened by JMCheng-SEU 0
  • exported by onnx and inference scripts?

    exported by onnx and inference scripts?

    Hi, thanks for the challenge and baseline. I saw the description as "baseline, this folder contains baseline system include inference model exported by onnx and inference scripts;" But, where is actually the onnx part? and the inference scripts.

    Thanks a lot.

    opened by sailor88128 0
  • Missing data in Audioset

    Missing data in Audioset

    Hello,

    I was trying to run the simulation with the given selected_list, but I found some of the IDs for Audioset is not accessible now. Below I list part of them (I haven't check all of the sample IDs):

    HKTIe6piDOI
    M7GmqUqVQEA
    Hm20kZ7QzO0
    oz3LrVaXMb4
    6-kHUulyCog
    TGd5kPDdN_I
    IjoePLT_cFw
    dKK-JaIzwS4
    Cmhpj4MJ_hQ
    NbBM82N1Xos
    2JoJ_1agmTk
    8YIELHXpf3g
    AdLiRtpI01s
    AgVZ65Hr9rw
    4fh52mLYBYw
    KKoTQfro920
    L6DFGW6jeV8
    X61ftZ590Uc
    pK1ucosjoRo
    Lpzx6N2aCMY
    lnWP_zWFpBg
    mg2rhu_HHR0
    

    For example, if you go to https://www.youtube.com/watch?v=6-kHUulyCog, it says the video is unavailable. If you go to https://www.youtube.com/watch?v=Lpzx6N2aCMY, it says the video becomes private.

    Could you release the unavailable samples in Audioset directly, or just change the selected list for Audioset?

    opened by Emrys365 6
  • Task2 : RIR file

    Task2 : RIR file

    According to the RIR files, It seems that there is no correlation between different microphone arrays.

    The rooms are different and source positions are different.

    Is it useful for task2?

    opened by cwghnu 3
  • Bugs in the simulation code

    Bugs in the simulation code

    opened by Emrys365 4
Owner
null
This library provides common speech features for ASR including MFCCs and filterbank energies.

python_speech_features This library provides common speech features for ASR including MFCCs and filterbank energies. If you are not sure what MFCCs ar

James Lyons 2.2k Jan 4, 2023
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

SpeechPy Official Project Documentation Table of Contents Documentation Which Python versions are supported Citation How to Install? Local Installatio

Amirsina Torfi 870 Dec 27, 2022
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

Mozilla 20.8k Jan 3, 2023
Speech recognition module for Python, supporting several engines and APIs, online and offline.

SpeechRecognition Library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/

Anthony Zhang 6.7k Jan 8, 2023
Speech Algorithms Collections

Speech Algorithms Collections

Ryuk 498 Jan 6, 2023
Simple, hackable offline speech to text - using the VOSK-API.

Nerd Dictation Offline Speech to Text for Desktop Linux. This is a utility that provides simple access speech to text for using in Linux without being

Campbell Barton 844 Jan 7, 2023
Voicefixer aims at the restoration of human speech regardless how serious its degraded.

Voicefixer aims at the restoration of human speech regardless how serious its degraded.

Leo 324 Dec 26, 2022
Some utils for auto speech recognition

About Some utils for auto speech recognition. Utils Util Description Script Reset audio Reset sample rate, sample width, etc of audios.

null 1 Jan 24, 2022
Avatarify Python - Avatars for Zoom, Skype and other video-conferencing apps.

Avatarify Python - Avatars for Zoom, Skype and other video-conferencing apps.

Ali Aliev 15.3k Jan 5, 2023
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

One-Shot Free-View Neural Talking Head Synthesis Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Vide

ZLH 406 Dec 23, 2022
Yunqi Chen 7 Oct 30, 2022
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
AI grand challenge 2020 Repo (Speech Recognition Track)

KorBERT를 활용한 한국어 텍스트 기반 위협 상황인지(2020 인공지능 그랜드 챌린지) 본 프로젝트는 ETRI에서 제공된 한국어 korBERT 모델을 활용하여 폭력 기반 한국어 텍스트를 분류하는 다양한 분류 모델들을 제공합니다. 본 개발자들이 참여한 2020 인공지

Young-Seok Choi 23 Jan 25, 2022
Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Speech_38_ru_commands Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR Программа умеет распознавать 38 ключевы

Andrey 9 May 5, 2022
ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

ConferencingSpeech 2022 challenge This repository contains the datasets list and scripts required for the ConferencingSpeech 2022 challenge. For more

null 21 Dec 2, 2022
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.1k Dec 31, 2022
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

Keon Lee 114 Dec 12, 2022
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.2k Jan 9, 2023
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022