Conferencing Speech Challenge

Last update: Nov 29, 2022

Related tags

Audio speech-enhancement multi-channel-audio

Overview

ConferencingSpeech 2021 challenge

This repository contains the datasets list and scripts required for the ConferencingSpeech challenge. For more details about the challenge, please see our website.

Details

baseline, this folder contains baseline system include inference model exported by onnx and inference scripts;
eval, this folder contains evaluation scripts to calculate PESQ, STOI and SI-SNR;
selected_lists, the selected wave about train speech and noise wave name from aishell-1, aishell-3, librispeech-360, VCTK, MUSAN, Audioset. Each participant is only allowed to use the selected speech and noise data below :
- selected_lists/dev/circle.name circle RIR wave utt name of dev set
- selected_lists/dev/linear.name linear RIR wave utt name of dev set
- selected_lists/dev/non_uniform.name non uniform linear RIR wave utt name of dev set
- selected_lists/dev/clean.name wave utt name of dev set used clean set
- selected_lists/dev/noise.name wave utt name of dev set used noise set
- selected_lists/train/aishell_1.name wave utt name from aishell-1 set used in train set
- selected_lists/train/aishell_3.name wave utt name from aishell-3 set used in train set
- selected_lists/train/librispeech_360.name wave utt name from librispeech-360 set used in train set
- selected_lists/train/vctk.name wave utt name from VCTK set used in train set
- selected_lists/train/audioset.name wave utt name from Audioset used in train set
- selected_lists/train/musan.name wave utt name from MUSAN used in train set
- selected_lists/train/circle.name circle wave RIR name of train set
- selected_lists/train/linear.name linear wave RIR name of train set
- selected_lists/train/non_uniform.name non unifrom linear RIR utt name of train set
simulation, about simulation scripts, how to use to see ReadMe
- simulation/mix_wav.py simulate dev set and train set
- simulation/prepare.sh use selected_lists/*/*name to select used wave from downloaded raw data, or you can select them by yourself scripts.
- simulation/quick_select.py quickly select the name by a name list instead of grep -r -f
- simulation/challenge_rirgenerator.py the script to simulate RIRs in train and dev set
- simulation/data/dev_circle_simu_mix.config dev circle set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
- simulation/data/dev_linear_simu_mix.config dev linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
- simulation/data/dev_non_uniform_linear_simu_mix.config dev non uniform linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
- simulation/data/train_simu_circle.config train circle set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
- simulation/data/train_simu_linear.config train linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
- simulation/data/train_simu_non_uniform.config train non uniform linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
requirements.txt, dependency

Notes:

1. \*.config file should be replaced with correct path of audio files.
2. Training config files have been released together with challenge data.

Requirements

python3.6 or above

pip install -r requirements.txt

if you simulation RIRs by yourself with our scripts, you may better install this:

pyrirgen

Code license

Apache 2.0

Comments

Generating the synth examples. Step 3 not clear.

In simulation/README.md:

What does it means for step 3:

Attention to the data/[dev | train]_[linear|circle]_simu_mix.config . In the config file path should be replaced with the corresponding path.

do we have to produce a script for replacing path with our own paths ? If so, can you include in the repo the script you have used to replace the paths so each participant has not to write its own ? (i am lazy :) ).

opened by popcornell 4
The version mismatch of VCTK.

Hi, all

Without the VCTK in my cluster, I used the given download link to download the VCTK corpus. However, I found the given link maybe not correct one to download.

The VCTK corpus is updated with version 0.92 now, which is given by the link in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/ReadMe.md and https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/ReadMe.md. The new version of VCTK actually is quite different from the original widely used version 0.80 ( which I believe is also used by the official baseline, inferred from the selected_list content). Now the new version gets two tracks of audios with _mic1.flac or _mic2.flac suffix. However, the VCTK files in selected_list is in .wav suffix.

So, I think maybe the correct version for the VCTK is 0.80, which could be downloaded from the link: https://datashare.ed.ac.uk/handle/10283/2651. Please check it.

And the 0.80 version contains both raw recording and the waves without the silence part. And they are using the same file name as p376_295.wav.
I'd also like to know which one should be used. Because the quick_select.py may encounter some "accident" in this part.

Thanks.

opened by shincling 2
Update ReadMe.md

In your dropbox these files called train_simu_circle.config train_simu_linear.config train_simu_non_uniform.config. Also these configs contain rir waves

opened by vvasily 1
add flag to dynamic mixing; add gpus configuration.
Hello,

I came across some errors when running the simulation scripts, I fixed them and the script ran successfully.

writing scp file out.

adding quick select in rir preparation.

fixing other mistakes.

And I add a flag to choose whether to use dynamic mixing or not.

I also found some errors when using multiple gpus, and I add a configuration item in train.yml to set gpu ids to use.
opened by pkufool 1
Fix simulation files

Hello,

I tried to run the scripts and came across some errors that I believe are mistakes in the code (please correct me if I'm wrong). I also took the liberty to add an argument to prepare.sh to make it more useable. I fixed them and the script ran successfully.

I would also like you to clarify a point that is unclear to me. The Training_set.zip downloaded from Dropbox contains linear_rir.zip non_linear_rir.zip and circle_rir.zip Am I correct when I assume that this files have to be unzipped and renamed linear non_uniform and circle to match the prepare.sh and the selected_lists folder expectations ? And is that correct that these folders that are in the Training_set directory also contains the RIRs for the Development _test_set ?

Finally can you give an estimation of the amount storage space needed to store the simulated data ?

opened by JorisCos 1
Can you use PR?

How about using a PR to commit the sources, even for the internal change? This will make us easy to review the change in the source code. (In other words, it is not easy to review the changes if you directly commit the master without PR).

opened by sw005320 1
The version mis-match of VCTK corpus.

Hi, all

Without the VCTK in my cluster, I used the given download link to download the VCTK corpus. However, I found the given link maybe not correct one to download.

The VCTK corpus is updated with version 0.92 now, which is given by the link in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/ReadMe.md and https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/ReadMe.md. The new version of VCTK actually is quite different from the original widely used version 0.80 ( which I believe is also used by the official baseline, inferred from the selected_list content). Now the new version gets two tracks of audios with _mic1.flac or _mic2.flac suffix. However, the VCTK files in selected_list is in .wav suffix.

So, I think maybe the correct version for the VCTK is 0.80, which could be downloaded from the link: https://datashare.ed.ac.uk/handle/10283/2651. Please check it, thx.

And the 0.80 version contains both raw recording and the waves without the silence part. And they are using the same file name as p376_295.wav.
I'd also like to know which one should be used. Because the quick_select.py may encounter some "accident" now.

Thanks.

opened by shincling 0
Cannot get the dataset

We have already registered the competition using the educational mailbox. However we have not received the sharing code and do not have the permission to download data. Could you help us to fix it?

opened by JMCheng-SEU 0
exported by onnx and inference scripts?

Hi, thanks for the challenge and baseline. I saw the description as "baseline, this folder contains baseline system include inference model exported by onnx and inference scripts;" But, where is actually the onnx part? and the inference scripts.

Thanks a lot.

opened by sailor88128 0
Missing data in Audioset
Hello,

I was trying to run the simulation with the given selected_list, but I found some of the IDs for Audioset is not accessible now. Below I list part of them (I haven't check all of the sample IDs):

HKTIe6piDOI M7GmqUqVQEA Hm20kZ7QzO0 oz3LrVaXMb4 6-kHUulyCog TGd5kPDdN_I IjoePLT_cFw dKK-JaIzwS4 Cmhpj4MJ_hQ NbBM82N1Xos 2JoJ_1agmTk 8YIELHXpf3g AdLiRtpI01s AgVZ65Hr9rw 4fh52mLYBYw KKoTQfro920 L6DFGW6jeV8 X61ftZ590Uc pK1ucosjoRo Lpzx6N2aCMY lnWP_zWFpBg mg2rhu_HHR0

For example, if you go to https://www.youtube.com/watch?v=6-kHUulyCog, it says the video is unavailable. If you go to https://www.youtube.com/watch?v=Lpzx6N2aCMY, it says the video becomes private.

Could you release the unavailable samples in Audioset directly, or just change the selected list for Audioset?
opened by Emrys365 6
Task2 : RIR file

According to the RIR files, It seems that there is no correlation between different microphone arrays.

The rooms are different and source positions are different.

Is it useful for task2?

opened by cwghnu 3
Bugs in the simulation code

I notice that the simulation code is not compatible with the current pyrirgen.

In tencent_challenge_rirgenerator.py#L75, it calls the function pyrirgen.generateRir, but this API has been refactored to pyrirgen.rir_generator since this commit in phecda-xu/RIR-Generator

Could you update it and also double-check the scripts?

opened by Emrys365 4

Conferencing Speech Challenge

Related tags

Overview

ConferencingSpeech 2021 challenge

Details

Requirements

Code license

Comments

Owner

This library provides common speech features for ASR including MFCCs and filterbank energies.

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Speech Algorithms Collections

Simple, hackable offline speech to text - using the VOSK-API.

Voicefixer aims at the restoration of human speech regardless how serious its degraded.

Some utils for auto speech recognition

Avatarify Python - Avatars for Zoom, Skype and other video-conferencing apps.

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

AI grand challenge 2020 Repo (Speech Recognition Track)

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.