Couple Learning for SED
This repository provides the data and source code for sound event detection (SED) task.
The improvement of the Couple Learning method is verified on the basis of the dcase20-task4 baseline.
Information about the dcase20-task4 please visit github.
Information about Couple Learning please visit paper: Couple Learning: Mean Teacher method with pseudo-labels improves semi-supervised deep learning results.
Couple Learning model
More info in the PLG-MT_run folder.
Reproducing the results
See PLG-MT_run folder.
Python >= 3.6, pytorch >= 1.0, cudatoolkit>=9.0, pandas >= 0.24.1, scipy >= 1.2.1, pysoundfile >= 0.10.2, scaper >= 1.3.5, librosa >= 0.6.3, youtube-dl >= 2019.4.30, tqdm >= 4.31.1, ffmpeg >= 4.1, dcase_util >= 0.2.5, sed-eval >= 0.2.1, psds-eval >= 0.1.0, desed >= 1.3.0
A simplified installation procedure example is provided below for python 3.6 based Anconda distribution for Linux based system:
- install Ananconda
- launch
(recommended line by line)
All the scripts to get the data (soundbank, generated, separated) are in the scripts
folder and they use python files from data_generation
Scripts to generate the dataset
In the scripts/
folder, you can find the different steps to:
- Download recorded data and synthetic material.
- Generate synthetic soundscapes
- Reverberate synthetic data (Not used in the baseline)
- Separate sources of recorded and synthetic mixtures
It is likely that you'll have download issues with the real recordings. At the end of the download, please send a mail with the TSV files created in the missing_files
However, if none of the audio files have been downloaded, it is probably due to an internet, proxy problem. See Desed repo or Desed_website for more info.
Base dataset
The dataset for sound event detection of DCASE2020 task 4 is composed of:
- Train:
- *weak (DESED, recorded, 1 578 files)
- *unlabel_in_domain (DESED, recorded, 14 412 files)
- synthetic soundbank (DESED, synthetic, 2060 background (SINS only) + 1006 foreground files)
- *Validation (DESED, recorded, 1 168 files):
- test2018 (288 files)
- eval2018 (880 files)
Baselines dataset
SED baseline
- Train:
- weak
- unlabel_in_domain
- synthetic20/soundscapes (separated in train/valid-80%/20%)
- Validation:
- validation