Reading list for research topics in sound event detection

Overview

Reading List for topics in Sound Event Detection

Introduction

Sound event detection aims at processing the continuous acoustic signal and converting it into symbolic descriptions of the corresponding sound events present at the auditory scene. Sound event detection can be utilized in a variety of applications, including context-based indexing and retrieval in multimedia databases, unobtrusive monitoring in health care, and surveillance. Recently (since 2017), to utilise large multimedia data available, learning acoustic information from weak annotations was formulated. This reading list consists of papers which use weak annotation for learning symbolic descriptions of the corresponding sound events in the audio.

Papers covering multiple sub-areas are listed in both the sections. If there are any areas, papers, and datasets I missed, please let me know or feel free to make a pull request.

Maintained by Soham Deshmukh

Recent Content

INTERSPEECH 2021 papers added
ICASSP 2021 papers added

Table of Contents

Research papers

Survey papers

Sound event detection and time–frequency segmentation from weakly labelled data, TASLP 2019

Areas

Learning formulation

Weakly supervised scalable audio content analysis, ICME 2016

Audio Event Detection using Weakly Labeled Data, 24th ACM Multimedia Conference 2016

An approach for self-training audio event detectors using web data, 25th EUSIPCO 2017

A joint detection-classification model for audio tagging of weakly labelled data, ICASSP 2017

Connectionist Temporal Localization for Sound Event Detection with Sequential Labeling, ICASSP 2019

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection, ArXiv 2020

A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition, ICML 2020

Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection, ArXiv 2020

Duration robust weakly supervised sound event detection, ICASSP 2020

SeCoST:: Sequential Co-Supervision for Large Scale Weakly Labeled Audio Event Detection, ICASSP 2020

Guided Learning for Weakly-Labeled Semi-Supervised Sound Event Detection, ICASSP 2020

Unsupervised Contrastive Learning of Sound Event Representations, ICASSP 2021

Sound Event Detection Based on Curriculum Learning Considering Learning Difficulty of Events, ICASSP 2021

Comparison of Deep Co-Training and Mean-Teacher Approaches for Semi-Supervised Audio Tagging, ICASSP 2021

Enhancing Audio Augmentation Methods with Consistency Learning, ICASSP 2021

Network Architecture

Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks, ICASSP 2017

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data, NIPS Workshop on Machine Learning for Audio 2017

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network, ICASSP 2018

Orthogonality-Regularized Masked NMF for Learning on Weakly Labeled Audio Data, ICASSP 2018

Sound event detection and time–frequency segmentation from weakly labelled data, TASLP 2019

Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes, ICASSP 2019

Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization, TASLP 2020

DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity Acoustic Scene Classification, ArXiv 2020

Effective Perturbation based Semi-Supervised Learning Method for Sound Event Detection, INTERSPEECH 2020

Weakly-Supervised Sound Event Detection with Self-Attention, ICASSP 2020

Improving Deep Learning Sound Events Classifiers using Gram Matrix Feature-wise Correlations, ICASSP 2021

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection, ICASSP 2021

AST: Audio Spectrogram Transformer, INTERSPEECH 2021

Event Specific Attention for Polyphonic Sound Event Detection, INTERSPEECH 2021

Pooling functions

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection, TASLP 2018

Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks, Interspeech 2018

A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling, ICASSP 2019

Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection, INTERSPEECH 2019

Weakly labelled audioset tagging with attention neural networks, TASLP 2019

Sound event detection and time–frequency segmentation from weakly labelled data, TASLP 2019

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection, ArXiv 2019

A Global-Local Attention Framework for Weakly Labelled Audio Tagging, ICASSP 2021

Missing or noisy audio:

Sound event detection and time–frequency segmentation from weakly labelled data, TASLP 2019

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection, ArXiv 2019

Improving weakly supervised sound event detection with self-supervised auxiliary tasks, INTERSPEECH 2021

Data Augmentation:

SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification, INTERSPEECH 2021

Generative Learning

Acoustic Scene Generation with Conditional Samplernn, ICASSP 2019

Representation Learning

Towards Learning a Universal Non-Semantic Representation of Speech, INTERSPEECH 2021

Contrastive Predictive Coding of Audio with an Adversary, INTERSPEECH 2020

ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection, ICASSP 2021

FRILL: A Non-Semantic Speech Embedding for Mobile Devices, INTERSPEECH 2021

Multi-Task Learning

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection, ArXiv 2019

Multi-Task Learning and post processing optimisation for sound event detection, DCASE 2019

Label-efficient audio classification through multitask learning and self-supervision, ICLR 2019

Few-Shot Learning

Few-Shot Audio Classification with Attentional Graph Neural Networks, INTERSPEECH 2019

Continual Learning of New Sound Classes Using Generative Replay, WASSPA 2019

Few-Shot Sound Event Detection, ICASSP 2020

Few-Shot Continual Learning for Audio Classification, ICASSP 2021

Unsupervised and Semi-Supervised Few-Shot Acoustic Event Classification, ICASSP 2021

Knowledge Transfer

Do sound event representations generalize to other audio tasks? A case study in audio transfer learning, INTERSPEECH 2021

Transfer learning of weakly labelled audio, WASPAA 2017

Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes, ICASSP 2018

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition, TASLP 2020

Polyphonic SED

A first attempt at polyphonic sound event detection using connectionist temporal classification, ICASSP 2017

Polyphonic Sound Event Detection with Weak Labeling, Thesis 2018

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy, DCASE 2019

Evaluation of Post-Processing Algorithms for Polyphonic Sound Event Detection, WASPAA 2019

Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection, TASLP 2020

Joint task

A Joint Separation-Classification Model for Sound Event Detection of Weakly Labelled Data, ICASSP 2018

A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling, INTERSPEECH 2020

Loss function

Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance, ICASSP 2021

Audio and Visual

A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging, ICASSP 2018

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data, IJCAI 2020

Labelling unlabelled videos from scratch with multi-modal self-supervision, NeurIPS 2020

Audio-Visual Event Recognition Through the Lens of Adversary, ICASSP 2021

Audio and Text [Audio Captioning]

Automated audio captioning with recurrent neural networks, WASPAA 2017

Audio caption: Listen and tell, ICASSP 2018

AudioCaps: Generating captions for audios in the wild, NAACL 2019

Audio Captioning Based on Combined Audio and Semantic Embeddings, ISM 2020

Clotho: An Audio Captioning Dataset, ICASSP 2020

A Transformer-based Audio Captioning Model with Keyword Estimation, INTERSPEECH 2020

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events, ICASSP 2021

Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags, ICASSP 2021

Strongly and Weakly labelled data

Audio event and scene recognition: A unified approach using strongly and weakly labeled data, IJCNN 2017

Others

Sound Event Detection Using Point-Labeled Data, WASPAA 2019

Dataset

Task Dataset Source Num. Files
Sound Event Classification ESC-50 freesound.org 2k files
Sound Event Classification DCASE17 Task 4 YT videos 2k files
Sound Event Classification US8K freesound.org 8k files
Sound Event Classification FSD50K freesound.org 50k files
Sound Event Classification AudioSet YT videos 2M files
COVID-19 Detection using Coughs DiCOVA Volunteers recording audio via a website 1k files
Few-shot Bioacoustic Event Detection DCASE21 Task 5 audio 4k+ files
Acoustic Scene Classification DCASE18 Task 1 Recorded by TUT 1.5k
Various VGG-Sound Web videos 200k files
Audio Captioning Clotho freesound.org 5k files
Audio Captioning AudioCaps YT videos 51k files
Action Recognition UCF101 Web videos 13k files
Unlabeled YFCC100M Yahoo videos 1M files

Other audio-based datasets to consider
DCASE dataset list

Workshops/Conferences/Journals

List of old workshops (archived) and on-going workshops/conferences/journals:

Venues link
Machine Learning for Audio Signal Processing, NIPS 2017 workshop https://nips.cc/Conferences/2017/Schedule?showEvent=8790
MLSP: Machine Learning for Signal Processing https://ieeemlsp.cc/
WASPAA: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics https://www.waspaa.com
ICASSP: IEEE International Conference on Acoustics Speech and Signal Processing https://2021.ieeeicassp.org/
INTERSPEECH https://www.interspeech2021.org/
IEEE/ACM Transactions on Audio, Speech and Language Processing https://dl.acm.org/journal/taslp
DCASE http://dcase.community/

Tutorials

Sound Event Detection: A Tutorial

Resources

Computational Analysis of Sound Scenes and Events

More

If you are interested in audio-captioning, K. Drossos maintains a detailed reading list here

You might also like...
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

A simple voice detection system which can be applied practically for designing a device with capability to detect a baby’s cry and automatically turning on music

Auto-Baby-Cry-Detection-with-Music-Player A simple voice detection system which can be applied practically for designing a device with capability to d

Code for csig audio deepfake detection

FMFCC Audio Deepfake Detection Solution This repo provides an solution for the 多媒体伪造取证大赛. Our solution achieve the 1st in the Audio Deepfake Detection

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence La

News-app - This is a news web app for reading news from different sources and topics
News-app - This is a news web app for reading news from different sources and topics

News-app - This is a news web app for reading news from different sources and topics

:sound: Play and Record Sound with Python :snake:

Play and Record Sound with Python This Python module provides bindings for the PortAudio library and a few convenience functions to play and record Nu

Sound-Equalizer-  This is a Sound Equalizer GUI App Using Python's PyQt5
Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5

Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5. It gives you the ability to play, pause, and Equalize any one-channel wav audio file and play 3 different instruments.

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Generic Event Boundary Detection: A Benchmark for Event Segmentation We release our data annotation & baseline codes for detecting generic event bound

Code for the paper
Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Unsupervised Contrastive Learning of Sound Event Representations This repository contains the code for the following paper. If you use this code or pa

42-event-notifier - 42 Event notifier using 42API and Github Actions
42-event-notifier - 42 Event notifier using 42API and Github Actions

42 Event Notifier 42서울 Agenda에 새로운 이벤트가 등록되면 알려드립니다! 현재는 Github Issue로 등록되므로 상단

Scikit-event-correlation - Event Correlation and Forecasting over High Dimensional Streaming Sensor Data algorithms

scikit-event-correlation Event Correlation and Changing Detection Algorithm Theo

Event-forecasting - Event Forecasting Algorithms With Python

event-forecasting Event Forecasting Algorithms Theory Correlating events in comp

Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Ring x Sonos A python script that plays .mp3 files whenever a doorbell is rung or a doorbell detects motion. Features Music! Authors @braden Running T

Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

Accompanying code for the paper Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

Owner
Soham
Applied Scientist at Microsoft
Soham
Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5

Sound-Equalizer- This is a Sound Equalizer GUI App Using Python's PyQt5. It gives you the ability to play, pause, and Equalize any one-channel wav audio file and play 3 different instruments.

Mustafa Megahed  1 Jan 10, 2022
A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Ring x Sonos A python script that plays .mp3 files whenever a doorbell is rung or a doorbell detects motion. Features Music! Authors @braden Running T

braden 0 Nov 12, 2021
GNOME powered sound conversion

SoundConverter A simple sound converter application for the GNOME environment. It reads anything the GStreamer library can read, and writes Ogg Vorbis

Gautier Portet 188 Dec 17, 2022
Graphical interface to control granular sound synthesis.

Granular sound synthesis interface SoundGrain is a graphical interface where users can draw and edit trajectories to control granular sound synthesis

Olivier Bélanger 122 Dec 10, 2022
Open Sound Strip, Sequence or Record in Audacity

Audacity Tools For Blender Sound editing in Blender Video Sequence Editor with Audacity integrated. Send/receive the full edited sequence or single st

null 64 Dec 31, 2022
A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

نافع الهلالي 1 Oct 29, 2021
extract unpack asset file (form unreal engine 4 pak) with extenstion *.uexp which contain awb/acb (cri/cpk like) sound or music resource

Uexp2Awb extract unpack asset file (form unreal engine 4 pak) with extenstion .uexp which contain awb/acb (cri/cpk like) sound or music resource. i ju

max 6 Jun 22, 2022
Analyze, visualize and process sound field data recorded by spherical microphone arrays.

Sound Field Analysis toolbox for Python The sound_field_analysis toolbox (short: sfa) is a Python port of the Sound Field Analysis Toolbox (SOFiA) too

Division of Applied Acoustics at Chalmers University of Technology 69 Nov 23, 2022
PyAbsorp is a python module that has the main focus to help estimate the Sound Absorption Coefficient.

This is a package developed to be use to find the Sound Absorption Coefficient through some implemented models, like Biot-Allard, Johnson-Champoux and

Michael Markus Ackermann 8 Oct 19, 2022
Library for working with sound files of the format: .ogg, .mp3, .wav

Library for working with sound files of the format: .ogg, .mp3, .wav. By work is meant - playing sound files in a straight line and in the background, obtaining information about the sound file (author, performer, duration, bitrate, and so on). Playing goes through the pygame, and getting information through the mutagen.

Romanin 2 Dec 15, 2022