Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Overview

Discriminative Sounding Objects Localization

Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching (The previous title is Learning to Discriminatively Localize Sounding Objects in a Cocktail-party Scenario). The code is implemented on PyTorch with python3.

Requirements

  • PyTorch 1.1
  • torchvision
  • scikit-learn
  • librosa
  • Pillow
  • opencv

Running Procedure

For experiments on Music or AudioSet-instrument, the training and evaluation procedures are similar, respectively under the folder music-exp and audioset-instrument. Here, we take the experiments on Music dataset as an example.

Data Preparation

  • Download dataset, e.g., MUSIC, and split into training/validation/testing set. Specifically, for the training@stage_one, please use the solo_training_1.txt. For the training@stage_two, we use the the music clip in solo_training_2.txt to synthesize the cocktail-party scenarios.

  • Extract frames at 4 fps by running

    python3 data/cut_video.py
    
  • Extract 1-second audio clips and turn into Log-Mel-Spectrogram by running

    python3 data/cut_audio.py
    

The sounding object bounding box annotations on solo and duet are stored in music-exp/solotest.json and music-exp/duettest.json, and the data and annotations of synthetic set are available at https://zenodo.org/record/4079386#.X4PFodozbb2 . And the Audioset-instrument balanced subset bounding box annotations are in audioset-instrument/audioset_box.json

Training

Stage one
training_stage_one.py [-h]
optional arguments:
[--batch_size] training batchsize
[--learning_rate] learning rate
[--epoch] total training epoch
[--evaluate] only do testing or also training
[--use_pretrain] whether to initialize from ckpt
[--ckpt_file] the ckpt file path to be resumed
[--use_class_task] whether to use localization-classification alternative training
[--class_iter] training iterations for classification of each epoch
[--mask] mask threshold to determine whether is object or background
[--cluster] number of clusters for discrimination
python3 training_stage_one.py

After training of stage one, we will get the cluster pseudo labels and object dictionary of different classes in the folder ./obj_features, which is then used in the second stage training as category-aware object representation reference.

Stage two
training_stage_two.py [-h]
optional arguments:
[--batch_size] training batchsize
[--learning_rate] learning rate
[--epoch] total training epoch
[--evaluate] only do testing or also training
[--use_pretrain] whether to initialize from ckpt
[--ckpt_file] the ckpt file path to be resumed
python3 training_stage_two.py

Evaluation

Stage one

We first generate localization results and save then as a pkl file, then calculate metrics, IoU and AUC and also generate visualizations, by running

python3 test.py
python3 tools.py
Stage two

For evaluation of stage two, i.e., class-aware sounding object localization in multi-source scenes, we first match the cluster pseudo labels generated in stage one with gt labels to accordingly assign one object category to each center representation in the object dictionary by running

python3 match_cluster.py

It is necessary to manually ensure there is one-to-one matching between object category and each center representation.

Then we generate the localization results and calculate metrics, CIoU AUC and NSA, by running

python3 test_stage_two.py
python3 eval.py

Results

The two tables respectively show our model's performance on single-source and multi-source scenarios.

The following figures show the category-aware localization results under multi-source scenes. The green boxes mean the sounding objects while the red boxes are silent ones.

Comments
  • Tips about the evaluation on MUSIC-solo dataset

    Tips about the evaluation on MUSIC-solo dataset

    There are some tips when evaluating the model on MUSIC-solo dataset in Stage One:

    1. training for 10~15 epoch.
    2. when evaluating the localization performance, select the model that aftering localization, e.g. the name of pth file is "location_cluster_net_0xx_xxxx_av_local.pth".
    3. For the training of stage two, select the model after classification of stage one as pretrain, e.g. the name of pth file is "location_cluster_net_0xx_xxxx_av_class.pth"
    opened by echo0409 21
  • Some problems in tool.py

    Some problems in tool.py

    I don't know how to use the function‘visualize’ in the tool.py as follows: def visualize(images, cams, e): #images (batchsize, h, w, 3) #cams (batchsize, 1, h, w) images = images.cpu().numpy() cams = cams.detach().cpu().numpy() for i in range(images.shape[0]): if i % 2 == 0: cor = 'cor' else: cor = 'not' cam = cams[i, 0] image = images[i] #recover image to 0-1 image = image * np.array([0.229, 0.224, 0.225]) + np.array([0.485, 0.456, 0.406]) image = np.clip(image, 0, 1) #generate heatmap cam = cam * 255 cam = cv2.applyColorMap(cam.astype(np.uint8), cv2.COLORMAP_JET) cam = cam[:, :, ::-1] / 255 plt.imsave('vis/img'+str(e)+''+str(i)+cor+'.jpg', 0.5cam+0.5image) I don't see the function called in tool.py ,and I'm so confused ,I will truly appreciate it if you can explain it for me!

    opened by Li-dot6789 5
  • Bounding Box Annotation for MUSIC dataset

    Bounding Box Annotation for MUSIC dataset

    As mentioned in the paper, Faster RCNN was used for detecting 15 instruments as per AudioSet-instrument dataset but MUSIC dataset contains some other instruments that are not available in MUSIC. How are the extra objects annotated?

    opened by krantiparida 2
  • The AUC is so low in the evaluation for stage two

    The AUC is so low in the evaluation for stage two

    I run training_stage_two.py with 5 epochs learning_rate 1e-4 batch_size 32, the train acc is nearly 0.875 and eva acc is nearly 0.8 then I run : match_cluster.py test_stage_two.py(use the pretrained weight aquired in training_stage_two.py) after this,I run eval.py, but the AUC is quite lower than you, which is about 0.05, and the visualize result is very inaccurate. I have trained so many times but the AUC is always so low,I cannot find out the reason, I am so sorry to bother you ,but can you give me some advice for this ?

    I will appreciate it if you can contact me in my wechat(L18355043548) or my mailbox([email protected])

    opened by Li-dot6789 1
  • Inconsistent annotations in solotest.json with data

    Inconsistent annotations in solotest.json with data

    Dear authors,

    Thank you so much for releasing the code! I encountered a problem when doing evaluation in stage one, music-exp. I found the annotation name in solotest.json is not consistent with the processed data.

    For example: in solotest.json, video "1Ju4S0qeDqw", the original fps is 25pfs, so frame_116 should corresponds to around 2900th frame (116*25). In my data folder (I followed cut_video.py), it is true, but in solotest.json, it shows a frame of 3480.jpg. Based on my understanding, 3480.jpg should corresponds to 139th seconds of the video (3480/25=139.2). Other annotations are in similar situations. Can you give some tips on how to deal with it? Thanks! image image

    opened by Pixie412 1
  • What is the

    What is the "path" in test_stage_two.py line 67?

    When I run test_stage_two.py I need a variable path, but the "path" doesn't contain in dataloader syn_dataset.py line 86. I would like to ask about the meaning of the "path" and how can I get it.

    opened by YenanLiu 0
  • Timestamp for Synthetic-Music Dataset

    Timestamp for Synthetic-Music Dataset

    I wonder if you could provide the timestamp of the videos that are used to synthesized Synthetic-Music Dataset. I want to test it use different audio length and different image scale. Thanks.

    opened by StarrySilence1 0
  • Where to download the dataset MUSIC?

    Where to download the dataset MUSIC?

    Could you please offer the website for downloading the MUSIC dataset because I find it too common to search the name MUSIC to get this databset. Thank you so much.

    opened by HansonZhuang 0
  • Experment problem about AudioSet-instrument dataset

    Experment problem about AudioSet-instrument dataset

    Hi, thanks for your excellent work. Now, i want to reproduce the experiment about AudioSet-instrument dataset. I find there are only video ids about train and val set. Could you share the video ids about the test set? Additionally, if convenient, could you share the AudioSet-instrument dataset to me? Looking for your help. Thanks a lot!

    opened by huangsiyong 1
  • Clarification about sounding object annotations

    Clarification about sounding object annotations

    For our NeurIPS2021 paper, we firstly detected all the candidate objects emerged in the visual scene, then manually filter out the silent ones but only remain the sounding ones. Hence, all the annotated objects in the .json file are all the sounding ones.

    Great thanks Triantafyllos for pointing out this!

    opened by DTaoo 0
Owner
null
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 1.3k Jan 8, 2023
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"

Hold me tight! Influence of discriminative features on deep network boundaries This is the source code to reproduce the experiments of the NeurIPS 202

EPFL LTS4 19 Dec 10, 2021
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 74 Dec 27, 2022
OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

null 45 Dec 13, 2022
Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

NVIDIA Research Projects 1.2k Dec 30, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

null 150 Dec 26, 2022
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022
Code for ICE-BeeM paper - NeurIPS 2020

ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA This repository contains code to run and reproduce the experiments

Ilyes Khemakhem 65 Dec 22, 2022
Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Population-Based Bandits (PB2) Code for the Population-Based Bandits (PB2) Algorithm, from the paper Provably Efficient Online Hyperparameter Optimiza

Jack Parker-Holder 22 Nov 16, 2022
Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

THUML @ Tsinghua University 35 Sep 23, 2022
UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

UDP-Pose This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Top-Down Results on

null 20 Jul 29, 2022
git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

null 35 Sep 8, 2021
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

null 46 Nov 9, 2022
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

Xavier 33 Oct 12, 2022