PyTorch code for training MM-DistillNet for multimodal knowledge distillation

Overview

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

MM-DistillNet is a novel framework that is able to perform Multi-Object Detection and tracking using only ambient sound during inference time. The framework leverages on our new new MTA loss function that facilitates the distillation of information from multimodal teachers (RGB, thermal and depth) into an audio-only student network.

Illustration of MM-DistillNet

This repository contains the PyTorch implementation of our CVPR'2021 paper There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge. The repository builds on PyTorch-YOLOv3 Metrics and Yet-Another-EfficientDet-Pytorch codebases.

If you find the code useful for your research, please consider citing our paper:

@article{riverahurtado2021mmdistillnet,
  title={There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge},
  author={Rivera Valverde, Francisco and Valeria Hurtado, Juana and Valada, Abhinav},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2021}
}

Demo

http://rl.uni-freiburg.de/research/multimodal-distill

System Requirements

  • Linux
  • Python 3.7
  • PyTorch 1.3
  • CUDA 10.1

IMPORTANT NOTE: These requirements are not necessarily mandatory. However, we have only tested the code under the above settings and cannot provide support for other setups.

Installation

a. Create a conda virtual environment.

git clone https://github.com/robot-learning-freiburg/MM-DistillNet.git
cd MM-DistillNet
conda create -n mmdistillnet_env
conda activate mmdistillnet_env

b. Install dependencies

pip install -r requirements.txt

Prepare datasets and configure run

We also supply our large-scale multimodal dataset with over 113,000 time-synchronized frames of RGB, depth, thermal, and audio modalities, available at http://multimodal-distill.cs.uni-freiburg.de/#dataset

Please make sure the data is available in the directory under the name data.

The binary download contains the expected folder format for our scripts to work. The path where the binary was extracted must be updated in the configuration files, in this case configs/mm-distillnet.cfg.

You will also need to download our trained teacher-models available here. Kindly download this files and have them available in the current directory, with the name of trained_models. The directory structure should look something like this:

>ls
configs/  evaluate.py  images/  LICENSE  logs/  mp3_to_pkl.py  README.md  requirements.txt  setup.cfg  src/  train.py trained_models/

>ls trained_models
LICENSE.txt              README.txt                             yet-another-efficientdet-d2-embedding.pth  yet-another-efficientdet-d2-rgb.pth
mm-distillnet.0.pth.tar  yet-another-efficientdet-d2-depth.pth  yet-another-efficientdet-d2.pth            yet-another-efficientdet-d2-thermal.pth

Additionally, the file configs/mm-distillnet.cfg contains support for different parallelization strategies and GPU/CPU support (using PyTorch's DataParallel and DistributedDataParallel)

Due to disk space constraints, we provide a mp3 version of the audio files. Librosa is known to be slow with mp3 files, so we also provide a mp3->pickle conversion utility. The idea is, that before training we convert the audio files to a spectogram and store it to a pickle file.

mp3_to_pkl.py --dir <path to the dataset>

Training and Evaluation

Training Procedure

Edit the config file appropriately in configs folder. Our best recipe is found under configs/mm-distillnet.cfg.

python train.py --config 
   

   

To run the full dataset We our method using 4 GPUs with 2.4 Gb memory each (The expected runtime is 7 days). After training, the best model would be stored under /best.pth.tar . This file can be used to evaluate the performance of the model.

Evaluation Procedure

Evaluate the performance of the model (Our best model can be found under trained_models/mm-distillnet.0.pth.tar):

python evaluate.py --config 
   
     --checkpoint 
    

    
   

Results

The evaluation results of our method, after bayesian optimization, are (more details can be found in the paper):

Method KD mAP@Avg [email protected] [email protected] CDx CDy
StereoSoundNet[4] RGB 44.05 62.38 41.46 3.00 2.24
:--- ------------- ------------- ------------- ------------- ------------- -------------
MM-DistillNet RGB 61.62 84.29 59.66 1.27 0.69

Pre-Trained Models

Our best pre-trained model can be found on the dataset installation path.

Acknowledgements

We have used utility functions from other open-source projects. We especially thank the authors of:

Contacts

License

For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Comments
  • Question in Evaluate

    Question in Evaluate

    Hello there! First of all, thank you for your outstanding work! I have a problem when reproducing your work.

    I'm use the following command to evaluate. python evaluate.py --config configs/mm-distillnet.cfg --checkpoint trained_models/mm-distillnet.0.pth.tar

    But get bad performance.Can you help me how to improve? image

    image

    Thanks!

    opened by muzhaohui 7
  • Question in Train

    Question in Train

    Hello there! First of all, thank you for your outstanding work! I have a problem when reproducing your work.

    I use the model and the config you provided for training, but the results are very poor.

    image

    It stopped at the 21 epoch

    image image

    mAP is only 48. Is there a difference between the data set you use and the one provided? Because I use the model you provided (distillnet.0.pth.tar) to evaluate is even worse than this! image

    So is it the wrong for the best model you provided?

    opened by muzhaohui 6
  • Question about dataset structure

    Question about dataset structure

    Hello.

    Thank you so much for this dataset, it is very large and well thought out!

    I have a question about the structure of the dataset. The audio files are in the form: audio/audio_<mic_number_from_0_to_7>_.mp3

    When I untar the audio directories they are mostly like this audio/audio_<mic_number_from_0_to_7>.mp3
    but sometimes they are of the form audio/audio
    <mic_number_from_0_to_7><extra_number>.mp3 where there is another number after the time stamp.

    For example in /drive_day_2020_04_14_15_56_26/audio there is audio_0_1586873154_433877998_1.mp3 and audio_0_1586873154_433877998_4.mp3 and when I diff them, they seem to be the same file.

    Why is this the case. Can I just ignore all but one when processing the audio?

    Thanks!

    opened by drydenwiebe 3
  • question in Evaluate

    question in Evaluate

    Hello there!

    First of all, thank you for your outstanding work! I have a problem when reproducing your work.

    Your GT is generated through the teacher network, so when the teacher network performance changes, then the GT will change accordingly. Do you have a more accurate GT? Or can you teach me how to measure the performance of the student model more accurately?

    Thanks!

    opened by muzhaohui 1
  • Something wrong with the dataset download path

    Something wrong with the dataset download path

    I want to download the dataset, but an error occurred HTTP request sent, awaiting response... 502 Bad Gateway 2021-05-20 00:49:02 ERROR 502: Bad Gateway.

    The URL of the dataset http://multimodal-distill.cs.uni-freiburg.de/#dataset also cannot be opened. image

    Could you fix this problem?

    opened by zhouweii234 1
  • Source codes and datasets are missing

    Source codes and datasets are missing

    Hi,

    I have tried to run the following code: python train.py --config ./configs

    Then, got the following error: ModuleNotFoundError: No module named 'src.fullcnn_net'.

    After checking the utils.py, I feel the followings source files are missing:

    src.fullcnn_net src.loss.ABLoss src.loss.MTALoss src.loss.KLLoss src.loss.GroupAttentionLoss src.loss.MultiTeacherPairWiseSimilarityLoss src.loss.MultiTeacherPairWiseSimilarityLoss src.loss.MultiTeacherContrastiveAttentionLoss src.loss. MultiTeacherContrastiveAttentionLoss src.loss.MultiTeacherTrippletAttentionLoss src.loss. MultiTeacherTrippletAttentionLoss src.loss.CRDLoss import CRDLoss src.loss.NSTLoss src.loss.PKTLoss src.loss.SimilarityLoss src.loss.RankingLoss

    Also, I am not sure how to download/access the datasets. I do not see any binary files downloaded inside the folder.

    Thanks for your help.

    opened by as4mz 1
  • Microphone array configurations

    Microphone array configurations

    Thanks for the impressive results. I have some problems on the microphone array.

    1. what is the separation among different microphones?
    2. Are they colocated with all other sensors? Will the position affect the model a lot?
    opened by sunwell1994 0
  • Could not unzip dataset

    Could not unzip dataset

    Thank you very much for sharing the dataset. I found It is really interesting to have a try this project. However, I have an issue with the shared dataset.

    After downloading 84 files and joining them into mavd_dataset.tar.gz by using the command: cat mavd_dataset.tar.gz.part-* > mavd_dataset.tar.gz, I could not unzip it even I tried serval options:

    1. gzip -d mavd_dataset.tar.gz gzip: mavd_dataset.tar.gz: not in gzip format

    2. tar -xzvf mavd_dataset.tar.gz gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now

    opened by tuantdang 0
  • About train

    About train

    Thank you very much for your data set and code. I encountered this problem when training the model: Traceback (most recent call last): File "F:/py_pro/MM-DistillNet-main/sec/optimization/train_methods.py", line 318, in logits_s, features_s = self.student_model(audio) File "D:\ProgramData\Anaconda3\envs\MM-DistillNet-main\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "F:\py_pro\MM-DistillNet-main\src\YetAnotherEfficientDet.py", line 670, in forward _, p3, p4, p5 = self.backbone_net(inputs) File "D:\ProgramData\Anaconda3\envs\MM-DistillNet-main\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "F:\py_pro\MM-DistillNet-main\src\YetAnotherEfficientDet.py", line 556, in forward x = self.model._conv_stem(x) File "D:\ProgramData\Anaconda3\envs\MM-DistillNet-main\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "F:\py_pro\MM-DistillNet-main\src\YetAnotherEfficientNet.py", line 54, in forward x = F.pad(x, [left, right, top, bottom]) File "D:\ProgramData\Anaconda3\envs\MM-DistillNet-main\lib\site-packages\torch\nn\functional.py", line 3998, in _pad assert len(pad) // 2 <= input.dim(), "Padding length too large" RuntimeError:Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same.

    I can't solve this problem. Did I make an error in processing audio files.

    opened by liushibei 0
  • About loading the data

    About loading the data

    Hello, I have downloaded the data you provided and put it under the "data" folder as "mavd_dataset.tar.gz". However, when running the train.py I encountered an issue, which is detailed as follows. I wonder what the "train.txt" is. Need I uncompress or do anything else?

    Traceback (most recent call last): File "train.py", line 316, in train_multimodal_detection(config) File "train.py", line 149, in train_multimodal_detection mode="train", File "/home/jcli/MM-DistillNet/src/datasets/MultimodalDetection.py", line 92, in init super().init(config=config, mode=mode,classes=self.classes) File "/home/jcli/MM-DistillNet/src/datasets/BaseDataset.py", line 106, in init self.ids = self.get_id_list() File "/home/jcli/MM-DistillNet/src/datasets/MultimodalDetection.py", line 111, in get_id_list self.ids = [id.strip() for id in open(id_list_path)] FileNotFoundError: [Errno 2] No such file or directory: 'data/train_all.txt'

    opened by Frankie123421 0
  • Issue with mp3_to_pkl.py

    Issue with mp3_to_pkl.py

    Hi, First I must thank you for the great work and making the data set available. I was trying to convert the dataset to pkl and at about 24%, I got the following error. Not sure how to fix this.

    image

    Appreciate any help in this.

    opened by eyeris 0
  • Is the code for evaluating the tracking performance available?

    Is the code for evaluating the tracking performance available?

    I think the evaluate.py is for evaluating the object detection performance. And I could not find the code for evaluating the tracking performance such as ID switch, MOTA and MOTP?

    opened by KawhiZhao 0
  • The thermal teacher

    The thermal teacher

    Hi,

    Thanks for sharing this great work!

    I just meet some problems about the thermal teacher. When I load the thermal teacher model, I found that most models were not updated to the initialized efficientDet. The evaluate.log output is :

    using path=trained_models/yet-another-efficientdet-d2-thermal.pth ModelDict Update:174/1076

    Because I found that the thermal teacher's batch_labels is always empty.

    So can you teach me how to load the thermal teacher model correctly?

    Thank you very much!

    opened by LE0J-Song 0
Owner
null
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning ?? ?? ?? Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab 398 Dec 30, 2022
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab 89 Dec 26, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation (TESKD) By Zheng Li[1,4], Xiang Li[2], Lingfeng Yang[2,4], Jian Yang[2], Zh

Zheng Li 9 Sep 26, 2022
Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression Knowledge distillation for BERT model Installation Run command below to install the environm

Siqi 180 Dec 19, 2022
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

MKGFormer Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion" Model Architecture Illu

ZJUNLP 68 Dec 28, 2022
Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Data Efficient Stagewise Knowledge Distillation Table of Contents Data Efficient Stagewise Knowledge Distillation Table of Contents Requirements Image

IvLabs 112 Dec 2, 2022
MERLOT: Multimodal Neural Script Knowledge Models

merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio

Rowan Zellers 190 Dec 22, 2022
Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

FRSKD Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021) Requirements Pytho

null 75 Dec 28, 2022
Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

DNA This repository provides the code of our paper: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Illustration of DNA

Changlin Li 215 Dec 19, 2022
AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Frank Liu 26 Oct 13, 2022
Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching Official pytorch implementation of "Show, Attend and Distill: Kn

Clova AI Research 80 Dec 16, 2022
PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

PocketNet This is the official repository of the paper: PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and M

Fadi Boutros 40 Dec 22, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

MosaicKD Code for NeurIPS-21 paper "Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data" 1. Motivation Natural images share common l

ZJU-VIPA 37 Nov 10, 2022
Instance-conditional Knowledge Distillation for Object Detection

Instance-conditional Knowledge Distillation for Object Detection This is a MegEngine implementation of the paper "Instance-conditional Knowledge Disti

MEGVII Research 47 Nov 17, 2022
Knowledge Distillation Toolbox for Semantic Segmentation

SegDistill: Toolbox for Knowledge Distillation on Semantic Segmentation Networks This repo contains the supported code and configuration files for Seg

null 9 Dec 12, 2022