Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Ilaria Manco

Last update: Dec 7, 2022

Related tags

Overview

MusCaps: Generating Captions for Music Audio

Ilaria Manco¹ ², Emmanouil Benetos¹, Elio Quinton², Gyorgy Fazekas¹
¹ Queen Mary University of London, ² Universal Music Group

This repository is the official implementation of "MusCaps: Generating Captions for Music Audio" (IJCNN 2021). In this work, we propose an encoder-decoder model to generate natural language descriptions of music audio. We provide code to train our model on any dataset of (audio, caption) pairs, together with code to evaluate the generated descriptions on a set of automatic metrics (BLEU, METEOR, ROUGE, CIDEr, SPICE, SPIDEr).

Setup

The code was developed in Python 3.7 on Linux CentOS 7 and training was carried out on an RTX 2080 Ti GPU. Other GPUs and platforms have not been fully tested.

Clone the repo

git clone https://github.com/ilaria-manco/muscaps
cd muscaps

You'll need to have the libsndfile library installed. All other requirements, including the code package, can be installed with

pip install -r requirements.txt
pip install -e .

Project structure

root
├─ configs                      # Config files
│   ├─ datasets
│   ├─ models  
│   └─ default.yaml              
├─ data                         # Folder to save data (input data, pretrained model weights, etc.)
│   ├─ audio_encoders   
│   ├─ datasets            
│   │   └─ dataset_name     
|   └── ...             
├─ muscaps
|   ├─ caption_evaluation_tools # Translation metrics eval on audio captioning 
│   ├─ datasets                 # Dataset classes
│   ├─ models                   # Model code
│   ├─ modules                  # Model components
│   ├─ scripts                  # Python scripts for training, evaluation etc.
│   ├─ trainers                 # Trainer classes
│   └─ utils                    # Utils
└─ save                         # Saved model checkpoints, logs, configs, predictions    
    └─ experiments
        ├── experiment_id1
        └── ...

Dataset

The datasets used in our experiments is private and cannot be shared, but details on how to prepare an equivalent music captioning dataset are provided in the data README.

Pre-trained audio feature extractors

For the audio feature extraction component, MusCaps uses CNN-based audio tagging models like musicnn. In our experiments, we use @minzwon's implementation and pre-trained models, which you can download from the official repo. For example, to obtain the weights for the HCNN model trained on the MagnaTagATune dataset, run the following commands

mkdir data/audio_encoders
cd data/audio_encoders/
wget https://github.com/minzwon/sota-music-tagging-models/raw/master/models/mtat/hcnn/best_model.pth
mv best_model.pth mtt_hcnn.pth

Training

Dataset, model and training configurations are set in the respective yaml files in configs. Some of the fields can be overridden by arguments in the CLI (for more details on this, refer to the training script).

To train the model with the default configs, simply run

cd muscaps/scripts/
python train.py <baseline/attention> --feature_extractor <musicnn/hcnn> --pretrained_model <msd/mtt>  --device_num <gpu_number>

This will generate an experiment_id and create a new folder in save/experiments where the output will be saved.

If you wish to resume training from a saved checkpoint, run

python train.py <baseline/attention> --experiment_id <experiment_id>  --device_num <gpu_number>

Evaluation

To evaluate a model saved under <experiment_id> on the captioning task, run

cd muscaps/scripts/
python caption.py <experiment_id> --metrics True

Cite

@misc{manco2021muscaps,
      title={MusCaps: Generating Captions for Music Audio}, 
      author={Ilaria Manco and Emmanouil Benetos and Elio Quinton and Gyorgy Fazekas},
      year={2021},
      eprint={2104.11984},
      archivePrefix={arXiv}
}

Acknowledgements

This repo reuses some code from the following repos:

sota-music-tagging-models by @minzwon
caption-evaluation-tools by @audio-captioning
mmf by @facebookresearch
a-PyTorch-Tutorial-to-Image-Captioning by @sgrvinod
allennlp by @allenai

Contact

If you have any questions, please get in touch: [email protected].

FwordCTF 2021 Infrastructure and Source code of Web/Bash challenges

FwordCTF 2021 You can find here the source code of the challenges I wrote (Web and Bash) in FwordCTF 2021 and the source code of the platform with our

5 Nov 25, 2022

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

7 Oct 22, 2021

Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

53 Dec 25, 2022

Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling

RHGN Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling Dependencies torch==1.6.0 torchvision==0.7.0 dgl==0.7.1

Big Data and Multi-modal Computing Group, CRIPAC

6 Nov 29, 2022

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

CaGCN This repo is for source code of NeurIPS 2021 paper "Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration". Paper L

6 Dec 19, 2022

Source code for Zalo AI 2021 submission

zalo_ltr_2021 Source code for Zalo AI 2021 submission Solution: Pipeline We use the pipepline in the picture below: Our pipeline is combination of BM2

128 Dec 27, 2022

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Source-to-Source Debuggable Derivatives in Pure Python

Tangent Tangent is a new, free, and open-source Python library for automatic differentiation. Existing libraries implement automatic differentiation b

2.2k Jan 1, 2023

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

1 Oct 26, 2021

Comments

Missing COCO Java dependency
The computation of the MS-COCO metrics fails due to a missing dependency.

Steps to reproduce

By running caption.py, e.g. python muscaps/scripts/caption.py your_experiment_id --metrics True I got:

Loading dataset Building model loading annotations into memory... 0:00:00.001473 creating index... index created! Loading and preparing results... DONE (t=0.00s) creating index... index created! tokenization... Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer

My solution

In the original MS-COCO repository there's a file that is missing in this repository: Downloading that file and copying it in muscaps/caption_evaluation_tools/coco_caption/pycocoevalcap/tokenizer/ solved the issue for me.
opened by GiovanniGabbolini 2
Dataset Acquisition

Hello, I am doing research on audio caption recently, and I am inspired by your article. Could you provide dataset that you are using in this article? Thanks a lot！

opened by bugczw 2
pre-trained weights

Nice work! It would be great to have the pre-trained weights of the whole muscaps model (not only for the feature extractor). Could you please share them? :) Thanks in advance

opened by ericguizzo 1

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Related tags

Overview

MusCaps: Generating Captions for Music Audio

Setup

Project structure

Dataset

Pre-trained audio feature extractors

Training

Evaluation

Cite

Acknowledgements

Contact

You might also like...

FwordCTF 2021 Infrastructure and Source code of Web/Bash challenges

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

Source code for Zalo AI 2021 submission

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Source-to-Source Debuggable Derivatives in Pure Python

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Comments

Missing COCO Java dependency

Steps to reproduce

My solution

Dataset Acquisition

pre-trained weights

Owner

Ilaria Manco

Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

Source code for CVPR 2021 paper "Riggable 3D Face Reconstruction via In-Network Optimization"

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'