Self-Supervised Contrastive Learning of Music Spectrograms

Overview

Self-Supervised Music Analysis

Self-Supervised Contrastive Learning of Music Spectrograms

Dataset

Songs on the Billboard Year End Hot 100 were collected from the years 1960-2020. This list tracks the top songs of the US market for a given calendar year based on aggregating metrics including streaming plays, physical and digital purchases, radio plays, etc. In total the dataset includes 5737 songs, excluding some songs which could not be found and some which are duplicates across multiple years. It’s worth noting that the types of songs that are able to make it onto this sort of list represent a very narrow subset of the overall variety of the US music market, let alone the global music market. So while we can still learn some interesting things from this dataset, we shouldn’t mistake it for being representative of music in general.

Raw audio files were processed into spectrograms using a synchrosqueeze CWT algorithm from the ssqueezepy python library. Some additional cleaning and postprocessing was done and the spectrograms were saved as grayscale images. These images are structured so that the Y axis which spans 256 pixels represents a range of frequencies from 30Hz – 12kHz with a log scale. The X axis represents time with a resolution of 200 pixels per second. Pixel intensity therefore encodes the signal energy at a particular frequency at a moment in time.

The full dataset can be found here: https://www.kaggle.com/tpapp157/billboard-hot-100-19602020-spectrograms

Model and Training

A 30 layer ResNet styled CNN architecture was used as the primary feature extraction network. This was augmented with learned position embeddings along the frequency axis inserted at regular block intervals. Features were learned in a completely self-supervised fashion using Contrastive Learning. Matched pairs were taken as random 256x1024 pixel crops (corresponding to ~5 seconds of audio) from each song with no additional augmentations.

Output feature vectors have 512 channels representing a 64 pixel span (~0.3 seconds of audio).

Results

The entirety of each song was processed via the feature extractor with the resulting song matrix averaged across the song length into a single vector. UMAP is used for visualization and HDBSCAN for cluster extraction producing the following plot:

Each color represents a cluster (numbered 0-16) of similar songs based on the learned features. Immediately we can see a very clear structure in the data, showing the meaningful features have been learned. We can also color the points by year of release:

Points are colored form oldest (dark) to newest (light). As expected, the distribution of music has changed over the last 60 years. This gives us some confidence that the learned features are meaningful but let’s try a more specific test. A gradient boosting regressor model is trained on the learned features to predict the release year of a song.

The model achieves an overall mean absolute error of ~6.2 years. The violin and box plots show the distribution of predictions for songs in each year. This result is surprisingly good considering we wouldn’t expect a model get anywhere near perfect. The plot shows some interesting trends in how the predicted median and overall variance shift from year to year. Notice, for example, the high variance and rapid median shift across the years 1990 to 2000 compared to the decades before and after. This hints at some potential significant changes in the structure of music during this decade. Those with a knowledge of modern musical history probably already have some ideas in mind. Again, it’s worth noting that this dataset represents generically popular music which we would expect to lag behind specific music trends (probably by as much as 5-10 years).

Let’s bring back the 17 clusters that were identified previously and look at the distribution of release years of songs in each cluster. The black grouping labeled -1 captures songs which were not strongly allocated to any particular cluster and is simply included for completeness.

Here again we see some interesting trends of clusters emerging, peaking, and even dying out at various points in time. Aligning with out previous chart, we see four distinct clusters (7, 10, 11, 12) die off in the 90s while two brand new clusters (3, 4) emerge. Other clusters (8, 9, 15), interestingly, span most or all of the time range.

We can also look at the relative allocation of songs to clusters by year to get a better sense of the overall size of each cluster.

Cluster Samples

So what exactly are these clusters? I’ve provided links below to ten representative songs from each cluster so you can make your own qualitative evaluation. Before going further and listening to these songs I want to encourage you loosen your preconceived notions of musical genre. Popular conception of musical genres typically includes non-musical aspects like lyrics, theme, particular instruments, artist demographics, singer accent, year of release, marketing, etc. These aspects are not captured in the dataset and therefore not represented below but with an open ear you may find examples of songs that you considered to be different genres are actually quite musically similar.

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Cluster 9

Cluster 10

Cluster 11

Cluster 12

Cluster 13

Cluster 14

Cluster 15

Cluster 16

You might also like...
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

[CVPR 2021]
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

 Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang
[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Self-Damaging Contrastive Learning Introduction The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervis

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization (Salesforce Research) This is a PyTorch implementation of the CoMatch paper [B

PyTorch implementation of
PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

Supervised Contrastive Learning for Downstream Optimized Sequence Representations
Supervised Contrastive Learning for Downstream Optimized Sequence Representations

SupCL-Seq 📖 Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, ext

Owner
null
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Saeed Lotfi 28 Dec 12, 2022
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

Xinlong Wang 491 Jan 3, 2023
A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND 32 Jan 2, 2023
Implementation for paper "Towards the Generalization of Contrastive Self-Supervised Learning"

Contrastive Self-Supervised Learning on CIFAR-10 Paper "Towards the Generalization of Contrastive Self-Supervised Learning", Weiran Huang, Mingyang Yi

Weiran Huang 13 Nov 30, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

DirectCLR DirectCLR is a simple contrastive learning model for visual representation learning. It does not require a trainable projector as SimCLR. It

Meta Research 49 Dec 21, 2022
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 9, 2023
PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

SelfReg PyTorch official implementation of Self-supervised Contrastive Regularization for Domain Generalization (SelfReg, https://arxiv.org/abs/2104.0

null 64 Dec 16, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022