Contrastive Language-Audio Pretraining

Charles Foster

Last update: Dec 1, 2022

Related tags

Third-party APIs Wrappers CLAP

Overview

CLAP

Contrastive Language-Audio Pretraining

In due time this repo will be full of lovely things, I hope.

Feel free to check out the Issues if you're interested in contributing. Leave a note saying what interests you. :)

from jax import random, numpy as np
from clap.clap import CLAP

key1, key2 = random.split(random.PRNGKey(0), 2)

text = random.randint(key1, (2, 16,), 0, 256)
audio = random.uniform(key1, (2, 8, 512))

text_mask = np.ones((2, 16), dtype = bool)
audio_mask = np.ones((2, 8), dtype = bool)

model = CLAP(
    text_vocab = 256,
    text_dim = 512,
    text_depth = 6,
    text_heads = 8,
    audio_dim = 512,
    audio_depth = 6,
    audio_heads = 8
)

params = model.init(key2, text, audio, text_mask, audio_mask)
loss = model.apply(params, text, audio, text_mask, audio_mask)

# after a lot of training

sim = model.apply(params, text, audio, text_mask, audio_mask, return_loss = False) # (2, 2)

Comments

Set Up Package and Directory Structure

Would be great if this was set up with the basics of a Python package and any conveniences that make sense. Since there are a couple of different subcomponents in this project, we probably also want to figure out a directory structure so that people working on different aspects aren't stepping on toes.
good first issue

opened by cfoster0 6
Add dataloader class

I've been told JAX plays nice with other frameworks, so a PyTorch dataloader is honestly fine. Should load in text (possibly using https://github.com/leogao2/lm_dataformat) and spectrograms from disk in order to construct batches of paired data.
PyTorch

opened by cfoster0 3
Add Spectrogram Transformer

Copy the Transformer-in-Transformer architecture, with some modifications. First, the patches should always cover the full vertical (Mel filters) range. So the outer patch size might be 80 x 8. The inner patch size should be 80 x 1, which will give us frame-level information. Second, both should use rotary embeddings for relative positional encoding.

Feel free to take inspiration from: https://github.com/NZ99/transformer_in_transformer_flax https://github.com/lucidrains/transformer-in-transformer
JAX

opened by cfoster0 3
Add Text Transformer

Char or unigram-level transformer (though from the perspective of the transformer, it's just a sequence of token IDs). Likely does not need to have a context greater than, say, 128 tokens as the dataset clips are typically less than 10 seconds of audio, so we'll be cropping to that size or less anyway. Use rotary relative positional embeddings.
JAX

opened by cfoster0 3
Add Tokenization

Tokenize the text either via chars or unigrams. Figure out most appropriate method here: ideally we want to be able to accommodate all English text, punctuation, emojis, and potentially other text.
good first issue

opened by cfoster0 3
Add training loop

An optax training loop with an input pipeline will let us take the model code and run training with datasets.

https://github.com/deepmind/optax

Shouldn't be too terrible, especially once we've got a dataloading class set up.
JAX

opened by cfoster0 2
Add Config System

It would be good to have config system for the model. This should specify stuff around model shapes/sizes, microbatch size, hyperparams, and anything we'd want stored in the Weights and Biases config for a run.
documentation good first issue

opened by cfoster0 2
TFRecords

Switch dataset formatting to TFRecords, from current version which just uses a .pt files within a directory. This should make streaming possible, which may important for real training. Slightly awkward because the preprocessing is in PyTorch and the training is in JAX. I'm told the best way to read off TFRecords during training is through tf.data
enhancement

opened by cfoster0 1
Add Pod Orchestration

https://github.com/kingoflolz/mesh-transformer-jax

Use the pod orchestation code from here. Effectively, we should borrow everything and modify the transformer_shard and tfrecord_loader files.
JAX

opened by cfoster0 1
Add TFRecord Handling

Should write datasets to TFRecords. Probably some format where the spectrogram tensor and its padded, tokenized text tensor are stored as a single example.

opened by cfoster0 1
Switch datasets to TFRecords

Both preprocess.py and train.py now use TFRecords as the format for pre-processed data. Would close #25. Currently untested on GCP, but it works with TFRecords at local paths.

opened by cfoster0 0
Evaluations

We can start thinking about how to evaluate our models once trained. The simplest would be contrastive loss over some held out set with a fixed batch size. Another would be to evaluate how well CLAP score correlates with MOS (Mean Opinion Score), which is the gold standard subjective eval in the audio NN literature. We could probably also try linear probe training on the Google Speech Commands dataset.
help wanted

opened by cfoster0 1
Alternative Loss Functions

An alternative to the symmetric cross entropy loss, we could optimize for alignment and uniformity on the unit hypersphere as was done in this paper:

https://arxiv.org/abs/2005.10242

The code itself is like 4 lines of PyTorch, with the caveat that we'd need to write a pdist JAX function in order to use it.
enhancement JAX

opened by cfoster0 1

Owner

Charles Foster

GitHub

Telegram bot + userbot for streaming audio in group calls.

Calls Music — Telegram bot + userbot for streaming audio in group calls ✍?? Requirements FFmpeg Python 3.7+ ?? Deployment ?? Configuration Copy exampl

30 May 17, 2021

A telegram bot that can send you high-quality audio 🎧🎧🎧

Music downloader bot Still under development Please Report issues to improve this repo.I will try to fix bugs in next update Music downloader bot is a

36 Dec 6, 2022

Telegram Radio - A User-bot who continuously play random audio files (from the famous telegram music channel @mveargasm) in the intended voice chat.

MvEargasmDJ: This is my submission for the Telegram Radio Project of Baivaru. Which required a userbot to continiously play random audio files from th

24 Nov 12, 2022

A Telegram Bot to Play Audio in Voice Chats With Youtube and Deezer support. Supports Live streaming from youtube Supports Mega Radio Fm Streamings

Bot To Stream Musics on PyTGcalls with Channel Support. A Telegram Bot to Play Audio in Voice Chats With Supports Live streaming from youtube and Mega

37 Dec 15, 2022

This bot can stream audio or video files and urls in telegram voice chats :)

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) ?? Follow me and star this repo for more telegram bot

63 Dec 25, 2022

Telegram bot to download tiktok video/audio

TikTokDL (Bot) Telegram RoBot to Download Tiktok video/audio. Features: ?? Download TikTok Video without Watermark ?? Download TikTok Video with Water

23 Nov 21, 2022

A Telegram Userbot to play or streaming Audio and Video songs / files in Telegram Voice Chats.

Vcmusic-Userbot A Telegram Userbot to play or streaming Audio and Video songs / files in Telegram Voice Chats. It's made with PyTgCalls and Pyrogram R

3 Oct 23, 2021

This is a story bot, that will scrape stories from r/stories subreddit and convert it into an Audio File.

Introduction This is a story bot, that will scrape stories from r/stories subreddit and convert it into an Audio File. Installation pip install -r req

11 Jun 30, 2022

A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats.

VC UserBot A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats. It's made with PyTgCalls and Pyrogram Requirements Python

1 Nov 29, 2021

A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats

TG-MusicPlayer A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats. It's made with PyTgCalls and Pyrogram Requirements Py

4 Jul 30, 2022

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

6 Jun 20, 2022

Rocks vc Userbot: A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group

⭐️ Rocks VC Userbot ⭐️ Telegram Userbot To Play Audio And Video Song On VC Chat

10 Jul 18, 2022

Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

MonkeyLearn API for Python Official Python client for the MonkeyLearn API. Build and run machine learning models for language processing from your Pyt

157 Nov 22, 2022

Python library for the DeepL language translation API.

The DeepL API is a language translation API that allows other computer programs to send texts and documents to DeepL's servers and receive high-quality translations. This opens a whole universe of opportunities for developers: any translation product you can imagine can now be built on top of DeepL's best-in-class translation technology.

535 Jan 4, 2023

Contrastive Language-Audio Pretraining

Related tags

Overview

CLAP

Comments

Set Up Package and Directory Structure

Add dataloader class

Add Spectrogram Transformer

Add Text Transformer

Add Tokenization

Add training loop

Add Config System

TFRecords

Add Pod Orchestration

Add TFRecord Handling

Switch datasets to TFRecords

Evaluations

Alternative Loss Functions

Owner

Charles Foster

Telegram bot + userbot for streaming audio in group calls.

A telegram bot that can send you high-quality audio 🎧🎧🎧

Telegram Radio - A User-bot who continuously play random audio files (from the famous telegram music channel @mveargasm) in the intended voice chat.

A Telegram Bot to Play Audio in Voice Chats With Youtube and Deezer support. Supports Live streaming from youtube Supports Mega Radio Fm Streamings

This bot can stream audio or video files and urls in telegram voice chats :)

Telegram bot to download tiktok video/audio

A Telegram Userbot to play or streaming Audio and Video songs / files in Telegram Voice Chats.

This is a story bot, that will scrape stories from r/stories subreddit and convert it into an Audio File.

A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats.

A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Rocks vc Userbot: A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group

Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

Python library for the DeepL language translation API.

Asynchronous and also synchronous non-official QvaPay client for asyncio and Python language.

An advanced telegram language translator bot

Integrating the Daraja-Api with Python language

A simple language translator with python and google translate api

Cookiecutter templates for Serverless applications using AWS SAM and the Rust programming language.