Contrastive Language-Audio Pretraining

Overview

CLAP

Contrastive Language-Audio Pretraining

In due time this repo will be full of lovely things, I hope.

Feel free to check out the Issues if you're interested in contributing. Leave a note saying what interests you. :)

from jax import random, numpy as np
from clap.clap import CLAP

key1, key2 = random.split(random.PRNGKey(0), 2)

text = random.randint(key1, (2, 16,), 0, 256)
audio = random.uniform(key1, (2, 8, 512))

text_mask = np.ones((2, 16), dtype = bool)
audio_mask = np.ones((2, 8), dtype = bool)

model = CLAP(
    text_vocab = 256,
    text_dim = 512,
    text_depth = 6,
    text_heads = 8,
    audio_dim = 512,
    audio_depth = 6,
    audio_heads = 8
)

params = model.init(key2, text, audio, text_mask, audio_mask)
loss = model.apply(params, text, audio, text_mask, audio_mask)

# after a lot of training

sim = model.apply(params, text, audio, text_mask, audio_mask, return_loss = False) # (2, 2)
Comments
  • Set Up Package and Directory Structure

    Set Up Package and Directory Structure

    Would be great if this was set up with the basics of a Python package and any conveniences that make sense. Since there are a couple of different subcomponents in this project, we probably also want to figure out a directory structure so that people working on different aspects aren't stepping on toes.

    good first issue 
    opened by cfoster0 6
  • Add dataloader class

    Add dataloader class

    I've been told JAX plays nice with other frameworks, so a PyTorch dataloader is honestly fine. Should load in text (possibly using https://github.com/leogao2/lm_dataformat) and spectrograms from disk in order to construct batches of paired data.

    PyTorch 
    opened by cfoster0 3
  • Add Spectrogram Transformer

    Add Spectrogram Transformer

    Copy the Transformer-in-Transformer architecture, with some modifications. First, the patches should always cover the full vertical (Mel filters) range. So the outer patch size might be 80 x 8. The inner patch size should be 80 x 1, which will give us frame-level information. Second, both should use rotary embeddings for relative positional encoding.

    Feel free to take inspiration from: https://github.com/NZ99/transformer_in_transformer_flax https://github.com/lucidrains/transformer-in-transformer

    JAX 
    opened by cfoster0 3
  • Add Text Transformer

    Add Text Transformer

    Char or unigram-level transformer (though from the perspective of the transformer, it's just a sequence of token IDs). Likely does not need to have a context greater than, say, 128 tokens as the dataset clips are typically less than 10 seconds of audio, so we'll be cropping to that size or less anyway. Use rotary relative positional embeddings.

    JAX 
    opened by cfoster0 3
  • Add Tokenization

    Add Tokenization

    Tokenize the text either via chars or unigrams. Figure out most appropriate method here: ideally we want to be able to accommodate all English text, punctuation, emojis, and potentially other text.

    good first issue 
    opened by cfoster0 3
  • Add training loop

    Add training loop

    An optax training loop with an input pipeline will let us take the model code and run training with datasets.

    https://github.com/deepmind/optax

    Shouldn't be too terrible, especially once we've got a dataloading class set up.

    JAX 
    opened by cfoster0 2
  • Add Config System

    Add Config System

    It would be good to have config system for the model. This should specify stuff around model shapes/sizes, microbatch size, hyperparams, and anything we'd want stored in the Weights and Biases config for a run.

    documentation good first issue 
    opened by cfoster0 2
  • TFRecords

    TFRecords

    Switch dataset formatting to TFRecords, from current version which just uses a .pt files within a directory. This should make streaming possible, which may important for real training. Slightly awkward because the preprocessing is in PyTorch and the training is in JAX. I'm told the best way to read off TFRecords during training is through tf.data

    enhancement 
    opened by cfoster0 1
  • Add Pod Orchestration

    Add Pod Orchestration

    https://github.com/kingoflolz/mesh-transformer-jax

    Use the pod orchestation code from here. Effectively, we should borrow everything and modify the transformer_shard and tfrecord_loader files.

    JAX 
    opened by cfoster0 1
  • Add TFRecord Handling

    Add TFRecord Handling

    Should write datasets to TFRecords. Probably some format where the spectrogram tensor and its padded, tokenized text tensor are stored as a single example.

    opened by cfoster0 1
  • Switch datasets to TFRecords

    Switch datasets to TFRecords

    Both preprocess.py and train.py now use TFRecords as the format for pre-processed data. Would close #25. Currently untested on GCP, but it works with TFRecords at local paths.

    opened by cfoster0 0
  • Evaluations

    Evaluations

    We can start thinking about how to evaluate our models once trained. The simplest would be contrastive loss over some held out set with a fixed batch size. Another would be to evaluate how well CLAP score correlates with MOS (Mean Opinion Score), which is the gold standard subjective eval in the audio NN literature. We could probably also try linear probe training on the Google Speech Commands dataset.

    help wanted 
    opened by cfoster0 1
  • Alternative Loss Functions

    Alternative Loss Functions

    An alternative to the symmetric cross entropy loss, we could optimize for alignment and uniformity on the unit hypersphere as was done in this paper:

    https://arxiv.org/abs/2005.10242

    The code itself is like 4 lines of PyTorch, with the caveat that we'd need to write a pdist JAX function in order to use it.

    enhancement JAX 
    opened by cfoster0 1
Owner
Charles Foster
Charles Foster
Telegram bot + userbot for streaming audio in group calls.

Calls Music — Telegram bot + userbot for streaming audio in group calls ✍?? Requirements FFmpeg Python 3.7+ ?? Deployment ?? Configuration Copy exampl

Calls Music 30 May 17, 2021
A telegram bot that can send you high-quality audio 🎧🎧🎧

Music downloader bot Still under development Please Report issues to improve this repo.I will try to fix bugs in next update Music downloader bot is a

Anish Gowda 36 Dec 6, 2022
Telegram Radio - A User-bot who continuously play random audio files (from the famous telegram music channel @mveargasm) in the intended voice chat.

MvEargasmDJ: This is my submission for the Telegram Radio Project of Baivaru. Which required a userbot to continiously play random audio files from th

eyaadh 24 Nov 12, 2022
A Telegram Bot to Play Audio in Voice Chats With Youtube and Deezer support. Supports Live streaming from youtube Supports Mega Radio Fm Streamings

Bot To Stream Musics on PyTGcalls with Channel Support. A Telegram Bot to Play Audio in Voice Chats With Supports Live streaming from youtube and Mega

Shamil Habeeb 37 Dec 15, 2022
This bot can stream audio or video files and urls in telegram voice chats :)

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) ?? Follow me and star this repo for more telegram bot

Anjana Madu 63 Dec 25, 2022
Telegram bot to download tiktok video/audio

TikTokDL (Bot) Telegram RoBot to Download Tiktok video/audio. Features: ?? Download TikTok Video without Watermark ?? Download TikTok Video with Water

X-Noid 23 Nov 21, 2022
A Telegram Userbot to play or streaming Audio and Video songs / files in Telegram Voice Chats.

Vcmusic-Userbot A Telegram Userbot to play or streaming Audio and Video songs / files in Telegram Voice Chats. It's made with PyTgCalls and Pyrogram R

null 3 Oct 23, 2021
This is a story bot, that will scrape stories from r/stories subreddit and convert it into an Audio File.

Introduction This is a story bot, that will scrape stories from r/stories subreddit and convert it into an Audio File. Installation pip install -r req

Yasho 11 Jun 30, 2022
A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats.

VC UserBot A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats. It's made with PyTgCalls and Pyrogram Requirements Python

조던 1 Nov 29, 2021
A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats

TG-MusicPlayer A Telegram Userbot to play Audio and Video songs / files in Telegram Voice Chats. It's made with PyTgCalls and Pyrogram Requirements Py

null 4 Jul 30, 2022
Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Dr Asad Ali 6 Jun 20, 2022
Rocks vc Userbot: A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group

⭐️ Rocks VC Userbot ⭐️ Telegram Userbot To Play Audio And Video Song On VC Chat

Dr Asad Ali 10 Jul 18, 2022
Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

MonkeyLearn API for Python Official Python client for the MonkeyLearn API. Build and run machine learning models for language processing from your Pyt

MonkeyLearn 157 Nov 22, 2022
Python library for the DeepL language translation API.

The DeepL API is a language translation API that allows other computer programs to send texts and documents to DeepL's servers and receive high-quality translations. This opens a whole universe of opportunities for developers: any translation product you can imagine can now be built on top of DeepL's best-in-class translation technology.

DeepL 535 Jan 4, 2023
Asynchronous and also synchronous non-official QvaPay client for asyncio and Python language.

Asynchronous and also synchronous non-official QvaPay client for asyncio and Python language. This library is still under development, the interface could be changed.

Leynier Gutiérrez González 8 Sep 18, 2021
An advanced telegram language translator bot

Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License -> https://github.com/FayasNoushad/Translator-Bot-V3/blob/main/LICE

Fayas Noushad 19 Dec 24, 2022
Integrating the Daraja-Api with Python language

Mpesa-Daraja-Api Integrating the Daraja-Api with Python language. Credentials.py file This file contains the consumer key and the consumer secrete key

Morvin Ian 3 Nov 9, 2022
A simple language translator with python and google translate api

Language translator with python A simple language translator with python and google translate api Install pip and python 3.9. All the required depende

null 0 Nov 11, 2021
Cookiecutter templates for Serverless applications using AWS SAM and the Rust programming language.

Cookiecutter SAM template for Lambda functions in Rust This is a Cookiecutter template to create a serverless application based on the Serverless Appl

AWS Samples 24 Nov 11, 2022