HF's ML for Audio study group

Vaibhav Srivastav

Last update: Jan 1, 2023

Related tags

Overview

Hugging Face Machine Learning for Audio Study Group

Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and discussions, we'll explore the field of applying Machine Learning in the Audio domain. Some examples of this are:

Generating synthetic sound out of a given text (think of conversational assistants)
Transcribing audio signals to text.
Removing noise out of an audio.
Separating different sources of audio.
Identifying which speaker is talking.
And much more!

We suggest you to join the community Discord at http://hf.co/join/discord, and we're looking forward to meet at the #ml-4-audio-study-group channel 🤗 . Remember, this is a community effort so make out of this your space!

Organisation

We'll kick off with some basics and then collaboratively decide the further direction of the group.

Before each session:

Read/watch related resources

During each session, you can

Ask question in the forum
Present a short (~10-15mins) presentation on the topic (agree beforehand)

Before/after:

Keep discussing/asking questions about the topic (#ml-4-audio-study channel on discord)
Share interesting resources

Schedule

Date	Topics	Resources (To read before)
Dec 14, 2021	Kickoff + Overview of Audio related usecases (video, questions)	The 3 DL Frameworks for e2e Speech Recognition that power your devices
Dec 21, 2021	Intro to Audio Automatic Speech Recognition Deep Dive (video, questions)	Intro to Audio for FastAI Sections 1 and 2 Speech and Language Processing 26.1-26.5
Jan 4, 2022	Text to Speech Deep Dive (video, questions)	Intro to Audio & ASR Notebooks Speech and Language Processing 26.6
Jan 18, 2022	pyctcdecode: A simple & fast STT prediction decoding algorithm (demo, slides, questions)	Beam search CTC decoding pyctcdecode

Supplementary Resources

In case you want to solidify a concept, or just want to go down further deep into the speech processing rabbit-hole.

General Resources

Slides from LSA352: Slides (no videos available)
Slides from CS224S (Latest): Slides (no videos available)
Speech & Language Processing Book (Chapters 25 & 26) - E-book

Research Papers

Speech Recognition Papers: Github repo
Speech Synthesis Papers: Github repo

Toolkits

Speechbrain - Github repo
Toucan - Github repo
ESPnet - Github repo

Demos

Add interesting effects to your audio files - Huggingface spaces
Generate Speech from text (TTS) - Huggingface spaces
Generate text from Speech (ASR) - Huggingface spaces

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

Cute study buddy that helps you study with the Pomodoro technique!

study-buddy Cute study buddy that helps you study with the Pomodoro (or Animedoro) technique! Kirby The Kirby folder has a Kirby, pink-themed Pomodoro

1 Jan 19, 2022

A tool for study using pomodoro methodology, while study mode spotify or any other .exe app is opened and while resting is closed.

Pomodoro-Timer-With-Spotify-Connection A tool for study using pomodoro methodology, while study mode spotify or any other .exe app is opened and while

2 Oct 23, 2022

Audio Retrieval with Natural Language Queries: A Benchmark Study

Audio Retrieval with Natural Language Queries: A Benchmark Study Paper | Project page | Text-to-audio search demo This repository is the implementatio

21 Oct 31, 2022

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

98 Nov 16, 2022

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

WaGpScraper A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working

27 Dec 18, 2022

Ningyu Jia(nj2459)/Mengyin Ma(mm5937) Call Analysis group project(Group 36)

Group and Section Group 36 Section 001 name and UNI Name UNI Ningyu Jia nj2459 Mengyin Ma mm5937 code explanation Parking.py (1) Calculate the rate of

1 Dec 4, 2021

BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

1 Jan 26, 2022

szrose is an all in one group management bot made for managing your group effectively with some advance security tools & Suit For All Your Needs ❤️

93 Jan 7, 2023

Telegram bot + userbot for streaming audio in group calls.

Calls Music — Telegram bot + userbot for streaming audio in group calls ✍🏻 Requirements FFmpeg Python 3.7+ 🚀 Deployment 🛠 Configuration Copy exampl

30 May 17, 2021

Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

1 Oct 29, 2021

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Comments

YOUR-TTS Paper overview

Hi, I think it could be nice to have a presentation about the paper YOUR-TTS: https://arxiv.org/pdf/2112.02418.pdf as it joins a lot of elements of advanced elements in TTS (Speaker embedding, Language embedding)

It would also be interesting to see how Monotonic Alignment Search and Affine Coupling Layers work both at training and at inference. It is just an idea that I thought could be of interest. Feel free to disregard this if you don't think it should be something for a talk, that's ok.

That's all, thanks!

opened by ignacio-ferreira-dev 1
Music Generation & Processing
Context

AI-generated music is not new, however, the recent advancements in TTS have led to more fine-grained control over the generated sound.

Duration

10 minutes

Talk flow

Motivation

Model Architecture

Sample sounds

Possible pitfalls

Next steps

Key takeaways

The audience will get an understanding of TTS and how can one tweak the architecture to gain prosodic control over the generated speech.

P.S. The other details above are just a suggestion and you can tweak it at your convenience, just copy this template and respond with what you would like to cover.

If you would like to present on this topic or suggest a speaker, please leave a comment below :)
opened by Vaibhavs10 3

HF's ML for Audio study group

Related tags

Overview

Hugging Face Machine Learning for Audio Study Group

Organisation

Schedule

Supplementary Resources

General Resources

Research Papers

Toolkits

Demos

You might also like...

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Cute study buddy that helps you study with the Pomodoro technique!

A tool for study using pomodoro methodology, while study mode spotify or any other .exe app is opened and while resting is closed.

Audio Retrieval with Natural Language Queries: A Benchmark Study

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

Ningyu Jia(nj2459)/Mengyin Ma(mm5937) Call Analysis group project(Group 36)

BC3407-Group-5-Project - BC3407 Group Project With Python

szrose is an all in one group management bot made for managing your group effectively with some advance security tools & Suit For All Your Needs ❤️

Telegram bot + userbot for streaming audio in group calls.

Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Rocks vc Userbot: A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

Audio spatialization over WebRTC and JACK Audio Connection Kit

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

Audio augmentations library for PyTorch for audio in the time-domain

praudio provides audio preprocessing framework for Deep Learning audio applications

Comments

YOUR-TTS Paper overview

Music Generation & Processing

Context

Duration

Talk flow

Key takeaways

Owner

Vaibhav Srivastav

Data manipulation and transformation for audio signal processing, powered by PyTorch

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

VMD Audio/Text control with natural language

AudioCLIP Extending CLIP to Image, Text and Audio

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

TalkNet: Audio-visual active speaker detection Model

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

A desktop GUI providing an audio interface for GPT3.