Hugging Face Machine Learning for Audio Study Group
Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and discussions, we'll explore the field of applying Machine Learning in the Audio domain. Some examples of this are:
- Generating synthetic sound out of a given text (think of conversational assistants)
- Transcribing audio signals to text.
- Removing noise out of an audio.
- Separating different sources of audio.
- Identifying which speaker is talking.
- And much more!
We suggest you to join the community Discord at http://hf.co/join/discord, and we're looking forward to meet at the #ml-4-audio-study-group channel
Organisation
We'll kick off with some basics and then collaboratively decide the further direction of the group.
Before each session:
- Read/watch related resources
During each session, you can
- Ask question in the forum
- Present a short (~10-15mins) presentation on the topic (agree beforehand)
Before/after:
- Keep discussing/asking questions about the topic (#ml-4-audio-study channel on discord)
- Share interesting resources
Schedule
Date | Topics | Resources (To read before) |
---|---|---|
Dec 14, 2021 | Kickoff + Overview of Audio related usecases (video, questions) | The 3 DL Frameworks for e2e Speech Recognition that power your devices |
Dec 21, 2021 |
|
|
Jan 4, 2022 | Text to Speech Deep Dive (video, questions) | |
Jan 18, 2022 | pyctcdecode: A simple & fast STT prediction decoding algorithm (demo, slides, questions) |
Supplementary Resources
In case you want to solidify a concept, or just want to go down further deep into the speech processing rabbit-hole.
General Resources
- Slides from LSA352: Slides (no videos available)
- Slides from CS224S (Latest): Slides (no videos available)
- Speech & Language Processing Book (Chapters 25 & 26) - E-book
Research Papers
- Speech Recognition Papers: Github repo
- Speech Synthesis Papers: Github repo
Toolkits
- Speechbrain - Github repo
- Toucan - Github repo
- ESPnet - Github repo
Demos
- Add interesting effects to your audio files - Huggingface spaces
- Generate Speech from text (TTS) - Huggingface spaces
- Generate text from Speech (ASR) - Huggingface spaces