333 Python Audio-unit Libraries

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation The code repository for "Audio-Visual Generalized Few-Shot Learning with

3 Jun 27, 2022

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

AASIST This repository provides the overall framework for training and evaluating audio anti-spoofing systems proposed in 'AASIST: Audio Anti-Spoofing

56 Jan 2, 2023

An audio-solving python funcaptcha solving module

funcapsolver funcapsolver is a funcaptcha audio-solving module, which allows captchas to be interacted with and solved with the use of google's speech

8 Nov 21, 2022

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

22 Jul 7, 2022

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。安装 pip install -U ttskit 注意可能需另外安装的依赖包：torch，版本要求torch=1.6.0,=1.7.1，根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

483 Jan 4, 2023

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

33 Sep 22, 2022

Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

883 Jan 7, 2023

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries. An example o

9.6k Jan 6, 2023

Django-Audiofield is a simple app that allows Audio files upload, management and conversion to different audio format (mp3, wav & ogg), which also makes it easy to play audio files into your Django application.

Django-Audiofield Description: Django Audio Management Tools Maintainer: Areski Contributors: list of contributors Django-Audiofield is a simple app t

167 Nov 10, 2022

Music source separation is a task to separate audio recordings into individual sources

Music Source Separation Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmeme

958 Jan 3, 2023

Gateware for the Terasic/Arrow DECA board, to become a USB2 high speed audio interface

DECA USB Audio Interface DECA based USB 2.0 High Speed audio interface Status / current limitations enumerates as class compliant audio device on Linu

16 Mar 21, 2022

txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.

3.1k Dec 31, 2022

Classifying audio using Wavelet transform and deep learning

Audio Classification using Wavelet Transform and Deep Learning A step-by-step tutorial to classify audio signals using continuous wavelet transform (C

17 Nov 29, 2022

A tool to fuck a video/audio quality using FFmpeg

Media quality fucker A tool to fuck a video/audio quality using FFmpeg How to use Download the source Download Python Extract FFmpeg Put what you want

8 Jan 25, 2022

Reading list for research topics in sound event detection

Sound event detection aims at processing the continuous acoustic signal and converting it into symbolic descriptions of the corresponding sound events present at the auditory scene.

64 Jan 5, 2023

pedalboard is a Python library for adding effects to audio.

pedalboard is a Python library for adding effects to audio. It supports a number of common audio effects out of the box, and also allows the use of VST3® and Audio Unit plugin formats for third-party effects.

3.9k Jan 2, 2023

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

86 Dec 7, 2022

GA SEI Unit 4 project backend for Bloom.

Grow Your OpportunitiesTM Background Watch the Bloom Intro Video At Bloom, we believe every job seeker deserves an opportunity to find meaningful work

3 Sep 20, 2021

praudio provides audio preprocessing framework for Deep Learning audio applications

praudio provides objects and a script for performing complex preprocessing operations on entire audio datasets with one command.

105 Dec 26, 2022

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

96 Jan 3, 2023

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

261 Dec 14, 2022

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

64 Dec 22, 2022

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

42 Dec 9, 2022

Easy to use Audio Tagging in PyTorch

Audio Classification, Tagging & Sound Event Detection in PyTorch Progress: Fine-tune on audio classification Fine-tune on audio tagging Fine-tune on s

15 Dec 22, 2022

FPGA based USB 2.0 high speed audio interface featuring multiple optical ADAT inputs and outputs

ADAT USB Audio Interface FPGA based USB 2.0 High Speed audio interface featuring multiple optical ADAT inputs and outputs Status / current limitations

78 Dec 31, 2022

This app converts an pdf file into the audio file.

PDF-to-Audio This app takes an pdf as an input and convert it into audio, and the library text-to-speech starts speaking the preffered page given in t

3 Aug 4, 2021

MoviePy is a Python library for video editing, can read and write all the most common audio and video formats

MoviePy is a Python library for video editing: cutting, concatenations, title insertions, video compositing (a.k.a. non-linear editing), video processing, and creation of custom effects. See the gallery for some examples of use.

10k Jan 8, 2023

A Telegram Bot to Play Audio in Voice Chats With Youtube and Deezer support. Supports Live streaming from youtube Supports Mega Radio Fm Streamings

Bot To Stream Musics on PyTGcalls with Channel Support. A Telegram Bot to Play Audio in Voice Chats With Supports Live streaming from youtube and Mega

37 Dec 15, 2022

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

AST: Audio Spectrogram Transformer Introduction Citing Getting Started ESC-50 Recipe Speechcommands Recipe AudioSet Recipe Pretrained Models Contact I

603 Jan 7, 2023

TalkNet: Audio-visual active speaker detection Model

Is someone talking? TalkNet: Audio-visual active speaker detection Model This repository contains the code for our ACM MM 2021 paper, TalkNet, an acti

142 Dec 14, 2022

efficient neural audio synthesis in the waveform domain

neural waveshaping synthesis real-time neural audio synthesis in the waveform domain paper • website • colab • audio by Ben Hayes, Charalampos Saitis,

169 Dec 23, 2022

Telegram Radio - A User-bot who continuously play random audio files (from the famous telegram music channel @mveargasm) in the intended voice chat.

MvEargasmDJ: This is my submission for the Telegram Radio Project of Baivaru. Which required a userbot to continiously play random audio files from th

24 Nov 12, 2022

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

NU-Wave — Official PyTorch Implementation NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling Junhyeok Lee, Seungu Han @ MINDsLab Inc

242 Dec 23, 2022

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

The goal is to classify different birds species based on their songs/calls. Spectrograms have been extracted from the audio samples and used as features for classification.

9 Dec 27, 2022

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Fre-GAN Vocoder Fre-GAN: Adversarial Frequency-consistent Audio Synthesis Training: python train.py --config config.json Citation: @misc{kim2021frega

93 Dec 17, 2022

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 2, 2023

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

simple_diarizer Simplified diarization pipeline using some pretrained models. Made to be a simple as possible to go from an input audio file to diariz

65 Dec 30, 2022

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Project Page | Paper | PyTorch implementation for the paper "AD-NeRF: Audio

551 Dec 29, 2022

AudioCLIP Extending CLIP to Image, Text and Audio

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 2, 2023

Use android as mic/speaker for ubuntu

Pulse Audio Control Panel Platforms Requirements sudo apt install ffmpeg pactl (already installed) Download Download the AppImage from release page ch

19 Dec 1, 2022

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

59 Dec 6, 2022

spafe: Simplified Python Audio-Features Extraction

spafe aims to simplify features extractions from mono audio files. The library can extract of the following features: BFCC, LFCC, LPC, LPCC, MFCC, IMFCC, MSRCC, NGCC, PNCC, PSRCC, PLP, RPLP, Frequency-stats etc. It also provides various filterbank modules (Mel, Bark and Gammatone filterbanks) and other spectral statistics.

310 Jan 1, 2023

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

End to End Automatic Speech Recognition In this repository, I have developed an end to end Automatic speech recognition project. I have developed the

22 Nov 13, 2022

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-library. These sub-libraries include both function-based and class-based transforms, composition operators, and have the option to provide metadata about the transform applied, including its intensity.

4.6k Jan 9, 2023

A Youtube audio player for your terminal

AudioLine A lightweight Youtube audio player for your terminal Explore the docs » View Demo · Report Bug · Request Feature · Send a Pull Request About

26 Jan 4, 2023

Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

166 Jan 8, 2023

Pynguin, The PYthoN General UnIt Test geNerator is a test-generation tool for Python

Pynguin, the PYthoN General UnIt test geNerator, is a tool that allows developers to generate unit tests automatically.

Chair of Software Engineering II, Uni Passau

997 Jan 6, 2023

A telegram bot that can send you high-quality audio 🎧🎧🎧

Music downloader bot Still under development Please Report issues to improve this repo.I will try to fix bugs in next update Music downloader bot is a

36 Dec 6, 2022

An audio guide for destroying oracles in Destiny's Vault of Glass raid

prophet An audio guide for destroying oracles in Destiny's Vault of Glass raid. This project allows you to make any encounter with oracles without hav

24 Sep 15, 2022

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling For Official repo of NU-Wave: A Diffusion Probabilistic Model for Neural Audio Up

38 Oct 11, 2022

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

MusCaps: Generating Captions for Music Audio Ilaria Manco1 2, Emmanouil Benetos1, Elio Quinton2, Gyorgy Fazekas1 1 Queen Mary University of London, 2

57 Dec 7, 2022

Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs.

vog_oracles Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs. Huge thanks to mzucker on GitHub for the note detection cod

19 Sep 29, 2022

VMD Audio/Text control with natural language

This repository is a proof of principle for performing Molecular Dynamics analysis, in this case with the program VMD, via natural language commands.

13 Jun 9, 2022

Accompanying code for our paper "Point Cloud Audio Processing"

Point Cloud Audio Processing Krishna Subramani1, Paris Smaragdis1 1UIUC Paper For the necessary libraries/prerequisites, please use conda/anaconda to

17 Nov 17, 2022

Python codes for Lite Audio-Visual Speech Enhancement.

Lite Audio-Visual Speech Enhancement (Interspeech 2020) Introduction This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE

85 Dec 1, 2022

Telegram bot + userbot for streaming audio in group calls.

Calls Music — Telegram bot + userbot for streaming audio in group calls ✍🏻 Requirements FFmpeg Python 3.7+ 🚀 Deployment 🛠 Configuration Copy exampl

30 May 17, 2021

Ward is a modern test framework for Python with a focus on productivity and readability.

1k Dec 31, 2022

Hide Your Secret Message in any Wave Audio File.

HiddenWave Embedding secret messages in wave audio file What is HiddenWave Hiddenwave is a python based program for simple audio steganography. You ca

99 Dec 28, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Code for paper 'Audio-Driven Emotional Video Portraits'.

Audio-Driven Emotional Video Portraits [CVPR2021] Xinya Ji, Zhou Hang, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu [Project] [Paper] G

197 Dec 31, 2022

Identify the emotion of multiple speakers in an Audio Segment

MevonAI - Speech Emotion Recognition

111 Jan 7, 2023

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first

91 Dec 23, 2022

Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

881 Jan 3, 2023

Data manipulation and transformation for audio signal processing, powered by PyTorch

torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the

1.9k Jan 8, 2023

Contrastive Language-Audio Pretraining

CLAP Contrastive Language-Audio Pretraining In due time this repo will be full of lovely things, I hope. Feel free to check out the Issues if you're i

83 Dec 1, 2022

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

160 Jan 4, 2023

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

628 Dec 28, 2022

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

FAU Implementation of the paper: Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution. Yingruo

78 Nov 29, 2022

Pytorch port of Google Research's LEAF Audio paper

leaf-audio-pytorch Pytorch port of Google Research's LEAF Audio paper published at ICLR 2021. This port is not completely finished, but the Leaf() fro

80 Oct 31, 2022

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

253 Dec 24, 2022

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

4.2k Jan 8, 2023

Python Audio-unit Resources

Python audio-unit Libraries

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

An audio-solving python funcaptcha solving module

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Unofficial PyTorch implementation of Google AI's VoiceFilter system

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing

Django-Audiofield is a simple app that allows Audio files upload, management and conversion to different audio format (mp3, wav & ogg), which also makes it easy to play audio files into your Django application.

Music source separation is a task to separate audio recordings into individual sources

Gateware for the Terasic/Arrow DECA board, to become a USB2 high speed audio interface

txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.

Classifying audio using Wavelet transform and deep learning

A tool to fuck a video/audio quality using FFmpeg

Reading list for research topics in sound event detection

pedalboard is a Python library for adding effects to audio.

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

GA SEI Unit 4 project backend for Bloom.

praudio provides audio preprocessing framework for Deep Learning audio applications

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Easy to use Audio Tagging in PyTorch

FPGA based USB 2.0 high speed audio interface featuring multiple optical ADAT inputs and outputs

This app converts an pdf file into the audio file.

MoviePy is a Python library for video editing, can read and write all the most common audio and video formats

A Telegram Bot to Play Audio in Voice Chats With Youtube and Deezer support. Supports Live streaming from youtube Supports Mega Radio Fm Streamings

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

TalkNet: Audio-visual active speaker detection Model

efficient neural audio synthesis in the waveform domain

Telegram Radio - A User-bot who continuously play random audio files (from the famous telegram music channel @mveargasm) in the intended voice chat.

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

AudioCLIP Extending CLIP to Image, Text and Audio

Use android as mic/speaker for ubuntu

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

spafe: Simplified Python Audio-Features Extraction

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

A Youtube audio player for your terminal

Audio augmentations library for PyTorch for audio in the time-domain

Pynguin, The PYthoN General UnIt Test geNerator is a test-generation tool for Python

A telegram bot that can send you high-quality audio 🎧🎧🎧

An audio guide for destroying oracles in Destiny's Vault of Glass raid

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Audio processor to map oracle notes in the VoG raid in Destiny 2 to call outs.

VMD Audio/Text control with natural language

Accompanying code for our paper "Point Cloud Audio Processing"

Python codes for Lite Audio-Visual Speech Enhancement.

Telegram bot + userbot for streaming audio in group calls.

Ward is a modern test framework for Python with a focus on productivity and readability.

Hide Your Secret Message in any Wave Audio File.

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Code for paper 'Audio-Driven Emotional Video Portraits'.

Identify the emotion of multiple speakers in an Audio Segment

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Unofficial PyTorch implementation of Google AI's VoiceFilter system

Data manipulation and transformation for audio signal processing, powered by PyTorch

Contrastive Language-Audio Pretraining

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

Pytorch port of Google Research's LEAF Audio paper

Diverse Branch Block: Building a Convolution as an Inception-like Unit

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

ARCH models in Python

About A python based Apple Quicktime protocol，you can record audio and video from real iOS devices

Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Python audio and music signal processing library

A library for augmenting annotated audio data

Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

C++ library for audio and music analysis, description and synthesis, including Python bindings