438 Python Voice-synthesis Libraries

Invert and perturb GAN images for test-time ensembling

GAN Ensembling Project Page | Paper | Bibtex Ensembling with Deep Generative Views. Lucy Chai, Jun-Yan Zhu, Eli Shechtman, Phillip Isola, Richard Zhan

93 Dec 8, 2022

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Investigating U-NETS With Various Intermediate Blocks For Spectrogram-based Singing Voice Separation A Pytorch Implementation of the paper "Investigat

63 Nov 14, 2022

Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

135 Dec 18, 2022

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss This is an unofficial implementation of AutoVC based on the official one. The reposi

27 Jun 16, 2022

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

69 Dec 17, 2022

(Arxiv 2021) NeRF--: Neural Radiance Fields Without Known Camera Parameters

NeRF--: Neural Radiance Fields Without Known Camera Parameters Project Page | Arxiv | Colab Notebook | Data Zirui Wang¹, Shangzhe Wu², Weidi Xie², Min

411 Dec 26, 2022

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Main repo for ECCV 2020 paper MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images. visual.cs.brown.edu/matryodshka

75 Dec 13, 2022

《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

Image2Reverb Image2Reverb is an end-to-end neural network that generates plausible audio impulse responses from single images of acoustic environments

48 Nov 27, 2022

Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

22 Nov 16, 2022

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

536 Dec 20, 2022

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

537 Jan 5, 2023

Telegram Voice Chat UserBot made with Pyrogram and MarshalX/tgcalls with playlist and Heroku support

Telegram Voice Chat UserBot A Telegram UserBot to Play Audio in Voice Chats. This is also the source code of the userbot which is being used for playi

164 Nov 12, 2022

Async client API for the Telegram Group Calls

PyTgCalls This project allow to make Telegram group call with MTProto Api using Pyrogram and WebRTC, this is possible thanks to the power of NodeJS's

185 Jan 3, 2023

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

120 Jan 5, 2023

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 8, 2023

TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

🤪 TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.

3k Jan 4, 2023

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

110 Dec 24, 2022

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Anycost GAN video | paper | website Anycost GANs for Interactive Image Synthesis and Editing Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zh

726 Dec 28, 2022

Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

Welcome to MARSYAS. MARSYAS is a software framework for rapid prototyping of audio applications, with flexibility and extensibility as primary concer

364 Oct 31, 2022

C++ library for audio and music analysis, description and synthesis, including Python bindings

Essentia Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license.

Music Technology Group - Universitat Pompeu Fabra

2.3k Jan 3, 2023

Telegram Voice Chat Music Player UserBot Written with Pyrogram Smart Plugin and tgcalls

Telegram Voice Chat UserBot A Telegram UserBot to Play Audio in Voice Chats. This is also the source code of the userbot which is being used for playi

7 May 21, 2022

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

585 Jan 4, 2023

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats Note Neither this, or PyTgCalls are fully

58 Jan 2, 2023

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

LightSpeech UnOfficial PyTorch implementation of LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search.

54 Dec 3, 2022

Graphical interface to control granular sound synthesis.

Granular sound synthesis interface SoundGrain is a graphical interface where users can draw and edit trajectories to control granular sound synthesis

122 Dec 10, 2022

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow

201 Dec 21, 2022

The earliest beta version of pytgcalls on Linux x86_64 and ARM64! Use in production at your own risk!

Public beta test. Use in production at your own risk! tgcalls - a python binding for tgcalls (c++ lib by Telegram); pytgcalls - library connecting pyt

21 Jan 13, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

10.8k Feb 18, 2021

Python interface to the WebRTC Voice Activity Detector

py-webrtcvad This is a python interface to the WebRTC Voice Activity Detector (VAD). It is compatible with Python 2 and Python 3. A VAD classifies a p

1.5k Dec 22, 2022

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.1k Dec 31, 2022

Pyrogram bot to automate streaming music in voice chats

Pyrogram bot to automate streaming music in voice chats Help If you face an error, want to discuss this project or get support for it, join it's group

124 Oct 21, 2022

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

272 Dec 28, 2022

Clubhouse API written in Python. Standalone client included. For reference and education purposes only.

clubhouse-py is originally developed for the sake of interoperability. Standalone client is also created with very basic features, including but not limited to the audio-chat

1.7k Jan 5, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

Telegram Voice-Chat Bot Written In Python Using Pyrogram.

Telegram Voice-Chat Bot Telegram Voice-Chat Bot To Play Music From Various Sources In Your Group Support All linux based os. Windows Mac Diagram Requi

314 Dec 29, 2022

A framework for joint super-resolution and image synthesis, without requiring real training data

SynthSR This repository contains code to train a Convolutional Neural Network (CNN) for Super-resolution (SR), or joint SR and data synthesis. The met

83 Jan 1, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

Python Voice-synthesis Resources

Python voice-synthesis Libraries

Invert and perturb GAN images for test-time ensembling

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Binaural Speech Synthesis

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

(Arxiv 2021) NeRF--: Neural Radiance Fields Without Known Camera Parameters

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

Common Voice Dataset explorer

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Telegram Voice Chat UserBot made with Pyrogram and MarshalX/tgcalls with playlist and Heroku support

Async client API for the Telegram Group Calls

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

C++ library for audio and music analysis, description and synthesis, including Python bindings

Telegram Voice Chat Music Player UserBot Written with Pyrogram Smart Plugin and tgcalls

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Graphical interface to control granular sound synthesis.

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

The earliest beta version of pytgcalls on Linux x86_64 and ARM64! Use in production at your own risk!

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Python interface to the WebRTC Voice Activity Detector

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Pyrogram bot to automate streaming music in voice chats

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Official implementation of the ICLR 2021 paper

Clubhouse API written in Python. Standalone client included. For reference and education purposes only.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Telegram Voice-Chat Bot Written In Python Using Pyrogram.

A framework for joint super-resolution and image synthesis, without requiring real training data

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Python Voice-synthesis Resources

Python voice-synthesis Libraries

Invert and perturb GAN images for test-time ensembling

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Binaural Speech Synthesis

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

(Arxiv 2021) NeRF--: Neural Radiance Fields Without Known Camera Parameters

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

Common Voice Dataset explorer

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Telegram Voice Chat UserBot made with Pyrogram and MarshalX/tgcalls with playlist and Heroku support

Async client API for the Telegram Group Calls

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

C++ library for audio and music analysis, description and synthesis, including Python bindings

Telegram Voice Chat Music Player UserBot Written with Pyrogram Smart Plugin and tgcalls

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

SU Music Player — The first open-source PyTgCalls based Pyrogram bot to play music in voice chats

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Graphical interface to control granular sound synthesis.

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

The earliest beta version of pytgcalls on Linux x86_64 and ARM64! Use in production at your own risk!

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Python interface to the WebRTC Voice Activity Detector

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Pyrogram bot to automate streaming music in voice chats

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Official implementation of the ICLR 2021 paper

Clubhouse API written in Python. Standalone client included. For reference and education purposes only.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Telegram Voice-Chat Bot Written In Python Using Pyrogram.

A framework for joint super-resolution and image synthesis, without requiring real training data

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Related tags