499 Python Listening-to-Sound-of-Silence-for-Speech-Denoising Libraries

499 Repositories

Python Listening-to-Sound-of-Silence-for-Speech-Denoising Libraries

Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

168 Dec 28, 2022

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

This repo contains the official implementation of the VAE-GAN from the INTERSPEECH 2020 paper Voice Conversion Using Speech-to-Speech Neuro-Style Transfer.

93 Jan 5, 2023

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

This Project is based on NLTK(Natural Language Toolkit) It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

2 Nov 17, 2021

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning We propose a SASE mode

1 Nov 20, 2021

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

17 Dec 19, 2022

extract unpack asset file (form unreal engine 4 pak) with extenstion *.uexp which contain awb/acb (cri/cpk like) sound or music resource

Uexp2Awb extract unpack asset file (form unreal engine 4 pak) with extenstion .uexp which contain awb/acb (cri/cpk like) sound or music resource. i ju

6 Jun 22, 2022

Convert text to morse code and play morse code sound.

Convert text(english) to morse codes and play morse sound!

5 Jul 15, 2022

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Voice Based Personal Assistant We have built a Voice based Personal Assistant for people to access files hands free in their device using natural lang

2 Nov 13, 2021

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

922 Dec 31, 2022

Aydin is a user-friendly, feature-rich, and fast image denoising tool

Aydin is a user-friendly, feature-rich, and fast image denoising tool that provides a number of self-supervised, auto-tuned, and unsupervised image denoising algorithms.

99 Dec 14, 2022

Implementation for the paper: Invertible Denoising Network: A Light Solution for Real Noise Removal (CVPR2021).

Invertible Image Denoising This is the PyTorch implementation of paper: Invertible Denoising Network: A Light Solution for Real Noise Removal (CVPR 20

157 Dec 25, 2022

Uses Google's gTTS module to easily create robo text readin' on command.

Tool to convert text to speech, creating files for later use. TTRS uses Google's gTTS module to easily create robo text readin' on command.

0 Jun 20, 2021

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of "Meta-TTS: Meta-Learning for Few

128 Dec 25, 2022

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Introduction Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models". In this work, we demonstrate that existi

7 Nov 1, 2022

Azure Neural Speech Service TTS

Written in Python using the Azure Speech SDK. App.py provides an easy way to create an Text-To-Speech request to Azure Speech and download the wav file. Azure Neural Voices Text-To-Speech enables fluid, natural-sounding text to speech that matches the patterns and intonation of human voices.

4 Dec 14, 2022

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

emovoz Introduction A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container. The SER system was built with

2 Nov 11, 2022

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

1 Dec 9, 2021

Obsei is a low code AI powered automation tool.

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .

782 Dec 31, 2022

A python script to acquire multiple aws ec2 instances in a forensically sound-ish way

acquire_ec2.py The script acquire_ec2.py is used to automatically acquire AWS EC2 instances. The script needs to be run on an EC2 instance in the same

31 Sep 10, 2022

Pre-1.0 door/chest sound injector for Minecraft

doorjector Pre-1.0 door/chest sound injector for Minecraft. While the game is running, doorjector hotswaps the new sounds for the old right before the

1 Nov 20, 2021

Code and models for "Rethinking Deep Image Prior for Denoising" (ICCV 2021)

DIP-denosing This is a code repo for Rethinking Deep Image Prior for Denoising (ICCV 2021). Addressing the relationship between Deep image prior and e

36 Dec 29, 2022

Demo of using Auto Encoder for Image Denoising

2 Aug 4, 2022

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Deep-Rep-MFIR Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising Publication: Deep Reparametrization of M

39 Jan 4, 2023

A thing to simplify listening for PG notifications with asyncpg

18 Dec 23, 2022

Cryptocurrency application that displays instant cryptocurrency prices and reads prices with the Google Text-to-Speech library.

📈 Cryptocurrency Price App 💰 ◽ Cryptocurrency application that displays instant cryptocurrency prices and reads prices with the Google Text-to-Speec

2 Nov 8, 2021

Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

Discrete Denoising Flows This repository contains the code for the experiments presented in the paper Discrete Denoising Flows [1]. To give a short ov

3 Oct 9, 2022

A thing to simplify listening for PG notifications with asyncpg

asyncpg-listen This library simplifies usage of listen/notify with asyncpg: Handles loss of a connection Simplifies notifications processing from mult

18 Dec 23, 2022

Ukrainian TTS (text-to-speech) using Coqui TTS

title emoji colorFrom colorTo sdk app_file pinned Ukrainian TTS 🐸 green green gradio app.py false Ukrainian TTS 📢 🤖 Ukrainian TTS (text-to-speech)

85 Dec 26, 2022

On-device speech-to-index engine powered by deep learning.

30 Nov 24, 2022

Code for sound field predictions in domains with impedance boundaries. Used for generating results from the paper

11 Dec 17, 2022

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

1.2k Dec 29, 2022

Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

BiQQLSTM_HS Code and data for paper: Title: An Effective, Robust and Fairness-awareHate Speech Detection Framework. Authors: Guanyi Mou and Kyumin Lee

2 Dec 27, 2022

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language This repository contains the code, model, and deployment config

16 Oct 23, 2022

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity Indic TTS Samples can be found at https://peter-yh-wu.github.io/cross-

1 Nov 12, 2022

A denoising diffusion probabilistic model synthesises galaxies that are qualitatively and physically indistinguishable from the real thing.

Realistic galaxy simulation via score-based generative models Official code for 'Realistic galaxy simulation via score-based generative models'. We us

32 Dec 20, 2022

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

SpeechMix Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together. Introduction For the same input: from datas

31 Nov 7, 2022

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

10 Aug 19, 2022

Distort a video using Seam Carving (video) and Vibrato effect (sound)

Distort videos Applies a Seam Carving algorithm (aka liquid rescale) on every frame of a video, and a vibrato effect on the audio to distort the video

6 Dec 6, 2022

Installation, test and evaluation of Scribosermo speech-to-text engine

Scribosermo STT Setup Scribosermo is a LGPL licensed, open-source speech recognition engine to "Train fast Speech-to-Text networks in different langua

3 Jun 20, 2022

Text to speech converter with GUI made in Python.

Text-to-speech-with-GUI Text to speech converter with GUI made in Python. To run this download the zip file and run the main file or clone this repo.

1 Nov 15, 2021

Azure Text-to-speech service for Home Assistant

Azure Text-to-speech service for Home Assistant The Azure text-to-speech platform uses online Azure Text-to-Speech cognitive service to read a text wi

2 Aug 6, 2022

Terminal-based audio-to-text converter

att Terminal-based audio-to-text converter Project description A terminal-based audio-to-text converter written in python, enabling you to convert .wa

4 Dec 15, 2022

Simple, hackable offline speech to text - using the VOSK-API.

844 Jan 7, 2023

An 8D music player made to enjoy Halloween this year!🤘

HAPPY HALLOWEEN buddy! Split Player Hello There! Welcome to SplitPlayer... Supposed To Be A 8DPlayer.... You Decide.... It can play the ordinary audio

1 Nov 4, 2021

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

1 Oct 29, 2021

My first Minecraft CPU. Created in collaboration with Peer Carnes as a final project in CS 281: Architecture and Assembly at the University of Puget Sound

Minecraft CPU This is my first ever Minecraft CPU, created in collaboration with Peer Carnes. We created a custom assembly language, including an asse

4 Oct 10, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

The PyTorch based implementation of continuous integrate-and-fire (CIF) module.

CIF-PyTorch This is a PyTorch based implementation of continuous integrate-and-fire (CIF) module for end-to-end (E2E) automatic speech recognition (AS

24 Dec 29, 2022

Cobra is a highly-accurate and lightweight voice activity detection (VAD) engine.

On-device voice activity detection (VAD) powered by deep learning.

88 Dec 16, 2022

Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation"

TriBERT This repository contains the code for the NeurIPS 2021 paper titled "TriBERT: Full-body Human-centric Audio-visual Representation Learning for

8 Aug 31, 2022

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The SpeechBrain Toolkit SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch. The goal is to create a single, flexible, and us

5.1k Jan 2, 2023

Some toy examples of score matching algorithms written in PyTorch

toy_gradlogp This repo implements some toy examples of the following score matching algorithms in PyTorch: ssm-vr: sliced score matching with variance

21 Dec 26, 2022

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrai

77.4k Jan 5, 2023

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Cross-Speaker-Emotion-Transfer - PyTorch Implementation PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Conditio

114 Jan 8, 2023

🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.

28 Dec 22, 2022

A flask application to predict the speech emotion of any .wav file.

This is a speech emotion recognition app. It will allow you to train a modular MLP model with the RAVDESS dataset, and then use that model with a flask application to predict the speech emotion of any .wav file.

2 Dec 15, 2021

Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Fast Laplacian Eigenmaps in python Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMS

17 Jul 9, 2022

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

LDNet Author: Wen-Chin Huang (Nagoya University) Email: [email protected] This is the official implementation of the paper "LDNet

40 Nov 20, 2022

A simple Blog Using Django Framework and Used IBM Cloud Services for Text Analysis and Text to Speech

ElhamBlog Cloud Computing Course first assignment. A simple Blog Using Django Framework and Used IBM Cloud Services for Text Analysis and Text to Spee

5 Dec 6, 2022

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

19 Oct 14, 2022

Minimal GUI for accessing the Watson Text to Speech service.

Description Minimal graphical application for accessing the Watson Text to Speech service. Requirements Python 3 plus all dependencies listed in requi

1 Oct 22, 2021

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Speech_38_ru_commands Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR Программа умеет распознавать 38 ключевы

9 May 5, 2022

A Python/Pytorch app for easily synthesising human voices

Voice Cloning App A Python/Pytorch app for easily synthesising human voices Documentation Discord Server Video guide Voice Sharing Hub FAQ's System Re

840 Jan 4, 2023

The end-to-end platform for building voice products at scale

Picovoice Made in Vancouver, Canada by Picovoice Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Goog

318 Jan 7, 2023

On-device speech-to-intent engine powered by deep learning

Rhino Made in Vancouver, Canada by Picovoice Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a giv

510 Dec 30, 2022

A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Ring x Sonos A python script that plays .mp3 files whenever a doorbell is rung or a doorbell detects motion. Features Music! Authors @braden Running T

0 Nov 12, 2021

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

736 Jan 3, 2023

Production First and Production Ready End-to-End Speech Recognition Toolkit

2.7k Jan 4, 2023

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

8 Dec 25, 2022

Identify the emotion of multiple speakers in an Audio Segment

MevonAI - Speech Emotion Recognition Identify the emotion of multiple speakers in a Audio Segment Report Bug · Request Feature Try the Demo Here Table

110 Dec 3, 2022

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

19 Oct 14, 2022

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

LDNet Author: Wen-Chin Huang (Nagoya University) Email: [email protected] This is the official implementation of the paper "LDNet

40 Nov 20, 2022

Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

AequeVox Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems README under development. Python Packages Required

2 Aug 28, 2022

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Taming Visually Guided Sound Generation • [Project Page] • [ArXiv] • [Poster] • • Listen for the samples on our project page. Overview We propose to t

226 Jan 3, 2023

A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

196 Jan 5, 2023

End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 1.9.0 ubuntu20/python3.9/pip ubuntu20/python3.8/p

5.9k Jan 4, 2023

CPC-big and k-means clustering for zero-resource speech processing

The CPC-big model and k-means checkpoints used in Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.

5 Nov 23, 2022

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

SDDNet Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS

43 Nov 21, 2022

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

Light-SERNet This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion

29 Nov 12, 2022

Maix Speech AI lib, including ASR, chat, TTS etc.

Maix-Speech 中文 | English Brief Now only support Chinese, See 中文 Build Clone code by: git clone https://github.com/sipeed/Maix-Speech Compile x86x64 c

267 Dec 25, 2022

translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021

To lazy to read your homework ? Get it done with LOL

LOL To lazy to read your homework ? Get it done with LOL Needs python 3.x L:::::::::L OO:::::::::OO L:::::::::L L:::::::

4 Dec 8, 2022

PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r

3.2k Dec 28, 2022

LeBenchmark: a reproducible framework for assessing SSL from speech

11 Nov 30, 2022

AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using Mozilla DeepSpeech.

AutoSub About Motivation Installation Docker How-to example How it works TO-DO Contributing References About AutoSub is a CLI application to generate

414 Jan 6, 2023

Self-Supervised Speech Pre-training and Representation Learning Toolkit.

What's New Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site

1.6k Jan 8, 2023

UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

33 Oct 14, 2021

FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

LST-TTS Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis. Submitted to ICASSP 2022. Audi

64 Dec 30, 2022

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

2 Sep 15, 2022

Python Listening-to-Sound-of-Silence-for-Speech-Denoising Resources

Python Listening-to-Sound-of-Silence-for-Speech-Denoising Libraries

Official implementation of Meta-StyleSpeech and StyleSpeech

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

extract unpack asset file (form unreal engine 4 pak) with extenstion *.uexp which contain awb/acb (cri/cpk like) sound or music resource

Convert text to morse code and play morse code sound.

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

Aydin is a user-friendly, feature-rich, and fast image denoising tool

Implementation for the paper: Invertible Denoising Network: A Light Solution for Real Noise Removal (CVPR2021).

Uses Google's gTTS module to easily create robo text readin' on command.

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Azure Neural Speech Service TTS

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Obsei is a low code AI powered automation tool.

A python script to acquire multiple aws ec2 instances in a forensically sound-ish way

Pre-1.0 door/chest sound injector for Minecraft

Code and models for "Rethinking Deep Image Prior for Denoising" (ICCV 2021)

Demo of using Auto Encoder for Image Denoising

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

A thing to simplify listening for PG notifications with asyncpg

Cryptocurrency application that displays instant cryptocurrency prices and reads prices with the Google Text-to-Speech library.

Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

A thing to simplify listening for PG notifications with asyncpg

Ukrainian TTS (text-to-speech) using Coqui TTS

On-device speech-to-index engine powered by deep learning.

Code for sound field predictions in domains with impedance boundaries. Used for generating results from the paper

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

A denoising diffusion probabilistic model synthesises galaxies that are qualitatively and physically indistinguishable from the real thing.

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Distort a video using Seam Carving (video) and Vibrato effect (sound)

Installation, test and evaluation of Scribosermo speech-to-text engine

Text to speech converter with GUI made in Python.

Azure Text-to-speech service for Home Assistant

Terminal-based audio-to-text converter

Simple, hackable offline speech to text - using the VOSK-API.

An 8D music player made to enjoy Halloween this year!🤘

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

My first Minecraft CPU. Created in collaboration with Peer Carnes as a final project in CS 281: Architecture and Assembly at the University of Puget Sound

Simple Speech to Text, Text to Speech

The PyTorch based implementation of continuous integrate-and-fire (CIF) module.

Cobra is a highly-accurate and lightweight voice activity detection (VAD) engine.

Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation"

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

Some toy examples of score matching algorithms written in PyTorch

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.

A flask application to predict the speech emotion of any .wav file.

Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

A simple Blog Using Django Framework and Used IBM Cloud Services for Text Analysis and Text to Speech

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Minimal GUI for accessing the Watson Text to Speech service.

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

A Python/Pytorch app for easily synthesising human voices

The end-to-end platform for building voice products at scale

On-device speech-to-intent engine powered by deep learning

A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Production First and Production Ready End-to-End Speech Recognition Toolkit

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Identify the emotion of multiple speakers in an Audio Segment

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

End-to-End Speech Processing Toolkit

CPC-big and k-means clustering for zero-resource speech processing

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.