1138 Python Speech-processing Libraries

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

🗣️ aspeak A simple text-to-speech client using azure TTS API(trial). 😆 TL;DR: This program uses trial auth token of Azure Cognitive Services to do s

359 Jan 5, 2023

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Bilateral Denoising Diffusion Models (BDDMs) This is the official PyTorch implementation of the following paper: BDDM: BILATERAL DENOISING DIFFUSION M

172 Dec 23, 2022

Lyrics generation with GPT2-based Transformer

HuggingArtists - Train a model to generate lyrics Create AI-Artist in just 5 minutes! 🚀 Run the demo notebook to train 🚀 Run the GUI demo to test Di

65 Dec 19, 2022

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

🐤 Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

156 Jan 9, 2023

Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Optimum Transformers Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime. Installatio

115 Dec 16, 2022

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement This is the unofficial implementation of Vocoder part of

118 Dec 29, 2022

Entity Disambiguation as text extraction (ACL 2022)

ExtEnD: Extractive Entity Disambiguation This repository contains the code of ExtEnD: Extractive Entity Disambiguation, a novel approach to Entity Dis

121 Jan 3, 2023

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

Crosslingual Coreference Coreference is amazing but the data required for training a model is very scarce. In our case, the available training for non

71 Jan 4, 2023

RedDrop is a quick and easy web server for capturing and processing encoded and encrypted payloads and tar archives.

RedDrop Exfil Server Check out the accompanying MaverisLabs Blog Post Here! RedDrop Exfil Server is a Python Flask Web Server for Penetration Testers,

53 Nov 1, 2022

Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

imgbeddings A Python package to generate embedding vectors from images, using OpenAI's robust CLIP model via Hugging Face transformers. These image em

81 Jan 4, 2023

This repository contains the best Data Science free hand-picked resources to equip you with all the industry-driven skills and interview preparation kit.

Best Data Science Resources Hey, Data Enthusiasts out there! Finally, after lots of requests from the community I finally came up with the best free D

415 Dec 31, 2022

A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.

English 中文版 TransBigData Introduction TransBigData is a Python package developed for transportation spatio-temporal big data processing, analysis and

251 Jan 3, 2023

🙌Kart of 210+ projects based on machine learning, deep learning, computer vision, natural language processing and all. Show your support by ✨ this repository.

ML-ProjectKart 📌 Repository This kart showcases the finest collection of all projects based on machine learning, deep learning, computer vision, natu

203 Dec 28, 2022

Curso práctico: NLP de cero a cien 🤗

Curso Práctico: NLP de cero a cien Comprende todos los conceptos y arquitecturas clave del estado del arte del NLP y aplícalos a casos prácticos utili

147 Jan 6, 2023

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Python_Natural_Language_Processing This repository contains tutorials on important topics related to Natural Language Processing (NPL). No. Name 01 01

170 Dec 13, 2022

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

178 Dec 27, 2022

GPU-accelerated Image Processing library using OpenCL

pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op

17 Dec 25, 2022

Comprehensive-E2E-TTS - PyTorch Implementation

A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS

114 Nov 13, 2022

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

110 Dec 27, 2022

Finally, some decent sample sentences

tts-dataset-prompts This repository aims to be a decent set of sentences for people looking to clone their own voices (e.g. using Tacotron 2). Each se

19 Dec 13, 2022

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

MTFAA-Net Unofficial PyTorch implementation of Baidu's MTFAA-Net: "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speec

87 Dec 19, 2022

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Live Action Map (LAM) An attempt to use open source data on Twitter to map areas with active conflict. Right now it is used for the Ukraine-Russia con

171 Nov 21, 2022

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Squirrel Core Share, load, and transform data in a collaborative, flexible, and efficient way What is Squirrel? Squirrel is a Python library that enab

249 Dec 7, 2022

Natural Language Processing - Sommer Semester 2022

Natural Language Processing (DIS25a/NLP) This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Progra

Classrooms of IR Group at Technische Hochschule Köln

19 Sep 7, 2022

Arabic Car License Recognition. A solution to the kaggle competition Machathon 3.0.

Transformers Arabic licence plate recognition 🚗 Solution to the kaggle competition Machathon 3.0. Ranked in the top 6️⃣ at the final evaluation phase

17 Dec 4, 2022

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

247 Dec 26, 2022

Awesome Remote Sensing Toolkit based on PaddlePaddle.

基于飞桨框架开发的高性能遥感图像处理开发套件，端到端地完成从训练到部署的全流程遥感深度学习应用。最新动态 PaddleRS 即将发布alpha版本！欢迎大家试用简介 PaddleRS是遥感科研院所、相关高校共同基于飞桨开发的遥感处理平台，支持遥感图像分类，目标检测，图像分割，以及变化检测等常用遥

146 Dec 11, 2022

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions This repo contains the dataset and code for the paper Benchmarking Ro

168 Dec 29, 2022

[Arxiv preprint] Causality-inspired Single-source Domain Generalization for Medical Image Segmentation (code&data-processing pipeline)

Causality-inspired Single-source Domain Generalization for Medical Image Segmentation Arxiv preprint Repository under construction. Might still be bug

31 Dec 27, 2022

SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

SentimentArcs - Emotion in Text An end-to-end pipeline based on Jupyter notebooks to detect, extract, process and anlayze emotion over time in text. E

14 Dec 19, 2022

dyld_shared_cache processing / Single-Image loading for BinaryNinja

Dyld Shared Cache Parser Author: cynder (kat) Dyld Shared Cache Support for BinaryNinja Without any of the fuss of requiring manually loading several

76 Dec 28, 2022

DLO8012: Natural Language Processing & CSL804: Computational Lab - II Semester VIII

NATURAL-LANGUAGE-PROCESSING-AND-COMPUTATIONAL-LAB-II DLO8012: NLP & CSL804: CL-II [SEMESTER VIII] Syllabus NLP - Reference Books THE WALL MEGA SATISH

7 Apr 28, 2022

Scalable Optical Flow-based Image Montaging and Alignment

SOFIMA SOFIMA (Scalable Optical Flow-based Image Montaging and Alignment) is a tool for stitching, aligning and warping large 2d, 3d and 4d microscopy

16 Dec 21, 2022

Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

84 Dec 20, 2022

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

105 Jan 4, 2023

HF's ML for Audio study group

Hugging Face Machine Learning for Audio Study Group Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and disc

110 Jan 1, 2023

AI Summer's complete catalog of articles

Learn Deep Learning with AI Summer A collection of all articles (almost 100) written for the AI Summer blog organized by topic. Deep Learning Theory M

95 Dec 29, 2022

Get started with Machine Learning with Python - An introduction with Python programming examples

Machine Learning With Python Get started with Machine Learning with Python An engaging introduction to Machine Learning with Python TL;DR Download all

130 Jan 2, 2023

Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

Applied Natural Language Processing in the Enterprise This is the companion repo for Applied Natural Language Processing in the Enterprise, an O'Reill

95 Jan 5, 2023

A Traffic Sign Recognition Project which can help the driver recognise the signs via text as well as audio. Can be used at Night also.

Traffic-Sign-Recognition In this report, we propose a Convolutional Neural Network(CNN) for traffic sign classification that achieves outstanding perf

64 Nov 19, 2022

Implements VQGAN+CLIP for image and video generation, and style transfers, based on text and image prompts. Emphasis on ease-of-use, documentation, and smooth video creation.

VQGAN-CLIP-GENERATOR Overview This is a package (with available notebook) for running VQGAN+CLIP locally, with a focus on ease of use, good documentat

98 Dec 30, 2022

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

70 Dec 6, 2022

This is a repo of basic Machine Learning!

Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource

53 Dec 31, 2022

$Implementation of the famous Image Manipulation\Forgery Detector$

Implementation of the famous Image Manipulation\Forgery Detector "ManTraNet" in Pytorch

Who has never met a forged picture on the web ? No one ! Everyday we are constantly facing fake pictures touched up in Photoshop but it is not always

77 Dec 16, 2022

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba

48 Dec 6, 2022

Facestar dataset. High quality audio-visual recordings of human conversational speech.

Facestar Dataset Description Existing audio-visual datasets for human speech are either captured in a clean, controlled environment but contain only a

87 Dec 21, 2022

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

126 Jan 2, 2023

SelfRemaster: SSL Speech Restoration

SelfRemaster: Self-Supervised Speech Restoration Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesi

46 Jan 7, 2023

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

LightHuBERT LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT | Github | Huggingface | SUPER

46 Dec 29, 2022

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

AMOS This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: Pretraining Text Encoders wi

22 Sep 15, 2022

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

SF-Net for fullband SE This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Ban

36 Dec 2, 2022

Library for converting from RGB / GrayScale image to base64 and back.

Library for converting RGB / Grayscale numpy images from to base64 and back. Installation pip install -U image_to_base_64 Conversion RGB to base 64 b

16 Aug 28, 2022

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Introduction This codebase contains source-code of the Python-based implementation (ARES) of our SIGIR 2022 paper. Chen, Jia, et al. "Axiomatically Re

17 Nov 9, 2022

Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Seq2Seq Speech in JAX A JAX/Flax repository for combining a pre-trained speech encoder model (e.g. Wav2Vec2, HuBERT, WavLM) with a pre-trained text de

21 Dec 14, 2022

Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

12 Jan 6, 2023

Audio-analytics for music-producers! Automate tedious tasks such as musical scale detection, BPM rate classification and audio file conversion.

Click here to be re-directed to the Beat Inspect Streamlit Web-App You are a music producer? Let's get in touch via LinkedIn Fundamental Analytics for

11 Dec 27, 2022

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

3D Avatar Lip Syncronization from speech (JALI based face-rigging)

visemenet-inference Inference Demo of "VisemeNet-tensorflow" VisemeNet is an audio-driven animator centric speech animation driving a JALI or standard

17 Dec 20, 2022

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

VAD-SLI-ASR Python scripts for a speech processing pipeline with Voice Activity

14 Dec 9, 2022

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

157 Jan 1, 2023

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

SiD - Simple Deep Model Vectorwise Interpretable Attentions for Multimodal Tabul

40 Dec 22, 2022

Digitalizing-Prescription-Image - PIRDS - Prescription Image Recognition and Digitalizing System is a OCR make with Tensorflow

Digitalizing-Prescription-Image PIRDS - Prescription Image Recognition and Digit

2 May 11, 2022

Textlesslib - Library for Textless Spoken Language Processing

textlesslib Textless NLP is an active area of research that aims to extend NLP t

379 Dec 27, 2022

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

21 Dec 20, 2022

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

760 Jan 3, 2023

Convert PDF to AudioBook and Audio Speech to PDF

In this Python project, we will build a GUI-based PDF to Audio and Audio to PDF converter using the Tkinter, OS, path, pyttsx3, SpeechRecognition, PyPDF4, and Pydub libraries and the messagebox module of the Tkinter library.

1 Feb 13, 2022

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence, etc. This article aims to provide an introduction on how to make use of the SpeechRecognition and pyttsx3 library of Python.

1 Feb 13, 2022

Open-Source board for converting RaspberryPI to Brain-computer interface

The easiest way to the neuroscience world with the shield for RaspberryPi - PIEEG (website). Open-source. Crowdsupply This project is the result of se

436 Jan 1, 2023

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

14 Dec 2, 2022

Image-to-image regression with uncertainty quantification in PyTorch

Image-to-image regression with uncertainty quantification in PyTorch. Take any dataset and train a model to regress images to images with rigorous, distribution-free uncertainty quantification.

25 Dec 26, 2022

HGCN: Harmonic Gated Compensation Network For Speech Enhancement

HGCN The official repo of "HGCN: Harmonic Gated Compensation Network For Speech Enhancement", which was accepted at ICASSP2022. How to use step1: Calc

33 Nov 14, 2022

A framework for GPU based high-performance medical image processing and visualization

FAST is an open-source cross-platform framework with the main goal of making it easier to do high-performance processing and visualization of medical images on heterogeneous systems utilizing both multi-core CPUs and GPUs. To achieve this, FAST use modern C++, OpenCL and OpenGL.

315 Dec 30, 2022

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

3 Feb 10, 2022

An async Python library to automate solving ReCAPTCHA v2 by audio using Playwright.

Playwright nonoCAPTCHA An async Python library to automate solving ReCAPTCHA v2 by audio using Playwright. Disclaimer This project is for educational

69 Dec 28, 2022

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

38 Dec 12, 2022

L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

L3Cube-MahaCorpus L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources. We expand the existing Marathi monolingual

21 Dec 17, 2022

Pytorch Implementation for Dilated Continuous Random Field

DilatedCRF Pytorch implementation for fully-learnable DilatedCRF. If you find my work helpful, please consider our paper: @article{Mo2022dilatedcrf,

3 Nov 13, 2022

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

3 Jan 4, 2023

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

0 Feb 8, 2022

CT Based COVID 19 Diagnose by Image Processing and Deep Learning

This project proposed the deep learning and image processing method to undertake the diagnosis on 2D CT image and 3D CT volume.

1 Feb 8, 2022

Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

4 Mar 11, 2022

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

1 Feb 7, 2022

Sequence-tagging using deep learning

Classification using Deep Learning Requirements PyTorch version = 1.9.1+cu111 Python version = 3.8.10 PyTorch-Lightning version = 1.4.9 Huggingface

2 Dec 20, 2022

TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

TFPNER TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech Named entity recognition (NER), which aims at identifyin

1 Feb 7, 2022

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

T-IAI-901-MSC2022 - GROUP 18 Gestion de projet Notre travail a été organisé et réparti dans un Trello. https://trello.com/b/X3s2fpPJ/ia-projet Install

1 Feb 5, 2022

Natural Language Processing at EDHEC, 2022

Natural Language Processing Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022 What

1 Feb 4, 2022

A solution designed to extract, transform and load Chicago crime data from an RDS instance to other services in AWS.

This project is intended to implement a solution designed to extract, transform and load Chicago crime data from an RDS instance to other services in AWS.

1 Feb 4, 2022

This project proposes a camera vision based cursor control system, using hand moment captured from a webcam through a landmarks of hand by using Mideapipe module

2 Feb 20, 2022

A voice control utility for Spotify

Spotify Voice Control A voice control utility for Spotify · Report Bug · Request

27 Jan 1, 2023

This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project

Common Voice Utils This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project. It aims t

40 Dec 20, 2022

Used for data processing in machine learning, and help us to construct ML model more easily from scratch

Used for data processing in machine learning, and help us to construct ML model more easily from scratch. Can be used in linear model, logistic regression model, and decision tree.

0 Jul 5, 2022

Mednlp - Medical natural language parsing and utility library

Medical natural language parsing and utility library A natural language medical

3 Aug 24, 2022

DUCKSPLOIT - Windows Hacking FrameWork using Reverse Shell

Ducksploit Install Ducksploit Hacker setup raspberry pico Download https://githu

2 Jan 31, 2022

Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

epub2audiobook Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech Input examples qual a pasta do seu

7 Aug 25, 2022

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Low-resource-Machine-Translation This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal o

3 Jun 22, 2022

LSTM model - IMDB review sentiment analysis

NLP - Movie review sentiment analysis The colab notebook contains the code for building a LSTM Recurrent Neural Network that gives 87-88% accuracy on

1 Jan 29, 2022

Image processing is one of the most common term in computer vision

Image processing is one of the most common term in computer vision. Computer vision is the process by which computers can understand images and videos, and how they are stored, manipulated, and retrieve details from them. OpenCV is an open source computer vision image processing library for machine learning, deep leaning and AI application which plays a major role in real-time operation which is very important in today’s systems.

3 Feb 15, 2022

Python Speech-processing Resources

Python speech-processing Libraries

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Lyrics generation with GPT2-based Transformer

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Entity Disambiguation as text extraction (ACL 2022)

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

RedDrop is a quick and easy web server for capturing and processing encoded and encrypted payloads and tar archives.

Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

This repository contains the best Data Science free hand-picked resources to equip you with all the industry-driven skills and interview preparation kit.

A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.

🙌Kart of 210+ projects based on machine learning, deep learning, computer vision, natural language processing and all. Show your support by ✨ this repository.

Curso práctico: NLP de cero a cien 🤗

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

GPU-accelerated Image Processing library using OpenCL

Comprehensive-E2E-TTS - PyTorch Implementation

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Finally, some decent sample sentences

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Natural Language Processing - Sommer Semester 2022

Arabic Car License Recognition. A solution to the kaggle competition Machathon 3.0.

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Awesome Remote Sensing Toolkit based on PaddlePaddle.

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

[Arxiv preprint] Causality-inspired Single-source Domain Generalization for Medical Image Segmentation (code&data-processing pipeline)

SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

dyld_shared_cache processing / Single-Image loading for BinaryNinja

DLO8012: Natural Language Processing & CSL804: Computational Lab - II Semester VIII

Scalable Optical Flow-based Image Montaging and Alignment

Evaluation and Benchmarking of Speech Super-resolution Methods

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

HF's ML for Audio study group

AI Summer's complete catalog of articles

Get started with Machine Learning with Python - An introduction with Python programming examples

Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

A Traffic Sign Recognition Project which can help the driver recognise the signs via text as well as audio. Can be used at Night also.

Implements VQGAN+CLIP for image and video generation, and style transfers, based on text and image prompts. Emphasis on ease-of-use, documentation, and smooth video creation.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

This is a repo of basic Machine Learning!

Implementation of the famous Image Manipulation\Forgery Detector "ManTraNet" in Pytorch

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Facestar dataset. High quality audio-visual recordings of human conversational speech.

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

SelfRemaster: SSL Speech Restoration

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

Library for converting from RGB / GrayScale image to base64 and back.

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Classification models 1D Zoo - Keras and TF.Keras

Audio-analytics for music-producers! Automate tedious tasks such as musical scale detection, BPM rate classification and audio file conversion.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

3D Avatar Lip Syncronization from speech (JALI based face-rigging)

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

Digitalizing-Prescription-Image - PIRDS - Prescription Image Recognition and Digitalizing System is a OCR make with Tensorflow

Textlesslib - Library for Textless Spoken Language Processing

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Convert PDF to AudioBook and Audio Speech to PDF

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

Open-Source board for converting RaspberryPI to Brain-computer interface

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Image-to-image regression with uncertainty quantification in PyTorch

HGCN: Harmonic Gated Compensation Network For Speech Enhancement

A framework for GPU based high-performance medical image processing and visualization

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

An async Python library to automate solving ReCAPTCHA v2 by audio using Playwright.

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

Pytorch Implementation for Dilated Continuous Random Field

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions