1027 Python Audio-recognition Libraries

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning

4.2k Dec 29, 2022

Speech-Emotion-Analyzer - The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Speech Emotion Analyzer The idea behind creating this project was to build a machine learning model that could detect emotions from the speech we have

965 Dec 24, 2022

Prml - Repository of notes, code and notebooks in Python for the book Pattern Recognition and Machine Learning by Christopher Bishop

Pattern Recognition and Machine Learning (PRML) This project contains Jupyter notebooks of many the algorithms presented in Christopher Bishop's Patte

1k Jan 7, 2023

TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow

TensorFlow 101: Introduction to Deep Learning I have worked all my life in Machine Learning, and I've never seen one algorithm knock over its benchmar

896 Jan 4, 2023

Saliency - Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).

Saliency Methods 🔴 Now framework-agnostic! (Example core notebook) 🔴 🔗 For further explanation of the methods and more examples of the resulting ma

849 Dec 27, 2022

Captcha-tensorflow - Image Captcha Solving Using TensorFlow and CNN Model. Accuracy 90%+

Captcha Solving Using TensorFlow Introduction Solve captcha using TensorFlow. Learn CNN and TensorFlow by a practical project. Follow the steps, run t

869 Jan 6, 2023

GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

GestureSSD_CBAM A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js SSD implementation is based on https://github

2 Jan 6, 2022

Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

1 Jan 1, 2022

IDCARD-VERIFYING-SYSTEM - The "IDCARD VERIFYING SYSTEM" uses the Google's latest version of Tesseract OCR[Optical Character Recognition]

IDCARD VERIFYING SYSTEM The "IDCARD VERIFYING SYSTEM" uses the Google's latest v

1 Dec 31, 2021

HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

HAR-stacked-residual-bidir-LSTM The project is based on this repository which is presented as a tutorial. It consists of Human Activity Recognition (H

287 Dec 27, 2022

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment An implementation of SpecAugment for Pytorch How to use Install pytorch, version=1.9.0 (new feature (torch.Tensor.take_along_dim) is used

3 Oct 11, 2022

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Face-Recognition-based-Attendance-System A real time implementation of Attendance System in python. Pre-requisites To understand the implentation of F

1 Dec 31, 2021

Face_mosaic - Mosaic blur processing is applied to multiple faces appearing in the video

動機 face_recognitionを使用して得られる顔座標は長方形であり、この座標をそのまま用いてぼかし処理を行った場合得られる画像は醜い。それに対してモ

6 Feb 3, 2022

“Robust Lightweight Facial Expression Recognition Network with Label Distribution Training”, AAAI 2021.

EfficientFace Zengqun Zhao, Qingshan Liu, Feng Zhou. "Robust Lightweight Facial Expression Recognition Network with Label Distribution Training". AAAI

119 Jan 8, 2023

WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

27 Dec 22, 2022

A state-of-the-art semi-supervised method for image recognition

Mean teachers are better role models Paper ---- NIPS 2017 poster ---- NIPS 2017 spotlight slides ---- Blog post By Antti Tarvainen, Harri Valpola (The

1.4k Jan 6, 2023

Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

MLCR This is the source code for paper Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition. Xuesong Niu, Hu Han, Shiguang

60 Nov 29, 2022

Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model

Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model Baris Gecer 1, Binod Bhattarai 1

190 Dec 29, 2022

基于百度的语音识别，用python实现，pyaudio+pyqt

Speech-recognition 基于百度的语音识别，python3.8(conda)+pyaudio+pyqt+baidu-aip 百度有面向python

1 Jan 3, 2022

Rocks vc Userbot: A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group

⭐️ Rocks VC Userbot ⭐️ Telegram Userbot To Play Audio And Video Song On VC Chat

10 Jul 18, 2022

Youtube video downloader and info extractor for python.

tube_dl Tube_dl is a Simple Youtube video downloader for Python. A Modular approach to bypass and download Youtube Videos and Playlist from Youtube us

16 Jul 9, 2022

Library for Python 3 to communicate with the Google Chromecast.

pychromecast Library for Python 3.6+ to communicate with the Google Chromecast. It currently supports: Auto discovering connected Chromecasts on the n

2.4k Jan 2, 2023

Enhanced Audio Player for Discord

Discodo is an enhanced audio player for discord

42 Oct 5, 2022

A timer for bird lovers, plays a random birdcall while displaying its image and info.

Birdcall Timer A timer for bird lovers. Siriema hatchling by Junior Peres Junior Background My partner needed a customizable timer for sitting and sta

1 Jul 8, 2022

This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

MoroccoAI Data Challenge (Edition #001) This Reposotory is result of our work in the comepetiton organized by MoroccoAI in the context of the first Mo

14 Oct 31, 2022

A stable and Fast telegram video convertor bot which can compress, convert(video into audio and other video formats), rename with permanent thumbnail and trim.

ᴠɪᴅᴇᴏ ᴄᴏɴᴠᴇʀᴛᴏʀ A stable and Fast telegram video convertor bot which can compress, convert(video into audio and other video formats), rename and trim.

183 Jan 4, 2023

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

CPC_audio This code implements the Contrast Predictive Coding algorithm on audio data, as described in the paper Unsupervised Pretraining Transfers we

283 Dec 30, 2022

Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Video Representation Learning by Recognizing Temporal Transformations [Project Page] Simon Jenni, Givi Meishvili, and Paolo Favaro. In ECCV, 2020. Thi

46 Nov 14, 2022

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

[Paper] [Project page] This repository contains code for the paper: Andrew Owens, Alexei A. Efros. Audio-Visual Scene Analysis with Self-Supervised Mu

202 Dec 13, 2022

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

AdaFocusV2 This repo contains the official code and pre-trained models for AdaFo

79 Dec 26, 2022

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

28 Oct 18, 2022

Scene-Text-Detection-and-Recognition (Pytorch)

Scene-Text-Detection-and-Recognition (Pytorch) Competition URL: https://tbrain.t

9 Jan 2, 2023

Indonesian Car License Plate Character Recognition using Tensorflow, Keras and OpenCV.

Monopol Indonesian Car License Plate (Indonesia Mobil Nomor Polisi) Character Recognition using Tensorflow, Keras and OpenCV. Background This applicat

3 Apr 7, 2022

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

6 Jun 20, 2022

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

Cross Domain Facial Expression Recognition Benchmark Implementation of papers: Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchm

89 Dec 9, 2022

Efficient face emotion recognition in photos and videos

This repository contains code of face emotion recognition that was developed in the RSF (Russian Science Foundation) project no. 20-71-10010 (Efficien

239 Jan 4, 2023

Tkinter based YouTube video downloader works on pytube 11.0.2. Can download YouTube videos in 720p(HD), 144p and even only audio.

YouTube-Downloader Tkinter based YouTube video downloader works on pytube 11.0.2. Can download YouTube videos in 720p(HD), 144p and even only audio. G

2 Dec 27, 2021

Easily download audio described movies and TV shows found on audiovault.net

AudioVault Downloader A convenient downloader for audio described movies and TV shows found on the Audio Vault. get latest binary release for Windows

5 Feb 10, 2022

EmoTag helps you train emotion detection model for Chinese audios

emoTag emoTag helps you train emotion detection model for Chinese audios. Environment pip install -r requirement.txt Data We used Emotional Speech Dat

4 Sep 7, 2022

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

speech-recognition-py Speech recognition is the ability of computer software to identify words and phrases in spoken language and convert them to huma

1 Apr 3, 2022

Vietnamese Language Detection and Recognition

Table of Content Introduction (Khôi viết) Dataset (đổi link thui thành 3k5 ảnh mình) Getting Started (An Viết) Requirements Usage Example Training & E

6 May 27, 2022

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files.

praatIO Questions? Comments? Feedback? A library for working with praat, time aligned audio transcripts, and audio files that comes with batteries inc

224 Dec 19, 2022

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

133 Sep 20, 2022

A repository that finds a person who looks like you by using face recognition technology.

Find Your Twin Hello everyone, I've always wondered how casting agencies do the casting for a scene where a certain actor is young or old for a movie

3 Jan 29, 2022

A collection of python scripts for extracting and analyzing acoustics from audio files.

pyAcoustics A collection of python scripts for extracting and analyzing acoustics from audio files. Contents 1 Common Use Cases 2 Major revisions 3 Fe

74 Dec 26, 2022

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

CoG-BART Contrast and Generation Make BART a Good Dialogue Emotion Recognizer Quick Start: To run the model on test sets of four datasets, Download th

39 Dec 24, 2022

Group Activity Recognition with Clustered Spatial Temporal Transformer

GroupFormer Group Activity Recognition with Clustered Spatial-TemporalTransformer Backbone Style Action Acc Activity Acc Config Download Inv3+flow+pos

28 Dec 12, 2022

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SEOVER-Master This code is the implementation of paper： SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

4 Feb 24, 2022

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Contrast and Mix (CoMix) The repository contains the codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Backgroun

Computer Vision and Intelligence Research (CVIR)

13 Dec 10, 2022

Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

Pitcher.py Free & OS emulation of the SP-12 & SP-1200 signal chain (now with GUI) Pitch shift / bitcrush / resample audio files Written and tested in

13 Oct 3, 2022

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

13.6k Jan 5, 2023

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Named-Entity-Recognition-NER-Papers by Pengfei Liu, Jinlan Fu and other contributors. An elaborate and exhaustive paper list for Named Entity Recognit

388 Dec 18, 2022

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

End2End Occluded Face Recognition by Masking Corrupted Features This is the Pytorch implementation of our TPAMI 2021 paper End2End Occluded Face Recog

25 Oct 31, 2022

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

5 Nov 11, 2022

PyTorch implementation of our paper: Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition, arxiv This is a PyTorch implementation of our paper. 1. Re

11 Nov 19, 2022

A Comparative Review of Recent Kinect-Based Action Recognition Algorithms (TIP2020, Matlab codes)

A Comparative Review of Recent Kinect-Based Action Recognition Algorithms This repo contains: the HDG implementation (Matlab codes) for 'Analysis and

5 Oct 22, 2022

VIsually-Pivoted Audio and(N) Text

VIP-ANT: VIsually-Pivoted Audio and(N) Text Code for the paper Connecting the Dots between Audio and Text without Parallel Data through Visual Knowled

16 Nov 4, 2022

Streaming over lightweight data transformations

Description Data augmentation libarary for Deep Learning, which supports images, segmentation masks, labels and keypoints. Furthermore, SOLT is fast a

Research Unit of Medical Imaging, Physics and Technology

256 Jan 8, 2023

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

1 Dec 20, 2021

Using computer vision method to recognize and calcutate the features of the architecture.

building-feature-recognition In this repository, we accomplished building feature recognition using traditional/dl-assisted computer vision method. Th

4 Aug 11, 2022

Complete system for facial identity system

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

4 May 2, 2022

This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation This repository contains PyTorch evaluation code, trainin

45 Dec 20, 2022

A Telegram bot to transcribe audio, video and image into text.

Transcriber Bot A Telegram bot to transcribe audio, video and image into text. Deploy to Heroku Local Deploying Install the FFmpeg. Make sure you have

10 Dec 19, 2022

Chinese license plate recognition

AgentCLPR 简介一个基于 ONNXRuntime、AgentOCR 和 License-Plate-Detector 项目开发的中国车牌检测识别系统。车牌识别效果支持多种车牌的检测和识别（其中单层车牌识别效果较好）：单层车牌： [[[[373, 282], [69, 284],

26 Dec 25, 2022

J.A.R.V.I.S is an AI virtual assistant made in python.

J.A.R.V.I.S is an AI virtual assistant made in python. Running JARVIS Without Python To run JARVIS without python: 1. Head over to our installation pa

16 Dec 29, 2022

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

LSTMs for Human Activity Recognition Human Activity Recognition (HAR) using smartphones dataset and an LSTM RNN. Classifying the type of movement amon

3.1k Dec 30, 2022

Wav2Vec for speech recognition, classification, and audio classification

Soxan در زبان پارسی به نام سخن This repository consists of models, scripts, and notebooks that help you to use all the benefits of Wav2Vec 2.0 in your

140 Dec 15, 2022

Arabic speech recognition, classification and text-to-speech.

klaam Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows tr

177 Dec 27, 2022

Testing the Facial Emotion Recognition (FER) algorithm on animations

PegHeads-Tutorial-3 Testing the Facial Emotion Recognition (FER) algorithm on animations

2 Jan 3, 2022

Python script for downloading audio from YouTube songs/videos.

Python script for downloading audio from YouTube songs/videos. All you have to do is specify the path to your folder and then type song's/video's name and the sound will be downloaded into your folder.

0 Oct 5, 2022

Make an audio file (really) long-winded

longwind Make an audio file (really) long-winded Daily repetitions are an illusion anyway.

2 Sep 12, 2022

Create light scenes , voice control, ifttt, fuzzywuzzy speech correction and much more with Tuya light bulbs.

LightBox Features: Auto discover tuya lights Set and create moods (aka: light profiles) Change moods via IFTTT List moods via IFTTT FuzzyWuzzy, speech

1 Dec 20, 2021

nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

faceprocessor nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex. Tech faceprocessor uses a number of open source projec

3 Sep 6, 2021

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

75 Dec 19, 2022

Tensorflow Implementation of ECCV'18 paper: Multimodal Human Motion Synthesis

MT-VAE for Multimodal Human Motion Synthesis This is the code for ECCV 2018 paper MT-VAE: Learning Motion Transformations to Generate Multimodal Human

36 Oct 2, 2022

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021

pyo is a Python module written in C to help digital signal processing script creation.

1.1k Jan 1, 2023

Code of TIP2021 Paper《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet and Pytorch versions.

SFace Code of TIP2021 Paper 《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet, PyTorch and Jittor versi

47 Nov 25, 2022

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose We provide PyTorch implementations for our arxiv paper "Audio-dr

497 Jan 9, 2023

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020] by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wa

112 Dec 28, 2022

Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019) We propose Disentangled Audio-Visual System (DAVS) to ad

750 Dec 23, 2022

AudioDVP:Photorealistic Audio-driven Video Portraits

AudioDVP This is the official implementation of Photorealistic Audio-driven Video Portraits. Major Requirements Ubuntu = 18.04 PyTorch = 1.2 GCC =

232 Jan 3, 2023

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.

Adelaide Intelligent Machines (AIM) Group

3k Jan 2, 2023

User-friendly Voice Cloning Application

Multi-Language-RTVC stands for Multi-Language Real Time Voice Cloning and is a Voice Cloning Tool capable of transfering speaker-specific audio featur

19 Dec 30, 2022

UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing UniSpeech (ICML 202

281 Dec 15, 2022

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

SEW (Squeezed and Efficient Wav2vec) The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speec

67 Dec 1, 2022

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

PART 2: CHAIN LINKING AUDIO-TO-TEXT NLP TASKS 2A: TRANSCRIBE-TRANSLATE-SENTIMENT-ANALYSIS In notebook3.0, I demo a simple workflow to: transcribe a lo

30 Jul 13, 2022

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

77.1k Dec 31, 2022

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

7.8k Jan 9, 2023

Awesome Treasure of Transformers Models Collection

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️

577 Jan 7, 2023

A curated list of long-tailed recognition resources.

Awesome Long-tailed Recognition A curated list of long-tailed recognition and related resources. Please feel free to pull requests or open an issue to

542 Jan 1, 2023

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

FTLNet_Pytorch Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data 1. Introduction This repo is an unofficial

1 Nov 4, 2020

A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

RangeLoss Pytorch This is a Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Trai

7 Nov 27, 2021

Python Audio-recognition Resources

Python audio-recognition Libraries

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Speech-Emotion-Analyzer - The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Prml - Repository of notes, code and notebooks in Python for the book Pattern Recognition and Machine Learning by Christopher Bishop

TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow

Saliency - Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).

Captcha-tensorflow - Image Captcha Solving Using TensorFlow and CNN Model. Accuracy 90%+

GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

Omdena-abuja-anpd - Automatic Number Plate Detection for the security of lives and properties using Computer Vision.

IDCARD-VERIFYING-SYSTEM - The "IDCARD VERIFYING SYSTEM" uses the Google's latest version of Tesseract OCR[Optical Character Recognition]

HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Face_mosaic - Mosaic blur processing is applied to multiple faces appearing in the video

“Robust Lightweight Facial Expression Recognition Network with Label Distribution Training”, AAAI 2021.

WaveFake: A Data Set to Facilitate Audio DeepFake Detection

A state-of-the-art semi-supervised method for image recognition

Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition (NeurIPS 2019)

Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model

基于百度的语音识别，用python实现，pyaudio+pyqt

Rocks vc Userbot: A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group

Youtube video downloader and info extractor for python.

Library for Python 3 to communicate with the Google Chromecast.

Enhanced Audio Player for Discord

A timer for bird lovers, plays a random birdcall while displaying its image and info.

This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

A stable and Fast telegram video convertor bot which can compress, convert(video into audio and other video formats), rename with permanent thumbnail and trim.

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

Scene-Text-Detection-and-Recognition (Pytorch)

Indonesian Car License Plate Character Recognition using Tensorflow, Keras and OpenCV.

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

Efficient face emotion recognition in photos and videos

Tkinter based YouTube video downloader works on pytube 11.0.2. Can download YouTube videos in 720p(HD), 144p and even only audio.

Easily download audio described movies and TV shows found on audiovault.net

EmoTag helps you train emotion detection model for Chinese audios

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

Vietnamese Language Detection and Recognition

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files.

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

A repository that finds a person who looks like you by using face recognition technology.

A collection of python scripts for extracting and analyzing acoustics from audio files.

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

Group Activity Recognition with Clustered Spatial Temporal Transformer

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

PyTorch implementation of our paper: Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

A Comparative Review of Recent Kinect-Based Action Recognition Algorithms (TIP2020, Matlab codes)

VIsually-Pivoted Audio and(N) Text

Streaming over lightweight data transformations

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Using computer vision method to recognize and calcutate the features of the architecture.

Complete system for facial identity system

This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

A Telegram bot to transcribe audio, video and image into text.

Chinese license plate recognition

J.A.R.V.I.S is an AI virtual assistant made in python.

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

Wav2Vec for speech recognition, classification, and audio classification

Arabic speech recognition, classification and text-to-speech.

Testing the Facial Emotion Recognition (FER) algorithm on animations

Python script for downloading audio from YouTube songs/videos.

Make an audio file (really) long-winded

Create light scenes , voice control, ifttt, fuzzywuzzy speech correction and much more with Tuya light bulbs.

nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Tensorflow Implementation of ECCV'18 paper: Multimodal Human Motion Synthesis

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

pyo is a Python module written in C to help digital signal processing script creation.