1630 Repositories
Python speech-detection Libraries
Source code of article "Towards Toxic and Narcotic Medication Detection with Rotated Object Detector"
Towards Toxic and Narcotic Medication Detection with Rotated Object Detector Introduction This is the source code of article: Towards Toxic and Narcot
NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset
NOD (Night Object Detection) Dataset NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset, BM
BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual
FS-QAT: Few Shot Temporal Action Localization using Query Adaptive Transformer Accepted as Poster in BMVC 2021 This is an official implementation in P
The Official Repository for "Generalized OOD Detection: A Survey"
Generalized Out-of-Distribution Detection: A Survey 1. Overview This repository is with our survey paper: Title: Generalized Out-of-Distribution Detec
A python library for face detection and features extraction based on mediapipe library
FaceAnalyzer A python library for face detection and features extraction based on mediapipe library Introduction FaceAnalyzer is a library based on me
Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR
Speech_38_ru_commands Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR Программа умеет распознавать 38 ключевы
A Python/Pytorch app for easily synthesising human voices
Voice Cloning App A Python/Pytorch app for easily synthesising human voices Documentation Discord Server Video guide Voice Sharing Hub FAQ's System Re
a reimplementation of Holistically-Nested Edge Detection in PyTorch
pytorch-hed This is a personal reimplementation of Holistically-Nested Edge Detection [1] using PyTorch. Should you be making use of this work, please
The end-to-end platform for building voice products at scale
Picovoice Made in Vancouver, Canada by Picovoice Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Goog
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
chitra What is chitra? chitra (चित्र) is a multi-functional library for full-stack Deep Learning. It simplifies Model Building, API development, and M
On-device speech-to-intent engine powered by deep learning
Rhino Made in Vancouver, Canada by Picovoice Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a giv
Vision Deep-Learning using Tensorflow, Keras.
Welcome! I am a computer vision deep learning developer working in Korea. This is my blog, and you can see everything I've studied here. https://www.n
Train a state-of-the-art yolov3 object detector from scratch!
TrainYourOwnYOLO: Building a Custom Object Detector from Scratch This repo let's you train a custom image detector using the state-of-the-art YOLOv3 c
On-device wake word detection powered by deep learning.
Porcupine Made in Vancouver, Canada by Picovoice Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa
RSNA Intracranial Hemorrhage Detection with python
RSNA Intracranial Hemorrhage Detection This is the source code for the first place solution to the RSNA2019 Intracranial Hemorrhage Detection Challeng
PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection
PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection
Project code for weakly supervised 3D object detectors using wide-baseline multi-view traffic camera data: WIBAM.
WIBAM (Work in progress) Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data 3D object dete
Patch SVDD for Image anomaly detection
Patch SVDD Patch SVDD for Image anomaly detection. Paper: https://arxiv.org/abs/2006.16067 (published in ACCV 2020). Original Code : https://github.co
Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.
Graph Mining Author: Jiayi Chen Time: April 2021 Implemented Algorithms: Network: Scrabing Data, Network Construbtion and Network Measurement (e.g., P
The Video-based Accident Detection System built in Python
Accident-detection-system About the Project This Repository contains the Video-based Accident Detection System built in Python. Contributors Yukta Gop
A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.
Ring x Sonos A python script that plays .mp3 files whenever a doorbell is rung or a doorbell detects motion. Features Music! Authors @braden Running T
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod
Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)
Active Learning for Deep Object Detection via Probabilistic Modeling This repository is the official PyTorch implementation of Active Learning for Dee
Code of ICCV paper: Rethinking Transformer-based Set Prediction for Object Detection
Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD
Production First and Production Ready End-to-End Speech Recognition Toolkit
WeNet 中文版 Discussions | Docs | Papers | Runtime (x86) | Runtime (android) | Pretrained Models We share neural Net together. The main motivation of WeN
MutualGuide is a compact object detector specially designed for embedded devices
Introduction MutualGuide is a compact object detector specially designed for embedded devices. Comparing to existing detectors, this repo contains two
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.
Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g
Identify the emotion of multiple speakers in an Audio Segment
MevonAI - Speech Emotion Recognition Identify the emotion of multiple speakers in a Audio Segment Report Bug · Request Feature Try the Demo Here Table
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,
Out-of-distribution detection using the pNML regret. NeurIPS2021
OOD Detection Load conda environment conda env create -f environment.yml or install requirements: while read requirement; do conda install --yes $requ
Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"
LDNet Author: Wen-Chin Huang (Nagoya University) Email: [email protected] This is the official implementation of the paper "LDNet
Automatic Idiomatic Expression Detection
IDentifier of Idiomatic Expressions via Semantic Compatibility (DISC) An Idiomatic identifier that detects the presence and span of idiomatic expressi
Official implementation of "Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection" (ICCV Workshops 2021: RSL-CV).
Official PyTorch implementation of "Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection" This is the implementation of the paper "Syn
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context This Repository contains the code on AVA of our ACM MM 2021 paper: LSTC: Boosting
Detecting Blurred Ground-based Sky/Cloud Images
Detecting Blurred Ground-based Sky/Cloud Images With the spirit of reproducible research, this repository contains all the codes required to produce t
Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems
AequeVox Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems README under development. Python Packages Required
Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction, ICCV-2021".
HF2-VAD Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Predictio
The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classifier')
The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classifier')
TumorInsight is a Brain Tumor Detection and Classification model built using RESNET50 architecture.
A Brain Tumor Detection and Classification Model built using RESNET50 architecture. The model is also deployed as a web application using Flask framework.
NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring
NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring Uncensored version of the following image can be found at https://i.
Wafer Fault Detection using MlOps Integration
Wafer Fault Detection using MlOps Integration This is an end to end machine learning project with MlOps integration for predicting the quality of wafe
1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.
2021AICompetition-03 본 repo 는 mAy-I Inc. 팀으로 참가한 2021 인공지능 온라인 경진대회 중 [이미지] 운전 사고 예방을 위한 운전자 부주의 행동 검출 모델] 태스크 수행을 위한 레포지토리입니다. mAy-I 는 과학기술정보통신부가 주최하
Photonix Photo Manager - a photo management application based on web technologies
A modern, web-based photo management server. Run it on your home server and it will let you find the right photo from your collection on any device. Smart filtering is made possible by object recognition, face recognition, location awareness, color analysis and other ML algorithms.
face property detection pytorch
This is the face property train code of project face-detection-project
Acoustic mosquito detection code with Bayesian Neural Networks
HumBugDB Acoustic mosquito detection with Bayesian Neural Networks. Extract audio or features from our large-scale dataset on Zenodo. This repository
Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.
Anomaly Detection in Multi-Agent Trajectories for Automated Driving This is the official project page including the paper, code, simulation, baseline
ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset
UIT-ViSD4SA PACLIC 35 General Introduction This repository contains the data of the paper: Span Detection for Vietnamese Aspect-Based Sentiment Analys
End-to-End Speech Processing Toolkit
ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 1.9.0 ubuntu20/python3.9/pip ubuntu20/python3.8/p
CPC-big and k-means clustering for zero-resource speech processing
The CPC-big model and k-means checkpoints used in Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.
YOLOv5 detection interface - PyQt5 implementation
所有代码已上传,直接clone后,运行yolo_win.py即可开启界面。 2021/9/29:加入置信度选择 界面是在ultralytics的yolov5基础上建立的,界面使用pyqt5实现,内容较简单,娱乐而已。 功能: 模型选择 本地文件选择(视频图片均可) 开关摄像头
SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images.
SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images (IEEE GRSL 2021) Code (based on mmdetection) for SSPNet: Scale Selec
Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition
Light-SERNet This is the Tensorflow 2.x implementation of our paper "Light-SERNet: A lightweight fully convolutional neural network for speech emotion
Method for facial emotion recognition compitition of Xunfei and Datawhale .
人脸情绪识别挑战赛-第3名-W03KFgNOc-源代码、模型以及说明文档 队名:W03KFgNOc 排名:3 正确率: 0.75564 队员:yyMoming,xkwang,RichardoMu。 比赛链接:人脸情绪识别挑战赛 文章地址:link emotion 该项目分别训练八个模型并生成csv文
Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)
Active Learning for Deep Object Detection via Probabilistic Modeling This repository is the official PyTorch implementation of Active Learning for Dee
Maix Speech AI lib, including ASR, chat, TTS etc.
Maix-Speech 中文 | English Brief Now only support Chinese, See 中文 Build Clone code by: git clone https://github.com/sipeed/Maix-Speech Compile x86x64 c
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.
Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating
Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"
TSOD Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer" Usage For training, open train_test, run p
translate using your voice
speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...
Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"
PurNet Project for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss" Abstract Image-based salie
To lazy to read your homework ? Get it done with LOL
LOL To lazy to read your homework ? Get it done with LOL Needs python 3.x L:::::::::L OO:::::::::OO L:::::::::L L:::::::
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop
PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come
IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-quality pre-trained models from torchvision, MMLabs, and soon Pytorch Image Models. It orchestrates the end-to-end deep learning workflow allowing to train networks with easy-to-use robust high-performance libraries such as Pytorch-Lightning and Fastai
Multiview 3D object detection on MultiviewC dataset through moft3d.
Multiview Orthographic Feature Transformation for 3D Object Detection Multiview 3D object detection on MultiviewC dataset through moft3d. Introduction
Rethinking Transformer-based Set Prediction for Object Detection
Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD
LeBenchmark: a reproducible framework for assessing SSL from speech
LeBenchmark: a reproducible framework for assessing SSL from speech
Pun Detection and Location
Pun Detection and Location “The Boating Store Had Its Best Sail Ever”: Pronunciation-attentive Contextualized Pun Recognition Yichao Zhou, Jyun-yu Jia
Multimodal Descriptions of Social Concepts: Automatic Modeling and Detection of (Highly Abstract) Social Concepts evoked by Art Images
MUSCO - Multimodal Descriptions of Social Concepts Automatic Modeling of (Highly Abstract) Social Concepts evoked by Art Images This project aims to i
Face Library is an open source package for accurate and real-time face detection and recognition
Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis
Python and OpenCV-based scene cut/transition detection program & library.
Video Scene Cut Detection and Analysis Tool Latest Release: v0.5.6.1 (October 11, 2021) Main Webpage: py.scenedetect.com Documentation: manual.scenede
AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using Mozilla DeepSpeech.
AutoSub About Motivation Installation Docker How-to example How it works TO-DO Contributing References About AutoSub is a CLI application to generate
Face Mask Detection on Image and Video using tensorflow and keras
Face-Mask-Detection Face Mask Detection on Image and Video using tensorflow and keras Train Neural Network on face-mask dataset using tensorflow and k
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Code for Efficient Visual Pretraining with Contrastive Detection
Code for DetCon This repository contains code for the ICCV 2021 paper "Efficient Visual Pretraining with Contrastive Detection" by Olivier J. Hénaff,
Self-Supervised Speech Pre-training and Representation Learning Toolkit.
What's New Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site
UniSpeech - Large Scale Self-Supervised Learning for Speech
UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-
Exploring the link between uncertainty estimates obtained via "exact" Bayesian inference and out-of-distribution (OOD) detection.
Uncertainty-based OOD detection Exploring the link between uncertainty estimates obtained by "exact" Bayesian inference and out-of-distribution (OOD)
FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis
LST-TTS Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis. Submitted to ICASSP 2022. Audi
Code for the preprint "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"
This is a repository for the paper of "Well-classified Examples are Underestimated in Classification with Deep Neural Networks" The implementation and
This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022
CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
ByteTrack ByteTrack is a simple, fast and strong multi-object tracker. ByteTrack: Multi-Object Tracking by Associating Every Detection Box Yifu Zhang,
Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.
Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection This repository is an official implementation of the AAAI 2021 paper Co-mi
Morphable Detector for Object Detection on Demand
Morphable Detector for Object Detection on Demand (ICCV 2021) PyTorch implementation of the paper Morphable Detector for Object Detection on Demand. I
Code for the IJCAI 2021 paper "Structure Guided Lane Detection"
SGNet Project for the IJCAI 2021 paper "Structure Guided Lane Detection" Abstract Recently, lane detection has made great progress with the rapid deve
Leon is an open-source personal assistant who can live on your server.
Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal
Python SDK for working with Voicegain Speech-to-Text
Voicegain Speech-to-Text Python SDK Python SDK for the Voicegain Speech-to-Text API. This API allows for large vocabulary speech-to-text transcription
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021
Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant
African language Speech Recognition - Speech-to-Text
Swahili-Speech-To-Text Table of Contents Swahili-Speech-To-Text Overview Scenario Approach Project Structure data: models: notebooks: scripts tests: l
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0
OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le
A synthetic texture-invariant dataset for object detection of UAVs
A synthetic dataset for object detection of UAVs This repository contains a synthetic datasets accompanying the paper Sim2Air - Synthetic aerial datas
Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?" submitted to TOSEM
tosem2021-personality-rep-package Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far
A brand new hub for Scene Graph Generation methods based on MMdetection (2021). The pipeline of from detection, scene graph generation to downstream tasks (e.g., image cpationing) is supported. Pytorch version implementation of HetH (ECCV 2020) and TopicSG (ICCV 2021) is included.
MMSceneGraph Introduction MMSceneneGraph is an open source code hub for scene graph generation as well as supporting downstream tasks based on the sce
[ICCV 2021 Oral] Deep Evidential Action Recognition
DEAR (Deep Evidential Action Recognition) Project | Paper & Supp Wentao Bao, Qi Yu, Yu Kong International Conference on Computer Vision (ICCV Oral), 2
GAN-generated image detection based on CNNs
GAN-image-detection This repository contains a GAN-generated image detector developed to distinguish real images from synthetic ones. The detector is
A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.
DeepFilterNet A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering. libDF contains Rust code used for dat
Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.
SphereRPN Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021. Authors: Th
Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"
CTDNet The PyTorch code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection" Requirements Python 3.6
FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Azure Neural Speech Service TTS
Written in Python using the Azure Speech SDK. App.py provides an easy way to create an Text-To-Speech request to Azure Speech and download the wav file.