The codebase for Data-driven general-purpose voice activity detection.

Heinrich Dinkel

Last update: Nov 27, 2022

Related tags

Overview

Data driven GPVAD

Repository for the work in TASLP 2021 Voice activity detection in the wild: A data-driven approach using teacher-student training.

Sample predictions against other methods

Noise robustness

Results

Our best model trained on the SRE (V3) dataset obtains the following results:

	Precision	Recall	F1	AUC	FER	Event-F1
aurora_clean	96.844	95.102	95.93	98.66	3.06	74.8
aurora_noisy	90.435	92.871	91.544	97.63	6.68	54.45
dcase18	89.202	88.362	88.717	95.2	10.82	57.85

Usage

We provide most of our pretrained models in this repository, including:

Both teachers (T_1, T_2)
Unbalanced audioset pretrained model
Voxceleb 2 pretrained model
Our best submission (SRE V3 trained)

To download and run evaluation just do:

git clone https://github.com/RicherMans/Datadriven-VAD
cd Datadriven-VAD
pip3 install -r requirements.txt
python3 forward.py -w example/example.wav

Running this will print:

|   index | event_label   |   onset |   offset | filename            |
|--------:|:--------------|--------:|---------:|:--------------------|
|       0 | Speech        |    0.28 |     0.94 | example/example.wav |
|       1 | Speech        |    1.04 |     2.22 | example/example.wav |

Predicting voice activity

We support single file and filelist-batching in our script. Obtaining VAD predictions is easy:

python3 forward.py -w example/example.wav

Or if one prefers to do that batch_wise, first prepare a filelist: find . -type f -name *.wav > wavlist.txt' And then just run:

python3 forward.py -l wavlist

Extra parameters

-model adjusts the pretrained model. Can be one of t1,t2,v2,a2,a2_v2,sre. Refer to the paper for each respective model. By default we use sre.
-soft instead of predicting human-readable timestamps, the model is now outputting the raw probabilities.
-hard instead of predicting human-readable timestamps, the model is now outputting the post-processed 0-1 flags indicating speech. Please note this is different from the paper, which thresholded the soft probabilities without post-processing.
-th adjusts the threshold. If a single threshold is passed (e.g., -th 0.5), we utilize simple binearization. Otherwise use the default double threshold with -th 0.5 0.1.
-o outputs the results into a new folder.

Training from scratch

If you intend to rerun our work, prepare some data and extract log-Mel spectrogram features. Say, you have downloaded the balanced subset of AudioSet and stored all files in a folder data/balanced/. Then:

cd data;
mkdir hdf5 csv_labels;
find balanced -type f > wavs.txt;
python3 extract_features.py wavs.txt -o hdf5/balanced.h5
h5ls -r hdf5/balanced.h5 | awk -F[/' '] 'BEGIN{print "filename","hdf5path"}NR>1{print $2,"hdf5/balanced.h5"}'> csv_labels/balanced.csv

The input for our label prediction script is a csv file with exactly two columns, filename and hdf5path.

An example csv_labels/balanced.csv would be:

filename hdf5path
--PJHxphWEs_30.000.wav hdf5/balanced.h5                                                                                          
--ZhevVpy1s_50.000.wav hdf5/balanced.h5                                                                                          
--aE2O5G5WE_0.000.wav hdf5/balanced.h5                                                                                           
--aO5cdqSAg_30.000.wav hdf5/balanced.h5

After feature extraction, proceed to predict labels:

mkdir -p softlabels/{hdf5,csv};
python3 prepare_labels.py --pre ../pretrained_models/teacher1/model.pth csv_labels/balanced.csv softlabels/hdf5/balanced.h5 softlabels/csv/balanced.csv

Lastly, just train:

cd ../; #Go to project root
# Change config accoringly with input data
python3 run.py train configs/example.yaml

Citation

If youre using this work, please cite it in your publications.

@article{Dinkel2021,
author = {Dinkel, Heinrich and Wang, Shuai and Xu, Xuenan and Wu, Mengyue and Yu, Kai},
doi = {10.1109/TASLP.2021.3073596},
issn = {2329-9290},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
pages = {1542--1555},
title = {{Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training}},
url = {https://ieeexplore.ieee.org/document/9405474/},
volume = {29},
year = {2021}
}

and

@inproceedings{Dinkel2020,
  author={Heinrich Dinkel and Yefei Chen and Mengyue Wu and Kai Yu},
  title={{Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3665--3669},
  doi={10.21437/Interspeech.2020-0995},
  url={http://dx.doi.org/10.21437/Interspeech.2020-0995}
}

Comments

Training from scratch [Data format query]

Hi, Thank you for your wonderful work with GPVAD. I am looking at training the student model from scratch for my dataset(s). My dataset is in the form of audio_signal (wav) and the region has been tagged within the audio sample. For example: [{'type': 'BACKGROUND NOISE', 'time-range': [3.041, 3.169]}, {'type': 'SPEECH', 'time-range': [5.208, 5.544]}, {'type': 'BACKGROUND NOISE', 'time-range': [4.339, 5.069]}] is a tagged audio. Can your data pipeline support training for such data formats? If not, what do you suggest I should do to find a work around this? Thanks a lot!

opened by sanchit-ahuja 7
Evaluation set could provide？

Hello! I noticed the evaluate function in run.py, which is shown as bellow. Actually I don't know the format of labels.tsv. Could you provide the evaluation set? if not, is it possible to give a screenshot for labels.tsv By the way, is the data.h5 same as train set, which is extracted by extract_feature.py ? Thanks!

opened by wcangyu 6
When forward “example.wav”, Can not get the same result as Readme

Hello, I have git pull code, and pip install requirements. When run "python forward.py -w ./example/example.wav", the result bellow which is different from README. Is there any problom?Thank you very much

opened by wcangyu 4
'filename' also needed in data/softlabels/hdf5/balanced.h5 ?

When I was trying to train the model, I meet a new problem with UnicodeDecodeError.

File "run.py", line 97, in train data_df = pd.read_csv(config_parameters['data'], sep='\s+') File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 782, in pandas._libs.parsers.TextReader._get_header UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

I changed the data/softlabels/hdf5/balanced.h5 to utf8 and it is like:

8948 4446 0d0a 1a0a 0000 0000 0008 0800 0400 1000 0000 0000 0000 0000 0000 0000 ffff ffff ffff ffff ccda 4b01 0000 0000 ffff ffff ffff ffff 0000 0000 0000 0000 6000 0000 0000 0000 0100 0000 0000 0000 8800 0000 0000 0000 a802 0000 0000 0000 0100 0100 0100 0000 1800 0000 0000 0000 1100 1000 0000 0000 8800 0000 0000 0000 ......

A new problem relating to 'filename' occur. This line of code in run.py indicates that the data_df also needs a 'filename' line ? merged = data_df.merge(label_df, on='filename')

opened by AjianIronSide 4
Using the SRE model for other languages

Hi, Thank you for your work on Datadriven-GPVAD. I was able to set it up and do some inferencing for my data quickly. I wanted to know if I can use your model SRE (or any) for languages other than English. I wanted to use your model for Hindi. Or would you suggest training your model from scratch for other languages? Also, I wanted to know if you would recommend mixing the data points for both English and Hindi and trying to train a language-agnostic model using your work. Thanks a lot!

opened by sanchit-ahuja 2
Something wrong when I tried to extract features

Hi,

Something wrong when I tried to extract features with "python extract_feature.py wavs.txt -o hdf5/balanced.h5"

Traceback (most recent call last): File "extract_feature.py", line 86, in DF[ARGS.col].unique(), File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/core/frame.py", line 2927, in getitem indexer = self.columns.get_loc(key) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'filename'

Is the pandas version wrong or something else? Plz help. Thx

opened by AjianIronSide 2
Add support to `.mp3` files in `forward.py` script.

Add support to .mp3 files in forward.py script. extract_feature can now process ['.mp3', '.wav']. If file extension not supported, error is raised. Removed versions in requirements.txt due to dependencies problems.

opened by Diego-II 1
Bump pyyaml from 5.3.1 to 5.4
Bumps pyyaml from 5.3.1 to 5.4.

Changelog

Sourced from pyyaml's changelog.

5.4 (2021-01-19)

yaml/pyyaml#407 -- Build modernization, remove distutils, fix metadata, build wheels, CI to GHA

yaml/pyyaml#472 -- Fix for CVE-2020-14343, moves arbitrary python tags to UnsafeLoader

yaml/pyyaml#441 -- Fix memory leak in implicit resolver setup

yaml/pyyaml#392 -- Fix py2 copy support for timezone objects

yaml/pyyaml#378 -- Fix compatibility with Jython

Commits

58d0cb7 5.4 release

a60f7a1 Fix compatibility with Jython

ee98abd Run CI on PR base branch changes

ddf2033 constructor.timezone: _copy & deepcopy

fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers

a001f27 Fix for CVE-2020-14343

fe15062 Add 3.9 to appveyor file for completeness sake

1e1c7fb Add a newline character to end of pyproject.toml

0b6b7d6 Start sentences and phrases for capital letters

c976915 Shell code improvements

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump numpy from 1.16.4 to 1.22.0
Bumps numpy from 1.16.4 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
assert len(cv_df) > 0, "Fraction a bit too large?"

Thansk for your code. I`m trying to train from scratch by teacher1. But I did meet this error when I run 'run.py'. How can I solve this problem? Advancely Thank you!!

(env_gpvad)my_account:~/Datadriven-GPVAD$ python run.py train configs/example.yaml [2022-01-24 20:46:21] Storing files in experiments/CRNN/2022-01-24_20-46-01_400e8c547d0b11ec9397a0423f3aed9a [2022-01-24 20:46:21] batch_size: 64 [2022-01-24 20:46:21] data: data/csv_labels/balanced.csv [2022-01-24 20:46:21] data_args: [2022-01-24 20:46:21] mode: null [2022-01-24 20:46:21] early_stop: 15 [2022-01-24 20:46:21] epochs: 15 [2022-01-24 20:46:21] itercv: 10000 [2022-01-24 20:46:21] label: data/softlabels/csv/balanced.csv [2022-01-24 20:46:21] label_type: soft [2022-01-24 20:46:21] loss: FrameBCELoss [2022-01-24 20:46:21] model: CRNN [2022-01-24 20:46:21] model_args: {} [2022-01-24 20:46:21] num_workers: 8 [2022-01-24 20:46:21] optimizer: AdamW [2022-01-24 20:46:21] optimizer_args: [2022-01-24 20:46:21] lr: 0.001 [2022-01-24 20:46:21] outputpath: experiments/ [2022-01-24 20:46:21] postprocessing: double [2022-01-24 20:46:21] save: best [2022-01-24 20:46:21] scheduler_args: [2022-01-24 20:46:21] factor: 0.1 [2022-01-24 20:46:21] patience: 10 [2022-01-24 20:46:21] threshold: null [2022-01-24 20:46:21] transforms: [2022-01-24 20:46:21] - timemask [2022-01-24 20:46:21] - freqmask [2022-01-24 20:46:21] [2022-01-24 20:46:21] Running on device cpu [2022-01-24 20:46:21] train_df [2022-01-24 20:46:21] cv_df [2022-01-24 20:46:21] Transforms: [2022-01-24 20:46:21] Sequential( [2022-01-24 20:46:21] (0): TimeMask() [2022-01-24 20:46:21] (1): FreqMask() [2022-01-24 20:46:21] ) Traceback (most recent call last): File "run.py", line 639, in fire.Fire(Runner) File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 127, in Fire component_trace = _Fire(component, args, context, name) File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire component, remaining_args) File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable result = fn(*varargs, **kwargs) File "run.py", line 118, in train assert len(cv_df) > 0, "Fraction a bit too large?" AssertionError: Fraction a bit too large?

opened by wonyeongdeok 1
About how to perform fine-tunning

Hi，

Do you have any idea about fine-tunning the pretrained model(such sre) to a more complicated scenario using a small related data set? I tried to use the teacher model to label the new data set, and train few epochs with a very small learning rate. Howerver, the performance drops drastically. Quit sad.

opened by AjianIronSide 7
The error about “python3 extract_features.py wavs.txt -o hdf5/balanced.h5”

Hi,

I have some issue about extract feature.

1, In the file "configs/example.yaml"

data: data/softlabels/hdf5/balanced.h5 label: data/softlabels/csv/balanced.csv -> csv_labels/balanced.csv

2, when I run "python3 extract_features.py" command, there is an error!

in prepare_labels.py can't find "encoders/balanced.pth". it should be "labelencoders/vad.path" ? but when use models " 'gpvb':" ?

could you give me advice about it ?

MODELS = { 'crnn': { 'model': crnn, 'encoder': torch.load('encoders/balanced.pth'), 'outputdim': 527, }, 'gpvb': { 'model': crnn, 'encoder': torch.load('../labelencoders/vad.pth'), #('encoders/balanced_binary.pth'), 'outputdim': 2, } }

thanks for your response!

opened by minchaoyue 8

Owner

Heinrich Dinkel

日新月异

GitHub

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow

190 Dec 30, 2022

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

962 Dec 23, 2022

a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

9.9k Jan 8, 2023

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

160 Jan 4, 2023

A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

79 Dec 23, 2022

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

94 Nov 21, 2022

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

184 Dec 31, 2022

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

91 Dec 2, 2022

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

1k Jan 6, 2023

A general-purpose programming language, focused on simplicity, safety and stability.

The Rivet programming language A general-purpose programming language, focused on simplicity, safety and stability. Rivet's goal is to be a very power

17 Dec 29, 2022

A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

AI_Personal_Voice_Assistant_Using_Python A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perf

1 Oct 30, 2021

Voice assistant - Voice assistant with python

?? Python Voice Assistant ?? - User's greeting ?? - Writing tasks to todo-list ?

10 Dec 26, 2022

This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

26 Dec 24, 2022

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

2.9k Jan 4, 2023

The codebase for Data-driven general-purpose voice activity detection.

Related tags

Overview

Data driven GPVAD

Sample predictions against other methods

Noise robustness

Results

Usage

Predicting voice activity

Extra parameters

Training from scratch

Citation

Comments

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Owner

Heinrich Dinkel

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

a general-purpose Transformer based vision backbone

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

A task-agnostic vision-language architecture as a step towards General Purpose Vision

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

A general-purpose programming language, focused on simplicity, safety and stability.

A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

Voice assistant - Voice assistant with python

This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Activity image-based video retrieval

Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

Group Activity Recognition with Clustered Spatial Temporal Transformer

PyZebrascope - an open-source Python platform for brain-wide neural activity imaging in behaving zebrafish

HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio