The codebase for Data-driven general-purpose voice activity detection.

Overview

Data driven GPVAD

Repository for the work in TASLP 2021 Voice activity detection in the wild: A data-driven approach using teacher-student training.

Framework

Sample predictions against other methods

Samples_1

Samples_2

Samples_3

Samples_4

Noise robustness

Speech

Background

Speech

Results

Our best model trained on the SRE (V3) dataset obtains the following results:

Precision Recall F1 AUC FER Event-F1
aurora_clean 96.844 95.102 95.93 98.66 3.06 74.8
aurora_noisy 90.435 92.871 91.544 97.63 6.68 54.45
dcase18 89.202 88.362 88.717 95.2 10.82 57.85

Usage

We provide most of our pretrained models in this repository, including:

  1. Both teachers (T_1, T_2)
  2. Unbalanced audioset pretrained model
  3. Voxceleb 2 pretrained model
  4. Our best submission (SRE V3 trained)

To download and run evaluation just do:

git clone https://github.com/RicherMans/Datadriven-VAD
cd Datadriven-VAD
pip3 install -r requirements.txt
python3 forward.py -w example/example.wav

Running this will print:

|   index | event_label   |   onset |   offset | filename            |
|--------:|:--------------|--------:|---------:|:--------------------|
|       0 | Speech        |    0.28 |     0.94 | example/example.wav |
|       1 | Speech        |    1.04 |     2.22 | example/example.wav |

Predicting voice activity

We support single file and filelist-batching in our script. Obtaining VAD predictions is easy:

python3 forward.py -w example/example.wav

Or if one prefers to do that batch_wise, first prepare a filelist: find . -type f -name *.wav > wavlist.txt' And then just run:

python3 forward.py -l wavlist

Extra parameters

  • -model adjusts the pretrained model. Can be one of t1,t2,v2,a2,a2_v2,sre. Refer to the paper for each respective model. By default we use sre.
  • -soft instead of predicting human-readable timestamps, the model is now outputting the raw probabilities.
  • -hard instead of predicting human-readable timestamps, the model is now outputting the post-processed 0-1 flags indicating speech. Please note this is different from the paper, which thresholded the soft probabilities without post-processing.
  • -th adjusts the threshold. If a single threshold is passed (e.g., -th 0.5), we utilize simple binearization. Otherwise use the default double threshold with -th 0.5 0.1.
  • -o outputs the results into a new folder.

Training from scratch

If you intend to rerun our work, prepare some data and extract log-Mel spectrogram features. Say, you have downloaded the balanced subset of AudioSet and stored all files in a folder data/balanced/. Then:

cd data;
mkdir hdf5 csv_labels;
find balanced -type f > wavs.txt;
python3 extract_features.py wavs.txt -o hdf5/balanced.h5
h5ls -r hdf5/balanced.h5 | awk -F[/' '] 'BEGIN{print "filename","hdf5path"}NR>1{print $2,"hdf5/balanced.h5"}'> csv_labels/balanced.csv

The input for our label prediction script is a csv file with exactly two columns, filename and hdf5path.

An example csv_labels/balanced.csv would be:

filename hdf5path
--PJHxphWEs_30.000.wav hdf5/balanced.h5                                                                                          
--ZhevVpy1s_50.000.wav hdf5/balanced.h5                                                                                          
--aE2O5G5WE_0.000.wav hdf5/balanced.h5                                                                                           
--aO5cdqSAg_30.000.wav hdf5/balanced.h5                                                                                          

After feature extraction, proceed to predict labels:

mkdir -p softlabels/{hdf5,csv};
python3 prepare_labels.py --pre ../pretrained_models/teacher1/model.pth csv_labels/balanced.csv softlabels/hdf5/balanced.h5 softlabels/csv/balanced.csv

Lastly, just train:

cd ../; #Go to project root
# Change config accoringly with input data
python3 run.py train configs/example.yaml

Citation

If youre using this work, please cite it in your publications.

@article{Dinkel2021,
author = {Dinkel, Heinrich and Wang, Shuai and Xu, Xuenan and Wu, Mengyue and Yu, Kai},
doi = {10.1109/TASLP.2021.3073596},
issn = {2329-9290},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
pages = {1542--1555},
title = {{Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training}},
url = {https://ieeexplore.ieee.org/document/9405474/},
volume = {29},
year = {2021}
}

and

@inproceedings{Dinkel2020,
  author={Heinrich Dinkel and Yefei Chen and Mengyue Wu and Kai Yu},
  title={{Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3665--3669},
  doi={10.21437/Interspeech.2020-0995},
  url={http://dx.doi.org/10.21437/Interspeech.2020-0995}
}
Comments
  • Training from scratch [Data format query]

    Training from scratch [Data format query]

    Hi, Thank you for your wonderful work with GPVAD. I am looking at training the student model from scratch for my dataset(s). My dataset is in the form of audio_signal (wav) and the region has been tagged within the audio sample. For example: [{'type': 'BACKGROUND NOISE', 'time-range': [3.041, 3.169]}, {'type': 'SPEECH', 'time-range': [5.208, 5.544]}, {'type': 'BACKGROUND NOISE', 'time-range': [4.339, 5.069]}] is a tagged audio. Can your data pipeline support training for such data formats? If not, what do you suggest I should do to find a work around this? Thanks a lot!

    opened by sanchit-ahuja 7
  • Evaluation set could provide?

    Evaluation set could provide?

    Hello! I noticed the evaluate function in run.py, which is shown as bellow. screenshot-20211009-144321 Actually I don't know the format of labels.tsv. Could you provide the evaluation set? if not, is it possible to give a screenshot for labels.tsv By the way, is the data.h5 same as train set, which is extracted by extract_feature.py ? Thanks!

    opened by wcangyu 6
  • When forward “example.wav”, Can not get the same result as Readme

    When forward “example.wav”, Can not get the same result as Readme

    Hello, I have git pull code, and pip install requirements. When run "python forward.py -w ./example/example.wav", the result bellow which is different from README. Is there any problom?Thank you very much screenshot-20211008-160945

    opened by wcangyu 4
  • 'filename' also needed in  data/softlabels/hdf5/balanced.h5 ?

    'filename' also needed in data/softlabels/hdf5/balanced.h5 ?

    When I was trying to train the model, I meet a new problem with UnicodeDecodeError.

    File "run.py", line 97, in train data_df = pd.read_csv(config_parameters['data'], sep='\s+') File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init self._make_engine(self.engine) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 782, in pandas._libs.parsers.TextReader._get_header UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

    I changed the data/softlabels/hdf5/balanced.h5 to utf8 and it is like:

    8948 4446 0d0a 1a0a 0000 0000 0008 0800 0400 1000 0000 0000 0000 0000 0000 0000 ffff ffff ffff ffff ccda 4b01 0000 0000 ffff ffff ffff ffff 0000 0000 0000 0000 6000 0000 0000 0000 0100 0000 0000 0000 8800 0000 0000 0000 a802 0000 0000 0000 0100 0100 0100 0000 1800 0000 0000 0000 1100 1000 0000 0000 8800 0000 0000 0000 ......

    A new problem relating to 'filename' occur. This line of code in run.py indicates that the data_df also needs a 'filename' line ? merged = data_df.merge(label_df, on='filename')

    opened by AjianIronSide 4
  • Using the SRE model for other languages

    Using the SRE model for other languages

    Hi, Thank you for your work on Datadriven-GPVAD. I was able to set it up and do some inferencing for my data quickly. I wanted to know if I can use your model SRE (or any) for languages other than English. I wanted to use your model for Hindi. Or would you suggest training your model from scratch for other languages? Also, I wanted to know if you would recommend mixing the data points for both English and Hindi and trying to train a language-agnostic model using your work. Thanks a lot!

    opened by sanchit-ahuja 2
  • Something wrong when I tried to extract features

    Something wrong when I tried to extract features

    Hi,

    Something wrong when I tried to extract features with "python extract_feature.py wavs.txt -o hdf5/balanced.h5"

    Traceback (most recent call last): File "extract_feature.py", line 86, in DF[ARGS.col].unique(), File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/core/frame.py", line 2927, in getitem indexer = self.columns.get_loc(key) File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'filename'

    Is the pandas version wrong or something else? Plz help. Thx

    opened by AjianIronSide 2
  • Add support to `.mp3` files in `forward.py` script.

    Add support to `.mp3` files in `forward.py` script.

    Add support to .mp3 files in forward.py script. extract_feature can now process ['.mp3', '.wav']. If file extension not supported, error is raised. Removed versions in requirements.txt due to dependencies problems.

    opened by Diego-II 1
  • Bump pyyaml from 5.3.1 to 5.4

    Bump pyyaml from 5.3.1 to 5.4

    Bumps pyyaml from 5.3.1 to 5.4.

    Changelog

    Sourced from pyyaml's changelog.

    5.4 (2021-01-19)

    Commits
    • 58d0cb7 5.4 release
    • a60f7a1 Fix compatibility with Jython
    • ee98abd Run CI on PR base branch changes
    • ddf2033 constructor.timezone: _copy & deepcopy
    • fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers
    • a001f27 Fix for CVE-2020-14343
    • fe15062 Add 3.9 to appveyor file for completeness sake
    • 1e1c7fb Add a newline character to end of pyproject.toml
    • 0b6b7d6 Start sentences and phrases for capital letters
    • c976915 Shell code improvements
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.16.4 to 1.22.0

    Bump numpy from 1.16.4 to 1.22.0

    Bumps numpy from 1.16.4 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • assert len(cv_df) > 0,

    assert len(cv_df) > 0, "Fraction a bit too large?"

    Thansk for your code. I`m trying to train from scratch by teacher1. But I did meet this error when I run 'run.py'. How can I solve this problem? Advancely Thank you!!

    (env_gpvad)my_account:~/Datadriven-GPVAD$ python run.py train configs/example.yaml [2022-01-24 20:46:21] Storing files in experiments/CRNN/2022-01-24_20-46-01_400e8c547d0b11ec9397a0423f3aed9a [2022-01-24 20:46:21] batch_size: 64 [2022-01-24 20:46:21] data: data/csv_labels/balanced.csv [2022-01-24 20:46:21] data_args: [2022-01-24 20:46:21] mode: null [2022-01-24 20:46:21] early_stop: 15 [2022-01-24 20:46:21] epochs: 15 [2022-01-24 20:46:21] itercv: 10000 [2022-01-24 20:46:21] label: data/softlabels/csv/balanced.csv [2022-01-24 20:46:21] label_type: soft [2022-01-24 20:46:21] loss: FrameBCELoss [2022-01-24 20:46:21] model: CRNN [2022-01-24 20:46:21] model_args: {} [2022-01-24 20:46:21] num_workers: 8 [2022-01-24 20:46:21] optimizer: AdamW [2022-01-24 20:46:21] optimizer_args: [2022-01-24 20:46:21] lr: 0.001 [2022-01-24 20:46:21] outputpath: experiments/ [2022-01-24 20:46:21] postprocessing: double [2022-01-24 20:46:21] save: best [2022-01-24 20:46:21] scheduler_args: [2022-01-24 20:46:21] factor: 0.1 [2022-01-24 20:46:21] patience: 10 [2022-01-24 20:46:21] threshold: null [2022-01-24 20:46:21] transforms: [2022-01-24 20:46:21] - timemask [2022-01-24 20:46:21] - freqmask [2022-01-24 20:46:21] [2022-01-24 20:46:21] Running on device cpu [2022-01-24 20:46:21] train_df [2022-01-24 20:46:21] cv_df [2022-01-24 20:46:21] Transforms: [2022-01-24 20:46:21] Sequential( [2022-01-24 20:46:21] (0): TimeMask() [2022-01-24 20:46:21] (1): FreqMask() [2022-01-24 20:46:21] ) Traceback (most recent call last): File "run.py", line 639, in fire.Fire(Runner) File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 127, in Fire component_trace = _Fire(component, args, context, name) File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire component, remaining_args) File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable result = fn(*varargs, **kwargs) File "run.py", line 118, in train assert len(cv_df) > 0, "Fraction a bit too large?" AssertionError: Fraction a bit too large?

    opened by wonyeongdeok 1
  • About how to perform fine-tunning

    About how to perform fine-tunning

    Hi,

    Do you have any idea about fine-tunning the pretrained model(such sre) to a more complicated scenario using a small related data set? I tried to use the teacher model to label the new data set, and train few epochs with a very small learning rate. Howerver, the performance drops drastically. Quit sad.

    opened by AjianIronSide 7
  • The error about “python3 extract_features.py wavs.txt -o hdf5/balanced.h5”

    The error about “python3 extract_features.py wavs.txt -o hdf5/balanced.h5”

    Hi,

    I have some issue about extract feature.

    1, In the file "configs/example.yaml"

    data: data/softlabels/hdf5/balanced.h5 label: data/softlabels/csv/balanced.csv -> csv_labels/balanced.csv

    2, when I run "python3 extract_features.py" command, there is an error!

    in prepare_labels.py can't find "encoders/balanced.pth". it should be "labelencoders/vad.path" ? but when use models " 'gpvb':" ?

    could you give me advice about it ?

    MODELS = { 'crnn': { 'model': crnn, 'encoder': torch.load('encoders/balanced.pth'), 'outputdim': 527, }, 'gpvb': { 'model': crnn, 'encoder': torch.load('../labelencoders/vad.pth'), #('encoders/balanced_binary.pth'), 'outputdim': 2, } }

    thanks for your response!

    opened by minchaoyue 8
Owner
Heinrich Dinkel
日新月异
Heinrich Dinkel
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

AI Summer 962 Dec 23, 2022
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 8, 2023
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

Mohammad Amin Haghpanah 184 Dec 31, 2022
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Hong-Jia Chen 91 Dec 2, 2022
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

The Kompute Project 1k Jan 6, 2023
A general-purpose programming language, focused on simplicity, safety and stability.

The Rivet programming language A general-purpose programming language, focused on simplicity, safety and stability. Rivet's goal is to be a very power

The Rivet programming language 17 Dec 29, 2022
A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

AI_Personal_Voice_Assistant_Using_Python A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perf

Chumui Tripura 1 Oct 30, 2021
Voice assistant - Voice assistant with python

?? Python Voice Assistant ?? - User's greeting ?? - Writing tasks to todo-list ?

PythonToday 10 Dec 26, 2022
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

Future Interfaces Group (CMU) 26 Dec 24, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
Activity image-based video retrieval

Cross-modal-retrieval Our approach is focus on Activity Image-to-Video Retrieval (AIVR) task. The compared methods are state-of-the-art single modalit

BCMI 75 Oct 21, 2021
Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

-IEEE-TIM-2021-1-Shallow-CNN-for-HAR [IEEE TIM 2021-1] Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors All

Wenbo Huang 1 May 17, 2022
Group Activity Recognition with Clustered Spatial Temporal Transformer

GroupFormer Group Activity Recognition with Clustered Spatial-TemporalTransformer Backbone Style Action Acc Activity Acc Config Download Inv3+flow+pos

null 28 Dec 12, 2022
PyZebrascope - an open-source Python platform for brain-wide neural activity imaging in behaving zebrafish

PyZebrascope - an open-source Python platform for brain-wide neural activity imaging in behaving zebrafish

null 1 May 31, 2022
HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electronic Health Records

HiPAL Code for KDD'22 Applied Data Science Track submission -- HiPAL: A Deep Framework for Physician Burnout Prediction Using Activity Logs in Electro

Hanyang Liu 4 Aug 8, 2022