This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Isen (Songyao Jiang)

Last update: Dec 8, 2022

Related tags

Overview

Skeleton Aware Multi-modal Sign Language Recognition

By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu.

Smile Lab @ Northeastern University

This repo contains the official code of Skeleton Aware Multi-modal Sign Language Recognition (SAM-SLR) that ranked 1st in CVPR 2021 Challenge: Looking at People Large Scale Signer Independent Isolated Sign Language Recognition.

Our paper has been accepted to CVPR21 Workshop. A preprint version is available on arXiv. Please cite our paper if you find this repo useful in your research.

News

[2021/04/10] Our workshop paper has been accepted. Citation info updated.

[2021/03/24] A preprint version of our paper is released here.

[2021/03/20] Our work has been verified and announced by the organizers as the 1st place winner of the challenge!

[2021/03/15] The code is released to public on GitHub.

[2021/03/11] Our team (smilelab2021) ranked 1st in both tracks and here are the links to the leaderboards:

RGB / RGBD

Data Preparation
Requirements and Docker Image
Pretrained Models
Usage
License
Citation
Reference

Data Preparation

Download AUTSL Dataset.

We processed the dataset into six modalities in total: skeleton, skeleton features, rgb frames, flow color, hha and flow depth.

Please put original train, val, test videos in data folder as

    data
    ├── train
    │   ├── signer0_sample1_color.mp4
    │   ├── signer0_sample1_depth.mp4
    │   ├── signer0_sample2_color.mp4
    │   ├── signer0_sample2_depth.mp4
    │   └── ...
    ├── val
    │   └── ...
    └── test
        └── ...

Follow the data_processs/readme.md to process the data.
Use TPose/data_process to extract wholebody pose features.

Requirements and Docker Image

The code is written using Anaconda Python >= 3.6 and Pytorch 1.7 with OpenCV.

Detailed enviroment requirment can be found in requirement.txt in each code folder.

For convenience, we provide a Nvidia docker image to run our code.

Download Docker Image

Pretrained Models

We provide pretrained models for all modalities to reproduce our submitted results. Please download them at and put them into corresponding folders.

Download Pretrained Models

Usage

Reproducing the Results Submitted to CVPR21 Challenge

To test our pretrained model, please put them under each code folders and run the test code as instructed below. To ensemble the tested results and reproduce our final submission. Please copy all the results .pkl files to ensemble/ and follow the instruction to ensemble our final outputs.

For a step-by-step instruction, please see reproduce.md.

Skeleton Keypoints

Skeleton modality can be trained, finetuned and tested using the code in SL-GCN/ folder. Please follow the SL-GCN/readme.md instruction to prepare skeleton data into four streams (joint, bone, joint_motion, bone motion).

Basic usage:

python main.py --config /path/to/config/file

To train, finetune and test our models, please change the config path to corresponding config files. Detailed instruction can be found in SL-GCN/readme.md

Skeleton Feature

For the skeleton feature, we propose a Separable Spatial-Temporal Convolution Network (SSTCN) to capture spatio-temporal information from those features.

Please follow the instruction in SSTCN/readme.txt to prepare the data, train and test the model.

RGB Frames

The RGB frames modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_clip.py

python Sign_Isolated_Conv3D_clip_finetune.py

python Sign_Isolated_Conv3D_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Optical Flow

The RGB optical flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_flow_clip.py

python Sign_Isolated_Conv3D_flow_clip_funtine.py

python Sign_Isolated_Conv3D_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth HHA

The Depth HHA modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_hha_clip_mask.py

python Sign_Isolated_Conv3D_hha_clip_mask_finetune.py

python Sign_Isolated_Conv3D_hha_clip_mask_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth Flow

The Depth Flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_depth_flow_clip.py

python Sign_Isolated_Conv3D_depth_flow_clip_finetune.py

python Sign_Isolated_Conv3D_depth_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Model Ensemble

For both RGB and RGBD track, the tested results of all modalities need to be ensemble together to generate the final results.

For RGB track, we use the results from skeleton, skeleton feature, rgb, and flow color modalities to ensemble the final results.

a. Test the model using newly trained weights or provided pretrained weights.

b. Copy all the test results to ensemble folder and rename them as their modality names.

c. Ensemble SL-GCN results from joint, bone, joint motion, bone motion streams in gcn/ .
```
 python ensemble_wo_val.py; python ensemble_finetune.py
```
c. Copy test_gcn_w_val_finetune.pkl to ensemble/. Copy RGB, TPose and optical flow results to ensemble/. Ensemble final prediction.
```
 python ensemble_multimodal_rgb.py
```
Final predictions are saved in predictions.csv
For RGBD track, we use the results from skeleton, skeleton feature, rgb, flow color, hha and flow depth modalities to ensemble the final results. a. copy hha and flow depth modalities to ensemble/ folder, then
```
 python ensemble_multimodal_rgb.py
```

To reproduce our results in CVPR21Challenge, we provide .pkl files to ensemble and obtain our final submitted predictions. Detailed instruction can be find in ensemble/readme.md

License

Licensed under the Creative Commons Zero v1.0 Universal license with the following exceptions:

The code is released for academic research use only. Commercial use is prohibited.
Published versions (changed or unchanged) must include a reference to the origin of the code.

Citation

If you find this project useful in your research, please cite our paper

@inproceedings{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2021}
}

@article{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  journal={arXiv preprint arXiv:2103.08833},
  year={2021}
}

Reference

https://github.com/Sun1992/SSTCN-for-SLR

https://github.com/jin-s13/COCO-WholeBody

https://github.com/open-mmlab/mmpose

https://github.com/0aqz0/SLR

https://github.com/kchengiva/DecoupleGCN-DropGraph

https://github.com/HRNet/HRNet-Human-Pose-Estimation

https://github.com/charlesCXK/Depth2HHA

Comments

Conv3D for WLASL

Hello thank you so much for your code.

I am trying to train Conv3D for WLASL dataset but the accuracy is too low. Do you have any idea why this is happening ?

Thank you in advance.

opened by EvgeniaChroni 6
AUTSL dataset corrupted or in unknown format

Hello thank you so much for your code.

When I downloaded AUSTL datasets from the official website and unpacked it, it showed unknown format or was corrupted. How should I download and unpack correctly? By the way, where should I download CSL and WLASL datasets?

I would appreciate it if you could reply.Thank you very much.

opened by 98yaoyaoha 4
Wholebody yaml missing

Hi, I get the following error when I try to run the demo.py to extract 2d keypoints

FileNotFoundError: [Errno 2] No such file or directory: 'wholebody_w48_384x288.yaml'

opened by iliasprc 4
FileNotFoundError when trying to test config files in Conv3D

Greetings. Congratulations on your work. I was trying to reproduce your results in Google Colab using processed datasets and pretrained models. I stored 'test_frames' folder inside /Conv3D/data/ and tried to run "python Sign_Isolated_Conv3D_clip_test.py" inside /Conv3D/ folder. I ran into the following error, I'm pasting the error traceback.

---Traceback Starts----

######################Testing Started####################### Traceback (most recent call last): File "Sign_Isolated_Conv3D_clip_test.py", line 138, in val_loss = val_epoch(model, criterion, val_loader, device, 0, logger, writer, phase=phase, exp_name=exp_name) File "/content/CVPR21Chal-SLR/Conv3D/validation_clip.py", line 13, in val_epoch for batch_idx, data in enumerate(dataloader): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/CVPR21Chal-SLR/Conv3D/dataset_sign_clip.py", line 111, in getitem images.append(self.read_images(selected_folder, i)) File "/content/CVPR21Chal-SLR/Conv3D/dataset_sign_clip.py", line 75, in read_images index_list = self.frame_indices_tranform_test(len(os.listdir(folder_path)), self.frames, clip_no) FileNotFoundError: [Errno 2] No such file or directory: '../data/test_frames/signer34_sample1'

----Traceback Ends----

However, there is a signer34_sample1 inside /Conv3D/data/test_frames/. Any insights will be appreciated. Thanks.

opened by tauhidtamim 4
Images' Docker
Hello !

I facing an issue related to the Nvidia docker image, when I run it by using this command

sudo docker run -it --gpus all -v path_to_your_data:/home/smilelab_slr/cvpr2021_allcode/shared_data cvpr2021cha_code /bin/bash

it gives me this error message:

Unable to find image 'cvpr2021cha_code:latest' locally docker: Error response from daemon: pull access denied for cvpr2021cha_code, repository does not exist or may require 'docker login': denied: requested access to the resource is denied. See 'docker run --help'.

Any help?
opened by HajarAlnamshan 3
Running Docker Command

Hello! I have been struggling with getting the docker command to work. I'm working on a Mac with BigSur OS and have downloaded the Docker Desktop. I have been able to get the image to upload to the Docker Desktop but when I run the command in command line, I get the following error: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

I'm unsure if my issue is nvidia based or if I've changed the code incorrectly on the path_to_your_data part, as I put in the path to where I downloaded the docker file from the main github page. Should this be the path to where the container is saved or the path to where the dataset is saved?

Thanks!

opened by celenaponce 3
Problem in sign_gendata.py

https://github.com/jackyjsy/CVPR21Chal-SLR/blob/892de189f4f8afc69031780a44f308044eff2429/SL-GCN/data_gen/sign_gendata.py#L15

The '27' and '27_2' info of selected_joints is the same.

Could you please let me know what the idea is?

opened by snorlaxse 3
Loss does not decrease, and the model is not trained(Loss不下降，模型没有得到训练)

Hi, honorable competition winner. I deployed your code locally, but the results of many rounds of training seem to be that the model has not been trained, and the results are still random. The accuracy rate can only reach around 1/226.

For your code, the only modification I made was that batch-size is set to 2 for resource constrain, and the 512512 data was directly resized to 256256 during data preprocessing.

No other changes were made.

If it is possible, do you know the reason for my problem?

（作者您好，我在本地部署了代码，但是多轮训练得到的结果好像模型没有得到训练，结果仍然是随机的。准确率只能达到1/226附近。对于您的代码，我做的修改仅为资源受限batchsize设为2，数据预处理的时候将512512的数据直接resize成256256.除此之外未作改动。如果有可能的话请问您知道我遇到问题的原因吗？）

opened by LiangSiyv 3
How to get correct 'train_val_labels.pkl' ?

Thanks for your project. I'm very interested in the network you designed. But when goes to 'test_joint.yaml', I can not get the 'train_val_labels.pkl'. The 'train_val_labels.pkl' I merged from 'train_label.pkl' and 'val_gt.pkl' is wrong. The code is following:

import pickle with open('/media/lyh/data/AUTSL/Autsl/skeleton_data/train_label.pkl', 'rb') as fr_1: data_1 = pickle.load(fr_1) with open('/media/lyh/data/AUTSL/Autsl/skeleton_data/val_gt.pkl', 'rb') as fr_2: data_2 = pickle.load(fr_2) data = [data_1, data_2] with open('/media/lyh/data/AUTSL/Autsl/skeleton_data/train_val_labels.pkl', 'wb') as fr_3: pickle.dump(data, fr_3)
### Would you mind help me to get correct 'train_val_labels.pkl'? I am looking forward to your reply. Thanks in advance.

opened by LiuyhLinda 2
Train for bone

Hello!

I run this command for bone's training python main.py --config config/sign/train/train_bone.yaml

and it gives me this error

Traceback (most recent call last): File "main.py", line 588, in <module> processor = Processor(arg) File "main.py", line 215, in __init__ self.load_data() File "main.py", line 224, in load_data dataset=Feeder(**self.arg.train_feeder_args), File "/home/smilelab_slr/cvpr2021_allcode/GCN/feeders/feeder.py", line 40, in __init__ self.load_data() File "/home/smilelab_slr/cvpr2021_allcode/GCN/feeders/feeder.py", line 59, in load_data self.data = np.load(self.data_path, mmap_mode='r') File "/home/smilelab_slr/SLR_pytorch/env/lib/python3.8/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: './data/sign/27_2/train_data_bone.npy'

Any help? Many thanks <img width="1077" alt="Screen Shot 1443-08-14 at 3 52 19 PM" src="https://user-images.githubusercontent.com/85832533/158812428-5ccdefa0-2ea3-40e4-9bde-56dca 4f507db.png">

opened by HajarAlnamshan 2
Accuracy of V2 paper

I'm sorry to ask question below. The result of v1 cited in V2 paper is 97.51%. The result of V2 is 98.00%, which reaches 98.53% after adding extra data. However, the results mentioned in the v1 paper have already reached 98.53%. Then there are two possibilities. Either the improvements mentioned in V2 have no effect at all, or the data have already used the technologies mentioned in V2 when the V1 paper comes out. /(ㄒoㄒ)/~~ Could you please help me to find out the all_modilities_cc 98.53% in V2 paper?

opened by LiangSiyv 2
Help from an admirer

Hello, I am from China. I am a graduate student. I am very interested in your work. Can you share the data set of your article? If I post an article, I will quote your article. Thank you!

opened by herochen7372 0

IndexError in forward func of Conv3D/Sign_Isolated_Conv3D_clip.py

After I trained the RGB Conv3D on for one training epoch without modifying anything from the source code, after the first train epoch is finished and I get to the val_epoch, the code behaves like this:

$ python Conv3D/Sign_Isolated_Conv3D_clip.py
...
######################Training Started######################
lr:  0.001
epoch   1 | iteration    80 | Loss 5.711482 | Acc 0.00%
epoch   1 | iteration   160 | Loss 5.423379 | Acc 0.00%
epoch   1 | iteration   240 | Loss 5.502132 | Acc 14.29%
epoch   1 | iteration   320 | Loss 5.452106 | Acc 0.00%
epoch   1 | iteration   400 | Loss 5.348779 | Acc 0.00%
epoch   1 | iteration   480 | Loss 5.369306 | Acc 0.00%
epoch   1 | iteration   560 | Loss 5.412856 | Acc 0.00%
epoch   1 | iteration   640 | Loss 5.431209 | Acc 0.00%
epoch   1 | iteration   720 | Loss 5.376038 | Acc 0.00%
epoch   1 | iteration   800 | Loss 5.504383 | Acc 0.00%
epoch   1 | iteration   880 | Loss 5.414754 | Acc 0.00%
epoch   1 | iteration   960 | Loss 5.481614 | Acc 0.00%
epoch   1 | iteration  1040 | Loss 5.402166 | Acc 0.00%
epoch   1 | iteration  1120 | Loss 5.561030 | Acc 0.00%
epoch   1 | iteration  1200 | Loss 5.304134 | Acc 14.29%
epoch   1 | iteration  1280 | Loss 5.452147 | Acc 0.00%
epoch   1 | iteration  1360 | Loss 5.429211 | Acc 0.00%
epoch   1 | iteration  1440 | Loss 5.503419 | Acc 0.00%
epoch   1 | iteration  1520 | Loss 5.407657 | Acc 0.00%
epoch   1 | iteration  1600 | Loss 5.423106 | Acc 0.00%
epoch   1 | iteration  1680 | Loss 5.427852 | Acc 0.00%
epoch   1 | iteration  1760 | Loss 5.387938 | Acc 0.00%
epoch   1 | iteration  1840 | Loss 5.491746 | Acc 0.00%
epoch   1 | iteration  1920 | Loss 5.375609 | Acc 0.00%
epoch   1 | iteration  2000 | Loss 5.529760 | Acc 0.00%
epoch   1 | iteration  2080 | Loss 5.462255 | Acc 0.00%
epoch   1 | iteration  2160 | Loss 5.383886 | Acc 0.00%
epoch   1 | iteration  2240 | Loss 5.354466 | Acc 0.00%
epoch   1 | iteration  2320 | Loss 5.439829 | Acc 0.00%
epoch   1 | iteration  2400 | Loss 5.484483 | Acc 0.00%
epoch   1 | iteration  2480 | Loss 5.388660 | Acc 0.00%
epoch   1 | iteration  2560 | Loss 5.336263 | Acc 0.00%
epoch   1 | iteration  2640 | Loss 5.511293 | Acc 0.00%
epoch   1 | iteration  2720 | Loss 5.430277 | Acc 0.00%
epoch   1 | iteration  2800 | Loss 5.447950 | Acc 0.00%
epoch   1 | iteration  2880 | Loss 5.434804 | Acc 0.00%
epoch   1 | iteration  2960 | Loss 5.414961 | Acc 0.00%
epoch   1 | iteration  3040 | Loss 5.452834 | Acc 0.00%
epoch   1 | iteration  3120 | Loss 5.405386 | Acc 0.00%
epoch   1 | iteration  3200 | Loss 5.377852 | Acc 0.00%
epoch   1 | iteration  3280 | Loss 5.378382 | Acc 0.00%
epoch   1 | iteration  3360 | Loss 5.481858 | Acc 0.00%
epoch   1 | iteration  3440 | Loss 5.544360 | Acc 0.00%
epoch   1 | iteration  3520 | Loss 5.439571 | Acc 0.00%
epoch   1 | iteration  3600 | Loss 5.497654 | Acc 0.00%
epoch   1 | iteration  3680 | Loss 5.374403 | Acc 0.00%
epoch   1 | iteration  3760 | Loss 5.400540 | Acc 0.00%
epoch   1 | iteration  3840 | Loss 5.482468 | Acc 0.00%
epoch   1 | iteration  3920 | Loss 5.428809 | Acc 0.00%
epoch   1 | iteration  4000 | Loss 5.400549 | Acc 0.00%
Average Training Loss of Epoch 1: 5.445218 | Acc: 0.39%
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 6 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
Traceback (most recent call last):
  File "/content/codebase/CVPR21Chal-SLR/Conv3D/Sign_Isolated_Conv3D_clip.py", line 165, in <module>
    logger, writer)
  File "/content/codebase/CVPR21Chal-SLR/Conv3D/validation_clip.py", line 27, in val_epoch
    loss = criterion(outputs, labels.squeeze())
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/codebase/CVPR21Chal-SLR/Conv3D/Sign_Isolated_Conv3D_clip.py", line 27, in forward
    nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Do you have any suggestions on what could be wrong here and why does the forward function present such a strange behaviour?
Even more important, what could be the solution to this problem?

opened by DavidMosoarca 0

gen_flow.py

Hi there :) I am trying to replicate your results and am currently on the data prepare steps. In the Generate flow data from rgb and depth videos section you mention that I must run the docker image. Is there a way for me to do it without that image? (I am running in an environment which is really strict about loading foreign images). Thanks in advance, Amit

opened by Amitharitan25 1

Owner

Isen (Songyao Jiang)

GitHub

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

48 Dec 30, 2022

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

37 Oct 30, 2022

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

184 Dec 11, 2022

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

37 Dec 8, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

190 Jan 3, 2023

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

761 Dec 26, 2022

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

294 Dec 12, 2022

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

15 Aug 20, 2022

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources (e.g. just the lead vocals).

14 Nov 7, 2022

Sign Language Transformers (CVPR'20)

Sign Language Transformers (CVPR'20) This repo contains the training and evaluation code for the paper Sign Language Transformers: Sign Language Trans

164 Dec 30, 2022

This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

386 Nov 26, 2022

This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"

Two-Timescale-DNN Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding This repository contains the entire code for our work

3 Mar 7, 2022

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

138 Dec 30, 2022

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

130 Dec 2, 2022

This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (EMNLP 2020)

Towards Persona-Based Empathetic Conversational Models (PEC) This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (E

35 Nov 17, 2022

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

1.1k Dec 27, 2022

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views.

4 Nov 19, 2022

AI grand challenge 2020 Repo (Speech Recognition Track)

KorBERT를 활용한 한국어 텍스트 기반 위협 상황인지(2020 인공지능 그랜드 챌린지) 본 프로젝트는 ETRI에서 제공된 한국어 korBERT 모델을 활용하여 폭력 기반 한국어 텍스트를 분류하는 다양한 분류 모델들을 제공합니다. 본 개발자들이 참여한 2020 인공지

23 Jan 25, 2022

Download files from DSpace systems (because for some reason DSpace won't let you)

DSpaceDL A tool for downloading files from DSpace items. For some reason, DSpace systems have a dogshit UI, and Universities absolutely LOOOVE to use

5 Dec 1, 2022

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Related tags

Overview

Skeleton Aware Multi-modal Sign Language Recognition

News

Table of Contents

Data Preparation

Requirements and Docker Image

Pretrained Models

Usage

Reproducing the Results Submitted to CVPR21 Challenge

Skeleton Keypoints

Skeleton Feature

RGB Frames

Optical Flow

Depth HHA

Depth Flow

Model Ensemble

License

Citation

Reference

Comments

Owner

Isen (Songyao Jiang)

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Sign Language Transformers (CVPR'20)

This repository contains the source code of our work on designing efficient CNNs for computer vision

This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (EMNLP 2020)

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

AI grand challenge 2020 Repo (Speech Recognition Track)

Download files from DSpace systems (because for some reason DSpace won't let you)