3D ResNets for Action Recognition (CVPR 2018)

Kensho Hara

Last update: Jan 6, 2023

Related tags

Deep Learning python computer-vision deep-learning pytorch action-recognition video-recognition

Overview

3D ResNets for Action Recognition

Update (2020/4/13)

We published a paper on arXiv.

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

We uploaded the pretrained models described in this paper including ResNet-50 pretrained on the combined dataset with Kinetics-700 and Moments in Time.

Update (2020/4/10)

We significantly updated our scripts. If you want to use older versions to reproduce our CVPR2018 paper, you should use the scripts in the CVPR2018 branch.

This update includes as follows:

Refactoring whole project
Supporting the newer PyTorch versions
Supporting distributed training
Supporting training and testing on the Moments in Time dataset.
Adding R(2+1)D models
Uploading 3D ResNet models trained on the Kinetics-700, Moments in Time, and STAIR-Actions datasets

Summary

This is the PyTorch code for the following papers:

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Towards Good Practice for Action Recognition with Spatiotemporal 3D Convolutions",
Proceedings of the International Conference on Pattern Recognition, pp. 2516-2521, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?",
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition",
Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.

This code includes training, fine-tuning and testing on Kinetics, Moments in Time, ActivityNet, UCF-101, and HMDB-51.

Citation

If you use this code or pre-trained models, please cite the following:

@inproceedings{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={6546--6555},
  year={2018},
}

Pre-trained models

Pre-trained models are available here.
All models are trained on Kinetics-700 (K), Moments in Time (M), STAIR-Actions (S), or merged datasets of them (KM, KS, MS, KMS).
If you want to finetune the models on your dataset, you should specify the following options.

r3d18_K_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 700
r3d18_KM_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 1039
r3d34_K_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 700
r3d34_KM_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 1039
r3d50_K_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 700
r3d50_KM_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1039
r3d50_KMS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1139
r3d50_KS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 800
r3d50_M_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 339
r3d50_MS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 439
r3d50_S_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 100
r3d101_K_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 700
r3d101_KM_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 1039
r3d152_K_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 700
r3d152_KM_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 1039
r3d200_K_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 700
r3d200_KM_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 1039

Old pretrained models are still available here.
However, some modifications are required to use the old pretrained models in the current scripts.

Requirements

PyTorch (ver. 0.4+ required)

conda install pytorch torchvision cudatoolkit=10.1 -c soumith

FFmpeg, FFprobe
Python 3

Preparation

ActivityNet

Download videos using the official crawler.
Convert from avi to jpg files using util_scripts/generate_video_jpgs.py

python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path activitynet

Add fps infomartion into the json file util_scripts/add_fps_into_activitynet_json.py

python -m util_scripts.add_fps_into_activitynet_json mp4_video_dir_path json_file_path

Kinetics

Download videos using the official crawler.
- Locate test set in video_directory/test.
Convert from avi to jpg files using util_scripts/generate_video_jpgs.py

python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path kinetics

Generate annotation file in json format similar to ActivityNet using util_scripts/kinetics_json.py
- The CSV files (kinetics_{train, val, test}.csv) are included in the crawler.

python -m util_scripts.kinetics_json csv_dir_path 700 jpg_video_dir_path jpg dst_json_path

UCF-101

Download videos and train/test splits here.
Convert from avi to jpg files using util_scripts/generate_video_jpgs.py

python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path ucf101

Generate annotation file in json format similar to ActivityNet using util_scripts/ucf101_json.py
- annotation_dir_path includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt

python -m util_scripts.ucf101_json annotation_dir_path jpg_video_dir_path dst_json_path

HMDB-51

Download videos and train/test splits here.
Convert from avi to jpg files using util_scripts/generate_video_jpgs.py

python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path hmdb51

Generate annotation file in json format similar to ActivityNet using util_scripts/hmdb51_json.py
- annotation_dir_path includes brush_hair_test_split1.txt, ...

python -m util_scripts.hmdb51_json annotation_dir_path jpg_video_dir_path dst_json_path

Running the code

Assume the structure of data directories is the following:

~/
  data/
    kinetics_videos/
      jpg/
        .../ (directories of class names)
          .../ (directories of video names)
            ... (jpg files)
    results/
      save_100.pth
    kinetics.json

Confirm all options.

python main.py -h

Train ResNets-50 on the Kinetics-700 dataset (700 classes) with 4 CPU threads (for data loading).
Batch size is 128.
Save models at every 5 epochs. All GPUs is used for the training. If you want a part of GPUs, use CUDA_VISIBLE_DEVICES=....

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --model resnet \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Continue Training from epoch 101. (~/data/results/save_100.pth is loaded.)

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_100.pth \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Calculate top-5 class probabilities of each video using a trained model (~/data/results/save_200.pth.)
Note that inference_batch_size should be small because actual batch size is calculated by inference_batch_size * (n_video_frames / inference_stride).

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_200.pth \
--model_depth 50 --n_classes 700 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1

Evaluate top-1 video accuracy of a recognition result (~/data/results/val.json).

python -m util_scripts.eval_accuracy ~/data/kinetics.json ~/data/results/val.json --subset val -k 1 --ignore

Fine-tune fc layers of a pretrained model (~/data/models/resnet-50-kinetics.pth) on UCF-101.

python main.py --root_path ~/data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json \
--result_path results --dataset ucf101 --n_classes 101 --n_pretrain_classes 700 \
--pretrain_path models/resnet-50-kinetics.pth --ft_begin_module fc \
--model resnet --model_depth 50 --batch_size 128 --n_threads 4 --checkpoint 5

Comments

question about the 'Temporal duration of inputs'

Hi@kenshohara , in the opts.py ,whether I can change temporal duration of inputs in parser.add_argument('--sample_duration', default=16, type=int, help='Temporal duration of inputs'),like 32 frames,64 frames,etc? have you take the similar experiments? I really appreciate for your reply, Thanks.

opened by sophiazy 25
Performance of pretrained weights on UCF101

Hi, Nice work! I have a question about your results on UCF101 split 1. I've evaluated your pretrained weight of "resnext-101-kinetics-ucf101_split1.pth" on UCF101 split 1 and got the accuracy of ~85.99. I'm wondering if it is the right accuracy or not. Would you please provide the accuracies of the pretrained models?

opened by MohsenFayyaz89 21
Train from scratch on UCF101 using ResNet18 and get 10% gain without doing anything
I train and evaluate the code and get 10% gain without changing anything. Here is my process:

Parse video data using codes from READ.ME.

Train the model using python3 main.py --root_path ./datasets/ --video_path UCF101/jpg --annotation_path ucf101_01.json --result_path results --dataset ucf101 --model resnet --model_depth 18 --n_classes 101 --batch_size 16 and get the model result datasets/desults/save_200.pth.

Test the dataset using python3 main.py --root_path ./datasets/ --video_path UCF101/jpg --annotation_path ucf101_01.json --result_path results --dataset ucf101 --resume_path results/save_200.pth --model resnet --model_depth 18 --n_classes 101 --batch_size 16 --no_train --test and get the result in val.json and the command window says the clip accuracy is 0.346.

Get the accuracy using eval_ucf101.py to compare ucf101_01.json with val.json and the top1 accuracy for video is 52.66%, which is about 10% over 42.4% (reported in the paper).

I only use UCF101 split_01 so there are no overlap videos in train and test data. It is a little bit strange, and is it because in the paper, the training procedure did not last for 200 epochs? My platform is Pytorch 0.4 and I only modified one place to avoid the error, which is reported in another issue.
opened by BestJuly 16
RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

Hi dear Need help....on running main.py ,everything is going well till dataset loading as shown below: model generated dataset loading [0/9537] dataset loading [1000/9537] dataset loading [2000/9537] dataset loading [3000/9537] dataset loading [4000/9537] dataset loading [5000/9537] dataset loading [6000/9537] dataset loading [7000/9537] dataset loading [8000/9537] dataset loading [9000/9537] dataset loading [0/3783] dataset loading [1000/3783] dataset loading [2000/3783] dataset loading [3000/3783] run

error occured here:

train at epoch 1 Traceback (most recent call last): File "/media/psrana/New Volume/chandni/HAR_3D_TU/main.py", line 139, in train_logger, train_batch_logger) File "/media/psrana/New Volume/chandni/HAR_3D_TU/train.py", line 22, in train_epoch for i, (inputs, targets) in enumerate(data_loader): File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 417, in iter return DataLoaderIter(self) File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 242, in init self._put_indices() File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 290, in _put_indices indices = next(self.sample_iter, None) File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 119, in iter for idx in self.sampler: File "/home/psrana/anaconda3/envs/har_chandni/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 50, in iter return iter(torch.randperm(len(self.data_source)).long()) RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/TH/generic/THTensorMath.c:2247

what could be the

opened by chandnikathuria1992 12
Very Slow Training

I am training Resnet with depth 34 on the kinetics dataset, however the training procedure is not improving anything. How long does it take till the model starts improving ? I have attached a screenshot; currently I am at epoch 34 but the loss is still 5.99 and is not decreasing, and accuracy is very volatile

opened by cryptedp 11
RuntimeError: expected a non-empty list of Tensors

Traceback (most recent call last): File "main.py", line 129, in train_logger, train_batch_logger) File "/home/hareesh/Downloads/3D-ResNets-PyTorch-master/train.py", line 22, in train_epoch for i, (inputs, targets) in enumerate(data_loader): File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 286, in next return self._process_next_batch(batch) File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/hareesh/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/hareesh/Downloads/3D-ResNets-PyTorch-master/datasets/ucf101.py", line 193, in getitem clip = torch.stack(clip, 0).permute(1, 0, 2, 3) RuntimeError: expected a non-empty list of Tensors

Please let me know cause of this error.

opened by hareeshdevarakonda 10
Input of Densenet

Thank you for your wonderful work. Just read the paper, it is noted that each clip contains 16 frames. I read two other papers in which the author claims that 32 frame input would be better, have you tried 32 frames input? If you trained such models, can you please release the pretrained models?

opened by Tord-Zhang 10
Asking about the using of 3D ResNet on video sequence

Hello,

I'm new in this kind of 3D Convolution, so, I'm trying to understand how does this works. My dataset (UNBC McMaster) is including some videos that contains sequence of frames. For each frame, we have one pain intensity level. Now, I want to use 3D ResNet to predict pain level as regression problem. So, let says we have a sequence including 32 frames, which mean I have 32 labels for this sequence. Normally, with CNN + LSTM, I will use CNN to extract features and then put it through LSTM, take the output and label of the last frame to compute loss. So, for 3D ResNet, should I take the output of the model and the label of last frame to calculate loss ?

opened by glmanhtu 9
i have some problem do a test

I would like to test network on UCF101 after finetune in UCF101 using pretrained kinetics you provided.

i use resnet18. I want to get the accuracy of a video unit, not clip.

i use instruction below

python main.py --root_path UCF101 --video_path jpg --annotation_path ucf101_01.json --result_path test --dataset ucf101 --model resnet --model_depth 18 --n_classes 101 --batch_size 64 --n_threads 4 --pretrain_path 18result1s/save_200.pth --no_train --no_val --test --test_subset val --n_finetune_classes 101

jpg is the same as your data directory. 18results1s/save_200.pth is pretrained on Kinetics, finetune in UCF101's network.

it make a error.

run dataset loading [0/3783] dataset loading [1000/3783] dataset loading [2000/3783] dataset loading [3000/3783] test test.py:42: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. inputs = Variable(inputs, volatile=True) test.py:45: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. outputs = F.softmax(outputs) Traceback (most recent call last): File "main.py", line 162, in test.test(test_loader, model, opt, test_data.class_names) File "test.py", line 50, in test test_results, class_names) File "test.py", line 20, in calculate_video_results 'label': class_names[locs[i]], KeyError: tensor(12)

KeyError : tensor(12) , i change my instruction, the number in parentheses is change, maybe.

can you help me?

opened by lee2h 9
Performance of fine-tuning on UCF101

I downloaded the network ResNet-101 pretrained on Kinetics, and fine-tuned on UCF101 following the example script. However, I can only get 82.5 by averaging the three splits. In the paper, the authors reported 88.9. Any suggestion?

opened by zhihuilics 9
Pretrain models cannot download

Hello, I am making a demo about 3D convolution network. After reading your CVPR paper, I am happy to use your pretrain models. However, when I try to open your link, I get "The folder has been put in the recycle bin.". These pretrain models are extramely important for my program, because I don't have many GPU to train the network from scratch. Please give me a chance to use res3D convolution network...... @kenshohara

opened by KeCh96 8
main.py: error: unrecognized arguments: hmdb51_3.json

I'm trying to train resnet on hmdb51. So I've 3 json files. But whichever I'm passing with the --annotation_path argument, it's keep giving this error. Please help!

opened by soumyadbanik 2
ABOUT ucf101_json

I used python script to generate three json files of UCF101, they are ucf101_01.json,ucf101_01.json,ucf101_03.json I run the main function to train resnet python main.py --root_path ./data --video_path ucf101_jpg/ --annotation_path ucf101_json/ucf101_01.json
--result_path results --dataset ucf101 --n_classes 101 --model resnet --model_depth 50 --batch_size 64
--n_threads 4 --checkpoint 5

I want to know when ucf101_02.json and ucf101_03.json should be used,thank you very much!!!

opened by theones-g 0
Image Resolution 112*112

I wanted to be able to input larger image resolutions. However when I do input image size of 480*480 it takes almost 10 minutes to process a tiny 10 second clip.

It seems when I increase image size, the model inference run-time become exponentially greater.

There is crucial motion information being lost when I downscale my images to 112*112 and it is effecting the precision of the model on my test sets.

Is there any alternative model or method that will allow me to proceed with larger image resolutions using the 3D-ResNet model?

Is it practical to use 3D-CNN with input sizes of 480*480 images for video classification tasks?

opened by darshvirbelandis 1
Why is opt.n_val_samples 3

I found that when fine-tuning UCF101, split1 partition was used to verify that the number in the dataset was 11349 instead of 3783, and why was batchsize opt.batch_size // opt.n_val_samples?

opened by YTHmamba 0
AssertionError when I inference

I used the r2p1d18_K_200ep.pth and finetune it on hmdb51 dataset,and when I want to use it to inference there is an AssertionError `CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --root_path /home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data --video_path hmdb51-videos/jpg --annotation_path hmdb51_1.json \

--result_path results --dataset hmdb51 --resume_path results/save_200.pth
--model_depth 18 --n_classes 51 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1 Namespace(accimage=False, annotation_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/hmdb51_1.json'), arch='resnet-18', batch_size=128, batchnorm_sync=False, begin_epoch=1, checkpoint=10, colorjitter=False, conv1_t_size=7, conv1_t_stride=1, dampening=0.0, dataset='hmdb51', dist_url='tcp://127.0.0.1:23456', distributed=False, file_type='jpg', ft_begin_module='', inference=True, inference_batch_size=1, inference_crop='center', inference_no_average=False, inference_stride=16, inference_subset='val', input_type='rgb', learning_rate=0.1, lr_scheduler='multistep', manual_seed=1, mean=[0.4345, 0.4051, 0.3775], mean_dataset='kinetics', model='resnet', model_depth=18, momentum=0.9, multistep_milestones=[50, 100, 150], n_classes=51, n_epochs=200, n_input_channels=3, n_pretrain_classes=0, n_threads=4, n_val_samples=3, nesterov=False, no_cuda=False, no_hflip=False, no_max_pool=False, no_mean_norm=False, no_std_norm=False, no_train=True, no_val=True, optimizer='sgd', output_topk=5, overwrite_milestones=False, plateau_patience=10, pretrain_path=None, resnet_shortcut='B', resnet_widen_factor=1.0, resnext_cardinality=32, result_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results'), resume_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results/save_200.pth'), root_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data'), sample_duration=16, sample_size=112, sample_t_stride=1, std=[0.2768, 0.2713, 0.2737], tensorboard=False, train_crop='random', train_crop_min_ratio=0.75, train_crop_min_scale=0.25, train_t_crop='random', value_scale=1, video_path=PosixPath('/home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/hmdb51-videos/jpg'), weight_decay=0.001, wide_resnet_k=2, world_size=-1) loading checkpoint /home/pubNAS/jianfei/3D-ResNets-PyTorch-master/data/results/save_200.pth model Traceback (most recent call last): File "main.py", line 428, in main_worker(-1, opt) File "main.py", line 345, in main_worker model = resume_model(opt.resume_path, opt.arch, model) File "main.py", line 89, in resume_model assert arch == checkpoint['arch'] AssertionError `

opened by z369437558 0

Releases(1.0)

1.0(Oct 30, 2018)

This version works in PyTorch v0.3.1 or earlier.

In the following papers, we used this version.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh, "Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh, "Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition", Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.
Source code(tar.gz)
Source code(zip)

Owner

Kensho Hara

GitHub

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

20 Jan 3, 2023

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

294 Dec 12, 2022

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

70 Nov 4, 2022

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

67 Jan 3, 2023

Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

27 Jul 20, 2022

StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

5.1k Jan 4, 2023

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

44 Dec 12, 2022

BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

BABEL is a large dataset with language labels describing the actions being performed in mocap sequences. BABEL labels about 43 hours of mocap sequences from AMASS [1] with action labels.

113 Dec 28, 2022

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars This repository is the official implementation of Colar. In this work,

246 Dec 13, 2022

TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition Overview We release the PyTorch code of the TDN(Temporal Difference Networks).

Multimedia Computing Group, Nanjing University

326 Dec 13, 2022

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

29 Sep 24, 2022

Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

27 Nov 22, 2022

Synthetic Humans for Action Recognition, IJCV 2021

SURREACT: Synthetic Humans for Action Recognition from Unseen Viewpoints Gül Varol, Ivan Laptev and Cordelia Schmid, Andrew Zisserman, Synthetic Human

59 Dec 14, 2022

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

A PyTorch Reproduction of HCN Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. Ch

210 Dec 31, 2022

Compressed Video Action Recognition

Compressed Video Action Recognition Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl. In CVPR, 2018. [Proj

479 Dec 26, 2022

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

342 Dec 16, 2022

AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by DATA Lab at Texas A&M University.

Data Analytics Lab at Texas A&M University

267 Dec 17, 2022

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

148 Dec 16, 2022

3D ResNets for Action Recognition (CVPR 2018)

Related tags

Overview

3D ResNets for Action Recognition

Update (2020/4/13)

Update (2020/4/10)

Summary

Citation

Pre-trained models

Requirements

Preparation

ActivityNet

Kinetics

UCF-101

HMDB-51

Running the code

Comments

Releases(1.0)

1.0(Oct 30, 2018)

Owner

Kensho Hara

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Human Action Controller - A human action controller running on different platforms.

StarGAN - Official PyTorch Implementation (CVPR 2018)

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

TDN: Temporal Difference Networks for Efficient Action Recognition

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

Learning Representational Invariances for Data-Efficient Action Recognition

Synthetic Humans for Action Recognition, IJCV 2021

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

Compressed Video Action Recognition

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

AutoVideo: An Automated Video Action Recognition System

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"