Compressed Video Action Recognition

Chao-Yuan Wu

Last update: Dec 26, 2022

Related tags

Deep Learning pytorch-coviar

Overview

Compressed Video Action Recognition

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl.
In CVPR, 2018. [Project Page]

Overview

This is a reimplementation of CoViAR in PyTorch (the original paper uses MXNet). This code currently supports UCF-101 and HMDB-51; Charades coming soon. (This is a work in progress. Any suggestions are appreciated.)

Results

This code produces comparable or better results than the original paper:
HMDB-51: 52% (I-frame), 40% (motion vector), 43% (residuals), 59.2% (CoViAR).
UCF-101: 87% (I-frame), 70% (motion vector), 80% (residuals), 90.5% (CoViAR).
(average of 3 splits; without optical flow. )

Data loader

We provide a python data loader that directly takes a compressed video and returns the compressed representation (I-frames, motion vectors, and residuals) as a numpy array . We can thus train the model without extracting and storing all representations as image files.

In our experiments, it's fast enough so that it doesn't delay GPU training. Please see GETTING_STARTED.md for details and instructions.

Using CoViAR

Please see GETTING_STARTED.md for instructions for training and inference.

Citation

If you find this model useful for your resesarch, please use the following BibTeX entry.

@inproceedings{wu2018coviar,
  title={Compressed Video Action Recognition},
  author={Wu, Chao-Yuan and Zaheer, Manzil and Hu, Hexiang and Manmatha, R and Smola, Alexander J and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={CVPR},
  year={2018}
}

Acknowledgment

This implementation largely borrows from tsn-pytorch by yjxiong. Part of the dataloader implementation is modified from this tutorial and FFmpeg extract_mv example.

Comments

ModuleNotFoundError: No module named 'coviar'

I execute sudo ./install.sh and get the following result: Did I succeed? Then when I try the command for training, I got the following wrong error: Did this mean the "./install.sh" didn't work successfully? I am a beginner, thank you very much for your guidance.

opened by ZealACMer 7
raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))

I met a problem in transform.py,could you please give me some advices. Thanks!

Traceback (most recent call last): File "train.py", line 275, in main() File "train.py", line 104, in main train(train_loader, model, criterion, optimizer, epoch, cur_lr) File "train.py", line 134, in train for i, (input, target) in enumerate(train_loader): File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 322, in next return self._process_next_batch(batch) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch raise batch.exc_type(batch.exc_msg) ValueError: Traceback (most recent call last): File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/data/code/project/pytorch-coviar/dataset.py", line 160, in getitem frames = self._transform(frames) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 49, in call img = t(img) File "/data/code/project/pytorch-coviar/transforms.py", line 124, in call crop_w, crop_h, offset_w, offset_h = self._sample_crop_size(im_size) File "/data/code/project/pytorch-coviar/transforms.py", line 153, in _sample_crop_size w_offset = random.randint(0, image_w - crop_pair[0]) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/random.py", line 221, in randint return self.randrange(a, b+1) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/random.py", line 199, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (0,-1, -1)

terminate called after throwing an instance of 'at::Error' what(): CUDA error (29): driver shutting down (check_status at /pytorch/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36) frame #0: at::detail::CUDAStream_free(CUDAStreamInternals*&) + 0x50 (0x7fe59246aa50 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #1: THCStream_free + 0x13 (0x7fe56f4d0953 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #2: std::_Rb_tree<std::shared_ptr, std::shared_ptr, std::_Identity<std::shared_ptr >, std::less<std::shared_ptr >, std::allocator<std::shared_ptr > >::_M_erase(std::_Rb_tree_node<std::shared_ptr >*) + 0x8e (0x7fe56f4c1fbe in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #3: + 0xd1ca71 (0x7fe56f4c5a71 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #4: + 0xd1caa0 (0x7fe56f4c5aa0 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #5: + 0x38e69 (0x7fe5b237ae69 in /lib64/libc.so.6) frame #6: + 0x38eb5 (0x7fe5b237aeb5 in /lib64/libc.so.6) frame #7: __libc_start_main + 0xfc (0x7fe5b2363b1c in /lib64/libc.so.6)

opened by Tylerjoe 7

Any help for building coviar_data_loader on windows?

I've tried to build coviar using mingw64

D:\bin\MinGW64\bin\gcc.exe -shared -s build\temp.win-amd64-3.6\Release\coviar_data_loader.o build\temp.win-amd64-3.6\Release\coviar.cp36-win_amd64.def -LD:\bin\python\libs -LD:\bin\python\PCbuild\amd64 -lpython36 -lmsvcr140 -o build\lib.win-amd64-3.6\coviar.cp36-win_amd64.pyd -lavutil -lavcodec -lavformat -lswscale -L./ffmpeg/lib/
running install
running build
running build_ext
running install_lib
copying build\lib.win-amd64-3.6\coviar.cp36-win_amd64.pyd -> C:\Users\Jim\AppData\Roaming\Python\Python36\site-packages
running install_egg_info
Writing C:\Users\Jim\AppData\Roaming\Python\Python36\site-packages\coviar-0.1-py3.6.egg-info

Successfully built.

But when I import coviar in python, it fails.

$ ipython
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.3.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import coviar
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-e416fb4448c7> in <module>
----> 1 import coviar

ImportError: DLL load failed: 找不到指定的模块。

Thanks for any ideas.

opened by JimLee1996 6

.npz flow score file for combining your method with TSN.

For facilitating the reproducibility of your proposed method, could you please share the .npz optical flow score file you used to combine Coviar with TSN. It is very important for the result reproducibility. Thank you very much. Looking forward for your reply.

opened by ZealACMer 5
numpy issue during ./install.sh

thank you for your lovely paper. was trying to get your code to work. We are using amazon AMI and got FFmpeg compiled and using python3 an pytorch. when trying to run ./install.sh with the ffmpeg path we get the following GCC error. Any idea of how to resolve this? it says Numpy decprecated API. What version of Numpy was used?

we also followed steps in other issues - https://github.com/chaoyuaw/pytorch-coviar/issues/6 and https://github.com/chaoyuaw/pytorch-coviar/issues/5 but the error still persists.

opened by srikar2097 5
How to reconstruct the frame?

Hi! How would I reconstruct the frame using the motion vector, residual, and the initial I frame? I tried to warp the initial I frame of. the GOP according to the accumulated motion vector and then add the residual, but the results look weird somehow.

opened by zhipeng-fan 4
Encounter the problem "IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number", when I tried to train on HMDB51.

To be more specific, when I tried to use the following commands: I encountered the problem as shown in the following pic: I have not find the appropriate solution for this problem, can you help me? Thank you very much.

opened by ZealACMer 4
Assertion Error

@manzilzaheer @chaoyuaw

hi,

Kindly please check this error and tell the possible reason and solution for it. I have run this on Server with 4 GPUs and attach is its screenshot.

opened by manza-ari 4
Testing with 25 segments is not like paper

Hi, in the paper you guys said, that you're sampling 25 frames uniformly. for what I understood , let's say the number of frames is 100, you sample 25 indexes. and those are the frames. in this method I understand why you get 4.2 GFLOPs. But in the code it doesn't look like that, in: https://github.com/chaoyuaw/pytorch-coviar/blob/master/dataset.py#L130 you are running on the number of segemnts, and you calling coviar for each of the reprsentation, I mean you're calling this module for I frames residual and for mv, so you choose in a head how much to sample from each type? I'm a bit confused on this testing method and the code, thanks for your elp.

opened by Esaada 3
Decoding a video frame, given the previous frame, current motion vectors, and current residual image.

Assume we have pos_target=t, a reference frame at pos_target=t-1, and the motion vectors and residual image for the given pos_target=t. However, assume we don't have the original video file.

Given these constraints, I would like to reconstruct the frame at pos_target=t, as described in Equation 1 of your paper.

So far, I've tried decoding the frame at pos_target=t by: (1) creating a reference frame, which is just a copy of the t-1 frame; (2) performing motion compensation by copying 16x16 pixel blocks from the t-1 frame to the reference frame, based on the motion vectors; (3) adding the residual image to the motion-compensated reference frame.

This is the reference frame at pos_target=2:

This is the result after step (1), for pos_target=3:

This is the result after step (2), for pos_target=3:

The final result seems to have some compression artifacts, so I guess I'm not reconstructing the frame correctly. Is there a better way to do this (particularly, using ffmpeg)? Thanks!

opened by itoen220 3

Some questions about coviar data loader

Hi, I have some questions when reading the coviar_data_loader.c Firstly, you init the variable accu_src_old as follows but i whether why:

                    for (size_t x = 0; x < w; ++x) {
                        for (size_t y = 0; y < h; ++y) {
                            accu_src_old[x * h * 2 + y * 2    ]  = x;
                            accu_src_old[x * h * 2 + y * 2 + 1]  = y;
                        }
                    }

Secondly, is the following codes means that every frame in the target gop before target frame will be decoded, and only the I-frame and the target frame will be transit to bgr format?

            if (cur_gop == gop_target && cur_pos <= pos_target) {
                ret = avcodec_decode_video2(pCodecCtx, pFrame, &got_picture, &packet);  
......
                if (got_picture) {

                    if ((cur_pos == 0              && accumulate  && representation == RESIDUAL) ||
                        (cur_pos == pos_target - 1 && !accumulate && representation == RESIDUAL) ||
                        cur_pos == pos_target) {
                        create_and_load_bgr(
                            pFrame, pFrameBGR, buffer, bgr_arr, cur_pos, pos_target);
                    }

Thirdly, in dataset.py, I whether why you process the img like follows:

def clip_and_scale(img, size):
    return (img * (127.5 / size)).astype(np.int32)

Thanks for your excellent work and code. Looking forward to your reply : )

opened by Manolo1988 3

my load return value is empty

I installed ffmpeg according to the operation, and modified the path in setup.py, executed the command ./install.sh, and installed: Processing dependencies for coviar==0.1 Finished processing dependencies for coviar==0.1 but: from coviar import load load([input], [gop_index], [frame_index], [representation_type], [accumulate]) The execution result returned is empty

opened by JunLiangZ 0
only get 57% acc on mv vector, ucf101 split1

Hey, I just copied the options and settings when I trained coviar on UCF101-split1, but I only got 57% acc (validation) when training. Is there something wrong? Any help? Thanks.

opened by DawnChou 2
how to execute ./data_loader/install.sh in Windows enviroment?

how to execute ./data_loader/install.sh in Windows enviroment? and how to modify these paths: ./ffmpeg/include/, -L./ffmpeg/lib/, in Windows enviroment? or others need to be modified?

opened by fengfan028 0
some videos from hmdb51 and ucf101 cannot be recognized

After I converted the hmdb51 and ucf101 data sets according to the "reencode.sh", some mp4 videos could not be recognized during training, but some mp4 videos success. And the unrecognizable videos will output "Could not open input stream". It's weird, does anyone know how to solve this problem?

opened by suki7 3

Owner

Chao-Yuan Wu

GitHub https://www.cs.utexas.edu/~cywu/projects/coviar/

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

20 Jan 3, 2023

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

ActNN : Activation Compressed Training This is the official project repository for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Comp

178 Jan 5, 2023

An implementation of chunked, compressed, N-dimensional arrays for Python.

Zarr Latest Release Package Status License Build Status Coverage Downloads Gitter Citation What is it? Zarr is a Python package providing an implement

1.1k Dec 30, 2022

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

7 Nov 7, 2022

AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by DATA Lab at Texas A&M University.

Data Analytics Lab at Texas A&M University

267 Dec 17, 2022

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

70 Nov 4, 2022

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

67 Jan 3, 2023

Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

27 Jul 20, 2022

TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition Overview We release the PyTorch code of the TDN(Temporal Difference Networks).

Multimedia Computing Group, Nanjing University

326 Dec 13, 2022

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

29 Sep 24, 2022

Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

27 Nov 22, 2022

Synthetic Humans for Action Recognition, IJCV 2021

SURREACT: Synthetic Humans for Action Recognition from Unseen Viewpoints Gül Varol, Ivan Laptev and Cordelia Schmid, Andrew Zisserman, Synthetic Human

59 Dec 14, 2022

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

A PyTorch Reproduction of HCN Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. Ch

210 Dec 31, 2022

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

342 Dec 16, 2022

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

4 Nov 3, 2022

Compressed Video Action Recognition

Related tags

Overview

Compressed Video Action Recognition

Overview

Results

Data loader

Using CoViAR

Citation

Acknowledgment

Comments

Owner

Chao-Yuan Wu

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

An implementation of chunked, compressed, N-dimensional arrays for Python.

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

AutoVideo: An Automated Video Action Recognition System

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Human Action Controller - A human action controller running on different platforms.

TDN: Temporal Difference Networks for Efficient Action Recognition

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

Learning Representational Invariances for Data-Efficient Action Recognition

Synthetic Humans for Action Recognition, IJCV 2021

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

3D ResNets for Action Recognition (CVPR 2018)

[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0