Compressed Video Action Recognition

Overview

Compressed Video Action Recognition

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl.
In CVPR, 2018. [Project Page]

Overview

This is a reimplementation of CoViAR in PyTorch (the original paper uses MXNet). This code currently supports UCF-101 and HMDB-51; Charades coming soon. (This is a work in progress. Any suggestions are appreciated.)

Results

This code produces comparable or better results than the original paper:
HMDB-51: 52% (I-frame), 40% (motion vector), 43% (residuals), 59.2% (CoViAR).
UCF-101: 87% (I-frame), 70% (motion vector), 80% (residuals), 90.5% (CoViAR).
(average of 3 splits; without optical flow. )

Data loader

We provide a python data loader that directly takes a compressed video and returns the compressed representation (I-frames, motion vectors, and residuals) as a numpy array . We can thus train the model without extracting and storing all representations as image files.

In our experiments, it's fast enough so that it doesn't delay GPU training. Please see GETTING_STARTED.md for details and instructions.

Using CoViAR

Please see GETTING_STARTED.md for instructions for training and inference.

Citation

If you find this model useful for your resesarch, please use the following BibTeX entry.

@inproceedings{wu2018coviar,
  title={Compressed Video Action Recognition},
  author={Wu, Chao-Yuan and Zaheer, Manzil and Hu, Hexiang and Manmatha, R and Smola, Alexander J and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={CVPR},
  year={2018}
}

Acknowledgment

This implementation largely borrows from tsn-pytorch by yjxiong. Part of the dataloader implementation is modified from this tutorial and FFmpeg extract_mv example.

Comments
  • ModuleNotFoundError: No module named 'coviar'

    ModuleNotFoundError: No module named 'coviar'

    I execute sudo ./install.sh and get the following result: image Did I succeed? Then when I try the command for training, I got the following wrong error: image Did this mean the "./install.sh" didn't work successfully? I am a beginner, thank you very much for your guidance.

    opened by ZealACMer 7
  • raise ValueError(

    raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))

    I met a problem in transform.py,could you please give me some advices. Thanks!

    Traceback (most recent call last): File "train.py", line 275, in main() File "train.py", line 104, in main train(train_loader, model, criterion, optimizer, epoch, cur_lr) File "train.py", line 134, in train for i, (input, target) in enumerate(train_loader): File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 322, in next return self._process_next_batch(batch) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch raise batch.exc_type(batch.exc_msg) ValueError: Traceback (most recent call last): File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/data/code/project/pytorch-coviar/dataset.py", line 160, in getitem frames = self._transform(frames) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 49, in call img = t(img) File "/data/code/project/pytorch-coviar/transforms.py", line 124, in call crop_w, crop_h, offset_w, offset_h = self._sample_crop_size(im_size) File "/data/code/project/pytorch-coviar/transforms.py", line 153, in _sample_crop_size w_offset = random.randint(0, image_w - crop_pair[0]) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/random.py", line 221, in randint return self.randrange(a, b+1) File "/root/anaconda3/envs/caffe-tf/lib/python3.6/random.py", line 199, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (0,-1, -1)

    terminate called after throwing an instance of 'at::Error' what(): CUDA error (29): driver shutting down (check_status at /pytorch/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36) frame #0: at::detail::CUDAStream_free(CUDAStreamInternals*&) + 0x50 (0x7fe59246aa50 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #1: THCStream_free + 0x13 (0x7fe56f4d0953 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #2: std::_Rb_tree<std::shared_ptr, std::shared_ptr, std::_Identity<std::shared_ptr >, std::less<std::shared_ptr >, std::allocator<std::shared_ptr > >::_M_erase(std::_Rb_tree_node<std::shared_ptr >*) + 0x8e (0x7fe56f4c1fbe in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #3: + 0xd1ca71 (0x7fe56f4c5a71 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #4: + 0xd1caa0 (0x7fe56f4c5aa0 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #5: + 0x38e69 (0x7fe5b237ae69 in /lib64/libc.so.6) frame #6: + 0x38eb5 (0x7fe5b237aeb5 in /lib64/libc.so.6) frame #7: __libc_start_main + 0xfc (0x7fe5b2363b1c in /lib64/libc.so.6)

    opened by Tylerjoe 7
  • Any help for building coviar_data_loader on windows?

    Any help for building coviar_data_loader on windows?

    I've tried to build coviar using mingw64

    D:\bin\MinGW64\bin\gcc.exe -shared -s build\temp.win-amd64-3.6\Release\coviar_data_loader.o build\temp.win-amd64-3.6\Release\coviar.cp36-win_amd64.def -LD:\bin\python\libs -LD:\bin\python\PCbuild\amd64 -lpython36 -lmsvcr140 -o build\lib.win-amd64-3.6\coviar.cp36-win_amd64.pyd -lavutil -lavcodec -lavformat -lswscale -L./ffmpeg/lib/
    running install
    running build
    running build_ext
    running install_lib
    copying build\lib.win-amd64-3.6\coviar.cp36-win_amd64.pyd -> C:\Users\Jim\AppData\Roaming\Python\Python36\site-packages
    running install_egg_info
    Writing C:\Users\Jim\AppData\Roaming\Python\Python36\site-packages\coviar-0.1-py3.6.egg-info
    

    Successfully built.

    But when I import coviar in python, it fails.

    $ ipython
    Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.3.0 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: import coviar
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-1-e416fb4448c7> in <module>
    ----> 1 import coviar
    
    ImportError: DLL load failed: 找不到指定的模块。
    

    Thanks for any ideas.

    opened by JimLee1996 6
  • .npz flow score file for combining your method with TSN.

    .npz flow score file for combining your method with TSN.

    For facilitating the reproducibility of your proposed method, could you please share the .npz optical flow score file you used to combine Coviar with TSN. It is very important for the result reproducibility. Thank you very much. Looking forward for your reply.

    opened by ZealACMer 5
  • numpy issue during ./install.sh

    numpy issue during ./install.sh

    thank you for your lovely paper. was trying to get your code to work. We are using amazon AMI and got FFmpeg compiled and using python3 an pytorch. when trying to run ./install.sh with the ffmpeg path we get the following GCC error. Any idea of how to resolve this? it says Numpy decprecated API. What version of Numpy was used?

    screen shot 2018-07-27 at 12 44 52 am

    we also followed steps in other issues - https://github.com/chaoyuaw/pytorch-coviar/issues/6 and https://github.com/chaoyuaw/pytorch-coviar/issues/5 but the error still persists.

    opened by srikar2097 5
  • How to reconstruct the frame?

    How to reconstruct the frame?

    Hi! How would I reconstruct the frame using the motion vector, residual, and the initial I frame? I tried to warp the initial I frame of. the GOP according to the accumulated motion vector and then add the residual, but the results look weird somehow.

    opened by zhipeng-fan 4
  • Encounter the problem

    Encounter the problem "IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number", when I tried to train on HMDB51.

    To be more specific, when I tried to use the following commands: image I encountered the problem as shown in the following pic: image I have not find the appropriate solution for this problem, can you help me? Thank you very much.

    opened by ZealACMer 4
  • Assertion Error

    Assertion Error

    @manzilzaheer @chaoyuaw

    hi,

    Kindly please check this error and tell the possible reason and solution for it. I have run this on Server with 4 GPUs and attach is its screenshot. 3rd august

    opened by manza-ari 4
  • Testing with 25 segments is not  like paper

    Testing with 25 segments is not like paper

    Hi, in the paper you guys said, that you're sampling 25 frames uniformly. for what I understood , let's say the number of frames is 100, you sample 25 indexes. and those are the frames. in this method I understand why you get 4.2 GFLOPs. But in the code it doesn't look like that, in: https://github.com/chaoyuaw/pytorch-coviar/blob/master/dataset.py#L130 you are running on the number of segemnts, and you calling coviar for each of the reprsentation, I mean you're calling this module for I frames residual and for mv, so you choose in a head how much to sample from each type? I'm a bit confused on this testing method and the code, thanks for your elp.

    opened by Esaada 3
  • Decoding a video frame, given the previous frame, current motion vectors, and current residual image.

    Decoding a video frame, given the previous frame, current motion vectors, and current residual image.

    Assume we have pos_target=t, a reference frame at pos_target=t-1, and the motion vectors and residual image for the given pos_target=t. However, assume we don't have the original video file.

    Given these constraints, I would like to reconstruct the frame at pos_target=t, as described in Equation 1 of your paper.

    So far, I've tried decoding the frame at pos_target=t by: (1) creating a reference frame, which is just a copy of the t-1 frame; (2) performing motion compensation by copying 16x16 pixel blocks from the t-1 frame to the reference frame, based on the motion vectors; (3) adding the residual image to the motion-compensated reference frame.

    This is the reference frame at pos_target=2: image

    This is the result after step (1), for pos_target=3: image

    This is the result after step (2), for pos_target=3: image

    The final result seems to have some compression artifacts, so I guess I'm not reconstructing the frame correctly. Is there a better way to do this (particularly, using ffmpeg)? Thanks!

    opened by itoen220 3
  • Some questions about coviar data loader

    Some questions about coviar data loader

    Hi, I have some questions when reading the coviar_data_loader.c Firstly, you init the variable accu_src_old as follows but i whether why:

                        for (size_t x = 0; x < w; ++x) {
                            for (size_t y = 0; y < h; ++y) {
                                accu_src_old[x * h * 2 + y * 2    ]  = x;
                                accu_src_old[x * h * 2 + y * 2 + 1]  = y;
                            }
                        }
    

    Secondly, is the following codes means that every frame in the target gop before target frame will be decoded, and only the I-frame and the target frame will be transit to bgr format?

                if (cur_gop == gop_target && cur_pos <= pos_target) {
                    ret = avcodec_decode_video2(pCodecCtx, pFrame, &got_picture, &packet);  
    ......
                    if (got_picture) {
    
                        if ((cur_pos == 0              && accumulate  && representation == RESIDUAL) ||
                            (cur_pos == pos_target - 1 && !accumulate && representation == RESIDUAL) ||
                            cur_pos == pos_target) {
                            create_and_load_bgr(
                                pFrame, pFrameBGR, buffer, bgr_arr, cur_pos, pos_target);
                        }
    

    Thirdly, in dataset.py, I whether why you process the img like follows:

    def clip_and_scale(img, size):
        return (img * (127.5 / size)).astype(np.int32)
    
    Thanks for your excellent work and code. Looking forward to your reply : )
    
    opened by Manolo1988 3
  • my load return value is empty

    my load return value is empty

    I installed ffmpeg according to the operation, and modified the path in setup.py, executed the command ./install.sh, and installed: Processing dependencies for coviar==0.1 Finished processing dependencies for coviar==0.1 but: from coviar import load load([input], [gop_index], [frame_index], [representation_type], [accumulate]) The execution result returned is empty

    opened by JunLiangZ 0
  • only get 57% acc on mv vector, ucf101 split1

    only get 57% acc on mv vector, ucf101 split1

    Hey, I just copied the options and settings when I trained coviar on UCF101-split1, but I only got 57% acc (validation) when training. Is there something wrong? Any help? Thanks.

    opened by DawnChou 2
  • how to execute ./data_loader/install.sh in Windows enviroment?

    how to execute ./data_loader/install.sh in Windows enviroment?

    how to execute ./data_loader/install.sh in Windows enviroment? and how to modify these paths: ./ffmpeg/include/, -L./ffmpeg/lib/, in Windows enviroment? or others need to be modified?

    opened by fengfan028 0
  • some videos from hmdb51 and ucf101 cannot be recognized

    some videos from hmdb51 and ucf101 cannot be recognized

    After I converted the hmdb51 and ucf101 data sets according to the "reencode.sh", some mp4 videos could not be recognized during training, but some mp4 videos success. And the unrecognizable videos will output "Could not open input stream". It's weird, does anyone know how to solve this problem?

    opened by suki7 3
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

ActNN : Activation Compressed Training This is the official project repository for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Comp

UC Berkeley RISE 178 Jan 5, 2023
An implementation of chunked, compressed, N-dimensional arrays for Python.

Zarr Latest Release Package Status License Build Status Coverage Downloads Gitter Citation What is it? Zarr is a Python package providing an implement

Zarr Developers 1.1k Dec 30, 2022
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 7, 2022
AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by DATA Lab at Texas A&M University.

Data Analytics Lab at Texas A&M University 267 Dec 17, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 4, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

null 27 Jul 20, 2022
TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition Overview We release the PyTorch code of the TDN(Temporal Difference Networks).

Multimedia Computing Group, Nanjing University 326 Dec 13, 2022
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

Virginia Tech Vision and Learning Lab 27 Nov 22, 2022
Synthetic Humans for Action Recognition, IJCV 2021

SURREACT: Synthetic Humans for Action Recognition from Unseen Viewpoints Gül Varol, Ivan Laptev and Cordelia Schmid, Andrew Zisserman, Synthetic Human

Gul Varol 59 Dec 14, 2022
A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

A PyTorch Reproduction of HCN Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. Ch

Guyue Hu 210 Dec 31, 2022
PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

R2Plus1D-PyTorch PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal

Irhum Shafkat 342 Dec 16, 2022
3D ResNets for Action Recognition (CVPR 2018)

3D ResNets for Action Recognition Update (2020/4/13) We published a paper on arXiv. Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,

Kensho Hara 3.5k Jan 6, 2023
[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

Yuxin Chen 148 Dec 16, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 268 Jan 9, 2023
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 32 Sep 25, 2021
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 3, 2022