Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Overview

================================================================================

Convolutional Two-Stream Network Fusion for Video Action Recognition

This repository contains the code for our CVPR 2016 paper:

Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman
"Convolutional Two-Stream Network Fusion for Video Action Recognition"
in Proc. CVPR 2016

If you find the code useful for your research, please cite our paper:

    @inproceedings{feichtenhofer2016convolutional,
      title={Convolutional Two-Stream Network Fusion for Video Action Recognition},
      author={Feichtenhofer, Christoph and Pinz, Axel and Zisserman, Andrew},
      booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2016}
    }

Requirements

The code was tested on Ubuntu 14.04 and Windows 10 using MATLAB R2015b and NVIDIA Titan X or Z GPUs.

If you have questions regarding the implementation please contact:

Christoph Feichtenhofer 
   

   

================================================================================

Setup

  1. Download the code git clone --recursive https://github.com/feichtenhofer/twostreamfusion

  2. Compile the code by running compile.m.

    • This will also compile a modified (and older) version of the MatConvNet toolbox. In case of any issues, please follow the installation instructions on the MatConvNet homepage.
  3. Edit the file cnn_setup_environment.m to adjust the models and data paths.

  4. Download pretrained model files and the datasets, linked below and unpack them into your models/data directory.

  • Optionally you can pretrain your own twostream models by running
    1. cnn_ucf101_spatial(); to train the appearance network stream.
    2. cnn_ucf101_temporal(); to train the optical flow network stream.
  1. Run cnn_ucf101_fusion(); this will use the downloaded models and demonstrate training of our final architecture on UCF101/HMDB51.
    • In case you would like to train on the CPU, clear the variable opts.train.gpus
    • In case you encounter memory issues on your GPU, consider decreasing the cudnnWorkspaceLimit (512MB is default)

Pretrained models

Data

Pre-computed optical flow images and resized rgb frames for the UCF101 and HMDB51 datasets

Use it on your own dataset

Comments
  • not able to open your data

    not able to open your data

    do you mind double check you Pre-computed optical flow images and resized rgb frames for the UCF101 and HMDB51 datasets? I tried download and open your data twice, still not able to open it

    opened by jiaxue1993 9
  • The dataset links are down!

    The dataset links are down!

    The links are invalid for download the datasets and models. Could you please renew the link you offered so we could download the datasets and models for study? Thanks a lot!

    The invalid links ↓ See

    opened by CooperLi 4
  • Can I get the 92.5% accuracy by directly run your code?

    Can I get the 92.5% accuracy by directly run your code?

    Hello. I downloaded your code for the paper 'Two-Stream Convolutional Networks for Action Recognition in Videos', and run it directly. I tried many times but I can only get a result of 91.5%. Should I change somewhere in your code to get the 92.5% in your paper?

    opened by MubarkLa 3
  • cannot find 'models\ucf101-img-vgg16-split1-dr0.85.mat'

    cannot find 'models\ucf101-img-vgg16-split1-dr0.85.mat'

    I follow the setup procedure and download the data and model. But when I run cnn_ucf101_fusion(), it occurs one problem "unable to read file 'models\ucf101-img-vgg16-split1-dr0.85.mat'. No such file or directory". And I check the pretrained models document below, there is no such file. Could you please tell me where I can get this file? And also later the 'models\ucf101-TVL1flow-vgg16-split1-dr0.9.mat'. Thanks!

    opened by laura-wang 2
  • the difference between err1_spatical and err1

    the difference between err1_spatical and err1

    when I adjusted your code to the method that use colour images only with 3D filter (get ride of fusion layer), an interesting thing I find is that the result of err1 and err1_spatical are always different, observe from the code, err1_spatial is extracted from dagnn.Loss layers

    function stats = extractStats(net) % ------------------------------------------------------------------------- sel = find(cellfun(@(x) isa(x,'dagnn.Loss'), {net.layers.block})) ; stats = struct() ; for i = 1:numel(sel) stats.(net.layers(sel(i)).name) = net.layers(sel(i)).block.average ; end

    and err1 is computed by comparing difference between label and prediction

    function [err1, err5] = error_multiclass(opts, labels, predictions) % ------------------------------------------------------------------------- [~,predictions] = sort(predictions, 3, 'descend') ; error = ~bsxfun(@eq, predictions, reshape(labels, 1, 1, 1, [])) ; err1 = sum(sum(sum(error(:,:,1,:)))) ; err5 = sum(sum(sum(min(error(:,:,1:5,:),[],3)))) ;

    so what's the relation between them?

    opened by jiaxue1993 2
  • what are the differences among the 3 split models?

    what are the differences among the 3 split models?

    There are 3 models for each kind of pretrained models, for example, ucf101-img-vgg16-split1.mat, ucf101-img-vgg16-split2.mat and ucf101-img-vgg16-split3.mat. I find that you set nSplit=1 in the cnn_ucf101_fusion.m. What do the other two models mean and what are the differences among these three model? Thank you for your help.

    opened by jiandan42 2
  • RGB and optical flow image numbers do not match in some classes

    RGB and optical flow image numbers do not match in some classes

    RGB and optical flow image numbers do not match in some classes. The most matches.

    After unzipping these two

    For example, 50_FIRST_DATES_kick_f_cm_np1_ba_med_19, it has 1 more RGB image than optical flow image.

    • jpegs_256
      • 50_FIRST_DATES_kick_f_cm_np1_ba_med_19 (48 images)
        • frame000001.jpg
        • frame000002.jpg
        • ...
      • ...
    • tvl1_flow
      • u
        • 50_FIRST_DATES_kick_f_cm_np1_ba_med_19 (47 images)
          • frame000001.jpg
          • frame000002.jpg
          • ...
        • ...
      • v
        • 50_FIRST_DATES_kick_f_cm_np1_ba_med_19 (47 images)
          • frame000001.jpg
          • frame000002.jpg
          • ...
        • ...
    opened by Hongbo-Miao 1
  • forever compiling issue

    forever compiling issue

    Hello everyone, I was running the compile.m file and matlab seems have stucked into the following output line. I wonder if anyone has met the same problem and does anyone have some ideas on this problem?

    I was using matlab 2016a with cuda 8.0 and g++ 4.7.5 version.

    vl_compilenn: MEX LINK: -outdir /home/myhome/twostreamfusion/matconvnet/matlab/mex -lmwblas -ljpeg -L/usr/local/cuda-8.0/lib64 -lcudart -lcublas -lmwgpu -L/usr/local/cuda-8.0/lib64 -lcudnn -largeArrayDims LDFLAGS=$LDFLAGS -Wl,-rpath -Wl,"/usr/local/cuda-8.0/lib64" LDFLAGS=$LDFLAGS -Wl,-rpath -Wl,"/usr/local/cuda-8.0/lib64" /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/vl_nnconv.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/data.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/datamex.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnconv.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnfullyconnected.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnsubsample.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnpooling.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnnormalize.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnbnorm.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnbias.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/im2row_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/subsample_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/copy_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/pooling_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/normalize_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/bnorm_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/tinythread.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/imread.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/im2row_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/subsample_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/copy_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/pooling_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/normalize_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/bnorm_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/datacu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/nnconv_cudnn.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/nnbias_cudnn.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/nnpooling_cudnn.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/imread_libjpeg.o

    opened by ThorinLee 1
  • Using multiple GPUs to train the model error

    Using multiple GPUs to train the model error

    I am trying to use multiple GPUs to train the model. So I changed the default value of opts.train.gpus in the cnn_ucf101_fusion file from [1] to [1 2]. But it reports the following error:

    line 503 int cnn_train_dag (this is the line that call fwrite) the file id is invalid. Please create a valid file id.

    Could you please give some instruction on how to fix it? Thanks!

    opened by kaisenseics 1
  • Epochs for training

    Epochs for training

    Hi, How many epochs you have used for the training the fusion network? The number of 2000 epochs, which is mentioned in the code, is correct? In my computer system each epoch take 1 day to be completed..

    Thanks

    opened by pkoutras 1
  • May I ask two questions?

    May I ask two questions?

    Hello, I am very interested in your work and I am doing some reproduction work based on your work. Now I have two questions which make me a little confused. May I ask about them?

    1.How do you get your final prediction? For example, if I fuse from 'temporal' to 'spatial', should I only use the prediction of spatial net or both of the two nets? And when you got your best result in your paper, the 'nFramesPerVid' you used is also only 1?

    2.Which of these two performs better in your experiment? Fuse from 'temporal' to 'spatial' or fuse from 'spatial' to 'temporal'?

    I am sorry for taking your time and thank you a lot for reading my questions. I'd appreciate it a lot if you could kindly answer my questions.

    opened by MubarkLa 1
  • How to load pre-trained resnet-50 weights?

    How to load pre-trained resnet-50 weights?

    Hello @abursuc and @feichtenhofer, can you tell how the dictionary is structured in the saved .mat weight file? I tried loading them as usual by simply referencing the keys, but the output does'nt make any sense.

    It would be great if you could give some pointers on how to load the dictionary properly so that I can create a model instance and load them.

    opened by sarosijbose 0
  • Speed is too low

    Speed is too low

    My speed is "train: epoch 01: 1/994: lr: 1e-03, 1.2 Hz " with a GTX 1060,it's too low and doesn't make sense. Anyone have the same question?

    opened by Quxyz 2
  • when I try to unzip the UCF101 RBG file, a problem occured

    when I try to unzip the UCF101 RBG file, a problem occured

    After download the dataset and run 'cat ucf101_jpegs_256.zip* > ucf101_jpegs_256.zip', I try to unzip the file, use the command 'unzip ucf101_jpegs_256.zip', but a problem occured, 'jpegs_256/v_TrampolineJumping_g09_c05/frame000020.jpg bad CRC 5860d5eb (should be b9b7fa4a)', is the picture damaged?

    opened by ljmiao 2
  • How Can I get a PYTORCH VERSION?

    How Can I get a PYTORCH VERSION?

    Hello, I would like to know whether there is a version which is based on Pytorch, and if there is, could that be possible to be sharing to me ! Thanks a lot!

    opened by Amazingren 0
  • Pre-computed RGB images cannot be merged

    Pre-computed RGB images cannot be merged

    I have downloaded your pre-computed RGB images, but I have some problems in merging the three parts (ucf101_jpegs_256.zip.001, ucf101_jpegs_256.zip.002, ucf101_jpegs_256.zip.003) into one .zip file. I wonder if some files (e.g. ucf101_jpegs_256.zip) are missing?

    opened by wj320 1
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

Wenhao Wu 114 Nov 27, 2022
MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

null 2 Jan 29, 2022
Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

FPS-Net Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation", accepted by ISPRS journal of Photogrammetry

null 15 Nov 30, 2022
Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

V-Sense 171 Dec 26, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
Code release for Local Light Field Fusion at SIGGRAPH 2019

Local Light Field Fusion Project | Video | Paper Tensorflow implementation for novel view synthesis from sparse input images. Local Light Field Fusion

null 1.1k Dec 27, 2022
Code release for Local Light Field Fusion at SIGGRAPH 2019

Local Light Field Fusion Project | Video | Paper Tensorflow implementation for novel view synthesis from sparse input images. Local Light Field Fusion

null 748 Nov 27, 2021
Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments Paper: arXiv (ICRA 2021) Video : https://youtu.be/CC

Sachini Herath 68 Jan 3, 2023
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
Compressed Video Action Recognition

Compressed Video Action Recognition Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl. In CVPR, 2018. [Proj

Chao-Yuan Wu 479 Dec 26, 2022
AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by DATA Lab at Texas A&M University.

Data Analytics Lab at Texas A&M University 267 Dec 17, 2022
Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)

Efficient Two-Step Networks for Temporal Action Segmentation This repository provides a PyTorch implementation of the paper Efficient Two-Step Network

null 8 Apr 16, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 4, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Human Action Controller - A human action controller running on different platforms.

Human Action Controller (HAC) Goal A human action controller running on different platforms. Fun Easy-to-use Accurate Anywhere Fun Examples Mouse Cont

null 27 Jul 20, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 6, 2023
Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

Facebook Research 123 Dec 13, 2022