Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Christoph Feichtenhofer

Last update: Dec 31, 2022

Related tags

Deep Learning twostreamfusion

Overview

================================================================================

Convolutional Two-Stream Network Fusion for Video Action Recognition

This repository contains the code for our CVPR 2016 paper:

Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman
"Convolutional Two-Stream Network Fusion for Video Action Recognition"
in Proc. CVPR 2016

If you find the code useful for your research, please cite our paper:

    @inproceedings{feichtenhofer2016convolutional,
      title={Convolutional Two-Stream Network Fusion for Video Action Recognition},
      author={Feichtenhofer, Christoph and Pinz, Axel and Zisserman, Andrew},
      booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2016}
    }

Requirements

The code was tested on Ubuntu 14.04 and Windows 10 using MATLAB R2015b and NVIDIA Titan X or Z GPUs.

If you have questions regarding the implementation please contact:

Christoph Feichtenhofer

================================================================================

Setup

Download the code git clone --recursive https://github.com/feichtenhofer/twostreamfusion
Compile the code by running compile.m.
- This will also compile a modified (and older) version of the MatConvNet toolbox. In case of any issues, please follow the installation instructions on the MatConvNet homepage.
Edit the file cnn_setup_environment.m to adjust the models and data paths.
Download pretrained model files and the datasets, linked below and unpack them into your models/data directory.

Optionally you can pretrain your own twostream models by running
1. cnn_ucf101_spatial(); to train the appearance network stream.
2. cnn_ucf101_temporal(); to train the optical flow network stream.

Run cnn_ucf101_fusion(); this will use the downloaded models and demonstrate training of our final architecture on UCF101/HMDB51.
- In case you would like to train on the CPU, clear the variable opts.train.gpus
- In case you encounter memory issues on your GPU, consider decreasing the cudnnWorkspaceLimit (512MB is default)

Pretrained models

Download our baseline networks trained on UCF101 here:

Data

Pre-computed optical flow images and resized rgb frames for the UCF101 and HMDB51 datasets

UCF101 RGB: part1 part2 part3
UCF101 Flow: part1 part2 part3
HMDB51 RGB: part1
HMDB51 Flow: part1

Use it on your own dataset

Our Optical flow extraction tool provides OpenCV wrappers for optical flow extraction on a GPU.

Comments

not able to open your data

do you mind double check you Pre-computed optical flow images and resized rgb frames for the UCF101 and HMDB51 datasets? I tried download and open your data twice, still not able to open it

opened by jiaxue1993 9
The dataset links are down!

The links are invalid for download the datasets and models. Could you please renew the link you offered so we could download the datasets and models for study? Thanks a lot!

The invalid links ↓

opened by CooperLi 4
Can I get the 92.5% accuracy by directly run your code?

Hello. I downloaded your code for the paper 'Two-Stream Convolutional Networks for Action Recognition in Videos', and run it directly. I tried many times but I can only get a result of 91.5%. Should I change somewhere in your code to get the 92.5% in your paper?

opened by MubarkLa 3
$cannot find 'models\ucf101-img-vgg16-split1-dr0.85.mat'$

cannot find 'models\ucf101-img-vgg16-split1-dr0.85.mat'

I follow the setup procedure and download the data and model. But when I run cnn_ucf101_fusion(), it occurs one problem "unable to read file 'models\ucf101-img-vgg16-split1-dr0.85.mat'. No such file or directory". And I check the pretrained models document below, there is no such file. Could you please tell me where I can get this file? And also later the 'models\ucf101-TVL1flow-vgg16-split1-dr0.9.mat'. Thanks!

opened by laura-wang 2
the difference between err1_spatical and err1

when I adjusted your code to the method that use colour images only with 3D filter (get ride of fusion layer), an interesting thing I find is that the result of err1 and err1_spatical are always different, observe from the code, err1_spatial is extracted from dagnn.Loss layers

function stats = extractStats(net) % ------------------------------------------------------------------------- sel = find(cellfun(@(x) isa(x,'dagnn.Loss'), {net.layers.block})) ; stats = struct() ; for i = 1:numel(sel) stats.(net.layers(sel(i)).name) = net.layers(sel(i)).block.average ; end

and err1 is computed by comparing difference between label and prediction

function [err1, err5] = error_multiclass(opts, labels, predictions) % ------------------------------------------------------------------------- [~,predictions] = sort(predictions, 3, 'descend') ; error = ~bsxfun(@eq, predictions, reshape(labels, 1, 1, 1, [])) ; err1 = sum(sum(sum(error(:,:,1,:)))) ; err5 = sum(sum(sum(min(error(:,:,1:5,:),[],3)))) ;

so what's the relation between them?

opened by jiaxue1993 2
what are the differences among the 3 split models?

There are 3 models for each kind of pretrained models, for example, ucf101-img-vgg16-split1.mat, ucf101-img-vgg16-split2.mat and ucf101-img-vgg16-split3.mat. I find that you set nSplit=1 in the cnn_ucf101_fusion.m. What do the other two models mean and what are the differences among these three model? Thank you for your help.

opened by jiandan42 2
RGB and optical flow image numbers do not match in some classes
RGB and optical flow image numbers do not match in some classes. The most matches.

After unzipping these two

HMDB51 RGB: part1

HMDB51 Flow: part1

For example, 50_FIRST_DATES_kick_f_cm_np1_ba_med_19, it has 1 more RGB image than optical flow image.

jpegs_256

50_FIRST_DATES_kick_f_cm_np1_ba_med_19 (48 images)

frame000001.jpg

frame000002.jpg

...

...

tvl1_flow

u

50_FIRST_DATES_kick_f_cm_np1_ba_med_19 (47 images)

frame000001.jpg

frame000002.jpg

...

...

v

50_FIRST_DATES_kick_f_cm_np1_ba_med_19 (47 images)

frame000001.jpg

frame000002.jpg

...

...
opened by Hongbo-Miao 1
forever compiling issue

Hello everyone, I was running the compile.m file and matlab seems have stucked into the following output line. I wonder if anyone has met the same problem and does anyone have some ideas on this problem?

I was using matlab 2016a with cuda 8.0 and g++ 4.7.5 version.

vl_compilenn: MEX LINK: -outdir /home/myhome/twostreamfusion/matconvnet/matlab/mex -lmwblas -ljpeg -L/usr/local/cuda-8.0/lib64 -lcudart -lcublas -lmwgpu -L/usr/local/cuda-8.0/lib64 -lcudnn -largeArrayDims LDFLAGS=$LDFLAGS -Wl,-rpath -Wl,"/usr/local/cuda-8.0/lib64" LDFLAGS=$LDFLAGS -Wl,-rpath -Wl,"/usr/local/cuda-8.0/lib64" /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/vl_nnconv.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/data.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/datamex.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnconv.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnfullyconnected.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnsubsample.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnpooling.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnnormalize.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnbnorm.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/nnbias.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/im2row_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/subsample_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/copy_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/pooling_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/normalize_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/bnorm_cpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/tinythread.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/imread.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/im2row_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/subsample_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/copy_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/pooling_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/normalize_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/bnorm_gpu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/datacu.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/nnconv_cudnn.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/nnbias_cudnn.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/nnpooling_cudnn.o /home/myhome/twostreamfusion/matconvnet/matlab/mex/.build/bits/impl/imread_libjpeg.o

opened by ThorinLee 1
Using multiple GPUs to train the model error

I am trying to use multiple GPUs to train the model. So I changed the default value of opts.train.gpus in the cnn_ucf101_fusion file from [1] to [1 2]. But it reports the following error:

line 503 int cnn_train_dag (this is the line that call fwrite) the file id is invalid. Please create a valid file id.

Could you please give some instruction on how to fix it? Thanks!

opened by kaisenseics 1
Epochs for training

Hi, How many epochs you have used for the training the fusion network? The number of 2000 epochs, which is mentioned in the code, is correct? In my computer system each epoch take 1 day to be completed..

Thanks

opened by pkoutras 1
May I ask two questions?

Hello, I am very interested in your work and I am doing some reproduction work based on your work. Now I have two questions which make me a little confused. May I ask about them?

1.How do you get your final prediction? For example, if I fuse from 'temporal' to 'spatial', should I only use the prediction of spatial net or both of the two nets? And when you got your best result in your paper, the 'nFramesPerVid' you used is also only 1?

2.Which of these two performs better in your experiment? Fuse from 'temporal' to 'spatial' or fuse from 'spatial' to 'temporal'?

I am sorry for taking your time and thank you a lot for reading my questions. I'd appreciate it a lot if you could kindly answer my questions.

opened by MubarkLa 1
How to load pre-trained resnet-50 weights?

Hello @abursuc and @feichtenhofer, can you tell how the dictionary is structured in the saved .mat weight file? I tried loading them as usual by simply referencing the keys, but the output does'nt make any sense.

It would be great if you could give some pointers on how to load the dictionary properly so that I can create a model instance and load them.

opened by sarosijbose 0
Speed is too low

My speed is "train: epoch 01: 1/994: lr: 1e-03, 1.2 Hz " with a GTX 1060,it's too low and doesn't make sense. Anyone have the same question?

opened by Quxyz 2
when I try to unzip the UCF101 RBG file, a problem occured

After download the dataset and run 'cat ucf101_jpegs_256.zip* > ucf101_jpegs_256.zip', I try to unzip the file, use the command 'unzip ucf101_jpegs_256.zip', but a problem occured, 'jpegs_256/v_TrampolineJumping_g09_c05/frame000020.jpg bad CRC 5860d5eb (should be b9b7fa4a)', is the picture damaged?

opened by ljmiao 2
How Can I get a PYTORCH VERSION?

Hello, I would like to know whether there is a version which is based on Pytorch, and if there is, could that be possible to be sharing to me ! Thanks a lot!

opened by Amazingren 0
Pre-computed RGB images cannot be merged

I have downloaded your pre-computed RGB images, but I have some problems in merging the three parts (ucf101_jpegs_256.zip.001, ucf101_jpegs_256.zip.002, ucf101_jpegs_256.zip.003) into one .zip file. I wonder if some files (e.g. ucf101_jpegs_256.zip) are missing?

opened by wj320 1

Owner

Christoph Feichtenhofer

GitHub http://www.robots.ox.ac.uk/~vgg/software/two_stream_action/

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

87 Dec 16, 2022

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

114 Nov 27, 2022

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

2 Jan 29, 2022

Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

FPS-Net Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation", accepted by ISPRS journal of Photogrammetry

15 Nov 30, 2022

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

171 Dec 26, 2022

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

20 Jan 3, 2023

Code release for Local Light Field Fusion at SIGGRAPH 2019

Local Light Field Fusion Project | Video | Paper Tensorflow implementation for novel view synthesis from sparse input images. Local Light Field Fusion

1.1k Dec 27, 2022

Code release for Local Light Field Fusion at SIGGRAPH 2019

Local Light Field Fusion Project | Video | Paper Tensorflow implementation for novel view synthesis from sparse input images. Local Light Field Fusion

748 Nov 27, 2021

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments Paper: arXiv (ICRA 2021) Video : https://youtu.be/CC

68 Jan 3, 2023

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

1.1k Dec 25, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Compressed Video Action Recognition

Compressed Video Action Recognition Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl. In CVPR, 2018. [Proj

479 Dec 26, 2022

AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by DATA Lab at Texas A&M University.

Data Analytics Lab at Texas A&M University

267 Dec 17, 2022

Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)

Efficient Two-Step Networks for Temporal Action Segmentation This repository provides a PyTorch implementation of the paper Efficient Two-Step Network

8 Apr 16, 2022

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

70 Nov 4, 2022

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

67 Jan 3, 2023

Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Related tags

Overview

Convolutional Two-Stream Network Fusion for Video Action Recognition

Requirements

Setup

Pretrained models

Data

Use it on your own dataset

Comments

Owner

Christoph Feichtenhofer

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Code release for Local Light Field Fusion at SIGGRAPH 2019

Code release for Local Light Field Fusion at SIGGRAPH 2019

Fusion-DHL: WiFi, IMU, and Floorplan Fusion for Dense History of Locations in Indoor Environments

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Compressed Video Action Recognition

AutoVideo: An Automated Video Action Recognition System

Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Human Action Controller - A human action controller running on different platforms.

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Code release for ICCV 2021 paper "Anticipative Video Transformer"