Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

yuanyao366

Last update: Dec 29, 2022

Related tags

Deep Learning PRP

Overview

PRP

Introduction

This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

Getting started

Install

Our experiments run on Python 3.6.1 and PyTorch 0.4.1. All dependencies can be installed using pip:
```
python -m pip install -r requirements.txt
```

Data preparation

We construct experiments on UCF101 and HMDB51 (the split1 of UCF101 for pre-training and the rest for fine-tuning). The expected dataset directory hierarchy is as follow:

├── UCF101/HMDB51
│   ├── split
│   │   ├── classInd.txt
│   │   ├── testlist01.txt
│   │   ├── trainlist01.txt
│   │   └── ...
│   └── video
│       ├── ApplyEyeMakeup
│       │   └── *.avi
│       └── ...
└── ...

Train and Test Pre-training on Pretext Task

python train_predict.py --gpu 0 --epoch 300 --model_name c3d/r21d/r3d

Action Recognition

python ft_classfy.py --gpu 0 --model_name c3d/r21d/r3d --pre_path [your pre-trained model] --split 1/2/3
python test_classify.py

Video Retrieval

Please refer to the code video_retrieval_samples.py of VCOP.

Model zoo

Models

Pre-trained PRP model on the split1 of UCF101: C3D(OneDrive); R3D(OneDrive); R(2+1)D(OneDrive)

Action Recognition Results

Architecture	UCF101(%)	HMDB51(%)
C3D	69.1	34.5
R3D	66.5	29.7
R(2+1)D	72.1	35.0

License

This project is released under the Apache 2.0 license.

Citation

Please cite the following paper if you feel RSPNet useful to your research

@InProceedings{Yao_2020_CVPR,  
author = {Yao, Yuan and Liu, Chang and Luo, Dezhao and Zhou, Yu and Ye, Qixiang},  
title = {Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning},  
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},  
month = {June},  
year = {2020}  
}

Comments

Can't run train_predict.py

Can't run the train_predict.py, after i download the whole file and the datasets, then when i tried to run train_predict.py it stopped. Is there anything that i do it wrong? Thanks 👍

opened by dD0852974 7
Dataloading Error

While training, during loading of batches, in the 3rd batch I'm facing this error. I tried to modify the random seed but it's stuck at 3rd batch specifically.

File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 77, in getitem videodata, sample_step_label = self.loadcvvideo_Finsert(index, recon_rate=recon_rate, sample_step=None) File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 179, in loadcvvideo_Finsert buffer, sample_step_label = self.loadcvvideo_Finsert(index, recon_rate, sample_step) File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 132, in loadcvvideo_Finsert sample_step_proposal = self.sample_retrieval[recon_rate] KeyError: 1

recon_rate is 1 I tried try and except to skip the video causing the error but then again new error pops up. The video path is correct but there's some error with the buffer length I'm facing. retaining:False buffer_len:106 sample_len:128

If I skip this video, then at this line https://github.com/yuanyao366/PRP/blob/58a301d92a540c915296de0d60a1cbaa304f0819/datasets/predict_dataset.py#L71

it's showing me NoneType object error.

opened by AKASH2907 6
sample step manipulation

If I change the sample step from 1,2,4,8 to 1,2,4 or 1,2, do I need to modify some lines in pat_region.py. If so, can you point it out? I changed the sample step list and again the loss nan error starts coming in each epoch at some iteration over a batch. Using kinetics dataset for pre-training.

opened by AKASH2907 5
Difference in weights loading
I have no problems in code just have some doubts about weight loading

Why is there a difference in weight loading in train_predict.py and ft_classfy.py?

Did you use nn.Dataparallel even with 1 GPU to add a "module" in the starting of parameters and save it like that to maintain consistency of each checkpoint?

If we use 2 networks, one successive to the other, the previous network weights are saved as module.base_network?

Does it mean we modify the base network (c3d, r21d, r3d) parameters according to the pretext task as in whole? and then use only those weights for finetuning on the downstream task dataset?

In ft_classfy load_pretrained _weights function why did you modify it to module.base_network and +14:. I mean since the strict is False, so it will upload only those weights which are common to both the networks? I mean only base network? What's the problem in this?

def load_pretrained_weights(ckpt_path): """load pretrained weights and adjust params name.""" adjusted_weights = {} pretrained_weights = torch.load(ckpt_path) for name, params in pretrained_weights.items(): if 'base_network' in name: name = name[name.find('.')+1:] adjusted_weights[name] = params return adjusted_weights

Thanks.
opened by AKASH2907 4
Error when running ft_classfy.py
Hi, I am facing an issue trying to run the finetuning task for PRP.

Exact Command used: python ft_classfy.py --gpu 0 --model_name c3d --pre_path 0 --split 1

The error is added as image attachment

Steps before running this command:

The environment contains the more recent versions of packages

Modified dataset locations throughout the project according to my folder structure

With these changes, training on pretext task worked fine. Ran command: python3 train_predict.py --gpu 0 --epoch 300 --model_name c3d (Due to resource restriction, terminated after 147 epochs)

Added the path to the best model in file ft_classfy.py in pretrain_path0. From the output text file, it seems that this is best model location is being picked up as the model.

Solutions attempted:

To make sure the issue is not due to train_predict terminating before completing 300 epochs, I ran it for 2 epochs and tried with the best model obtained there too.

Similar error first occurred with the mode attribute when the code hit the part "if self.mode == 'train'" in the class ClassifyDataSet(data.Dataset) in predict_dataset.py. This got resolved once I moved the self.mode=mode above the said condition check.

To me, the attribute error regarding dataset looked like an issue of child class not having the attributes of the parent, so I also tried super().init() and super().init(*args, **kwargs) without any success.

Can you please advise regarding this, specially if I have missed any step after running the train_predict? Is there any other area of the project which is to be modifed to add the pretrained model path apart from pretrain_path0 in ft_classfy.py? I have also added a snippet with the last part of the output file (separate from the error file)

Please let me know if any other information would be helpful. Thank you in advance.
opened by falguni7 2
Loss NaN error

Hi,

I was running your code and after few epochs, nan loss started appearing. I'm sharing it from epoch 99 but it started appearing with epoch 5 or so.

Epoch:[99][200/278] data_time:0.128,batch time:1.571 loss:nan loss_recon:nan loss_class:nan accuracy:27.125 [TRAIN] loss_cls: nan, acc: 0.266 tensor([2367., 0., 0., 0.]) tensor([2367., 2146., 2168., 2215.]) tensor([1., 0., 0., 0.]) 33%|3 | 99/300 [27:19:23<56:29:50, 1011.89s/it][VAL] loss_cls: nan, acc: 0.292 tensor([467., 0., 0., 0.]) tensor([467., 363., 385., 385.]) tensor([1., 0., 0., 0.]) WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor.

conv_lr:0.001 fc8_lr:0.010000000000000002 Epoch:[100][100/278] data_time:0.126,batch time:1.425 loss:nan loss_recon:nan loss_class:nan accuracy:26.969

I'm unable to figure out where did I went wrong? What should I modify? I'm working on the Kinetics dataset

When I trained on UCF 101 dataset, this didn't happen. I checked for 90 epochs and the pretext task accuracy also increased, here it's stuck at 26%.

opened by AKASH2907 2
About softmax

hi,
when training in self-supervised and fine-tune stage. your work dont use softmax after linear layer, which some other related work is used. So,my question is,: In training, what is the difference when softmax is applied or not. Best wishes.

opened by 321hallelujah 1

Error in predict_dataset.py

I think the logic is not strict.

In the predict_dataset.py, I list several lines in the following.

count_need=16
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
start = np.random.randint(0, frame_count - count_need + 1)

There is no garantee that the frame_count is larger than the default setting 16, and it will report an error here.

File "/raid/home/taoli/exp/prp/datasets/predict_dataset.py", line 222, in loadcvvideo
    start = np.random.randint(0, frame_count - count_need + 1);
  File "mtrand.pyx", line 993, in mtrand.RandomState.randint
ValueError: low >= high

When I add the following codes

if frame_count - count_need + 1 < 1:
    start = 0
else:
    start = np.random.randint(0, frame_count - count_need + 1)

It will print too many reload msg in the command line.

opened by BestJuly 1

Error When Loading Pretrained Models

I am facing an error when loading the state_dict. It states that there are missing keys ("linear.weight", "linear.bias") as well as unexpected keys. See attached image for error details.

Please note that this particular screenshot is from when I try C3D. Similar error happens when I try to load R3D or R(2+1)D. Please advise. Thank you in advance.

opened by shahd-seddik 1
decoder structure of C3D, R3D, R21D

Based on your repo, it seems that you are using the same decoder structure for all of the backbones(c3d, r3d, r(2+1)d). But in your paper, it seems you used different decoder structure based on C3D-block, R2D-block and R21D-block.

We cannot reproduce the result reported in the paper base on current code. Could you also provide your decoder implementation of r3d, r(2+1)d?

opened by hanwen0529 3

Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

Related tags

Overview

PRP

Introduction

Getting started

Model zoo

License

Citation

Comments

Owner

yuanyao366

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

The repository offers the official implementation of our paper in PyTorch.

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules