Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

Related tags

Deep Learning PRP
Overview

PRP

Introduction

This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

Getting started

  • Install

    Our experiments run on Python 3.6.1 and PyTorch 0.4.1. All dependencies can be installed using pip:

    python -m pip install -r requirements.txt
  • Data preparation

    We construct experiments on UCF101 and HMDB51 (the split1 of UCF101 for pre-training and the rest for fine-tuning). The expected dataset directory hierarchy is as follow:

    ├── UCF101/HMDB51
    │   ├── split
    │   │   ├── classInd.txt
    │   │   ├── testlist01.txt
    │   │   ├── trainlist01.txt
    │   │   └── ...
    │   └── video
    │       ├── ApplyEyeMakeup
    │       │   └── *.avi
    │       └── ...
    └── ...
    
  • Train and Test Pre-training on Pretext Task

    python train_predict.py --gpu 0 --epoch 300 --model_name c3d/r21d/r3d

    Action Recognition

    python ft_classfy.py --gpu 0 --model_name c3d/r21d/r3d --pre_path [your pre-trained model] --split 1/2/3
    python test_classify.py

    Video Retrieval

    Please refer to the code video_retrieval_samples.py of VCOP.

Model zoo

  • Models

    Pre-trained PRP model on the split1 of UCF101: C3D(OneDrive); R3D(OneDrive); R(2+1)D(OneDrive)

  • Action Recognition Results

    Architecture UCF101(%) HMDB51(%)
    C3D 69.1 34.5
    R3D 66.5 29.7
    R(2+1)D 72.1 35.0

License

This project is released under the Apache 2.0 license.

Citation

Please cite the following paper if you feel RSPNet useful to your research

@InProceedings{Yao_2020_CVPR,  
author = {Yao, Yuan and Liu, Chang and Luo, Dezhao and Zhou, Yu and Ye, Qixiang},  
title = {Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning},  
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},  
month = {June},  
year = {2020}  
}
Comments
  • Can't run train_predict.py

    Can't run train_predict.py

    Can't run the train_predict.py, after i download the whole file and the datasets, then when i tried to run train_predict.py it stopped. image Is there anything that i do it wrong? Thanks 👍

    opened by dD0852974 7
  • Dataloading Error

    Dataloading Error

    While training, during loading of batches, in the 3rd batch I'm facing this error. I tried to modify the random seed but it's stuck at 3rd batch specifically.

    File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 77, in getitem videodata, sample_step_label = self.loadcvvideo_Finsert(index, recon_rate=recon_rate, sample_step=None) File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 179, in loadcvvideo_Finsert buffer, sample_step_label = self.loadcvvideo_Finsert(index, recon_rate, sample_step) File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 132, in loadcvvideo_Finsert sample_step_proposal = self.sample_retrieval[recon_rate] KeyError: 1

    recon_rate is 1 I tried try and except to skip the video causing the error but then again new error pops up. The video path is correct but there's some error with the buffer length I'm facing. retaining:False buffer_len:106 sample_len:128

    If I skip this video, then at this line https://github.com/yuanyao366/PRP/blob/58a301d92a540c915296de0d60a1cbaa304f0819/datasets/predict_dataset.py#L71

    it's showing me NoneType object error.

    opened by AKASH2907 6
  • sample step manipulation

    sample step manipulation

    If I change the sample step from 1,2,4,8 to 1,2,4 or 1,2, do I need to modify some lines in pat_region.py. If so, can you point it out? I changed the sample step list and again the loss nan error starts coming in each epoch at some iteration over a batch. Using kinetics dataset for pre-training.

    opened by AKASH2907 5
  • Difference in weights loading

    Difference in weights loading

    I have no problems in code just have some doubts about weight loading

    1. Why is there a difference in weight loading in train_predict.py and ft_classfy.py?
    2. Did you use nn.Dataparallel even with 1 GPU to add a "module" in the starting of parameters and save it like that to maintain consistency of each checkpoint?
    3. If we use 2 networks, one successive to the other, the previous network weights are saved as module.base_network?
    4. Does it mean we modify the base network (c3d, r21d, r3d) parameters according to the pretext task as in whole? and then use only those weights for finetuning on the downstream task dataset?
    5. In ft_classfy load_pretrained _weights function why did you modify it to module.base_network and +14:. I mean since the strict is False, so it will upload only those weights which are common to both the networks? I mean only base network? What's the problem in this?
    def load_pretrained_weights(ckpt_path):
        """load pretrained weights and adjust params name."""
        adjusted_weights = {}
        pretrained_weights = torch.load(ckpt_path)
        for name, params in pretrained_weights.items():
            if 'base_network' in name:
                name = name[name.find('.')+1:]
                adjusted_weights[name] = params
        return adjusted_weights
    

    Thanks.

    opened by AKASH2907 4
  • Error when running ft_classfy.py

    Error when running ft_classfy.py

    Hi, I am facing an issue trying to run the finetuning task for PRP.

    Exact Command used: python ft_classfy.py --gpu 0 --model_name c3d --pre_path 0 --split 1

    The error is added as image attachment image

    Steps before running this command:

    1. The environment contains the more recent versions of packages
    2. Modified dataset locations throughout the project according to my folder structure
    3. With these changes, training on pretext task worked fine. Ran command: python3 train_predict.py --gpu 0 --epoch 300 --model_name c3d (Due to resource restriction, terminated after 147 epochs)
    4. Added the path to the best model in file ft_classfy.py in pretrain_path0. From the output text file, it seems that this is best model location is being picked up as the model.

    Solutions attempted:

    1. To make sure the issue is not due to train_predict terminating before completing 300 epochs, I ran it for 2 epochs and tried with the best model obtained there too.
    2. Similar error first occurred with the mode attribute when the code hit the part "if self.mode == 'train'" in the class ClassifyDataSet(data.Dataset) in predict_dataset.py. This got resolved once I moved the self.mode=mode above the said condition check.
    3. To me, the attribute error regarding dataset looked like an issue of child class not having the attributes of the parent, so I also tried super().init() and super().init(*args, **kwargs) without any success.

    Can you please advise regarding this, specially if I have missed any step after running the train_predict? Is there any other area of the project which is to be modifed to add the pretrained model path apart from pretrain_path0 in ft_classfy.py? I have also added a snippet with the last part of the output file (separate from the error file) image

    Please let me know if any other information would be helpful. Thank you in advance.

    opened by falguni7 2
  • Loss NaN error

    Loss NaN error

    Hi,

    I was running your code and after few epochs, nan loss started appearing. I'm sharing it from epoch 99 but it started appearing with epoch 5 or so.


    Epoch:[99][200/278] data_time:0.128,batch time:1.571 loss:nan loss_recon:nan loss_class:nan accuracy:27.125 [TRAIN] loss_cls: nan, acc: 0.266 tensor([2367., 0., 0., 0.]) tensor([2367., 2146., 2168., 2215.]) tensor([1., 0., 0., 0.]) 33%|3 | 99/300 [27:19:23<56:29:50, 1011.89s/it][VAL] loss_cls: nan, acc: 0.292 tensor([467., 0., 0., 0.]) tensor([467., 363., 385., 385.]) tensor([1., 0., 0., 0.]) WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor.

    conv_lr:0.001 fc8_lr:0.010000000000000002 Epoch:[100][100/278] data_time:0.126,batch time:1.425 loss:nan loss_recon:nan loss_class:nan accuracy:26.969

    I'm unable to figure out where did I went wrong? What should I modify? I'm working on the Kinetics dataset

    When I trained on UCF 101 dataset, this didn't happen. I checked for 90 epochs and the pretext task accuracy also increased, here it's stuck at 26%.

    opened by AKASH2907 2
  • About softmax

    About softmax

    hi,
    when training in self-supervised and fine-tune stage. your work dont use softmax after linear layer, which some other related work is used. So,my question is,: In training, what is the difference when softmax is applied or not. Best wishes.

    opened by 321hallelujah 1
  • Error in predict_dataset.py

    Error in predict_dataset.py

    I think the logic is not strict.

    In the predict_dataset.py, I list several lines in the following.

    count_need=16
    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
    start = np.random.randint(0, frame_count - count_need + 1)
    

    There is no garantee that the frame_count is larger than the default setting 16, and it will report an error here.

    File "/raid/home/taoli/exp/prp/datasets/predict_dataset.py", line 222, in loadcvvideo
        start = np.random.randint(0, frame_count - count_need + 1);
      File "mtrand.pyx", line 993, in mtrand.RandomState.randint
    ValueError: low >= high
    

    When I add the following codes

    if frame_count - count_need + 1 < 1:
        start = 0
    else:
        start = np.random.randint(0, frame_count - count_need + 1)
    

    It will print too many reload msg in the command line.

    opened by BestJuly 1
  • Error When Loading Pretrained Models

    Error When Loading Pretrained Models

    I am facing an error when loading the state_dict. It states that there are missing keys ("linear.weight", "linear.bias") as well as unexpected keys. See attached image for error details. state_dict

    Please note that this particular screenshot is from when I try C3D. Similar error happens when I try to load R3D or R(2+1)D. Please advise. Thank you in advance.

    opened by shahd-seddik 1
  • decoder structure of C3D, R3D, R21D

    decoder structure of C3D, R3D, R21D

    Based on your repo, it seems that you are using the same decoder structure for all of the backbones(c3d, r3d, r(2+1)d). But in your paper, it seems you used different decoder structure based on C3D-block, R2D-block and R21D-block.

    We cannot reproduce the result reported in the paper base on current code. Could you also provide your decoder implementation of r3d, r(2+1)d?

    opened by hanwen0529 3
Owner
yuanyao366
yuanyao366
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022
Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.

RESA PyTorch implementation of the paper "RESA: Recurrent Feature-Shift Aggregator for Lane Detection". Our paper has been accepted by AAAI2021. Intro

null 137 Jan 2, 2023
Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

null 217 Jan 3, 2023
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

Facebook Research 1k Dec 31, 2022
Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

Michael Schlichtkrull 29 Sep 2, 2022
Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

Lottery Jackpots Exist in Pre-trained Models (Paper Link) Requirements Python >= 3.7.4 Pytorch >= 1.6.1 Torchvision >= 0.4.1 Reproduce the Experiment

Yuxin Zhang 27 Jun 28, 2022
The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

Chenyu 109 Dec 23, 2022
The repository offers the official implementation of our paper in PyTorch.

Cloth Interactive Transformer (CIT) Cloth Interactive Transformer for Virtual Try-On Bin Ren1, Hao Tang1, Fanyang Meng2, Runwei Ding3, Ling Shao4, Phi

Bingoren 49 Dec 1, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

SSAN Introduction This is the pytorch implementation of the SSAN model (see our AAAI2021 paper: Entity Structure Within and Throughout: Modeling Menti

benfeng 69 Nov 15, 2022
Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

S2VC Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations. In thi

null 81 Dec 15, 2022
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

null 202 Dec 30, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

StructDepth PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimat

SJTU-ViSYS 112 Nov 28, 2022
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022