code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

Overview

Video_Pace

This repository contains the code for the following paper:

Jiangliu Wang, Jianbo Jiao and Yunhui Liu, "Self-Supervised Video Representation Learning by Pace Prediction", In: ECCV (2020).


Main idea:

teaser

Framework:

framework

Requirements

  • pytroch >= 1.3.0
  • tensorboardX
  • cv2
  • scipy

Usage

Data preparation

UCF101 dataset

  • Download the original UCF101 dataset from the official website. And then extarct RGB images from videos.
  • Or direclty download the pre-processed RGB data of UCF101 here provided by feichtenhofer.

Pre-train

Train with pace prediction task on S3D-G, the default clip length is 64 and input video size is 224 x 224.

python train.py --rgb_prefix RGB_DIR --gpu 0,1,2,3 --bs 32 --lr 0.001 --height 256 --width 256 --crop_sz 224 --clip_len 64

Train with pace prediction task on c3d/r3d/r21d, the default clip length is 16 and input video size is 112 x 112.

python train.py --rgb_prefix RGB_DIR --gpu 0 --bs 30 --lr 0.001 --model c3d/r3d/r21d --height 128 --width 171 --crop_sz 112 --clip_len 16

Evaluation

To be updated...

Citation

If you find this work useful or use our code, please consider citing:

@InProceedings{Wang20,
  author       = "Jiangliu Wang and Jianbo Jiao and Yunhui Liu",
  title        = "Self-Supervised Video Representation Learning by Pace Prediction",
  booktitle    = "European Conference on Computer Vision",
  year         = "2020",
}

Acknowlegement

Part of our codes are adapted from S3D-G HowTO100M, we thank the authors for their contributions.

Comments
  • Cannot reproduce the supervised performance on UCF101

    Cannot reproduce the supervised performance on UCF101

    Thank you very much for your inspiring work. However, I encountered a problem when reproducing the performance. I followed your code to do the self-supervised learning. I got about 60-70% accuracy in pace prediction. However, when I freeze the Conv weights and only train the final FC layer for supervised learning, I just got 0.10 average accuracy on training. When training final FC, I used the same data augmentation method as self-supervised learning as your paper said. Could you please tell me more about the fine-tuning details?

    opened by KT27-A 5
  • Code verify

    Code verify

    def loop_load_rgb(self, video_dir, start_frame, sample_rate, clip_len,
                          num_frames):
    
            video_clip = []
            idx = 0
    
            for i in range(clip_len):
                cur_img_path = os.path.join(
                    video_dir,
                    "frame" + "{:06}.jpg".format(start_frame + idx * sample_rate))
    
                img = cv2.imread(cur_img_path)
                video_clip.append(img)
    
                if (start_frame + (idx + 1) * sample_rate) > num_frames:
                    start_frame = 1 <--
                    idx = 0
                else:
                    idx += 1
    

    why is the blank frames starting from the begining? should the start_frame be commented out?

    opened by jaypatravali 5
  • error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'

    error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'

    I got the error like this cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-6sxsq0tp\opencv\modules\imgproc\src\resize.cpp:3929: error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize' and I dont know how to address it, could you help me?

    opened by loovi7 2
  • About the epoch number

    About the epoch number

    Hi, thank you for your work. I have a question about the epoch number in your paper.

    While when pretraining on UCF101 dataset, as it only contains around 9k videos in the training split, we set epoch size to be around 90k for temporal jittering following [1].

    I found in [1], there are some descriptions which you might refer to:

    For inference on the downstream tasks, we uniformly sample 10 clips per testing example and average their predictions to make a video-level prediction.

    It is strange that using self.rgb_lines = list(lines) * 10 in ucf101.py and mention the total epoch number is 18. And if video clips are sampled randomly in temporal axis, using 180 epochs will have the same effect.

    Therefore, my question is why not just using 180 epochs to train and conduct temporal jittering during each sampling procedure? Then just use self.rgb_lines = list(lines) and set the epoch number to 180 would be more clear for code. There might be some reasons or tricks that I have not noticed. Thank you in advance.

    opened by BestJuly 2
  • How to preprocess data?

    How to preprocess data?

    I want to try this with a custom video dataset. How can I do it? I have the following questions.

    1. What should be the maximum length of a single training video clip? - Is it similar to the UCF101 dataset or we can use longer videos?
    2. Should each video be saved as image frames?
    3. What is the framerate?
    opened by shamanez 1
  • Request for explaning

    Request for explaning

    hi wang! thanks for your wonderful work. i saw your code, but i can't find contrastive loss part... would you explain your code..? i can see only cls cross entropy loss. thank you.

    opened by youwantsy 0
  • Image preprocessing

    Image preprocessing

    https://github.com/laura-wang/video-pace/blob/master/datasets/ucf101.py#L82

    Why using ClipResize((128,171)) instead of (128, 128) in the preprocessing stage?

    opened by hanwen0529 0
  • Sampling rate

    Sampling rate

    Hello. Thanks for the great work.

    I noticed that the maximum sampling rate for your implementation is 4 which seems to enable the faster sampling rates.

    Can you provide how you designed the possible pace candidates? (Pace lists) And if possible, with the updated implementations.

    Thank you.

    opened by wjun0830 0
  • The Evaluation Code

    The Evaluation Code

    Hello Laura

    Please could you post the evaluation code. There are details that are missing in the paper for example:

    Lets say that the network takes a 16 frames clip as input then the ideal testing video length is 160 frames.

    1- What is the exact sampling method when evaluating on the testing videos for action recognition? 2- If the testing video is too short, 24 frames, ( there is no way to sample unique 10 clips from it), What is your method to overcome this case? For example, do you abandon short videos or do you pad them ?

    3- If the testing video is longer than the required length, 300 frames, then what are the sampling indices of the 10 clips ?

    Thanks

    opened by Hussein-A-Hassan 0
  • Request for pretrained model

    Request for pretrained model

    Hello Laura, Thanks for sharing your wondeful work! I wonder if you can provide me with your pretrained model(best if just pretrained with prediction task)? Please pardon my question if it is not proper from your perspective.

    Best regards, Sean

    opened by GSeanCDAT 2
  • What does  self.rgb_lines = list(lines) * 10 mean?

    What does self.rgb_lines = list(lines) * 10 mean?

    Hi,

    Thank you for your great work.

    While I want to apply this method on my custom data, I found this line in ucf101.py.

    Why do you multiply 10 here?

    Thank you

    opened by litingfeng 1
Owner
Jiangliu Wang
Postdoc CUHK /CUHK T Stone Robotics Institute
Jiangliu Wang
Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PWLQ Updates 2020/07/16 - We are working on getting permission from our institution to release our source code. We will release it once we are granted

null 54 Dec 15, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".

Contact and Human Dynamics from Monocular Video This is the official implementation for the ECCV 2020 spotlight paper by Davis Rempe, Leonidas J. Guib

Davis Rempe 207 Jan 5, 2023
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
IAST: Instance Adaptive Self-training for Unsupervised Domain Adaptation (ECCV 2020)

This repo is the official implementation of our paper "Instance Adaptive Self-training for Unsupervised Domain Adaptation". The purpose of this repo is to better communicate with you and respond to your questions. This repo is almost the same with Another-Version, and you can also refer to that version.

CVSM Group -  email: czhu@bupt.edu.cn 84 Dec 12, 2022
Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

null 74 Dec 30, 2022
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

Jinpeng Wang 114 Oct 16, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 3, 2022
Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

Hesper 63 Jan 5, 2023
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

null 41 Jan 6, 2023
PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Yulun Zhang 1.2k Dec 26, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

SubTab: Author: Talip Ucar ([email protected]) The official implementation of the paper, SubTab: Subsetting Features of Tabular Data for Self-Supervis

AstraZeneca 98 Dec 29, 2022
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022