Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Overview

Video Representation Learning by Recognizing Temporal Transformations [Project Page]

Simon Jenni, Givi Meishvili, and Paolo Favaro.
In ECCV, 2020.

Model

This repository contains code for self-supervised pre-training on UCF101 and supervised transfer learning on the UCF101 and HMDB51 action recognition benchmarks.

Requirements

The code is based on Python 3.7 and tensorflow 1.15.

How to use it

1. Setup

python init_datasets.py

2. Training and evaluation

  • To train and evaluate a model using the C3D architecture, execute train_test_C3D.py. An example usage could look like this:
python train_test_C3D.py --tag='test' --num_gpus=2
Comments
  • Cannot reproduce linear evaluation performance on UCF-101

    Cannot reproduce linear evaluation performance on UCF-101

    Dear friend, thank you very much for your work, I really learned a lot from it. It is impressive that after training only on speed prediction and 50 epoch, it got 49.3% acc on UCF-101 with linear evaluation. Nowadays, I have been trying to reproduce this performance on Pytorch following your code. But I just got 15% acc on UCF-101 with linear evaluation. Could you please give me some advice on how to achieve the performance? I have checked a lot of times that I followed your code and I may neglect some important things. Thank you very much.

    opened by KT27-A 7
  • About  two softmax outputs

    About two softmax outputs

    Hi, When SSLtraining In your paper, you claim to use two softmax for the pseudo-task. However, in this code, it seems you use tf.split to split the prediction(eg. 8 class for all, and 4 for skip ,the other 4 for transforms), which lead to only one softmax. what is the difference about this, or just my misunderstanding. Looking forward to your reply.

    opened by 321hallelujah 2
  • About training logs.

    About training logs.

    Hi Jenni, I'm the author of VTDL. I'm preparing my manuscript and aims to compare with your work. It seems your R(2+1)D model can achieve 46.4 with 112 resolution input which seems very surprising. Could you provide train logs or pretrain weight for your experiments?

    opened by FingerRec 1
  • Sampling Technique

    Sampling Technique

    Hello Thank you friend for sharing your work and knowledge. I am sorry for asking these question but I am not familiar with tensor flow at all.

    Please could you clarify the following questions:

    1- During the down stream task (action recognition) training, did you sample one clip from each training video using random starting index ? If Yes, then at each epoch the total number of training videos would be equal to the size of the training split.

    Or
    Did you use temporal jittering during training? If Yes how many clips did you sample from each training video ? 
    What is the size of one epoch then ?
    

    2- During down stream task evaluation, you mentioned in the paper that you used all the sub sequences of each testing video in the test split to get the video level prediction. What if the testing video length is not divisible by the clip length, then there would be extra frames that are not enough to sample one clip ? What is your approach to over come this issue ?

    For example: When the testing video has a 173 frames and the clip length is 16 frames then 10 non overlapping clips 
    can be sampled and  13 extra frames that are not enough to sample one clip are left over.
    

    Thanks for your help

    opened by Hussein-A-Hassan 1
  • Extremely Slow Training

    Extremely Slow Training

    Thanks for releasing the temporal-ssl codebase. I run the temporal-ssl training with command python train_test_C3D.py --tag='test' --num_gpus=2 and find that the training is extremely slow on a 8-TitanXp server: it takes several minutes for one iteration:

    image

    How long does it take to train temporal-ssl on your platform? Would you please share the training log?

    opened by kennymckormick 1
  • R(2+1)D backbone

    R(2+1)D backbone

    We noticed in your paper that you got an amazing result with R(2+1)D backbone. Could you public your implementation of R(2+1)D model as well as the best checkpoint? Thanks!

    opened by hanwen0529 1
Owner
Simon Jenni
PhD student in the Computer Vision Group at the University of Bern
Simon Jenni
We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

Facebook Research 42 Dec 9, 2022
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Yazhou XING 90 Oct 19, 2022
Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".

Contact and Human Dynamics from Monocular Video This is the official implementation for the ECCV 2020 spotlight paper by Davis Rempe, Leonidas J. Guib

Davis Rempe 207 Jan 5, 2023
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

null 41 Jan 6, 2023
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

hippopmonkey 4 Dec 11, 2022
Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Temporal Segment Networks (TSN) We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation fo

null 1.4k Jan 1, 2023
Neural network for recognizing the gender of people in photos

Neural Network For Gender Recognition How to test it? Install requirements.txt file using pip install -r requirements.txt command Run nn.py using pyth

Valery Chapman 1 Sep 18, 2022
Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations"

Source code for our paper "Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations" this repository is maintained by bo

Yuhan Liu 24 Nov 29, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon Patiño 24 Oct 22, 2022
git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction Code for the ECCV 2020 paper by Yiming Qian and Yasutaka Furukawa Getting

null 37 Dec 4, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 3, 2022
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

Nguyen Mau Dung 438 Dec 29, 2022
[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Stable Head Pose Estimation and Landmark Regression via 3D Dense Face Reconstruction Reimplementation of (ECCV 2020) Towards Fast, Accurate and Stable

Remilia Scarlet 221 Dec 30, 2022
1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Instead, two models for appearance modeling are included, together with the open-source BAGS model and the full set of code for inference. With this code, you can achieve around mAP@23 with TAO test set (based on our estimation).

null 79 Oct 8, 2022
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022
Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PWLQ Updates 2020/07/16 - We are working on getting permission from our institution to release our source code. We will release it once we are granted

null 54 Dec 15, 2022
dataset for ECCV 2020 "Motion Capture from Internet Videos"

Motion Capture from Internet Videos Motion Capture from Internet Videos Junting Dong*, Qing Shuai*, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

ZJU3DV 98 Dec 7, 2022