[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Related tags

Deep Learning BE
Overview

TBE

The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning" [arxiv] [code][Project Website]

image

Citation

@inproceedings{wang2021removing,
  title={Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning},
  author={Wang, Jinpeng and Gao, Yuting and Li, Ke and Lin, Yiqi and Ma, Andy J and Cheng, Hao and Peng, Pai and Ji, Rongrong and Sun, Xing},
  booktitle={CVPR},
  year={2021}
}

News

[2020.3.7] The first version of TBE are released!

0. Motivation

  • In camera-fixed situation, the static background in most frames remain similar in pixel-distribution.

  • We ask the model to be temporal sensitive rather than static sensitive.

  • We ask model to filter the additive Background Noise, which means to erasing background in each frame of the video.

Activation Map Visualization of BE

GIF

More hard example

2. Plug BE into any self-supervised learning method in two steps

The impementaion of BE is very simple, you can implement it in two lines by python:

rand_index = random.randint(t)
mixed_x[j] = (1-prob) * x + prob * x[rand_index]

Then, just need define a loss function like MSE:

loss = MSE(F(mixed_x),F(x))

2. Installation

Dataset Prepare

Please refer to [dataset.md] for details.

Requirements

  • Python3
  • pytorch1.1+
  • PIL
  • Intel (on the fly decode)
  • Skvideo.io
  • Matplotlib (gradient_check)

As Kinetics dataset is time-consuming for IO, we decode the avi/mpeg on the fly. Please refer to data/video_dataset.py for details.

3. Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51/Actor-HMDB51
      • hmdb51_sta: the train/val lists of HMDB51_STA
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
      • diving48: the train/val lists of diving48
  • experiments
    • logs: experiments record in detials, include logs and trained models
    • gradientes:
    • visualization:
    • pretrained_model:
  • src
    • Contrastive
      • data: load data
      • loss: the loss evaluate in this paper
      • model: network architectures
      • scripts: train/eval scripts
      • augmentation: detail implementation of BE augmentation
      • utils
      • feature_extract.py: feature extractor given pretrained model
      • main.py: the main function of pretrain / finetune
      • trainer.py
      • option.py
      • pt.py: BE pretrain
      • ft.py: BE finetune
    • Pretext
      • main.py the main function of pretrain / finetune
      • loss: the loss include classification loss

4. Run

(1). Download dataset lists and pretrained model

A copy of both dataset lists is provided in anonymous. The Kinetics-pretrained models are provided in anonymous.

cd .. && mkdir datasets
mv [path_to_lists] to datasets
mkdir experiments && cd experiments
mkdir pretrained_models && logs
mv [path_to_pretrained_model] to ../experiments/pretrained_model

Download and extract frames of Actor-HMDB51.

wget -c  anonymous
unzip
python utils/data_process/gen_hmdb51_dir.py
python utils/data_process/gen_hmdb51_frames.py

(2). Network Architecture

The network is in the folder src/model/[].py

Method #logits_channel
C3D 512
R2P1D 2048
I3D 1024
R3D 2048

All the logits_channel are feed into a fc layer with 128-D output.

For simply, we divide the source into Contrastive and Pretext, "--method pt_and_ft" means pretrain and finetune in once.

Action Recognition

Random Initialization

For random initialization baseline. Just comment --weights in line 11 of ft.sh. Like below:

#!/usr/bin/env bash
python main.py \
--method ft --arch i3d \
--ft_train_list ../datasets/lists/diving48/diving48_v2_train_no_front.txt \
--ft_val_list ../datasets/lists/diving48/diving48_v2_test_no_front.txt \
--ft_root /data1/DataSet/Diving48/rgb_frames/ \
--ft_dataset diving48 --ft_mode rgb \
--ft_lr 0.001 --ft_lr_steps 10 20 25 30 35 40 --ft_epochs 45 --ft_batch_size 4 \
--ft_data_length 64 --ft_spatial_size 224 --ft_workers 4 --ft_stride 1 --ft_dropout 0.5 \
--ft_print-freq 100 --ft_fixed 0 # \
# --ft_weights ../experiments/kinetics_contrastive.pth

BE(Contrastive)

Kinetics
bash scripts/kinetics/pt_and_ft.sh
UCF101
bash scripts/ucf101/ucf101.sh
Diving48
bash scripts/Diving48/diving48.sh

For Triplet loss optimization and moco baseline, just modify --pt_method

BE (Triplet)

--pt_method be_triplet

BE(Pretext)

bash scripts/hmdb51/i3d_pt_and_ft_flip_cls.sh

or

bash scripts/hmdb51/c3d_pt_and_ft_flip.sh

Notice: More Training Options and ablation study can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

Kinetics Pretrained (I3D)

Method UCF101 HMDB51 Diving48
Random Initialization 57.9 29.6 17.4
MoCo Baseline 70.4 36.3 47.9
BE 86.5 56.2 62.6

Video Retrieve (HMDB51-C3D)

Method @1 @5 @10 @20 @50
BE 10.2 27.6 40.5 56.2 76.6

More Visualization

T-SNE

please refer to utils/visualization/t_SNE_Visualization.py for details.

Confusion_Matrix

please refer to utils/visualization/confusion_matrix.py for details.

Acknowledgement

This work is partly based on UEL and MoCo.

License

The code are released under the CC-BY-NC 4.0 LICENSE.

Comments
  • About the setting

    About the setting

    Hi, the default setting in the code is different from the paper, lr for example. So which setting should I apply? I tried the default setting in the code. And I get the fine-tune-best_prec1 result around 0.456 on ucf101. I want to know what causes this result, something wrong I made or just the setting problem?

    opened by gazelxu 3
  • Question about model assessment

    Question about model assessment

    In section 4.5 How does Background Erasing Work, the author first trained an I3D model with static video generated from HMDB51 and draws classification top-1 acc of each class. But in our experiments, we found the validation acc of each class is similar due to the small amount of validation set.

    Thus, we doubt whether the classification top-1 acc results in fig.4 are reported from the train set or validation set? They are evenly distributed on the horizontal axis. It is appreciated if more information about the experiment in section 4.5 is provided.

    • Here is our experiment result, where the average top-1 acc is 15.82.
    class          val acc  train acc
    0	===>	0.00,	97.14
    47	===>	0.00,	93.55
    43	===>	0.00,	100.00
    38	===>	0.00,	91.89
    37	===>	0.00,	76.47
    36	===>	0.00,	92.50
    27	===>	0.00,	69.70
    20	===>	0.00,	94.12
    19	===>	0.00,	97.06
    14	===>	0.00,	81.08
    13	===>	0.00,	96.88
    12	===>	0.00,	65.71
    50	===>	0.00,	84.21
    1	===>	0.00,	83.87
    4	===>	0.00,	94.59
    11	===>	0.00,	84.38
    31	===>	6.67,	87.80
    33	===>	6.67,	91.67
    39	===>	6.67,	85.71
    24	===>	6.67,	94.74
    5	===>	6.67,	93.33
    21	===>	6.67,	93.33
    8	===>	6.67,	85.37
    41	===>	6.67,	96.15
    17	===>	6.67,	78.38
    16	===>	6.67,	83.33
    48	===>	6.67,	84.09
    7	===>	6.67,	94.29
    45	===>	6.67,	90.00
    42	===>	13.33,	81.58
    26	===>	13.33,	100.00
    28	===>	13.33,	91.18
    49	===>	13.33,	88.89
    46	===>	13.33,	81.82
    40	===>	13.33,	85.71
    32	===>	20.00,	83.33
    6	===>	20.00,	93.94
    34	===>	20.00,	94.74
    23	===>	20.00,	90.62
    22	===>	20.00,	80.00
    29	===>	20.00,	100.00
    25	===>	26.67,	94.87
    9	===>	40.00,	93.94
    3	===>	40.00,	93.55
    10	===>	40.00,	85.71
    2	===>	46.67,	93.55
    44	===>	46.67,	83.78
    35	===>	53.33,	94.59
    18	===>	53.33,	93.94
    15	===>	53.33,	100.00
    30	===>	66.67,	93.55
    
    opened by XinyuSun 3
  • Question about the motivation of BE

    Question about the motivation of BE

    Hi jinpeng,

    I noticed that the "pt_spatial_size"(i.e., cropping size) for the anchor and the postive is large (112/128 or 224/256) in the code of train transform, which means that they will overlap greatly after cropping. Since the motivation of BE is to force the model to focus on the dynamic motion information by creating a distracting anchor for the postive , while large overlapping will naturally make the distance of their representations close. Because the model could easily pulls the anchor and the postive by only focusing on the shared appearance/background information in the overlapped area and does not need to capture the dynamic motion information. What do you think of this?

    Thanks!

    Best, Licai

    opened by youcaiSUN 3
  • About the package 'bk'

    About the package 'bk'

           Hi , when i train this project , the error 'No module named bk.inpainting_src.spatial_inpainting' still occours although i have already used the command 'pip install bk' in advance. So , could you please tell me that how can i find the package bk ?
           Thank you very much ! And i am looking forward to your reply.
    
    opened by zhenweibao 1
  • About computational cost

    About computational cost

    Hi, thanks for share your code. Could you please tell me how long it takes to train a I3D pre-training model on K400 dataset for 50 epochs. What and how many gpu did you train on. Thank you.

    opened by ZMHH-H 1
  • Where can i find package

    Where can i find package "bk" ?

    I try to run "scripts/ucf101/pt_and_tf.sh", but i got the following error message.

    [Error Messeage]: " from bk.inpainting_src.spatial_inpainting import PatchDis, PatchPaint ModuleNotFoundError: No module named 'bk.inpainting_src' " I couldn't find the "bk" package in your code. Can you help me figure out this problem ?

    opened by idealwei 1
  • About the I3D backbone

    About the I3D backbone

    Hi all, I am quite interested about your work. When I dive into the code, I am questioned about the setting (https://github.com/FingerRec/BE/blob/632c3aa0eaa3acc24a545ec05a9a36f96592cb2c/src/Contrastive/model/i3d.py#L293) in i3d.py. The kernel here is set as (7,1,1), but the dimension of pretrained feature is already B * C * 1 * 1 * 1. What's the purpose here? Looking forward to your reply.

    Best.

    opened by Mark12Ding 1
  • Where can I find the code of augment.optical_flow?

    Where can I find the code of augment.optical_flow?

    https://github.com/FingerRec/BE/blob/632c3aa0eaa3acc24a545ec05a9a36f96592cb2c/src/Contrastive/data/dataset.py#L10 ModuleNotFoundError: No module named 'augment.optical_flow' This code is missing?

    opened by 0HaNC 1
  • Could you provide R3D backbone performance in (MoCo)?

    Could you provide R3D backbone performance in (MoCo)?

    Thank you for your great research, it's so interesting. In table1, i can see MoCo's performance in various backbone model. but i can't find MoCo method's UCF101 pretrained R3D performance. could you provide MoCo, R3D, UCF101 pretrained model's performance in UCF101, HMDB51 fine tuning?

    thanks.

    opened by youwantsy 0
Owner
Jinpeng Wang
Focus on Biometrics and Video Understanding, Self/Semi Supervised Learning.
Jinpeng Wang
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

Jinpeng Wang 114 Oct 16, 2022
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Implementation for paper "Towards the Generalization of Contrastive Self-Supervised Learning"

Contrastive Self-Supervised Learning on CIFAR-10 Paper "Towards the Generalization of Contrastive Self-Supervised Learning", Weiran Huang, Mingyang Yi

Weiran Huang 13 Nov 30, 2022
Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

Hesper 63 Jan 5, 2023
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

null 41 Jan 6, 2023
Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

Robust Reflection Removal with Reflection-free Flash-only Cues (RFC) Paper | To be released: Project Page | Video | Data Tensorflow implementation for

Chenyang LEI 162 Jan 5, 2023
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Yuming Jiang 151 Dec 26, 2022
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

SubTab: Author: Talip Ucar ([email protected]) The official implementation of the paper, SubTab: Subsetting Features of Tabular Data for Self-Supervis

AstraZeneca 98 Dec 29, 2022
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022
This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

Intro This is the repository for CVPR2021 Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales Vehicle Sam

null 39 Jul 21, 2022
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

null 153 Dec 14, 2022
Code for our CVPR2021 paper coordinate attention

Coordinate Attention for Efficient Mobile Network Design (preprint) This repository is a PyTorch implementation of our coordinate attention (will appe

Qibin (Andrew) Hou 726 Jan 5, 2023
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

null 143 Dec 28, 2022
The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

DER.ClassIL.Pytorch This repo is the official implementation of DER: Dynamically Expandable Representation for Class Incremental Learning (CVPR 2021)

rhyssiyan 108 Jan 1, 2023