Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Skye

Last update: Sep 18, 2022

Related tags

Overview

Pytorch Code for VideoLT

[Website][Paper]

Updates

[10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zhangxing18 at fudan.edu.cn
[09/28/2021] Features uploaded to Aliyun Drive(deprecated), for access please send us an e-mail: zhangxing18 at fudan.edu.cn
[08/23/2021] Checkpoint links uploaded, sorry we are handling campus network bandwidth limitation, dataset will be released in this weeek.
[08/15/2021] Code released. Dataset download links and checkpoints links will be updated in a week.
[07/29/2021] Dataset released, visit https://videolt.github.io/ for downloading.
[07/23/2021] VideoLT is accepted by ICCV2021.

Overview

VideoLT is a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition. We provide VideoLT dataset and long-tailed baselines in this repo including:

Data Preparation

Please visit https://videolt.github.io/ to obtain download links. We provide raw videos and extracted features.

For using extracted features, please modify dataset/dutils.py and set the correct path to features.

Model Zoo

The baseline scripts and checkpoints are provided in MODELZOO.md.

FrameStack

FrameStack is simple yet effective approach for long-tailed video recognition which re-samples training data at the frame level and adopts a dynamic sampling strategy based on knowledge learned by the network. The rationale behind FrameStack is to dynamically sample more frames from videos in tail classes and use fewer frames for those from head classes.

Usage

Requirement

pip install -r requirements.txt

Prepare Data Path

Modify FEATURE_NAME, PATH_TO_FEATURE and FEATURE_DIM in dataset/dutils.py.
Set ROOT in dataset/dutils.py to labels folder. The directory structure is:

    labels
    |-- count-labels-train.lst
    |-- test.lst
    |-- test_videofolder.txt
    |-- train.lst
    |-- train_videofolder.txt
    |-- val_videofolder.txt
    `-- validate.lst

Train

We provide scripts for training. Please refer to MODELZOO.md.

Example training scripts:

FEATURE_NAME='ResNet101'

export CUDA_VISIBLE_DEVICES='2'
python base_main.py  \
     --augment "mixup" \
     --feature_name $FEATURE_NAME \
     --lr 0.0001 \
     --gd 20 --lr_steps 30 60 --epochs 100 \
     --batch-size 128 -j 16 \
     --eval-freq 5 \
     --print-freq 20 \
     --root_log=$FEATURE_NAME-log \
     --root_model=$FEATURE_NAME'-checkpoints' \
     --store_name=$FEATURE_NAME'_bs128_lr0.0001_lateavg_mixup' \
     --num_class=1004 \
     --model_name=NonlinearClassifier \
     --train_num_frames=60 \
     --val_num_frames=150 \
     --loss_func=BCELoss \

Note: Set args.resample, args.augment and args.loss_func can apply multiple long-tailed stratigies.

Options:

    args.resample: ['None', 'CBS','SRS']
    args.augment : ['None', 'mixup', 'FrameStack']
    args.loss_func: ['BCELoss', 'LDAM', 'EQL', 'CBLoss', 'FocalLoss']

Test

We provide scripts for testing in scripts. Modify CKPT to saved checkpoints.

Example testing scripts:

FEATURE_NAME='ResNet101'
CKPT='VideoLT_checkpoints/ResNet-101/ResNet101_bs128_lr0.0001_lateavg_mixup/ckpt.best.pth.tar'

export CUDA_VISIBLE_DEVICES='1'
python base_test.py \
     --resume $CKPT \
     --feature_name $FEATURE_NAME \
     --batch-size 128 -j 16 \
     --print-freq 20 \
     --num_class=1004 \
     --model_name=NonlinearClassifier \
     --train_num_frames=60 \
     --val_num_frames=150 \
     --loss_func=BCELoss \

Citing

If you find VideoLT helpful for your research, please consider citing:

@misc{zhang2021videolt,
title={VideoLT: Large-scale Long-tailed Video Recognition}, 
author={Xing Zhang and Zuxuan Wu and Zejia Weng and Huazhu Fu and Jingjing Chen and Yu-Gang Jiang and Larry Davis},
year={2021},
eprint={2105.02668},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

28 Oct 18, 2022

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

4 Jun 18, 2022

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

184 Dec 11, 2022

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

37 Dec 8, 2022

An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

33 Jun 27, 2021

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

294 Dec 12, 2022

A large-scale face dataset for face parsing, recognition, generation and editing.

CelebAMask-HQ [Paper] [Demo] CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA da

1.7k Dec 26, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

36 Oct 30, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

25 Jun 20, 2021

Comments

Question about the figure

Thanks for your great work and contribution. I have one question: how can I draw a picture like Figure 2 (the taxonomy structure of VideoLT) in your paper? Do you use MATLAB or other softwares?

opened by Vanint 4
About the device and training time.

Hi there, I noticed that you did not post the device and time information on your paper. I am considering following this work but I am not sure if my device can handle this task. Could you pls tell me the corresponding information? Like the type of GPU and training time. Thank you very much.

opened by abcdvzz 2

Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Related tags

Overview

Pytorch Code for VideoLT

[Website][Paper]

Updates

Overview

Data Preparation

Model Zoo

FrameStack

Usage

Requirement

Prepare Data Path

Train

Test

Citing

You might also like...

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

A large-scale face dataset for face parsing, recognition, generation and editing.

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

Comments

Question about the figure

About the device and training time.

Owner

Skye

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Awesome Long-Tailed Learning

A Simple Long-Tailed Rocognition Baseline via Vision-Language Model