An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Kakao Brain

Last update: Dec 28, 2022

Related tags

Deep Learning bassl

Overview

BaSSL

This is an official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL) [arxiv]

The method is a self-supervised learning algorithm that learns a model to capture contextual transition across boundaries during the pre-training stage. To be specific, the method leverages pseudo-boundaries and proposes three novel boundary-aware pretext tasks effective in maximizing intra-scene similarity and minimizing inter-scene similarity, thus leading to higher performance in video scene segmentation task.

1. Environmental Setup

We have tested the implementation on the following environment:

Python 3.7.7 / PyTorch 1.7.1 / torchvision 0.8.2 / CUDA 11.0 / Ubuntu 18.04

Also, the code is based on pytorch-lightning (==1.3.8) and all necessary dependencies can be installed by running following command.

$ pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install -r requirements.txt

# (optional) following installation of pillow-simd sometimes brings faster data loading.
$ pip uninstall pillow && CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

2. Prepare Data

We provide data download script for raw key-frames of MovieNet-SSeg dataset, and our re-formatted annotation files applicable for BaSSL. FYI, our script will automatically download and decompress data---1) key-frames (160G), 2) annotations (200M)---into /bassl/data/movienet.

# download movienet data
$ cd <path-to-root>
$ bash script/download_movienet_data.sh

# 
   
    /bassl/data
   
movienet
│─ 240P_frames
│    │─ tt0120885                 # movie id (or video id)
│    │    │─ shot_0000_img_0.jpg
│    │    │─ shot_0000_img_1.jpg
│    │    │─ shot_0000_img_2.jpg  # for each shot, three key-frames are given.
|    |    :
│    :    │─ shot_1256_img_2.jpg
│    |    
│    │─ tt1093906
│         │─ shot_0000_img_0.jpg
│         │─ shot_0000_img_1.jpg
│         │─ shot_0000_img_2.jpg
|         :
│         │─ shot_1270_img_2.jpg
│
│─anno
     │─ anno.pretrain.ndjson
     │─ anno.trainvaltest.ndjson
     │─ anno.train.ndjson
     │─ anno.val.ndjson
     │─ anno.test.ndjson
     │─ vid2idx.json

3. Train (Pre-training and Fine-tuning)

We use Hydra to provide flexible training configurations. Below examples explain how to modify each training parameter for your use cases.
We assume that you are in (i.e., root of this repository).

3.1. Pre-training

(1) Pre-training BaSSL
Our pre-training is based on distributed environment (multi-GPUs training) using ddp environment supported by pytorch-lightning.
The default setting requires 8-GPUs (of V100) with a batch of 256. However, you can set the parameter config.DISTRIBUTED.NUM_PROC_PER_NODE to the number of gpus you can use or change config.TRAIN.BATCH_SIZE.effective_batch_size. You can run a single command cd bassl; bash ../scripts/run_pretrain_bassl.sh or following full command:

cd <path-to-root>/bassl
EXPR_NAME=bassl
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.DISTRIBUTED.NUM_NODES=1 \
    config.DISTRIBUTED.NUM_PROC_PER_NODE=8 \
    config.TRAIN.BATCH_SIZE.effective_batch_size=256

Note that the checkpoints are automatically saved in bassl/pretrain/ckpt/ and log files (e.g., tensorboard) are saved in `bassl/pretrain/logs/ .

(2) Running with various loss combinations
Each objective can be turned on and off independently.

cd <path-to-root>/bassl
EXPR_NAME=bassl_all_pretext_tasks
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.LOSS.shot_scene_matching.enabled=true \
    config.LOSS.contextual_group_matching.enabled=true \
    config.LOSS.pseudo_boundary_prediction.enabled=true \
    config.LOSS.masked_shot_modeling.enabled=true

(3) Pre-training shot-level pre-training baselines
Shot-level pre-training methods can be trained by setting config.LOSS.sampling_method.name as one of followings:

instance (Simclr_instance), temporal (Simclr_temporal), shotcol (Simclr_NN).
And, you can choose two more options: bassl (BaSSL), and bassl+shotcol (BaSSL+ShotCoL).
Below example is for Simclr_NN, i.e., ShotCoL. Choose your favorite option ;)

cd <path-to-root>/bassl
EXPR_NAME=Simclr_NN
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.LOSS.sampleing_method.name=shotcol \

3.2. Fine-tuning

(1) Simple running a single command to fine-tune pre-trained models
Firstly, download the checkpoints provided in Model Zoo section and move them into bassl/pretrain/ckpt.

cd <path-to-root>/bassl

# for fine-tuning BaSSL (10 epoch)
bash ../scripts/finetune_bassl.sh

# for fine-tuning Simclr_NN (i.e., ShotCoL)
bash ../scripts/finetune_shot-level_baseline.sh

The full process (i.e., extraction of shot-level representation followed by fine-tuning) is described in below.

(2) Extracting shot-level features from shot key-frames
For computational efficiency, we pre-extract shot-level representation and then fine-tune pre-trained models.
Set LOAD_FROM to EXPR_NAME used in the pre-training stage and change config.DISTRIBUTED.NUM_PROC_PER_NODE as the number of GPUs you can use. Then, the extracted shot-level features are saved in /bassl/data/movienet/features/.

cd <path-to-root>/bassl
LOAD_FROM=bassl
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/extract_shot_repr.py \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	+config.LOAD_FROM=${LOAD_FROM}

(3) Fine-tuning and evaluation

cd <path-to-root>/bassl
WORK_DIR=$(pwd)

# Pre-training methods: bassl and bassl+shotcol
# which learn CRN network during the pre-training stage
LOAD_FROM=bassl
EXPR_NAME=transfer_finetune_${LOAD_FROM}
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/finetune/main.py \
	config.TRAIN.BATCH_SIZE.effective_batch_size=1024 \
	config.EXPR_NAME=${EXPR_NAME} \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	config.TRAIN.OPTIMIZER.lr.base_lr=0.0000025 \
	+config.PRETRAINED_LOAD_FROM=${LOAD_FROM}

# Pre-training methods: instance, temporal, shotcol
# which DO NOT learn CRN network during the pre-training stage
# thus, we use different base learning rate (determined after hyperparameter search)
LOAD_FROM=shotcol_pretrain
EXPR_NAME=finetune_scratch_${LOAD_FROM}
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/finetune/main.py \
	config.TRAIN.BATCH_SIZE.effective_batch_size=1024 \
	config.EXPR_NAME=${EXPR_NAME} \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	config.TRAIN.OPTIMIZER.lr.base_lr=0.000025 \
	+config.PRETRAINED_LOAD_FROM=${LOAD_FROM}

4. Model Zoo

We provide pre-trained checkpoints trained in a self-supervised manner.
After fine-tuning with the checkpoints, the models will give scroes that are almost similar to ones shown below.

Method	AP	Checkpoint (pre-trained)
SimCLR (instance)	51.51	download
SimCLR (temporal)	50.05	download
SimCLR (NN)	51.17	download
BaSSL (10 epoch)	56.26	download
BaSSL (40 epoch)	57.40	download

5. Citation

If you find this code helpful for your research, please cite our paper.

@article{mun2022boundary,
  title={Boundary-aware Self-supervised Learning for Video Scene Segmentation},
  author={Mun, Jonghwan and Shin, Minchul and Han, Gunsu and
          Lee, Sangho and Ha, Sungsu and Lee, Joonseok and Kim, Eun-sol},
  journal={arXiv preprint arXiv:2201.05277},
  year={2022}
}

6. Contact for Issues

Jonghwan Mun, [email protected]
Minchul Shin, [email protected]

7. License

Comments

Missing vid2idx.json

Many thanks for sharing your code.

I download anno.tar from https://arena.kakaocdn.net/brainrepo/bassl/data/anno.tar, and I can't find vid2idx.json, where can I get this file?

opened by pida0 2
MovieNet-SSeg data

Thank you very much for your generous sharing, but I ran into some problems with the implementation. During the data preparation stage, I tried to download the data with the command bash scripts/download_movienet_data.sh but I failed, and got the following prompt： How can I get the key-frames of MovieNet-SSeg dataset?

opened by Dobby114 2
Inferenece Demo?

Dear author: Thanks for contributing an insightful work and make it published! it's helpful! Your readme instruction file is really detailed. However, I can not find a "inference demo on single video" script . For a lot of researchers, A simple demo code will be the first step to try and play with your work. Thank you.

opened by dragen1860 2
Hardcode path in MovieNetMetric() init

Thanks for open sourcing your great work.

Found two hardcoded paths in bassl/finetune/utils/metric.py line181 and 186, causing not found error when this method is being called. Also, vid2ind.json and shot files are not included in anno.tar.

opened by Bonnie970 1
Issue with Dataset Download
Thank you for your great work!

While I was following your setup instructions, the link provided in the script file for downloading keyframes is not working anymore. It seems that MovieNet is now downloadable by following the steps below after sign-up.

# Configure install pip install opendatalab odl login # Login odl get MovieNet # Download full dataset
opened by HYUNJS 0
Pretraining memory

Hello, I found that during the pre-training process, the memory occupied keeps increasing in the iteration process, I want to know why this is, is the same for your training process and how much memory does it take to train an epoch? Thanks!

opened by Dobby114 1
Explanation for scene boundary prediction

Hello, I read that you might be working on a demo on how to predict on a single video. I was able to create my own dataloader and call trainer.predict() but the output is not binary (boundary or not boundary). Does this model support scene boundary prediction (if so could you detail what are the steps? I just need to understand how i can make it work) or is it only a shot encoding model?

Thank you very much

opened by BrunoSader 7

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Related tags

Overview

BaSSL

1. Environmental Setup

2. Prepare Data

3. Train (Pre-training and Fine-tuning)

3.1. Pre-training

3.2. Fine-tuning

4. Model Zoo

5. Citation

6. Contact for Issues

7. License

Comments

Missing vid2idx.json

MovieNet-SSeg data

Inferenece Demo?

Hardcode path in MovieNetMetric() init

Issue with Dataset Download

Pretraining memory

Explanation for scene boundary prediction

Owner

Kakao Brain

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

This is the implementation of the paper "Self-supervised Outdoor Scene Relighting"

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

A Weakly Supervised Amodal Segmenter with Boundary Uncertainty Estimation

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

[BMVC 2021] Official PyTorch Implementation of Self-supervised learning of Image Scale and Orientation Estimation

An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Official PyTorch implementation of UACANet: Uncertainty Aware Context Attention for Polyp Segmentation

Just Go with the Flow: Self-Supervised Scene Flow Estimation

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

Generic Event Boundary Detection: A Benchmark for Event Segmentation

[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation