Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Enrico Fini

Last update: Dec 18, 2022

Related tags

Deep Learning cassle

Overview

Self-Supervised Models are Continual Learners

This is the official repository for the paper:

Self-Supervised Models are Continual Learners
Enrico Fini*, Victor Turrisi*, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal
CVPR 2022

Abstract: Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale. However, their efficacy is catastrophically reduced in a Continual Learning (CL) scenario where data is presented to the model sequentially. In this paper, we show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for CL by adding a predictor network that maps the current state of the representations to their past state. This enables us to devise a framework for Continual self-supervised visual representation Learning that (i) significantly improves the quality of the learned representations, (ii) is compatible with several state-of-the-art self-supervised objectives, and (iii) needs little to no hyperparameter tuning. We demonstrate the effectiveness of our approach empirically by training six popular self-supervised models in various CL settings.

Overview of our method and results

NOTE: most of the code in this repository is borrowed from solo-learn

Installation

Use the following commands to create an environment and install the required packages (needs conda):

conda create --name cassle python=3.8
conda activate cassle
conda install pytorch=1.10.2 torchvision cudatoolkit=11.3 -c pytorch
pip install pytorch-lightning==1.5.4 lightning-bolts wandb sklearn einops
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110

Remember to check your cuda version and modify the install commands accorgingly.

OPTIONAL: consider installing pillow-SIMD for faster data loading:

pip uninstall pillow
CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

Commands

Here below you can find a few example commands for running our code. The bash scripts with full training configurations for our continual and linear evaluation experiments can be found in the bash_files folder. Use our job_launcher.py to launch continual self-supervised learning experiments. We also provide example code for launching jobs with SLURM where you can pass the desired configuration for your job (bash script, data directory, number of GPUs, walltime, etc...).

NOTE: each experiment uses a different number of gpus (1 for CIFAR100, 2 for ImageNet100 and 4 for DomainNet). You can change this setting directly in the bash scripts.

Fine-tuning

CIFAR100

E.g. running Barlow Twins:

DATA_DIR=/path/to/data/dir/ CUDA_VISIBLE_DEVICES=0 python job_launcher.py --script bash_files/continual/cifar/barlow_distill.sh

ImageNet100

Class-incremental

E.g. running BYOL:

DATA_DIR=/path/to/data/dir/ CUDA_VISIBLE_DEVICES=0,1 python job_launcher.py --script bash_files/continual/imagenet-100/class/byol.sh

Data-incremental

E.g. running SimCLR:

DATA_DIR=/path/to/data/dir/ CUDA_VISIBLE_DEVICES=0,1 python job_launcher.py --script bash_files/continual/imagenet-100/data/simclr.sh

DomainNet

E.g. running SwAV:

DATA_DIR=/path/to/data/dir/ CUDA_VISIBLE_DEVICES=0,1,2,3 python job_launcher.py --script bash_files/continual/domainnet/swav.sh

CaSSLe

After running fine-tuning, you can also run CaSSLe by just loading the checkpoint of the first task. You will find all the checkpoints in your experiment directory (defaults to "./experiments"). Check the id of your run on WandB to make sure you are loading the correct checkpoint.

CIFAR100

E.g. running Barlow Twins + CaSSLe:

PRETRAINED_PATH=/path/to/task0/checkpoint/ DATA_DIR=/path/to/data/dir/ CUDA_VISIBLE_DEVICES=0 python job_launcher.py --script bash_files/continual/cifar/barlow_distill.sh

ImageNet100

Class-incremental

E.g. running BYOL + CaSSLe:

PRETRAINED_PATH=/path/to/task0/checkpoint/ DATA_DIR=/path/to/data/dir/ CUDA_VISIBLE_DEVICES=0,1 python job_launcher.py --script bash_files/continual/imagenet-100/class/byol_distill.sh

Data-incremental

E.g. running SimCLR + CaSSLe:

PRETRAINED_PATH=/path/to/task0/checkpoint/ DATA_DIR=/path/to/data/dir/ CUDA_VISIBLE_DEVICES=0,1 python job_launcher.py --script bash_files/continual/imagenet-100/data/simclr_distill.sh

DomainNet

E.g. running SwAV + CaSSLe:

PRETRAINED_PATH=/path/to/task0/checkpoint/ DATA_DIR=/path/to/data/dir/ CUDA_VISIBLE_DEVICES=0,1,2,3 python job_launcher.py --script bash_files/continual/domainnet/swav_distill.sh

Linear Evaluation

For linear evaluation you do not need the job launcher. You can simply run the scripts from bash_files/linear, e.g., for VICReg:

PRETRAINED_PATH=/path/to/last/checkpoint/ DATA_DIR=/path/to/data/dir/ bash bash_files/linear/imagenet-100/class/vicreg_linear.sh

Logging

Logging is performed with WandB. Please create an account and specify your --entity YOUR_ENTITY and --project YOUR_PROJECT in the bash scripts. For debugging, or if you do not want all the perks of WandB, you can disable logging by passing --offline in your bash scripts. After training you can always sync an offline run with the following command: wandb sync your/wandb/run/folder.

Citation

If you like our work, please cite our paper:

@inproceedings{fini2021self,
  title={Self-Supervised Models are Continual Learners},
  author={Fini, Enrico and da Costa, Victor G Turrisi and Alameda-Pineda, Xavier and Ricci, Elisa and Alahari, Karteek and Mairal, Julien},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Comments

Why did you load the checkpoint of task 0 before training cassle?

Hello, Congratulations on your excellent work. I have a question about the training setting.

Why did you load the checkpoint of task 0 before training cassle? I see the first task of cassle is trained without distillers. Then the setting is the same as the first task of finetuning. I think loading the checkpoint is unnecessary.

Looking forward to your reply, Thanks.

opened by BorisChen1998 5
Need of checkpoints : BT and VicReg

Hi all,

Is there any link where I can access to the checkpoint of model trained using Barlowtwins and VicREG ?

I would like to evaluate this approach using different models and need the trained last checkpoint of these models.

Thanks.

opened by KameniAlexNea 4
¿Bug in online KNN eval?

Hello,

Congrats for your paper! It touches very interesting questions and I'd love to further study the problem of CSSL!

I am trying to execute your script for training barlow twins python job_launcher.py --script bash_files/continual/cifar/barlow_distill.sh, but I might have encountered a bug: if I train with the WeightedKNNClassifier for performance monitor, your code calls its forward here with only the train_features and target_features provided. After that, the compute function breaks down here at line 89 because self.test_features is an empty list.

Am I getting something wrong? I am working in a new conda env with setup as specified in your README file.

Muchas gracias!

opened by juanCarlosWheat 3
How did you train the classifier?

Hello,

I have read your paper. It is very impressive. I got a question for class incremental setting and am wondering to know if you can answer.

Did you train the classifier for each task only in the embedding training process? Or did you re-train all classifiers after all task embedding training processes finish? I see that the embedding of the previous task may change after the next task is trained. How does the old classifier trained by the old embedding format take this changed embedding? Your paper mentioned "a subset, e.g., 10% of the data". Did this mean using 10% of data to retrain the classifier at the very end?

Looking forward to your kind reply.

Thanks.

opened by newhand555 3
About the Forward Transfer

Hi, Thanks for your excellent work! I'm curious about how to calculate the "Forward Transfer" after training. For example, I have successfully re-produced the class-il results for Fine-tuning and CaSSLe (with BYOL) on Cifar-100 but don't know how to directly check the FT results. Does it need a seperate run to obtain the "linear evaluation accuracy of a random network" as the paper stated? BTW, just to be sure, is it right to directly check the "val_acc1" results of wandb board as the final linear evaluation accuracy?

opened by moonlitt 2
problem with reproducibility
Hi,

thanks for your interesting work.

I have problems reproducing the results.

Did you use dali for all your experiments? Can we trust the results of the regular data loader? I'm getting 6~7% accuracy drops on Imagenet while switching from dali to regular dataloader (I needed to run the regular one for fair comparisons with my method).

Also I think there is a problem here https://github.com/DonkeyShot21/cassle/blob/b5b0929c3b468cd41740a529d58e92ee4e6ace61/cassle/args/utils.py#L171 why is 256 hardcoded here?

It would be nice to mention in the readme that the batch size needs to be modified based on the number of gpus.

Thanks
opened by sinaazar 2
Task_id set to zero as first task SSL

https://github.com/DonkeyShot21/cassle/blob/ba739c8376bdc4d16acda3d143088569a56d18f4/bash_files/continual/cifar/barlow_distill.sh#L8

Hi, I want to be sure that setting the first task of distillation process as zero isn't an error, since you have already learned this as SSL in normal process. What can be the goal of this, please ?

Thanks

opened by KameniAlexNea 2
The classifier for Linear Evaluation Accuracy

Hi,

This is an exciting and enlightening work.

I am confused by the number of classifiers for Linear Evaluation Accuracy.

In the paper, you said, "For class-incremental and data-incremental, we use the task-agnostic setting, meaning that at evaluation time we do not assume to know the task ID". As I understand it, this means that you only maintain one classifier and continuously optimize it after learning each task for linear evaluation accuracy.

However, I found in #1 that you said, "as we operate in the class-incremental setting we train one linear classifier per task."

I would appreciate a clearer explanation.

Thanks.

opened by WenjinW 1
The data for Linear Evaluation Accuracy.

Hi,

This is an exciting and enlightening work.

I wonder where the data for training the classifier come from, for linear evaluation accuracy. The training data of the current task?

opened by WenjinW 1
DomainNet dataset version

Hi,

Very interesting work. Did you use the cleaned version of DomainNet or the original one? The cleaned version excludes a lot of duplicate images.

Thanks

opened by sinaazar 1
KNN Classifier Issue

Hi, I found this work very interesting and plan to work on similar topics. However I encounter some issues: (1) For the fine-tuning example with Barlow Twins and CIFAR-100, should it be barlow.sh instead of barlow_distill.sh? Otherwise, we need to provide the pretrained model in order to successfully run the code. (2) If I enable the the KNN online evaluation by setting disable_knn_eval = False, there was an issue showing empty test feature and expect argument in base.py line 432. I saw the previous closed issue saying the similar thing but it still appears even if I set a meaningful online_eval_batch_size = 256. Thanks for your help!

opened by pp1016 3

Owner

Enrico Fini

PhD Student at University of Trento

GitHub

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

3 Aug 20, 2022

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

37 Nov 27, 2022

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

21 Dec 22, 2022

Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

(CVPR 2022) Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning ArXiv This repo contains Official Implementat

24 Nov 1, 2022

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

141 Dec 30, 2022

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

64 Jan 7, 2023

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

1.4k Jan 1, 2023

Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

150 Dec 26, 2022

The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

25 Sep 26, 2022

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

CoG-BART Contrast and Generation Make BART a Good Dialogue Emotion Recognizer Quick Start: To run the model on test sets of four datasets, Download th

39 Dec 24, 2022

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

122 Dec 13, 2022

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

294 Jan 1, 2023

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

68 Dec 6, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

83 Dec 14, 2022

Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Deep3DMM Official repository for the CVPR 2021 paper Learning Feature Aggregation for Deep 3D Morphable Models. Requirements This code is tested on Py

38 Dec 27, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

45 Jan 7, 2023

Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

13 Nov 29, 2022

Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

?? Sound-guided Semantic Image Manipulation (CVPR2022) Official Pytorch Implementation Sound-guided Semantic Image Manipulation IEEE/CVF Conference on

58 Dec 28, 2022

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News ?? 3DCrowdNet achieves the state-of-the-art accuracy on 3D

113 Dec 21, 2022