Scaling and Benchmarking Self-Supervised Visual Representation Learning

Meta Research

Last update: Dec 31, 2022

Related tags

Deep Learning fair_self_supervision_benchmark

Overview

FAIR Self-Supervision Benchmark is deprecated. Please see VISSL, a ground-up rewrite of benchmark in PyTorch.

FAIR Self-Supervision Benchmark

This code provides various benchmark (and legacy) tasks for evaluating quality of visual representations learned by various self-supervision approaches. This code corresponds to our work on Scaling and Benchmarking Self-Supervised Visual Representation Learning. The code is written in Python and can be used to evaluate both PyTorch and Caffe2 models (see this). We hope that this benchmark release will provided a consistent evaluation strategy that will allow measuring the progress in self-supervision easily.

Introduction

The goal of fair_self_supervision_benchmark is to standardize the methodology for evaluating quality of visual representations learned by various self-supervision approaches. Further, it provides evaluation on a variety of tasks as follows:

Benchmark tasks: The benchmark tasks are based on principle: a good representation (1) transfers to many different tasks, and, (2) transfers with limited supervision and limited fine-tuning. The tasks are as follows.

Image Classification
- VOC07
- COCO2014
- Places205
Low-Shot Image Classification
- VOC07
- Places205
Object Detection on VOC07 and VOC07+12 with frozen backbone for detectors:
- Fast R-CNN
- Faster R-CNN
Surface Normal Estimation
Visual Navigation in Gibson Environment

These Benchmark tasks use the network architectures:

Legacy tasks: We also classify some commonly used evaluation tasks as legacy tasks for reasons mentioned in Section 7 of paper. The tasks are as follows:

ImageNet-1K classification task
VOC07 full finetuning
Object Detection on VOC07 and VOC07+12 with full tuning for detectors:
- Fast R-CNN
- Faster R-CNN

License

fair_self_supervision_benchmark is CC-NC 4.0 International licensed, as found in the LICENSE file.

Citation

If you use fair_self_supervision_benchmark in your research or wish to refer to the baseline results published in the paper, please use the following BibTeX entry.

@article{goyal2019scaling,
  title={Scaling and Benchmarking Self-Supervised Visual Representation Learning},
  author={Goyal, Priya and Mahajan, Dhruv and Gupta, Abhinav and Misra, Ishan},
  journal={arXiv preprint arXiv:1905.01235},
  year={2019}
}

Installation

Please find installation instructions in INSTALL.md.

Getting Started

After installation, please see GETTING_STARTED.md for how to run various benchmark tasks.

Model Zoo

We provide models used in our paper in the MODEL_ZOO.

References

Scaling and Benchmarking Self-Supervised Visual Representation Learning. Priya Goyal, Dhruv Mahajan, Abhinav Gupta*, Ishan Misra*. Tech report, arXiv, May 2019.

Comments

[feature] Plans to support other pre-trained models?

Are there any plans to support custom pre-trained models, instead of AlexNet/ResNet-50? As background, I am working on a few new models and would like to evaluate them across all the benchmarks you've put together. Thanks!
awaiting_user_response

opened by david-codaio 4
[bug] GPU is not extensively used in feature extraction
Hello,

Thank you for sharing this benchmark for evaluating self-supervised learning approaches.

I followed the INSTALL.md file to setup the benchmark tool successfully. Then followed this README.md file to download the datasets and setup file hierarchies accordingly. And finally, used extra_scripts/README.md to produce image/label lists for each dataset as expected. So far so good.

I simply want to extract COCO2014 features from a pre-trained model, for instance AlexNet-In1K. To do that first I change NUM_DEVICES to 1 in caffenet_bvlc_supervised_extract_features.yaml file since I have only 1 GPU, then run the following code that I take from GETTING_STARTED.md

python tools/extract_features.py \ --config_file configs/benchmark_tasks/image_classification/coco2014/caffenet_bvlc_supervised_extract_features.yaml \ --data_type train \ --output_file_prefix trainval \ --output_dir /tmp/ssl-benchmark-output/extract_features/weights_init \ TEST.PARAMS_FILE https://dl.fbaipublicfiles.com/fair_self_supervision_benchmark/models/caffenet_bvlc_in1k_supervised.npy \ TRAIN.DATA_FILE /tmp/ssl-benchmark-output/coco/train_images.npy \ TRAIN.LABELS_FILE /tmp/ssl-benchmark-output/coco/train_labels.npy

While extracting features, it seems that GPU utilization is quite low, mostly less than 5% but instead, CPU load is 100% almost all the time. I believe that there is something wrong in device selection. Could you please help me resolving this issue?

I have the same thing when I try this on machines with:

Ubuntu-16.04, CUDA-9, pytorch-1.0

CentOs-7, CUDA-10, pytorch-1.0

Bulent
awaiting_user_response
opened by mbsariyildiz 4
[bug] Missing make_SN_labels.py?

The script make_SN_labels.py referenced in the Surface Normal Estimation README doesn't seem to exist in either of the repositories or data download. I could be mistaken, but I haven't been able to find it with over an hour of searching. Thanks!
bug

opened by BramSW 2

[bug] Following few-shot recipe does not seem to work at "testing SVM" stage

Hello - thanks for providing the benchmarking code =)

I've been trying to reproduce the places205 few-shot results. I've followed the various sections of the README's, and I've ran into a few inconsistencies in file names, which I've tried to sensibly resolve, and eventually got everything working (I think) up until the Step 4: Testing SVM stage of the GETTING_STARTED.md guide: step 4 - here

Here's everything I've done, with possibly relevant inconsistencies in bold:

Downloaded and renamed places205 in the desired format
Converted places205 to numpy files

python extra_scripts/create_imagenet_data_files.py --data_source_dir places205 --output_dir places205-processed

Generated the 5 samples at different k values

python extra_scripts/create_places_low_shot_samples.py \
    --images_data_file ssl-benchmark-output/places205/train_images.npy \
    --targets_data_file ssl-benchmark-output/places205/train_labels.npy \
    --output_path ssl-benchmark-output/places205/low_shot/ \
    --k_values "1,2,4,8,16,32,64,96" \
    --num_samples 5

I extracted the training and test fetures with the following two commands

# TRAIN
python tools/extract_features.py \
    --config_file configs/benchmark_tasks/low_shot_image_classification/places205/resnet50_supervised_low_shot_extract_features.yaml \ --data_type train \
    --output_file_prefix trainval \
    --output_dir ssl-benchmark-output/extract_features/weights_init \
    TEST.PARAMS_FILE https://dl.fbaipublicfiles.com/fair_self_supervision_benchmark/models/resnet50_in1k_supervised.pkl \
    TRAIN.DATA_FILE ssl-benchmark-output/places205/train_images_sample4_k1.npy \
    TRAIN.LABELS_FILE ssl-benchmark-output/places205/train_labels_sample4_k1.npy
# TEST
python tools/extract_features.py  \
  --config_file configs/benchmark_tasks/low_shot_image_classification/places205/resnet50_supervised_low_shot_extract_features.yaml \
  --data_type test \
  --output_file_prefix test \
  --output_dir ssl-benchmark-output/extract_features/weights_init \
  TEST.PARAMS_FILE https://dl.fbaipublicfiles.com/fair_self_supervision_benchmark/models/resnet50_in1k_supervised.pkl \ 
  TEST.DATA_FILE places205-processed/val_images.npy \
  TEST.LABELS_FILE places205-processed/val_labels.npy

Train the SVM I had to change the --data_file and --targets arg to not have the s0 part in the filepath

python tools/svm/train_svm_low_shot.py \
  --data_file ssl-benchmark-output/extract_features/weights_init/trainval_res_conv1_bn_resize_features.npy \
  --targets_data_file ssl-benchmark-output/extract_features/weights_init/trainval_res_conv1_bn_resize_targets.npy \ 
  --costs_list "0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1.0,10.0,100.0"  \
  --output_path ssl-benchmark-output/p205_svm_low_shot/svm_conv1/

Test the SVM I had to change the --data_file and --targets arg to not have the s0 part in the filepath and also set the k_values and sample_inds to only work with the first set of features trained in the previous command: doesn't seem to be an option to train them all at once

python tools/svm/test_svm_low_shot.py 
  --data_file ssl-benchmark-output/extract_features/weights_init/test_res_conv1_bn_resize_features.npy \
  --targets_data_file ssl-benchmark-output/extract_features/weights_init/test_res_conv1_bn_resize_targets.npy \
  --costs_list "0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1.0,10.0,100.0" \
  --output_path ssl-benchmark-output/p205_svm_low_shot/svm_conv1/ \
  --k_values "4" \
  --sample_inds "0"

The first error was:

[INFO: test_svm_low_shot.py:  188]: Namespace(costs_list='0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1.0,10.0,100.0', data_file='ssl-benchmark-output/extract_features/weights_init/test_res_conv1_bn_resize_features.npy', dataset='voc', generate_json=0, json_targets=None, k_values='4', output_path='ssl-benchmark-output/p205_svm_low_shot/svm_conv1/', sample_inds='1', targets_data_file='ssl-benchmark-output/extract_features/weights_init/test_res_conv1_bn_resize_targets.npy')
[INFO: test_svm_low_shot.py:   80]: Testing svm for k-values: [4] and sample_inds: [0]
[INFO: svm_helper.py:   58]: loading features and targets...
[INFO: svm_helper.py:   63]: Loaded features: (20500, 9216) and targets: (20500, 1)
[INFO: test_svm_low_shot.py:   98]: Testing SVM for costs: [1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625, 0.00048828125, 0.000244140625, 0.0001220703125, 6.103515625e-05, 3.0517578125e-05, 1.52587890625e-05, 7.62939453125e-06, 3.814697265625e-06, 1.9073486328125e-06]
[INFO: svm_helper.py:  144]: Testing SVM for classes: range(0, 1)
[INFO: svm_helper.py:  145]: Num classes: 1
[INFO: test_svm_low_shot.py:  125]: Test sample/k_value/cost/cls: 2/4/1e-07/0
Traceback (most recent call last):
  File "tools/svm/test_svm_low_shot.py", line 193, in <module>
    main()
  File "tools/svm/test_svm_low_shot.py", line 189, in main
    test_svm_low_shot(opts)
  File "tools/svm/test_svm_low_shot.py", line 128, in test_svm_low_shot
    with open(model_file, 'rb') as fopen:
FileNotFoundError: [Errno 2] No such file or directory: 'ssl-benchmark-output/p205_svm_low_shot/svm_conv1/cls0_cost1e-07_sample1_k4.pickle'

Looking in ssl-benchmark-output/p205_svm_low_shot/svm_conv1/ that file isn't there, but this one is: cls0_cost1e-07_sample1_k4.pickle, so I renamed it to cls0_cost1e-07_sample1_k4.pickle and reran the script, output this time:

[INFO: test_svm_low_shot.py:   80]: Testing svm for k-values: [4] and sample_inds: [0]                                    
[INFO: svm_helper.py:   58]: loading features and targets...                                                              
[INFO: svm_helper.py:   63]: Loaded features: (20500, 9216) and targets: (20500, 1)                                       
[INFO: test_svm_low_shot.py:   98]: Testing SVM for costs: [1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.
0, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625, 0.00048828125, 0.000244140625, 0.000122070
3125, 6.103515625e-05, 3.0517578125e-05, 1.52587890625e-05, 7.62939453125e-06, 3.814697265625e-06, 1.9073486328125e-06]   [INFO: svm_helper.py:  144]: Testing SVM for classes: range(0, 1)                                                         
[INFO: svm_helper.py:  145]: Num classes: 1                                                                               [INFO: test_svm_low_shot.py:  125]: Test sample/k_value/cost/cls: 1/4/1e-07/0                                             Traceback (most recent call last):                                                                                          File "tools/svm/test_svm_low_shot.py", line 193, in <module>                                                            
    main()                                                                                                                  File "tools/svm/test_svm_low_shot.py", line 189, in main                                                                    test_svm_low_shot(opts)                                                                                               
  File "tools/svm/test_svm_low_shot.py", line 138, in test_svm_low_shot                                                   
    eval_cls_labels, eval_preds                                                                                           
  File "/rscratch/cjrd/dul-project/deepul_proj_2/fair_self_supervision_benchmark/tools/svm/svm_helper.py", line 99, in get
_precision_recall                                                                                                         
    preds[:, np.newaxis].astype(np.float64)                                                                               
  File "<__array_function__ internals>", line 6, in hstack                                                                
  File "/rscratch/cjrd/anaconda3/envs/ssl/lib/python3.7/site-packages/numpy/core/shape_base.py", line 345, in hstack      
    return _nx.concatenate(arrs, 1)                                                                                       
  File "<__array_function__ internals>", line 6, in concatenate                                                           
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the 
array at index 1 has 3 dimension(s)

I tried fiddling with the numpy/dimensions of the the given problem, but it just seems to create more problems downstream.

Any help would be very appreciated. Thank you!

opened by cjrd 1

Can't get low accuracy for randomly extracted features

I am trying to get a <10% accuracy on VOC for using randomly extracted features, but I can't seem to do so. Running the following lines on a clean version of the repository gives me a Mean AP of 0.514. Am I doing this correctly?

python extra_scripts/create_voc_data_files.py \
    --data_source_dir /home/brenta/scratch/data/VOC2007/ \
    --output_dir voc07

python tools/extract_features.py \
    --config_file configs/benchmark_tasks/image_classification/voc07/caffenet_bvlc_random_extract_features.yaml \
    --data_type train \
    --output_file_prefix trainval \
    --output_dir extract_features/random \
    TRAIN.DATA_FILE voc07/train_images.npy \
    TRAIN.LABELS_FILE voc07/train_labels.npy

python tools/svm/train_svm_kfold.py \
    --data_file extract_features/random/trainval_conv1_s4k19_resize_features.npy \
    --targets_data_file extract_features/random/trainval_conv1_s4k19_resize_targets.npy \
    --costs_list "0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1.0,10.0,100.0" \
    --output_path voc07_svm/svm_conv1/

python tools/svm/test_svm.py \
    --data_file extract_features/random/trainval_conv1_s4k19_resize_features.npy \
    --targets_data_file extract_features/random/trainval_conv1_s4k19_resize_targets.npy \
    --costs_list "0.0000001,0.000001,0.00001,0.0001,0.001,0.01,0.1,1.0,10.0,100.0" \
    --output_path voc07_svm/svm_conv1/

opened by jasonwei20 1

Size of extracted features

Could you point me to where the code resizes the extracted features?

For ResNet-50, I got 9216 extracted features after layers 1, 2, and 4, and 8192 features after layers 3 and 5. Where do these numbers come from?

Thank you and apologizes if this is a silly question!

opened by jasonwei20 1
Evaluating benchmark on simple imagenet pretrained model loaded from PyTorch

I'd like to plug in the PyTorch default ResNet model with ImageNet pretraining to see how it does on the dataset. I'm using the default torchvision.models.resnet.ResNet as my model, but I don't know how to save it in the right format to be read in. In other words, your functions save_model_params(model, params_file, checkpoint_dir, model_iter) and checkpoints.load_model_from_params_file(model takes in a model that seems to be a ModelBuilder class, and I'd like to load a a model of the class torchvision.models.resnet.ResNet .

Any idea on what the best way to do this is?

Thanks, Jason

opened by jasonwei20 1
Pretraining using a custom subset of Imagenet.
Thanks for sharing the code and paper!

I am looking to perform pre-training using Jigsaw(Resnet50) pretext method, just like what was mentioned in the paper(Table 2). But instead of using Imagenet1K, I would like to pretrained with an Imagenet subset of 10 classes. I also have only one GPU available.

My question:

Is the code for pre-training using Jigsaw(Resenet50) available? I ask this because I am only seeing codes and commands for Benchmark Tasks and Legacy Tasks. If pretrained code is available for Jigsaw(Resenet5), pls indicate where I can find them.

Many thanks and looking forward for your prompt reply!
enhancement
opened by chho-work 1
Preprocessing COCO - 'minival', 'valminusminival'
What are minival and valminusminival for the coco dataset? My COCO14 annotations folder looks like this:

captions_train2014.json instances_train2014.json person_keypoints_train2014.json captions_val2014.json instances_val2014.json person_keypoints_val2014.json

Therefore I get an error in when I try to preprocess COCO when running extra_scripts/create_coco_data_files.py:

partitions = ['val', 'train', 'minival', 'valminusminival'] for partition in partitions:

Do I need minival and valminusminival?
opened by jasonwei20 1
When have you applied Jigsaw, for pre-training Faster-RCNN
Hello,

I'm tiring to understand where did you apply the Jigsaw augmentation for object detection:

To the original image, before the backbone

To the proposed regions, suggested by the RPN, so on a feature map

Thanks in advance! And have a nice day
opened by DarioRugg 0
Is there a simple demo file, containing code to get the visual representation of a single image?
Is there someway that we can get a simple demo file, which can be used to get the visual representation of each image. something like

python3 demo.py --model=model_name --image=/path/to/image --pretrained=/path/to/pretrained/weights

This will return the visual representation of the image according to the model given. This will be very helpful, for beginners like me to just play around with representations.

Thanks!
opened by qureshinomaan 0

Scaling and Benchmarking Self-Supervised Visual Representation Learning

Related tags

Overview

FAIR Self-Supervision Benchmark is deprecated. Please see VISSL, a ground-up rewrite of benchmark in PyTorch.

FAIR Self-Supervision Benchmark

Introduction

License

Citation

Installation

Getting Started

Model Zoo

References

Comments

Owner

Meta Research

Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Self-supervised learning on Graph Representation Learning (node-level task)

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

A self-supervised 3D representation learning framework named viewpoint bottleneck.

A self-supervised 3D representation learning framework named viewpoint bottleneck.

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"