The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Overview

PASS: Pictures without humAns for Self-Supervised Pretraining

TL;DR: An ImageNet replacement dataset for self-supervised pretraining without humans

img.png

Content

PASS is a large-scale image dataset that does not include any humans, human parts, or other personally identifiable information that can be used for high-quality pretraining while significantly reducing privacy concerns.

pass.gif

Download the dataset

Generally: all information is on our webpage.

For downloading the dataset, please visit our dataset on zenodo. There you can download it in tar files and find the meta-data.

You can also download the images from their AWS urls, from here.

Pretrained models

Pretraining Method Epochs Places205 lin. Acc. Model weights
IN-1k MoCo-v2 200 50.1 R50 weights
PASS MoCo-v2 200 52.8 R50 weights
PASS MoCo-v2-CLD 200 53.1 R50 weights
PASS SwAV 200 55.5 R50 weights
PASS DINO 100 X ViT S16 weights
PASS DINO 300 coming soon
PASS MoCo-v2 800 coming soon

Contribute your models

Please let us know if you have a model pretrained on this dataset and I will add this to the list above.

Citation

@Article{asano21pass,
author = "Yuki M. Asano and Christian Rupprecht and Andrew Zisserman and Andrea Vedaldi",
title = "PASS: An ImageNet replacement for self-supervised pretraining without humans",
journal = "NeurIPS Track on Datasets and Benchmarks",
year = "2021"
} 
You might also like...
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer Ever felt tired after preprocessing the dataset, and not wanting to write any code further to train your model? Ever encountered a situation where you wanted to record the hyperparameters of the trained model and able to retrieve it afterward? Models Playground is here to help you do that. Models playground allows you to train your models right from the browser. VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

Pretraining Representations For Data-Efficient Reinforcement Learning

Pretraining Representations For Data-Efficient Reinforcement Learning Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Ch

This is the official repo for TransFill:  Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.
Comments
  • Part 3, 4 are missing

    Part 3, 4 are missing

    Hello @yukimasano! đŸ‘‹đŸ»

    I was trying to download the dataset, but found part 3 and 4 were missing. I guess I can download the images from the AWS url but I prefer to download them in zip files. Would you mind uploading the missing parts?

    Thanks so much for sharing this amazing dataset!

    opened by junhsss 1
  • tags / labels for the images

    tags / labels for the images

    Hello guys,

    Thanks so much for releasing a large scale dataset for open-source community, even for commercial use. I was wondering if there are also some tags associated with each image, like say 'indoor', 'outdoor', 'beach', 'sports' etc. These tags would be very helpful in selecting subsets of images based on the task for training.

    Thanks!

    opened by ankuPRK 1
  • Sharing Pretrained models with the Hugging Face Hub

    Sharing Pretrained models with the Hugging Face Hub

    Hi @yukimasano !

    The work you're doing with PASS is very exciting.

    I see you currently have models hosted in your own server. Would you be interested in also sharing your models in the Hugging Face Hub? The Hub offers free hosting of over 25K models, and it would make your work more accessible and visible to the rest of the ML community.

    Some of the benefits of sharing your models through the Hub would be:

    • wider reach of your work to the ecosystem
    • versioning, commit history and diffs
    • repos provide useful metadata about their tasks, languages, metrics, etc that make them discoverable
    • multiple features from TensorBoard visualizations, PapersWithCode integration, and more

    Creating the repos and adding new models should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.

    Happy to hear your thoughts, Omar and the Hugging Face team

    cc @nateraw

    opened by osanseviero 2
Owner
Yuki M. Asano
I'm an Computer Vision researcher at the University of Amsterdam. Did my PhD at the Visual Geometry Group in Oxford.
Yuki M. Asano
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Elad Amrani 24 Dec 21, 2022
Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

Colorado Reed 53 Nov 9, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML ‱ About SpaceML ‱ Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

S2VC Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations. In thi

null 81 Dec 15, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 ?? is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

é˜żæ‰ 73 Dec 16, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

null 4 Jul 12, 2021
[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SapBERT: Self-alignment pretraining for BERT This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining

Cambridge Language Technology Lab 104 Dec 7, 2022
Code for generating a single image pretraining dataset

Single Image Pretraining of Visual Representations As shown in the paper A critical analysis of self-supervision, or what we can learn from a single i

Yuki M. Asano 12 Dec 19, 2022