A new test set for ImageNet

Last update: Dec 18, 2022

Related tags

Overview

ImageNetV2

The ImageNetV2 dataset contains new test data for the ImageNet benchmark. This repository provides associated code for assembling and working with ImageNetV2. The actual test sets are stored in a separate location.

ImageNetV2 contains three test sets with 10,000 new images each. Importantly, these test sets were sampled after a decade of progress on the original ImageNet dataset. This makes the new test data independent of existing models and guarantees that the accuracy scores are not affected by adaptive overfitting. We designed the data collection process for ImageNetV2 so that the resulting distribution is as similar as possible to the original ImageNet dataset. Our paper "Do ImageNet Classifiers Generalize to ImageNet?" describes ImageNetV2 and associated experiments in detail.

In addition to the three test sets, we also release our pool of candidate images from which the test sets were assembled. Each image comes with rich metadata such as the corresponding Flickr search queries or the annotations from MTurk workers.

The aforementioned paper also describes CIFAR-10.1, a new test set for CIFAR-10. It can be found in the following repository: https://github.com/modestyachts/CIFAR-10.1

Using the Dataset

Before explaining how the code in this repository was used to assemble ImageNetV2, we first describe how to load our new test sets.

Test Set Versions

There are currently three test sets in ImageNetV2:

Threshold0.7 was built by sampling ten images for each class among the candidates with selection frequency at least 0.7.
MatchedFrequency was sampled to match the MTurk selection frequency distribution of the original ImageNet validation set for each class.
TopImages contains the ten images with highest selection frequency in our candidate pool for each class.

In our code, we adopt the following naming convention: Each test set is identified with a string of the form

imagenetv2-<test-set-letter>-<revision-number>

for instance, imagenetv2-b-31. The Threshold0.7, MatchedFrequency, and TopImages have test set letters a, b, and c, respectively. The current revision numbers for the test sets are imagenetv2-a-44, imagenetv2-b-33, imagenetv2-c-12. We refer to our paper for a detailed description of these test sets and the review process underlying the different test set revisions.

Loading a Test Set

You can download the test sets from the following url: http://imagenetv2public.s3-website-us-west-2.amazonaws.com/. There is a link for each individual dataset and the ImageNet datasets must be decompressed before use.

To load the dataset, you can use the ImageFolder class in PyTorch on the extracted folder.

For instance, the following code loads the MatchedFrequency dataset:

from torchvision import datasets
datasets.ImageFolder(root='imagenetv2-matched-frequency')

Dataset Creation Pipeline

The dataset creation process has several stages outlined below. We describe the process here at a high level. If you have questions about any individual steps, please contact Rebecca Roelofs ([email protected]) and Ludwig Schmidt ([email protected]).

1. Downloading images from Flickr

In the first stage, we collected candidate images from the Flickr image hosting service. This requires a Flickr API key.

We ran the following command to search Flickr for images for a fixed list of wnids:

python flickr_search.py "../data/flickr_api_keys.json" \
                        --wnids "{wnid_list.json}" \
                        --max_images 200 \
                        --max_date_taken "2013-07-11"\
                        --max_date_uploaded "2013-07-11"\
                        --min_date_taken "2012-07-11"\
                        --min_date_uploaded "2012-07-11"

We refer to the paper for more details on which Flickr search parameters we used to complete our candidate pool.

The script outputs search result metadata, including the Flickr URLs returned for each query. This search result metadata is written to /data/search_results/.

We then stored the images to an Amazon S3 bucket using

python download_images_from_flickr.py ../data/search_results/{search_result.json} --batch --parallel

2. Create HITs

Similar to the original ImageNet dataset, we used Amazon Mechanical Turk (MTurk) to filter our pool of candidates. The main unit of work on MTurk is a HIT (Human Intelligence Tasks), which in our case consists of 48 images with a target class. The format of our HITs was derived from the original ImageNet HITs.

To submit a HIT, we performed the following steps. They require a configured MTurk account.

Encrypt all image URLs. This is necessary so that MTurk workers cannot identify whether an image is from the original validation set or our candidate pool by the source URL. python encrypt_copy_objects.py imagenet2candidates_mturk --strip_string ".jpg" --pywren
Run the image consistency check. This checks that all of the new candidate images have been stored to S3 and have encrypted URLs. python image_consistency_check.py
Generate hit candidates. This outputs a list of candidates to data/hit_candidates python generate_hit_candidates.py --num_wnids 1000
Submit live HITs to MTurk. bash make_hits_live.sh sample_args_10.json <username> <latest_hit_candidate_file>
Wait for prompt, and check if HTML file in the code/ directory looks correct.
Type in the word LIVE to confirm submitting the HITs to MTurk (this costs money).

The HIT metadata created by make_hits_live.sh is stored in data/mturk/hit_data_live/.

After a set of HITs was submitted, you can check their progress using python3 mturk.py show_hit_progress --live --hit_file ../data/mturk/hit_data_live/{hit.json}

Additionally, we occasionally used the Jupyter notebook inspect_hit.ipynb to visually examine the HITs we created. The code for this notebook is stored in inspect_hit_notebook_code.py.

3. Remove near duplicates

Next, we removed near-duplicates from our candidate pool. We checked for near-duplicates both within our new test set and between our new test set and the original ImageNet dataset.

To find near-duplicates, we computed the 30 nearest neighbors for each candidate image in three different metrics: l2 distance on raw pixels, l2 distance on features extracted from a pre-trained VGG model (fc7), and SSIM (structural similarity).

The fc7 metric requires that each image is featurized using the same pre-trained VGG model. The scripts featurize.py, feaurize_test.py and featurize_candidates.py were used to perform the fc7 featurization.

Next, we computed the nearest neighbors for each image. Each metric has a different starting script:

run_near_duplicate_checker_dssim.py
run_near_duplicate_checker_l2.py
run_near_duplicate_checker_fc7.py

All three scripts use near_duplicate_checker.py for the underlying computation.

The script test_near_duplicate_checker.sh was used to run the unit tests for the near duplicate checker contained in test_near_duplicate_checker.py.

Finally, we manually reviewed the nearest neighbor pairs using the notebook review_near_duplicates.ipynb. The file review_near_duplicates_notebook_code.py contains the code for this notebook. The review output is saved in data/metadata/nearest_neighbor_reviews_v2.json. All near duplicates that we found are saved in data/metadata/near_duplicates.json.

4. Sample Dataset

After we created a labeled candidate pool, we sampled the new test sets.

We use a separate bash script to sample each version of the dataset, i.e sample_dataset_type_{a}.sh. Each script calls sample_dataset.py and initialize_dataset_review.py with the correct arguments. The file dataset_sampling.py contains helper functions for the sampling procedure.

5. Review Final Dataset

For quality control, we added a final reviewing step to our dataset creation pipeline.

initialize_dataset_review.py initializes the metadata needed for each dataset review round.
final_dataset_inspection.ipynb is used to manually review dataset versions.
final_dataset_inspection_notebook_code.py contains the code needed for the final_dataset_inspection.ipynb notebook.
review_server.py is the review server used for additional cleaning of the candidate pool. The review server starts a web UI that allows one to browse all candidate images for a particular class. In addition, a user can easily flag images that are problematic or near duplicates.

The review server can use local, downloaded images if started with the flag python3 review_server.py --use_local_images. In addition, you also need to launch a separate static file server for serving the images. There is a script in data for starting the static file server ./start_file_server.sh.

The local images can be downloaded using

download_all_candidate_images_to_cache.py
download_dataset_images.py

Data classes

Our code base contains a set of data classes for working with various aspects of ImageNetV2.

imagenet.py: This file contains the ImageNetData class that provides metadata about ImageNet (a list of classes, etc.) and functionality for loading images in the original ImageNet dataset. The scripts generate_imagenet_metadata_pickle.py are used to assemble generate_class_info_file.py some of the metadata in the ImageNetData class.
candidate_data.py contains the CandidateData class that provides easy access to all candidate images in ImageNetV2 (both image data and metadata). The metadata file used in this class comes from generate_candidate_metadata_pickle.py.
image_loader.py provides a unified interface to loading image data from either ImageNet or ImageNetV2.
mturk_data.py provides the MTurkData class for accessing the results from our MTurk HITs. The data used by this class is assembled via generate_mturk_data_pickle.
near_duplicate_data.py loads and processes the information about near-duplicates in ImageNetV2. Some of the metadata is prepared with generate_review_thresholds_pickle.py.
dataset_cache.py allows easy loading of our various test set revisions.
prediction_data.py provides functionality for loading the predictions of various classification models on our three test sets.

The functionality provided by each data class is documented via examples in the notebooks folder of this repository.

Evaluation Pipeline

Finally, we describe our evaluation pipeline for the PyTorch models. The main file is eval.py, which can be invoked as follows:

python eval.py --dataset $DATASET --models $MODELS

where $DATASET is one of

imagenet-validation-original (the original validation set)
imagenetv2-b-33 (our new MatchedFrequency test set)
imagenetv2-a-44 (our new Threshold.7 test set)
imagenetv2-c-12 (our new TopImages test set).

The $MODELS parameter is a comma-separated list of model names in the torchvision or Cadene/pretrained-models.pytorch repositories. Alternatively, $MODELS can also be all, in which case all models are evaluated.

License

Unless noted otherwise in individual files, the code in this repository is released under the MIT license (see the LICENSE file). The LICENSE file does not apply to the actual image data. The images come from Flickr which provides corresponding license information. They can be used the same way as the original ImageNet dataset.

Comments

eval.py hanging
I followed the instructions and ran eval.py. It seems that everything works fine until it tries to download "metadata/candidate_metadata_2018-12-13_04-35-19_UTC.pickle" in candidate_data.py. It is hanging here for hours and I also try to use wget to manually get the data but it is 403 forbidden:

wget http://imagenet2datav2.s3.amazonaws.com/metadata/candidate_metadata_2018-12-13_04-35-19_UTC.pickle --2019-03-10 22:00:38-- http://imagenet2datav2.s3.amazonaws.com/metadata/candidate_m etadata_2018-12-13_04-35-19_UTC.pickle Resolving imagenet2datav2.s3.amazonaws.com (imagenet2datav2.s3.amazonaws.com)... 52.2 18.209.74 Connecting to imagenet2datav2.s3.amazonaws.com (imagenet2datav2.s3.amazonaws.com)|52. 218.209.74|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2019-03-10 22:00:38 ERROR 403: Forbidden.

Could you help me with this issue please?

Also what I tried to get is the frequency of manual labeled pictures that discussed in the paper. It is said that the matchedFrequency test set follows the estimation of frequency in original test set. But the imagenet data doesn't have this frequency information. May I know where I can also get the frequency information for the original test set please? (everything I need is to reproduce Figure 15-17 in the paper)

Thanks!
opened by keroro824 16
403 Forbidden Error in Download Link

Hello! First of all, thank you for sharing your work :) I tried downloading the dataset from the link specified in the README (http://imagenetv2public.s3-website-us-west-2.amazonaws.com/) but I'm getting a 403 forbidden error. Would you please look into this?

opened by ddoyoon 9
Are the labels the same with ImageNet?
I tried this dataset on some models trained with ImageNet but got extremely bad accuracy. Later, I find that the class index may not match with ImageNet. Did I do something wrong?

I nearly find the following relationship:

Index 0 in V2 ---> 782 screen, CRT screen in original ImageNet Index 1 in V2 ---> 263 Pembroke, Pembroke Welsh corgi in original ImageNet
opened by RogerNi 6
generate_class_info.py uses incorrect constructor for ImageNetData
Hello, I'm trying to get the mapping from class labels to IDs, and presumably this is done by running python generate_class_info_file.py from within the code directory (please let me know if this is wrong).

Upon running this, I hit the following error:

Traceback (most recent call last): File "generate_class_info_file.py", line 8, in <module> imgnet = imagenet.ImageNetData(load_class_info=False) TypeError: __init__() got an unexpected keyword argument 'load_class_info'

Indeed, imagenet.ImageNetData does not have a kwarg called "load_class_info". Perhaps this code is outdated?

If there is an easier way for me to simply get the map from label IDs to the original class name strings, please let me know. Thank you!
opened by mckinziebrandon 5
IS AWS needed for evaluation?

I try to use eval.py to replicate your experiments.

But I get hang on downloading the needed pickle file.

When I check the boto document, it says that you must provide access key and access token for using boto3.

Is AWS needed or there are some other ways to download the data?

opened by tzzcl 3
Make original dataset public

Could you please public the "original" dataset that is not scaled? I need it because flickr seems lost/change some of the images and it's hard to retrieve all non-scaled images now.

opened by tissue3 1
Wrongly labelled when using dataset.ImageFolder

Hello.

I found that for some OS system (my environment is Ubuntu20.04), the class_to_idx property of dataset.ImageFolder is not aligned with the directories' name, so it leads to wrongly label samples.

For instance, the directory 100 (str) is labelled with 2 (int) class. The easiest way to resolve the above issue is, from the dataset.ImageFolder source code (https://pytorch.org/vision/stable/_modules/torchvision/datasets/folder.html#ImageFolder), modifying the line in find_classes function class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)} with class_to_idx = {cls_name: int(cls_name) for cls_name in classes}.

opened by chaeunl 0

A new test set for ImageNet

Related tags

Overview

ImageNetV2

Using the Dataset

Test Set Versions

Loading a Test Set

Dataset Creation Pipeline

1. Downloading images from Flickr

2. Create HITs

3. Remove near duplicates

4. Sample Dataset

5. Review Final Dataset

Data classes

Evaluation Pipeline

License

Comments

eval.py hanging

403 Forbidden Error in Download Link

Are the labels the same with ImageNet?

generate_class_info.py uses incorrect constructor for ImageNetData

IS AWS needed for evaluation?

Make original dataset public

Wrongly labelled when using dataset.ImageFolder

Owner

Automatically download the cwru data set, and then divide it into training data set and test data set

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

The all new way to turn your boring vector meshes into the new fad in town; Voxels!

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Code for the paper "A Study of Face Obfuscation in ImageNet"

Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

PyTorch implementation of PNASNet-5 on ImageNet

Vanilla and Prototypical Networks with Random Weights for image classification on Omniglot and mini-ImageNet. Made with Python3.

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.