MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts (ICLR 2022)

Overview

MetaShift: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts

Website shields.io Documentation Status MIT license OpenReview Python 3.6 Pytorch Made withJupyter GitHub stars

This repo provides the PyTorch source code of our paper: MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts (ICLR 2022). [PDF] [ICLR 2022 Video] [Slides] [HuggingFace]

Project website: https://MetaShift.readthedocs.io/

@InProceedings{liang2022metashift,
  title={MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts},
  author={Weixin Liang and James Zou},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=MTex8qKavoS}
}

This repo provides the scripts for generating the proposed MetaShift, which offers a resource of 1000s of distribution shifts.

Abstract

Understanding the performance of machine learning model across diverse data distributions is critically important for reliable applications. Motivated by this, there is a growing focus on curating benchmark datasets that capture distribution shifts. While valuable, the existing benchmarks are limited in that many of them only contain a small number of shifts and they lack systematic annotation about what is different across different shifts. We present MetaShift---a collection of 12,868 sets of natural images across 410 classes---to address this challenge. We leverage the natural heterogeneity of Visual Genome and its annotations to construct MetaShift. The key construction idea is to cluster images using its metadata, which provides context for each image (e.g. cats with cars or cats in bathroom) that represent distinct data distributions. MetaShift has two important benefits: first it contains orders of magnitude more natural data shifts than previously available. Second, it provides explicit explanations of what is unique about each of its data sets and a distance score that measures the amount of distribution shift between any two of its data sets. We demonstrate the utility of MetaShift in benchmarking several recent proposals for training models to be robust to data shifts. We find that the simple empirical risk minimization performs the best when shifts are moderate and no method had a systematic advantage for large shifts. We also show how MetaShift can help to visualize conflicts between data subsets during model training.

Figure 1: Example Cat vs. Dog Images from MetaShift. For each class, MetaShift provides many subsets of data, each of which corresponds different contexts (the context is stated in parenthesis).

Figure 2: Infographics of MetaShift.

Figure 3: Meta-graph: visualizing the diverse data distributions within the “cat” class.

Repo Structure Overview

.
├── README.md
├── dataset/
    ├── meta_data/ 
    ├── generate_full_MetaShift.py
    ├── ...         
├── experiments/
    ├── subpopulation_shift/              
        ├── main_generalization.py
        ├── ...

The dataset folder provides the script for generating MetaShift. The experiments folder provides the expriments on MetaShift in the paper.

Dependencies

  • Python 3.6.13 (e.g. conda create -n venv python=3.6.13)
  • PyTorch Version: 1.4.0
  • Torchvision Version: 0.5.0

Download Visual Genome

We leveraged the natural heterogeneity of Visual Genome and its annotations to construct MetaShift. Download the pre-processed and cleaned version of Visual Genome by GQA.

  • Download image files (~20GB) and scene graph annotations:
wget -c https://nlp.stanford.edu/data/gqa/images.zip
unzip images.zip -d allImages
wget -c https://nlp.stanford.edu/data/gqa/sceneGraphs.zip  
unzip sceneGraphs.zip -d sceneGraphs
  • After this step, the base dataset file structure should look like this:
/data/GQA/
    allImages/
        images/
            <ID>.jpg
    sceneGraphs/
        train_sceneGraphs.json
        val_sceneGraphs.json
  • Specify local path of Visual Genome Extract the files, and then specify the folder path (e.g., IMAGE_DATA_FOLDER=/data/GQA/allImages/images/) in Constants.py.

Generate the Full MetaShift Dataset (subsets defined by contextual objects)

Understanding dataset/meta_data/full-candidate-subsets.pkl

The metadata file dataset/meta_data/full-candidate-subsets.pkl is the most important piece of metadata of MetaShift, which provides the full subset information of MetaShift. To facilitate understanding, we have provided a notebook dataset/understanding_full-candidate-subsets-pkl.ipynb to show how to extract information from it.

Basically, the pickle file stores a collections.defaultdict(set) object, which contains 17,938 keys. Each key is a string of the subset name like dog(frisbee), and the corresponding value is a list of the IDs of the images that belong to this subset. The image IDs can be used to retrieve the image files from the Visual Genome dataset that you just downloaded. In our current version, 13,543 out of 17,938 subsets have more than 25 valid images. In addition, dataset/meta_data/full-candidate-subsets.pkl is drived from the scene graph annotation, so check it out if your project need additional information about each image.

Generate Full MetaShift

Since the total number of all subsets is very large, all of the following scripts only generate a subset of MetaShift. As specified in dataset/Constants.py, we only generate MetaShift for the following classes (subjects). You can add any additional classes (subjects) into the list. See dataset/meta_data/class_hierarchy.json for the full object vocabulary and its hierarchy. SELECTED_CLASSES = [ 'cat', 'dog', 'bus', 'truck', 'elephant', 'horse', 'bowl', 'cup', ]

In addition, to save storage, all copied images are symbolic links. You can set use_symlink=True in the code to perform actual file copying. If you really want to generate the full MetaShift, then set ONLY_SELECTED_CLASSES = True in dataset/Constants.py.

cd dataset/
python generate_full_MetaShift.py

The following files will be generated by executing the script. Modify the global varaible SUBPOPULATION_SHIFT_DATASET_FOLDER to change the destination folder.

/data/MetaShift/MetaDataset-full
├── cat/
    ├── cat(keyboard)/
    ├── cat(sink)/ 
    ├── ... 
├── dog/
    ├── dog(surfboard) 
    ├── dog(boat)/ 
    ├── ...
├── bus/ 
├── ...

Beyond the generated MetaShift dataset, the scipt also genervates the meta-graphs for each class in dataset/meta-graphs.

.
├── README.md
├── dataset/
    ├── generate_full_MetaShift.py
    ├── meta-graphs/             (generated meta-graph visualization) 
        ├──  cat_graph.jpg
        ├──  dog_graph.jpg
        ├──  ...
    ├── ...         

Bonus: Generate the MetaShift-Attributes Dataset (subsets defined by subject attributes)

Figure: Example Subsets based on object attribute contexts. the attribute is stated in parenthesis). MetaShift covers attributes including activity (e.g., sitting, jumping), color (e.g., orange, white), material (e.g., wooden, metallic), shape (e.g., round, square), and so on.

Understanding dataset/attributes_MetaShift/attributes-candidate-subsets.pkl

dataset/attributes_MetaShift/attributes-candidate-subsets.pkl stores the metadata for MetaShift-Attributes, where each subset is defined by the attribute of the subject, e.g. cat(orange), cat(white), dog(sitting), dog(jumping).

attributes-candidate-subsets.pkl has the same data format as full-candidate-subsets.pkl. To facilitate understanding, we have provided a notebook dataset/attributes_MetaShift/understanding_attributes-candidate-subsets-pkl.ipynb to show how to extract information from it.

Basically, the pickle file stores a collections.defaultdict(set) object, which contains 4,962 keys. Each key is a string of the subset name like cat(orange), and the corresponding value is a list of the IDs of the images that belong to this subset. The image IDs can be used to retrieve the image files from the Visual Genome dataset that you just downloaded.

Understanding dataset/attributes_MetaShift/structured-attributes-candidate-subsets.pkl

dataset/attributes_MetaShift/structured-attributes-candidate-subsets.pkl is very similar to dataset/attributes_MetaShift/attributes-candidate-subsets.pkl, but stores the metadata in a more structured way. The pickle file stores a 3-level nested dictionary, with the following structure:

.
├── key: 'color'
    ├── key: 'cat'              
        ├── key: 'orange'
            ├── value: a list of image IDs
├── key: 'activity'
    ├── key: 'dog'              
        ├── key: 'sitting'
            ├── value: a list of image IDs
        ├── ...

See the full attrribute ontology in ATTRIBUTE_CONTEXT_ONTOLOGY in dataset/Constants.py

ATTRIBUTE_CONTEXT_ONTOLOGY = {
 'darkness': ['dark', 'bright'],
 'dryness': ['wet', 'dry'],
 'colorful': ['colorful', 'shiny'],
 'leaf': ['leafy', 'bare'],
 'emotion': ['happy', 'calm'],
 'sports': ['baseball', 'tennis'],
 'flatness': ['flat', 'curved'],
 'lightness': ['light', 'heavy'],
 'gender': ['male', 'female'],
 'width': ['wide', 'narrow'],
 'depth': ['deep', 'shallow'],
 'hardness': ['hard', 'soft'],
 'cleanliness': ['clean', 'dirty'],
 'switch': ['on', 'off'],
 'thickness': ['thin', 'thick'],
 'openness': ['open', 'closed'],
 'height': ['tall', 'short'],
 'length': ['long', 'short'],
 'fullness': ['full', 'empty'],
 'age': ['young', 'old'],
 'size': ['large', 'small'],
 'pattern': ['checkered', 'striped', 'dress', 'dotted'],
 'shape': ['round', 'rectangular', 'triangular', 'square'],
 'activity': ['waiting', 'staring', 'drinking', 'playing', 'eating', 'cooking', 'resting', 
              'sleeping', 'posing', 'talking', 'looking down', 'looking up', 'driving', 
              'reading', 'brushing teeth', 'flying', 'surfing', 'skiing', 'hanging'],
 'pose': ['walking', 'standing', 'lying', 'sitting', 'running', 'jumping', 'crouching', 
            'bending', 'smiling', 'grazing'],
 'material': ['wood', 'plastic', 'metal', 'glass', 'leather', 'leather', 'porcelain', 
            'concrete', 'paper', 'stone', 'brick'],
 'color': ['white', 'red', 'black', 'green', 'silver', 'gold', 'khaki', 'gray', 
            'dark', 'pink', 'dark blue', 'dark brown',
            'blue', 'yellow', 'tan', 'brown', 'orange', 'purple', 'beige', 'blond', 
            'brunette', 'maroon', 'light blue', 'light brown']
}

Section 4.2: Evaluating Subpopulation Shifts

Run the python script dataset/subpopulation_shift_cat_dog_indoor_outdoor.py to reproduce the MetaShift subpopulation shift dataset (based on Visual Genome images) in the paper.

cd dataset/
python subpopulation_shift_cat_dog_indoor_outdoor.py

The python script generates a “Cat vs. Dog” dataset, where the general contexts “indoor/outdoor” have a natural spurious correlation with the class labels.

The following files will be generated by executing the python script dataset/subpopulation_shift_cat_dog_indoor_outdoor.py.

Output files (mixed version: for reproducing experiments)

/data/MetaShift/MetaShift-subpopulation-shift
├── imageID_to_group.pkl
├── train/
    ├── cat/             (more cat(indoor) images than cat(outdoor))
    ├── dog/             (more dog(outdoor) images than cat(indoor)) 
├── val_out_of_domain/
    ├── cat/             (cat(indoor):cat(outdoor)=1:1)
    ├── dog/             (dog(indoor):dog(outdoor)=1:1) 

where imageID_to_group.pkl is a dictionary with 4 keys : 'cat(outdoor)', 'cat(outdoor)', 'dog(outdoor)', 'dog(outdoor)'. The corresponding value of each key is the list of the names of the images that belongs to that subset. Modify the global varaible SUBPOPULATION_SHIFT_DATASET_FOLDER to change the destination folder. You can tune the NUM_MINORITY_IMG to control the amount of subpopulation shift.

Output files (unmixed version, for other potential uses)

To facilitate other potential uses, we also outputs an unmixed version, where we output the 'cat(outdoor)', 'cat(outdoor)', 'dog(outdoor)', 'dog(outdoor)' into 4 seperate folders. Modify the global varaible CUSTOM_SPLIT_DATASET_FOLDER to change the destination folder.

/data/MetaShift/MetaShift-Cat-Dog-indoor-outdoor
├── imageID_to_group.pkl
├── train/
    ├── cat/             (all cat(indoor) images)
        ├── cat(indoor)/
    ├── dog/             (all dog(outdoor) images) 
        ├── dog(outdoor)/
├── test/
    ├── cat/             (all cat(outdoor) images)
        ├── cat(outdoor)/
    ├── dog/             (all dog(indoor) images) 
        ├── dog(indoor)/

Appendix D: Constructing MetaShift from COCO Dataset

The notebook dataset/extend_to_COCO/coco_MetaShift.ipynb reproduces the COCO subpopulation shift dataset in paper Appendix D. Executing the notebook would construct a “Cat vs. Dog” task based on COCO images, where the “indoor/outdoor” contexts are spuriously correlated with the class labels.

Install COCO Dependencies

Install pycocotools (for evaluation on COCO):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

COCO Data preparation

2017 Train/Val annotations [241MB]

2017 Train images [118K/18GB]

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

/home/ubuntu/data/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Modify the global varaible IMAGE_DATA_FOLDER to change the COCO image folder.

Output files (mixed version: for reproducing experiments)

The following files will be generated by executing the notebook.

/data/MetaShift/COCO-Cat-Dog-indoor-outdoor
├── imageID_to_group.pkl
├── train/
    ├── cat/
    ├── dog/ 
├── val_out_of_domain/
    ├── cat/
    ├── dog/ 

where imageID_to_group.pkl is a dictionary with 4 keys : 'cat(outdoor)', 'cat(outdoor)', 'dog(outdoor)', 'dog(outdoor)'. The corresponding value of each key is the list of the names of the images that belongs to that subset. Modify the global varaible CUSTOM_SPLIT_DATASET_FOLDER to change the destination folder.

Section 4.1: Evaluating Domain Generalization

Run the python script dataset/domain_generalization_cat_dog.py to reproduce the MetaShift domain generalization dataset (based on Visual Genome images) in the paper.

cd dataset/
python domain_generalization_cat_dog.py

Output files (cat vs. dog, unmixed version)

The following files will be generated by executing the python script dataset/domain_generalization_cat_dog.py. Modify the global varaible CUSTOM_SPLIT_DATASET_FOLDER to change the COCO image folder.

/data/MetaShift/Domain-Generalization-Cat-Dog
├── train/
    ├── cat/
        ├── cat(sofa)/              (The cat training data is always cat(\emph{sofa + bed}) ) 
        ├── cat(bed)/               (The cat training data is always cat(\emph{sofa + bed}) )
    ├── dog/
        ├── dog(cabinet)/           (Experiment 1: the dog training data is dog(\emph{cabinet + bed}))
        ├── dog(bed)/               (Experiment 1: the dog training data is dog(\emph{cabinet + bed}))

        ├── dog(bag)/               (Experiment 2: the dog training data is dog(\emph{bag + box}))
        ├── dog(box)/               (Experiment 2: the dog training data is dog(\emph{bag + box}))

        ├── dog(bench)/             (Experiment 3: the dog training data is dog(\emph{bench + bike}))
        ├── dog(bike)/              (Experiment 3: the dog training data is dog(\emph{bench + bike}))

        ├── dog(boat)/              (Experiment 4: the dog training data is dog(\emph{boat + surfboard}))
        ├── dog(surfboard)/         (Experiment 4: the dog training data is dog(\emph{boat + surfboard}))

├── test/
    ├── dog/
        ├── dog(shelf)/             (The test set we used in the paper)
        ├── dog(sofa)/             
        ├── dog(grass)/             
        ├── dog(vehicle)/             
        ├── dog(cap)/                         
    ├── cat/
        ├── cat(shelf)/
        ├── cat(grass)/
        ├── cat(sink)/
        ├── cat(computer)/
        ├── cat(box)/
        ├── cat(book)/

Code for Distribution Shift Experiments

The python script experiments/distribution_shift/main_generalization.py is the entry point for running the distribution shift experiemnts for Section 4.2 (Evaluating Subpopulation Shifts) and Appendix D (Constructing MetaShift from COCO Dataset), and Section 4.1 (Evaluating Domain Generalization). As a running example, the default value for --data in argparse is /data/MetaShift/MetaShift-subpopulation-shift (i.e., for Section 4.2).

clear && CUDA_VISIBLE_DEVICES=3 python main_generalization.py --num-domains 2 --algorithm ERM 
clear && CUDA_VISIBLE_DEVICES=4 python main_generalization.py --num-domains 2 --algorithm GroupDRO 
clear && CUDA_VISIBLE_DEVICES=5 python main_generalization.py --num-domains 2 --algorithm IRM 
clear && CUDA_VISIBLE_DEVICES=6 python main_generalization.py --num-domains 2 --algorithm CORAL 
clear && CUDA_VISIBLE_DEVICES=7 python main_generalization.py --num-domains 2 --algorithm CDANN 

Our code is based on the DomainBed, as introduced in In Search of Lost Domain Generalization. The codebase also provides many additional algorithms. Many thanks to the authors and developers!

Comments
  • How to reproduce the results of table 1 (domain generalization experiment)?

    How to reproduce the results of table 1 (domain generalization experiment)?

    Hello, I am trying to reproduce the results of table 1 in section 4.1, Evaluating domain generalization.

    I have managed to generate the datasets through domain_generalization_cat_dog.py (obtaining the directory structure showed in the readme) but it is not clear to me how I am supposed to adjust the main_generalization.py script to run that domain generalization experiment.

    I assume I have to adjust that file from your instructions:

    Code for Distribution Shift Experiments

    The python script experiments/distribution_shift/main_generalization.py is the entry point for running the distribution shift experiemnts for Section 4.2 (Evaluating Subpopulation Shifts) and Appendix D (Constructing MetaShift from COCO Dataset), and Section 4.1 (Evaluating Domain Generalization). As a running example, the default value for --data in argparseis /data/MetaShift/MetaShift-subpopulation-shift (i.e., for Section 4.2).

    In main_generalization.py , changing the data argument from .../data/MetaShift/MetaShift-subpopulation-shift to data/MetaShift/Domain-Generalization-Cat-Dog --i.e. the directory containing the data generated by domain_generalization_cat_dog.py-- is not enough, I receive an error about a missing imageID_to_group.pkl. Indeed the data directory is organized differently from the subpopulation-shift experiment (which I successfully reproduced).

    Am I missing something?

    opened by noranta4 3
  • dog(ocean) or ocean(dog)?

    dog(ocean) or ocean(dog)?

    Thank you for this great work!

    I was recently playing with understanding_full-candidate-subsets-pkl.ipynb and I found that full-candidate-subsets.pkl also contains some general contexts subsets like ocean(dog) (as mentioned in the .ipynb). Thus I tried to search for dog(ocean) yet with no outcomes. Similarly I found some other subsets, e.g., dog(cat) and cat(dog), are coupled as a two-way style. Here my questions are

    1. What is the logic behind involving these two-way subsets like dog(cat) and cat(dog)? Is it based on the object saliency such that the images in the subset can be depicted as foreground(background)?
    2. Not all two-way subsets are presented in full-candidate-subsets.pkl, e.g., there is ocean(dog), yet no dog(ocean). Is it simply because the dog(ocean) images are less than 25, or if the answer to Q1 is yes, dogs are overall less salient in images?

    Thanks for your attention!

    opened by muliyangm 2
  • How to select subsets of indoor and outdoor in subpopulation experiment

    How to select subsets of indoor and outdoor in subpopulation experiment

    Hi,

    Thanks for contributing the great project!

    I have a question regarding how you selected the subsets belonging to indoor and outdoor. While we can find corresponding subset name specified by attributes like "dog(white)" in attributes-candidate-subsets.pkl, it looks like you specified the indoor and outdoor manually in this file.

    I was wondering how you generate train_set_scheme and test_set_scheme. What if we want to select cat & dog with other contexts in GENERAL_CONTEXT_ONTOLOGY, e.g., cat(bedroom)? I also noticed there is a file obj2attribute.json. Could you please provide some instructions on how to utilize it?

    Thanks, Ting

    opened by litingfeng 1
  • Spectral layout dimensions

    Spectral layout dimensions

    Hello, thanks for making your code and dataset available!

    I have a question about the file domain_generalization_cat_dog.py at line https://github.com/Weixin-Liang/MetaShift/blob/916421d572bde555615bfa71b2ff8f1da76ac8a3/dataset/domain_generalization_cat_dog.py#L222-L225 Can you please guide me how do you pick the dimension for the spectral layout? I wonder because I was changing the datasets and selecting less subsets which raised an error on the dimension number.

    Thanks.

    opened by IbtihalFerwana 1
  • Cannot reproduce the results of Table 2

    Cannot reproduce the results of Table 2

    Thank you for sharing the MetaShift dataset.

    However, I have some trouble reproducing the results of Table 2 (Subpopulation shift).

    For p=6%, ERM shows 83.507% average accuracy and 73.6% worst-case accuracy (for iteration 40) which are much higher than the results reported in Table 2.

    Also, for other algorithms, every result of Table 2 is not matched with my own results.

    Is this because of the version of my python packages?

    Environment:
    	Python: 3.8.11
    	PyTorch: 1.7.1
    	Torchvision: 0.8.2
    	CUDA: 11.0
    	CUDNN: 8005
    	NumPy: 1.20.3
    	PIL: 8.3.1
    Args:
    	algorithm: ERM
    	batch_size: 32
    	data: MetaShift/dataset/data/MetaShift/MetaShift-subpopulation-shift
    	hparams: None
    	hparams_seed: 0
    	log_prefix: 
    	num_classes: 2
    	num_domains: 2
    	output_dir: train_output
    	save_model_every_checkpoint: False
    	seed: 0
    	skip_model_save: False
    	workers: 4
    train_dataset.samples reverse: [('cat(indoor)', 800), ('dog(outdoor)', 800), ('cat(outdoor)', 50), ('dog(indoor)', 50)]
    self.domain_to_groups {0: {'cat': ['cat(indoor)'], 'dog': ['dog(indoor)']}, 1: {'cat': ['cat(outdoor)'], 'dog': ['dog(outdoor)']}}
    HParams:
    	batch_size: 32
    	class_balanced: False
    	data_augmentation: True
    	lr: 5e-05
    	nonlinear_classifier: False
    	resnet18: True
    	resnet_dropout: 0.0
    	weight_decay: 0.0
    step_vals {'loss': 0.7582564353942871}
    Iteration: 0
    out-of-domain val
    accuracy 0.543 	 roc_auc_score 0.693
    confusion_matrix
    [[ 31 257]
     [  6 282]]
    classification_report
                  precision    recall  f1-score   support
    
               0       0.84      0.11      0.19       288
               1       0.52      0.98      0.68       288
    
        accuracy                           0.54       576
       macro avg       0.68      0.54      0.44       576
    weighted avg       0.68      0.54      0.44       576
    
    VAL * [email protected] 54.340
     * [email protected] 54.340 [email protected] 0.000
    accuracy 0.993 	 size: 144 	 dog(indoor)
    accuracy 0.965 	 size: 144 	 dog(outdoor)
    accuracy 0.118 	 size: 144 	 cat(indoor)
    accuracy 0.097 	 size: 144 	 cat(outdoor)
    step_vals {'loss': 0.8029559254646301}
    step_vals {'loss': 0.6127275228500366}
    step_vals {'loss': 0.5663400888442993}
    step_vals {'loss': 0.5361894965171814}
    step_vals {'loss': 0.5080360770225525}
    step_vals {'loss': 0.502351701259613}
    step_vals {'loss': 0.45667174458503723}
    step_vals {'loss': 0.4510887563228607}
    step_vals {'loss': 0.4313385784626007}
    step_vals {'loss': 0.41102027893066406}
    step_vals {'loss': 0.4396398067474365}
    step_vals {'loss': 0.35080477595329285}
    step_vals {'loss': 0.3561045825481415}
    step_vals {'loss': 0.4412407875061035}
    step_vals {'loss': 0.3722797632217407}
    step_vals {'loss': 0.40952974557876587}
    step_vals {'loss': 0.3348669111728668}
    step_vals {'loss': 0.28301629424095154}
    step_vals {'loss': 0.2869456112384796}
    step_vals {'loss': 0.2969983220100403}
    Iteration: 20
    out-of-domain val
    accuracy 0.835 	 roc_auc_score 0.913
    confusion_matrix
    [[250  38]
     [ 57 231]]
    classification_report
                  precision    recall  f1-score   support
    
               0       0.81      0.87      0.84       288
               1       0.86      0.80      0.83       288
    
        accuracy                           0.84       576
       macro avg       0.84      0.84      0.83       576
    weighted avg       0.84      0.84      0.83       576
    
    VAL * [email protected] 83.507
     * [email protected] 83.507 [email protected] 0.000
    accuracy 0.931 	 size: 144 	 cat(indoor)
    accuracy 0.896 	 size: 144 	 dog(outdoor)
    accuracy 0.806 	 size: 144 	 cat(outdoor)
    accuracy 0.708 	 size: 144 	 dog(indoor)
    step_vals {'loss': 0.27039286494255066}
    step_vals {'loss': 0.30204248428344727}
    step_vals {'loss': 0.21312294900417328}
    step_vals {'loss': 0.2437962144613266}
    step_vals {'loss': 0.25198787450790405}
    step_vals {'loss': 0.23842911422252655}
    step_vals {'loss': 0.1691025346517563}
    step_vals {'loss': 0.24762478470802307}
    step_vals {'loss': 0.28251034021377563}
    step_vals {'loss': 0.14738459885120392}
    step_vals {'loss': 0.18546883761882782}
    step_vals {'loss': 0.22910372912883759}
    step_vals {'loss': 0.21566125750541687}
    step_vals {'loss': 0.1549062728881836}
    step_vals {'loss': 0.13150382041931152}
    step_vals {'loss': 0.1629757136106491}
    step_vals {'loss': 0.21351677179336548}
    step_vals {'loss': 0.1662304550409317}
    step_vals {'loss': 0.13489097356796265}
    step_vals {'loss': 0.15391628444194794}
    Iteration: 40
    out-of-domain val
    accuracy 0.835 	 roc_auc_score 0.905
    confusion_matrix
    [[239  49]
     [ 46 242]]
    classification_report
                  precision    recall  f1-score   support
    
               0       0.84      0.83      0.83       288
               1       0.83      0.84      0.84       288
    
        accuracy                           0.84       576
       macro avg       0.84      0.84      0.84       576
    weighted avg       0.84      0.84      0.84       576
    
    VAL * [email protected] 83.507
     * [email protected] 83.507 [email protected] 0.000
    accuracy 0.944 	 size: 144 	 dog(outdoor)
    accuracy 0.924 	 size: 144 	 cat(indoor)
    accuracy 0.736 	 size: 144 	 cat(outdoor)
    accuracy 0.736 	 size: 144 	 dog(indoor)
    step_vals {'loss': 0.09984487295150757}
    step_vals {'loss': 0.12095910310745239}
    step_vals {'loss': 0.15282633900642395}
    step_vals {'loss': 0.24499759078025818}
    step_vals {'loss': 0.17233917117118835}
    step_vals {'loss': 0.1805627942085266}
    step_vals {'loss': 0.29616454243659973}
    step_vals {'loss': 0.18791015446186066}
    step_vals {'loss': 0.13349810242652893}
    step_vals {'loss': 0.17763575911521912}
    step_vals {'loss': 0.14355479180812836}
    step_vals {'loss': 0.2498779296875}
    step_vals {'loss': 0.08334649354219437}
    step_vals {'loss': 0.23552735149860382}
    step_vals {'loss': 0.1597958207130432}
    step_vals {'loss': 0.1945687234401703}
    step_vals {'loss': 0.12305081635713577}
    step_vals {'loss': 0.18060629069805145}
    step_vals {'loss': 0.10387007147073746}
    step_vals {'loss': 0.18083493411540985}
    Iteration: 60
    out-of-domain val
    accuracy 0.823 	 roc_auc_score 0.910
    confusion_matrix
    [[232  56]
     [ 46 242]]
    classification_report
                  precision    recall  f1-score   support
    
               0       0.83      0.81      0.82       288
               1       0.81      0.84      0.83       288
    
        accuracy                           0.82       576
       macro avg       0.82      0.82      0.82       576
    weighted avg       0.82      0.82      0.82       576
    
    VAL * [email protected] 82.292
     * [email protected] 82.292 [email protected] 0.000
    accuracy 0.965 	 size: 144 	 dog(outdoor)
    accuracy 0.903 	 size: 144 	 cat(indoor)
    accuracy 0.715 	 size: 144 	 dog(indoor)
    accuracy 0.708 	 size: 144 	 cat(outdoor)
    step_vals {'loss': 0.21658587455749512}
    step_vals {'loss': 0.21305663883686066}
    step_vals {'loss': 0.10273241996765137}
    step_vals {'loss': 0.08029960840940475}
    step_vals {'loss': 0.1491691768169403}
    step_vals {'loss': 0.18964910507202148}
    step_vals {'loss': 0.13241422176361084}
    step_vals {'loss': 0.1288807988166809}
    step_vals {'loss': 0.11977404356002808}
    step_vals {'loss': 0.09882873296737671}
    step_vals {'loss': 0.15328539907932281}
    step_vals {'loss': 0.18410827219486237}
    step_vals {'loss': 0.17586088180541992}
    step_vals {'loss': 0.08180069923400879}
    step_vals {'loss': 0.12042209506034851}
    step_vals {'loss': 0.15169812738895416}
    step_vals {'loss': 0.1118776723742485}
    step_vals {'loss': 0.10913677513599396}
    step_vals {'loss': 0.078999362885952}
    step_vals {'loss': 0.17687441408634186}
    Iteration: 80
    out-of-domain val
    accuracy 0.819 	 roc_auc_score 0.905
    confusion_matrix
    [[236  52]
     [ 52 236]]
    classification_report
                  precision    recall  f1-score   support
    
               0       0.82      0.82      0.82       288
               1       0.82      0.82      0.82       288
    
        accuracy                           0.82       576
       macro avg       0.82      0.82      0.82       576
    weighted avg       0.82      0.82      0.82       576
    
    VAL * [email protected] 81.944
     * [email protected] 81.944 [email protected] 0.000
    accuracy 0.965 	 size: 144 	 dog(outdoor)
    accuracy 0.951 	 size: 144 	 cat(indoor)
    accuracy 0.688 	 size: 144 	 cat(outdoor)
    accuracy 0.674 	 size: 144 	 dog(indoor)
    
    opened by JunhyunB 1
  • What is the dimension of the embeddings?

    What is the dimension of the embeddings?

    Hello, thank you for the nice and instructive work.

    There are two questions I'd like to ask.

    First, what is the dimension (K in the paper) of each embedding? Second, when assigning one embedding for each node, what package did you use, Sklearn or others?

    Thanks !

    opened by hehao13 1
  • Missing metadata in full-candidate-subsets.pkl

    Missing metadata in full-candidate-subsets.pkl

    Hello, thank you for the nice work and congratulations for the ICLR publication!

    I'm playing with the understanding_full-candidate-subsets-pkl.ipynb notebook. Exploring the metadata stored in full-candidate-subsets.pkl, some keys seem to be missing. For instance the ones about attribute contexts:

    IN

    print(VG_node_name_to_img_id['dog(sitting)'])
    print(VG_node_name_to_img_id['dog(jumping)'])
    print(VG_node_name_to_img_id['cat(orange)'])
    print(VG_node_name_to_img_id['cat(white)'])
    

    OUT

    set()
    set()
    set()
    set()
    

    image

    Other examples: dog(ocean) and dog(street) are missing, dog(grass) is not.

    IN

    print(VG_node_name_to_img_id['dog(ocean)'])
    print(VG_node_name_to_img_id['dog(grass)'])
    print(VG_node_name_to_img_id['dog(street)'])
    

    OUT

    set()
    {'2403124', '2386620', '2330800', '2377375', '2395806', '2402433', '2338955', ...
    set()
    

    image

    opened by noranta4 1
  • Comment: How do we know that the ERM in DomainBed is performing implicit upsampling?

    Comment: How do we know that the ERM in DomainBed is performing implicit upsampling?

    Question

    I was playing a lot with the DomainBed library recently trying to figure out why methods based on regularization fail so loudly or also why ERM performs that good. The results presented in metashift shed light (and hope!) on this problem, showing that regularization methods outperform ERM when the domain distance is enough, which makes a lot of sense.

    Indeed, you also make an interesting comment about the unconventional ERM implementation in DomainBed in the appendix, namely that it implicitly up-samples minority domains during training.

    Can you point me to where you spotted this implicit upsampling? I have searched in the dataloaders and the training.py script without success. I want to make sure there are not significant differences in the upsampling domain policies of ERM and other algorithms.

    Answer

    You can find the implicit sampling code here:

    https://github.com/Weixin-Liang/MetaShift/blob/main/experiments/distribution_shift/main_generalization.py#L137-L148

    This function generates one batch of training data. Within each batch, the number of samples from each domain is equal, thereby doing implicit sampling.

    opened by Weixin-Liang 0
  • Domain generalization experiment

    Domain generalization experiment

    Hi, Thanks for the dataset, it looks very interesting. I am having a lot of trouble how to reproduce your results. The README says that main_generalization allows to reproduce experiments of Section 4.1, but it looks from issue https://github.com/Weixin-Liang/MetaShift/issues/2 that there is in fact quite some changes to be done. I'd suggest updating the README to guide through reproducing 4.1

    Also, for the current main_generalization.py file reproducing if I understand correctly experiments in 4.2, where is the test data? Is it the validation data? Where is the test loop in main_generalization.py?

    opened by DianeBouchacourt 0
  • environments choices and sub-contexts

    environments choices and sub-contexts

    Hi @Weixin-Liang,

    Where should I find the exact choices of contexts for the domain generalization experiments, for example, for cat and dog experiments, the code has 'cat(sofa)': {'cat(cup)', 'cat(sofa)', 'cat(chair)'},, but what are the choices for bus(clock) and the rest ?

    opened by IbtihalFerwana 0
  • distances of the domain generalization experiment

    distances of the domain generalization experiment

    Hello,

    How to reproduce the distances of Table 1 in the paper ? When running the script of dataset/domain_generalization_cat_dog.py, I get

    Distance from dog(cabinet)+dog(bed) to dog(shelf): 0.025
    

    instead of d=0.44

    Is there specific data to be included or removed?

    opened by IbtihalFerwana 3
Owner
Ph.D. Student in Computer Science at Stanford University
null
CrossNorm and SelfNorm for Generalization under Distribution Shifts (ICCV 2021)

CrossNorm (CN) and SelfNorm (SN) (Accepted at ICCV 2021) This is the official PyTorch implementation of our CNSN paper, in which we propose CrossNorm

null 98 Oct 11, 2022
PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

Chongzhi Zhang 73 Nov 9, 2022
Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples / ICLR 2018

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples This project is for the paper "Training Confidence-Calibrated Clas

null 167 Nov 23, 2022
Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

Google Research 133 Oct 27, 2022
Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology Sharon Zhou, Eric Zelikman

Stanford Machine Learning Group 34 Nov 16, 2022
[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

AMOS This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: Pretraining Text Encoders wi

Microsoft 22 Sep 15, 2022
Distributionally robust neural networks for group shifts

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization This code implements the g

null 147 Nov 21, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

Dataset Cartography Code for the paper Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics at EMNLP 2020. This repository cont

AI2 119 Nov 17, 2022
Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Facebook Research 171 Nov 23, 2022
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

TL;DR Ignite is a high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. Click on the image to

null 4.1k Nov 28, 2022
An implementation of a discriminant function over a normal distribution to help classify datasets.

CS4044D Machine Learning Assignment 1 By Dev Sony, B180297CS The question, report and source code can be found here. Github Repo Solution 1 Based on t

Dev Sony 6 Nov 9, 2021
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie_recs Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Coll

ShopRunner 94 Sep 2, 2022
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Collie do

ShopRunner 94 Sep 2, 2022
torchlm is aims to build a high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations

??A high level pipeline for face landmarks detection, supports training, evaluating, exporting, inference and 100+ data augmentations, compatible with torchvision and albumentations, can easily install with pip.

DefTruth 132 Nov 15, 2022
Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets This is the official PyTorch implementation for the paper Rapid Neural A

null 46 Sep 24, 2022
The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).

DeepBDC for few-shot learning        Introduction In this repo, we provide the implementation of the following paper: "Joint Distribution Matters: Dee

FeiLong 107 Nov 21, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 65 Nov 22, 2022