Human annotated noisy labels for CIFAR-10 and CIFAR-100.

Overview

Dataloader for CIFAR-N

CIFAR-10N

noise_label = torch.load('./data/CIFAR-10_human.pt')
clean_label = noise_label['clean_label']
worst_label = noise_label['worse_label']
aggre_label = noise_label['aggre_label']
random_label1 = noise_label['random_label1']
random_label2 = noise_label['random_label2']
random_label3 = noise_label['random_label3']

CIFAR-100N

noise_label = torch.load('./data/CIFAR-100_human.pt')
clean_label = noise_label['clean_label']
noisy_label = noise_label['noisy_label']

Training on CIFAR-N with the Cross-Entropy loss

CIFAR-10N

# NOISE_TYPE: [clean, aggre, worst, rand1, rand2, rand3]
# Use human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar10 --noise_type NOISE_TYPE --is_human
# Use the synthetic noise that has the same noise transition matrix as human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar10 --noise_type NOISE_TYPE

CIFAR-100N

# NOISE_TYPE: [clean100, noisy100]
# Use human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar100 --noise_type NOISE_TYPE --is_human
# Use the synthetic noise that has the same noise transition matrix as human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar100 --noise_type NOISE_TYPE

Additional dataset information

We include additional side information during the noisy-label collection in side_info_cifar10N.csv and side_info_cifar100N.csv. A brief introduction of these two files:

  • Image-batch: a subset of indexes of the CIFAR training images.
  • Worker-id: the encrypted worker id on Amazon Mechanical Turk.
  • Work-time-in-seconds: the time (in seconds) a worker spent on annotating the corresponding image batch.
Comments
  • Noise files for Tensorflow are 0 bytes

    Noise files for Tensorflow are 0 bytes

    Hi,

    I see the files below are 0 bytes.

    CIFAR-100_human_ordered.npy CIFAR-10_human_ordered.npy

    I noticed this after running below from the instructions in the readme:

    import numpy as np
    noise_file = np.load('./data/CIFAR-100_human_ordered.npy', allow_pickle=True)
    clean_label = noise_file.item().get('clean_label')
    noise_label = noise_file.item().get('noise_label')
    # The noisy label matches with following tensorflow dataloader
    train_ds, test_ds = tfds.load('cifar100', split=['train','test'], as_supervised=True, batch_size = -1)
    train_images, train_labels = tfds.as_numpy(train_ds) 
    # You may want to replace train_labels by CIFAR-N noisy label sets
    

    UnpicklingError: Failed to interpret file './data/CIFAR-100_human_ordered.npy' as a pickle

    Thanks

    opened by JohnsonKuan 2
  • Where are the individual labeling information for CIFAR-100?

    Where are the individual labeling information for CIFAR-100?

    Hello,

    Can you share the individual labeling information of each labeler for CIFAR-100? (e.g., noise_file['random_label2'])

    It will be great if they are provided as CIFAR-10-N.

    Thank you.

    opened by yeachan-kr 1
  • Reproduction of the performance on

    Reproduction of the performance on "worst" label of "CE".

    Hello,

    I'm trying to reproduce the performance of "CE" in the paper.

    In "worst" label noise, the paper reports the test accuracy of 77.69 (in CIFAR-10-N).

    However, when I run the provided code on my machine, the last test epoch accuracy is only 67.89 and it seems overfitting occurs to the training noisy labels.

    Did you use validation set for evaluation? Or could you point out if I'm missing something?

    Also, there's a discrepancy on the learning rate scheduling between the paper and code.

    Learning rate decay is applied in 60th epoch in the code, but the paper says that it is applied in 50th epoch.

    Could you check about it?

    Thank you.

    image
    opened by Seojin-Kim 0
  • How to load side information in Tensorflow?

    How to load side information in Tensorflow?

    Hi,

    I would like to load the side information and associate it with the correct sample in Tensorflow. What order do the side_info_cifar10N.csv and side_info_cifar100N.csv follow, the one of the PyTorch or the Tensorflow files?

    And if they don't come in the Tensorflow order, should I load them like this?

    import numpy as np
    import pandas as pd
    
    noise_file = np.load('./data/CIFAR-10_human_ordered.npy', allow_pickle=True)
    random_label1 = noise_file.item().get('random_label1')
    
    train_ds, test_ds = tfds.load('cifar10', split=['train','test'], as_supervised=True, batch_size = -1)
    train_images, train_labels = tfds.as_numpy(train_ds) 
    
    side_info_df = pd.read_csv('side_info_cifar10N.csv')
    worker1_id = df['Worker1-id'].to_numpy()
    
    # Reorder side information with correct order
    image_order = np.load('image_order_c10.npy')
    worker1_id_ordered = worker1_id[image_order // 10]
    
    # Now, the indexing of all arrays matches correctly
    first_example = (train_images[0], train_labels[0], worker1_id_ordered[0])
    

    Thank you very much!

    opened by gortizji 0
  • Do we have to use validation sets?

    Do we have to use validation sets?

    Dear game initiator, I noticed that the official code divides the verification set. Do we also need to divide the verification set? Can we train on the complete training set and then test on the test set?

    opened by choresefree 2
Owner
REAL@UCSC
REsponsible & Accountable Learning (REAL) @ University of California, Santa Cruz
REAL@UCSC
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

ProSelfLC: CVPR 2021 ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks For any specific discussion or potential fu

amos_xwang 57 Dec 4, 2022
CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

Frederick Wang 3 Apr 26, 2022
Training a deep learning model on the noisy CIFAR dataset

Training-a-deep-learning-model-on-the-noisy-CIFAR-dataset This repository contai

null 1 Jun 14, 2022
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

null 55 Nov 23, 2022
A curated (most recent) list of resources for Learning with Noisy Labels

A curated (most recent) list of resources for Learning with Noisy Labels

Jiaheng Wei 321 Jan 9, 2023
A GOOD REPRESENTATION DETECTS NOISY LABELS

A GOOD REPRESENTATION DETECTS NOISY LABELS This code is a PyTorch implementation of the paper: Prerequisites Python 3.6.9 PyTorch 1.7.1 Torchvision 0.

REAL@UCSC 64 Jan 4, 2023
NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

[Official] FINE Samples for Learning with Noisy Labels This repository is the official implementation of "FINE Samples for Learning with Noisy Labels"

mythbuster 27 Dec 23, 2022
Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

The official code for the NeurIPS 2021 paper Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

null 13 Dec 22, 2022
Official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels".

WarPI The official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels". Run python main.py --corruption_type

Haoliang Sun 3 Sep 3, 2022
A Light CNN for Deep Face Representation with Noisy Labels

A Light CNN for Deep Face Representation with Noisy Labels Citation If you use our models, please cite the following paper: @article{wulight, title=

Alfred Xiang Wu 715 Nov 5, 2022
Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

PGDF This repo is the official implementation of our paper "Sample Prior Guided Robust Model Learning to Suppress Noisy Labels ". Citation If you use

CVSM Group -  email: czhu@bupt.edu.cn 22 Dec 23, 2022
Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

LUPerson-NL Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL) The repository is for our CVPR2022 paper Large-Scale

null 43 Dec 26, 2022
Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds

Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds Xinxin Zuo, Sen Wang, Minglun Gong, Li Cheng Prerequisites We have tested the code on Ubun

null 41 Dec 12, 2022
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

null 93 Nov 6, 2022
Annotated notes and summaries of the TensorFlow white paper, along with SVG figures and links to documentation

TensorFlow White Paper Notes Features Notes broken down section by section, as well as subsection by subsection Relevant links to documentation, resou

Sam Abrahams 437 Oct 9, 2022
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

3D AffordanceNet This repository is the official experiment implementation of 3D AffordanceNet benchmark. 3D AffordanceNet is a 3D point cloud benchma

null 49 Dec 1, 2022
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

Shayne O'Brien 471 Dec 16, 2022
NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions Overview NUANCED is a user-centric conversational recommen

Facebook Research 18 Dec 28, 2021