[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Overview

Unlearnable Examples

Code for ICLR2021 Spotlight Paper "Unlearnable Examples: Making Personal Data Unexploitable " by Hanxun Huang, Xingjun Ma, Sarah Monazam Erfani, James Bailey, Yisen Wang.

Quick Start

Use the QuickStart.ipynb notebook for a quick start.

In the notebook, you can find the minimal implementation for generating sample-wise unlearnable examples on CIFAR-10. Please remove mlconfig from models/__init__.py if you are only using the notebook and copy-paste the model to the notebook.

Experiments in the paper.

Check scripts folder for *.sh for each corresponding experiments.

Sample-wise noise for unlearnable example on CIFAR-10

Generate noise for unlearnable examples
python3 perturbation.py --config_path             configs/cifar10                \
                        --exp_name                path/to/your/experiment/folder \
                        --version                 resnet18                       \
                        --train_data_type         CIFAR-10                       \
                        --noise_shape             50000 3 32 32                  \
                        --epsilon                 8                              \
                        --num_steps               20                             \
                        --step_size               0.8                            \
                        --attack_type             min-min                        \
                        --perturb_type            samplewise                      \
                        --universal_stop_error    0.01
Train on unlearnable examples and eval on clean test
python3 -u main.py    --version                 resnet18                       \
                      --exp_name                path/to/your/experiment/folder \
                      --config_path             configs/cifar10                \
                      --train_data_type         PoisonCIFAR10                  \
                      --poison_rate             1.0                            \
                      --perturb_type            samplewise                      \
                      --perturb_tensor_filepath path/to/your/experiment/folder/perturbation.pt \
                      --train

Class-wise noise for unlearnable example on CIFAR-10

Generate noise for unlearnable examples
python3 perturbation.py --config_path             configs/cifar10                \
                        --exp_name                path/to/your/experiment/folder \
                        --version                 resnet18                       \
                        --train_data_type         CIFAR-10                       \
                        --noise_shape             10 3 32 32                     \
                        --epsilon                 8                              \
                        --num_steps               1                              \
                        --step_size               0.8                            \
                        --attack_type             min-min                        \
                        --perturb_type            classwise                      \
                        --universal_train_target  'train_subset'                 \
                        --universal_stop_error    0.1                            \
                        --use_subset
Train on unlearnable examples and eval on clean test
python3 -u main.py    --version                 resnet18                       \
                      --exp_name                path/to/your/experiment/folder \
                      --config_path             configs/cifar10                \
                      --train_data_type         PoisonCIFAR10                  \
                      --poison_rate             1.0                            \
                      --perturb_type            classwise                      \
                      --perturb_tensor_filepath path/to/your/experiment/folder/perturbation.pt \
                      --train

Cite Our Work

@inproceedings{huang2021unlearnable,
    title={Unlearnable Examples: Making Personal Data Unexploitable},
    author={Hanxun Huang and Xingjun Ma and Sarah Monazam Erfani and James Bailey and Yisen Wang},
    booktitle={ICLR},
    year={2021}
}
Comments
  • A problem with bi-level optimization in the article

    A problem with bi-level optimization in the article

    In the article, the author says "In order to find effective noise delta and unlearnable examples, the optimization steps for theta should be limited, compared to standard or adversarial training ".

    1. What is the meaning of using bi-level optimization? In traditional adversarial attack, most works are focusing on optimized on data but not on model.
    2. Why the limit on optimization step on model can help finding effective noises as in the article?

    I'm a new one studying on adversarial examples and I'm very interested in your brillant work, so my questions may sounds 'silly', but I'm really looking forward for your reply ! Thanks !

    Here's my email [email protected]

    opened by butybone 6
  • Several questions about this article

    Several questions about this article

    Hi, I'm a new one studying on adversarial examples, here I'd like to ask you for some questions. Q1: Is your scheme based on data poisoning?

    Q2: About the formula(2), it is said: "Note that the above bi-level optimization has two components that optimize the same objective. In order to find effective noise δ and unlearnable examples, the optimization steps for θ should be limited, compared to standard or adversarial training. Specifically, we optimize δ over Dc after every M steps of optimization of θ." Why optimize δ over Dc after every M steps of optimization of θ can help to find effective noise δ ? Does this strategy only work when the two min have a same objective?

    Q3: In section4.1, it is said:" However, in the sample-wise case, every sample has a different noise, and there is no explicit correlation between the noise and the label. In this case, only low-error samples can be ignored by the model, and normal and high-error examples have more positive impact on model learning than low-error examples. This makes error-minimizing noise more generic and effective in making data unlearnable." I know there is no explicit correlation between the noise in the sample-wise case. But why this makes error-minimizing noise more generic and effective in making data unlearnable? What does it mean?

    looking forward for your reply ! Thanks !

    opened by yyyliaQ 4
  • KeyError: 'train_subset'

    KeyError: 'train_subset'

    Hi, thanks for sharing your codes! I was able to run perturbation.py and main.py in the Sample-wise noise for unlearnable example on CIFAR-10 section. However, when I try to run perturbation.py in the Class-wise noise for unlearnable example on CIFAR-10 section, it raises the following error:

    Traceback (most recent call last):
      File "perturbation.py", line 483, in <module>
        main()
      File "perturbation.py", line 469, in main
        noise = universal_perturbation(noise_generator, trainer, evaluator, model, criterion, optimizer, scheduler, random_noise, ENV)
      File "perturbation.py", line 191, in universal_perturbation
        for i, (images, labels) in tqdm(enumerate(data_loader[args.universal_train_target]), total=len(data_loader[args.universal_train_target])):
    KeyError: 'train_subset'
    

    Thus, I add an option --universal_train_target train_dataset to fix this error. Is this right to get the class-wise perturbation? BTW, there are two typos (--perturb_type samplewse=>--perturb_type samplewise) in README.md.

    opened by hkunzhe 4
  • Why use custom models? Cannot reproduce with torchvision model

    Why use custom models? Cannot reproduce with torchvision model

    Hi, thank you for sharing your code!

    Is there any specific reason you chose to build your own models than using the models provided by torchvision?

    I am trying to reproduce the results in the quickstart notebook with a clean, default resnet-18 (torchvision.models.resnet18()), leaving every other code unchanged. It generates the error-minimizing noise normally. But on the training stage, it produces accuracies far higher (50%) than reported in the paper and in the notebook (Screenshot below). Also, when using your code to visualize the noise (the cell "Visualize Clean Images, Error-Minimizing Noise, Unlearnable Images"), it produces a black image in-place of the noise.

    However, when using your provided Resnet-18 model, I can reproduce your notebook's results, but generating the noise is far slower than using the torchvision resnet-18 (almost 2hrs on yours vs 20 minutes on torch model) .

    Inspecting your resnet code, I don't see any specific component that would purposefully slow it down. I did have to remove import mlconfig in your model's init and associated references to it because it seems to not be part of your package, and I was getting an import error otherwise

    Here is the training accuracy (on unlearnable train dataset) and test accuracy (on clean test dataset) on a torchvision Resnet-18 image

    opened by ajsanjoaquin 3
  • Questions about training casia-webface dataset

    Questions about training casia-webface dataset

    I'm using your InceptionResnet.yaml to train clean casia-webface, but got 0% acc, and 50%~55% acc when only using 50 num_classes of the same casia-webface dataset. Is the resullts reasonable?

    opened by butybone 1
  • Two problems in training code of ImageNetMini

    Two problems in training code of ImageNetMini

    1. In line 173 of main.py, 20% of the PoisonImageNetMini were selected as the training data but with "shuffle=True", then how could you make sure these data are exactly the same 20% of the data that have been used for generating the noise?

    2. I also find the same problem about the sample-wise noise as the one mentioned in https://github.com/HanxunH/Unlearnable-Examples/issues/5. in line 634 of dataset.py:

    if self.poison_samples[index]:
                noise = self.perturb_tensor[target]
                sample = sample + noise
                sample = np.clip(sample, 0, 255)
    

    so by using 'target' as the index, the perturb_tensor[target] will only select one of the top '0~99' components of the perturb_tensor, and add the same noise to all samples that are from the same [target] class. In this way, it definitely does it wrong but can lead to good results because it is doing class-wise noise.

    opened by ZhengyuZhao 1
  • Mismatch of the training data augmentation between QuickStart.ipynb and main.py

    Mismatch of the training data augmentation between QuickStart.ipynb and main.py

    Thanks for the interesting work and the detailed code.

    1. I may have noticed some mismatch of the training data augmentation between QuickStart.ipynb and main.py. Let's denote the clean image as x and the perturbations as noise, and the poisoned image as x'.

    In QuickStart.ipynb,

    unlearnable_train_dataset = datasets.CIFAR10(root='../datasets', train=True, download=True, transform=train_transform) perturb_noise = noise.mul(255).clamp(0, 255).permute(0, 2, 3, 1).to('cpu').numpy() unlearnable_train_dataset.data = unlearnable_train_dataset.data.astype(np.float32) for i in range(len(unlearnable_train_dataset)): unlearnable_train_dataset.data[i] += perturb_noise[i] unlearnable_train_dataset.data[i] = np.clip(unlearnable_train_dataset.data[i], a_min=0, a_max=255)_

    , which means that x' = train_transform(x)+noise.

    However, in the main.py, we can see that the input to the training process is the PoisonCIFAR10, which has been defined in line 376 of dataset.py. There, if I understand correctly, the construction of PoisonCIFAR10 by adding CIFAR10 and the perturbation.pt instead follows x' = train_transform(x+noise)

    Could you please confirm if my understanding is correct? If so, which version have you used for generating the results in your paper?

    1. By the way, I also notice that for generating the perturbations via perturbation.py, no train_transform has been used at all, could you please explain why it is the case?
    opened by ZhengyuZhao 1
  • A problem when training model on ImageNetMini

    A problem when training model on ImageNetMini

    Thanks for releasing the code. I found that you are using cls_id to choose the sample-wise noise from perturb_tensor. Should I change cls_id to the data index of the dataloader to choose the sample-wise noise from perturb_tensor?

    opened by czb2133 1
  • Some questions about training Inception-ResNet

    Some questions about training Inception-ResNet

    Hi, thanks for open source. I am very interested in your paper. I have some questions. When you use the WebFace dataset to train the Inception-ResNet network, how are the training set and test set divided?

    opened by jimo17 11
This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

GPRGNN This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network. Hidden state feature extraction i

Jianhao 92 Jan 3, 2023
ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

HPC-AI Tech 185 Jan 9, 2023
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 1, 2023
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

Microsoft 197 Jan 6, 2023
This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

Adversarial poison generation and evaluation. This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong

null 31 Nov 1, 2022
My personal Home Assistant configuration.

About This is my personal Home Assistant configuration. My guiding princile is to have full local control of all my devices. I intend everything to ru

Chris Turra 13 Jun 7, 2022
The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

DanceNet3D The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. Dataset and Results Pleas

南嘉Nanga 36 Dec 21, 2022
PyTorch Personal Trainer: My framework for deep learning experiments

Alex's PyTorch Personal Trainer (ptpt) (name subject to change) This repository contains my personal lightweight framework for deep learning projects

Alex McKinney 8 Jul 14, 2022
Robocop is your personal mini voice assistant made using Python.

Robocop-VoiceAssistant To use this project, you should have python installed in your system. If you don't have python installed, install it beforehand

Sohil Khanduja 3 Feb 26, 2022
An unofficial personal implementation of UM-Adapt, specifically to tackle joint estimation of panoptic segmentation and depth prediction for autonomous driving datasets.

Semisupervised Multitask Learning This repository is an unofficial and slightly modified implementation of UM-Adapt[1] using PyTorch. This code primar

Abhinav Atrishi 11 Nov 25, 2022
Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant.

Marvis v1.0 Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant. About M.A.R.V.I.S. J.A.R.V.I.S. is a fictional character

Reda Mastouri 1 Dec 29, 2021
Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval This repo provides personal implementation of paper Approximate Ne

John 8 Oct 7, 2022
Edison AT is software Depression Assistant personal.

Edison AT Edison AT is software / program Depression Assistant personal. Feature: Analyze emotional real-time from face. Audio Edison(Comingsoon relea

Ananda Rauf 2 Apr 24, 2022
My personal code and solution to the Synacor Challenge from 2012 OSCON.

Synacor OSCON Challenge Solution (2012) This repository contains my code and solution to solve the Synacor OSCON 2012 Challenge. If you are interested

null 2 Mar 20, 2022
RepVGG: Making VGG-style ConvNets Great Again

RepVGG: Making VGG-style ConvNets Great Again (PyTorch) This is a super simple ConvNet architecture that achieves over 80% top-1 accuracy on ImageNet

null 2.8k Jan 4, 2023
naked is a Python tool which allows you to strip a model and only keep what matters for making predictions.

naked is a Python tool which allows you to strip a model and only keep what matters for making predictions. The result is a pure Python function with no third-party dependencies that you can simply copy/paste wherever you wish.

Max Halford 24 Dec 20, 2022
《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

Train in Germany, Test in The USA: Making 3D Object Detectors Generalize This paper has been accpeted by Conference on Computer Vision and Pattern Rec

Xiangyu Chen 101 Jan 2, 2023
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
A tool for making map images from OpenTTD save games

OpenTTD Surveyor A tool for making map images from OpenTTD save games. This is not part of the main OpenTTD codebase, nor is it ever intended to be pa

Aidan Randle-Conde 9 Feb 15, 2022