RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Overview

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

Website: https://robust.art

Paper: https://openreview.net/forum?id=wu1qmnC32fB

Document: https://robust.art/api

Leaderboard: http://robust.art/results

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial noises, which motivates the benchmark of model robustness. Existing benchmarks mainly focus on evaluating the defenses, but there are no comprehensive studies on how architecture design and general training techniques affect robustness. Comprehensively benchmarking their relationships will be highly beneficial for better understanding and developing robust DNNs. Therefore, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet (including open-source toolkit, pre-trained model zoo, datasets, and analyses) regarding ARchitecture design (44 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ general techniques, e.g., data augmentation) towards diverse noises (adversarial, natural, and system noises). Extensive experiments revealed and substantiated several insights for the first time, for example: (1) adversarial training largely improves the clean accuracy and all types of robustness for Transformers and MLP-Mixers; (2) with comparable sizes, CNNs > Transformers > MLP-Mixers on robustness against natural and system noises; Transformers > MLP-Mixers > CNNs on adversarial robustness; for some light-weight architectures (e.g., EfficientNet, MobileNetV2, and Mo- bileNetV3), increasing model sizes or using extra training data reduces robustness. Our benchmark http://robust.art/: (1) presents an open-source platform for conducting comprehensive evaluation on different robustness types; (2) provides a variety of pre-trained models that can be utilized for downstream applications; (3) proposes a new perspective to better understand the mechanism of DNNs towards designing robust architectures, backed up by comprehensive analysis. We will continuously contribute to build this open eco-system for the community.

Installation

You use conda to create a virtual environment to run this project.

git clone --recurse-submodules https://github.com/DIG-Beihang/RobustART.git
cd robustART
conda create --name RobustART python=3.6.9
conda activate RobustART
pip install -r requirements.txt

After this, you should installl pytorch and torchvision package which meet your GPU and CUDA version according to https://pytorch.org

Quick Start

Common Setting

If you want to use this project to train or evaluate model(s), you can choose to create a work directory for saving config, checkpoints, scripts etc.

We have put some example for trainging or evlaluate. You can use it as follows

cd exprs/exp/imagenet-a_o-loop
bash run.sh

Add Noise

You can use the AddNoise's add_noise function to add multiple noise for one image or a batch of images The supported noise list is: ['imagenet-s', 'imagenet-c', 'pgd_linf', 'pgd_l2', 'fgsm', 'autoattack_linf', 'mim_linf', 'pgd_l1']

Example of adding ImageNet-C noise for image

from RobustART.noise import AddNoise
NoiseClass = AddNoise(noise_type='imagenet-c')
# set the config of one kind of noise
NoiseClass.set_config(corruption_name='gaussian_noise')
image_addnoise = NoiseClass.add_noise(image='test_input.jpeg')

Training Pipeline

We provided cls_solver solver to train a model with a specific config

Example of using base config to train a resnet50

cd exprs/robust_baseline_exp/resnet/resnet50
#Change the python path to the root path
PYTHONPATH=$PYTHONPATH:../../../../
srun -n8 --gpu "python -u -m RobustART.training.cls_solver --config config.yaml"

Evaluation Pipeline

We evaluate model(s) of different dataset, we provides several solver to evaluate the model on one or some specific dataset(s)

Example of evaluation on ImageNet-A and ImageNet-O dataset

cd exprs/exp/imagenet-a_0-loop
#Change the python path to the root path
PYTHONPATH=$PYTHONPATH:../../../
srun -n8 --gpu "python -u -m RobustART.training.cls_solver --config config.yaml"

Metrics

We provided metrics APIs, so that you can use these APIs to evaluate results for ImageNet-A,O,P,C,S and Adv noise.

from RobustART.metrics import ImageNetAEvaluator
metric = ImageNetAEvaluator()
metric.eval(res_file)

Citation

@article{tang2021robustart,
title={RobustART: Benchmarking Robustness on Architecture Design and Training Techniques},
author={Shiyu Tang and Ruihao Gong and Yan Wang and Aishan Liu and Jiakai Wang and Xinyun Chen and Fengwei Yu and Xianglong Liu and Dawn Song and Alan Yuille and Philip H.S. Torr and Dacheng Tao},
journal={https://openreview.net/forum?id=wu1qmnC32fB},
year={2021}}
You might also like...
 Molecular Sets (MOSES): A benchmarking platform for molecular generation models
Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

ColossalAI-Benchmark - Performance benchmarking with ColossalAI

Benchmark for Tuning Accuracy and Efficiency Overview The benchmark includes our

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

This is the official repository for evaluation on the NoW Benchmark Dataset. The goal of the NoW benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods from a single image under variations in viewing angle, lighting, and common occlusions.
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

Comments
  • imagenet_s数据集生成问题

    imagenet_s数据集生成问题

    您好,请问关于imagenet_s噪声数据集怎么生成的呢,在imagenet_s_gen文件中以下两行应该如何配置呢? server_list_config_file = "/mnt/lustre/share/memcached_client/server_list.conf" client_config_file = "/mnt/lustre/share/memcached_client/client.conf"

    opened by Jialiang14 1
  • Questions about quick start

    Questions about quick start

    1、无Slurm的Single/Distributed GPU Train/Test如何设置?Debug提示prototype/prototype/utils/dist.py中KeyError:'SLURM_PROCID'? 2、训练和测试数据集路径示例代码config.yaml配置到/mnt/lustre/share下,数据如何准备,下载方式?有没有国内网盘下载链接?

    opened by magic-liu2021 0
  • PGD对抗训练读取数据使用的ImageNetTrainPipeV2问题

    PGD对抗训练读取数据使用的ImageNetTrainPipeV2问题

       在进行PGD对抗训练时,需要读取数据,而这用到了ImageNetTrainPipeV2,我无法成功运行prototype/data/pipelines/imagenet_pipeline_v2.py,
       在以下代码中:
       import nvidia.dali.ops as ops
       self.mc_input = ops.McReader(file_root=data_root,
                                                           file_list=data_list,
                                                           sampler_index=list(sampler))        
       在nvidia.dali.ops中没有McReader方法,这部分代码是如何实现的?        
       感谢您的帮助。
    
    opened by magic-liu2021 2
Owner
null
Code repository accompanying the paper "On Adversarial Robustness: A Neural Architecture Search perspective"

On Adversarial Robustness: A Neural Architecture Search perspective Preparation: Clone the repository: https://github.com/tdchaitanya/nas-robustness.g

Chaitanya Devaguptapu 4 Nov 10, 2022
code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

null 39 Dec 17, 2022
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022
FedScale: Benchmarking Model and System Performance of Federated Learning

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper) This repository contains scripts and instructions of building FedSca

null 268 Jan 1, 2023
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

ModelNet-C Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions". For the latest updates, see: sites.google.com

Jiawei Ren 45 Dec 28, 2022
Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

Haohe Liu (刘濠赫) 84 Dec 20, 2022
Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

Map Metrics for Trajectory Quality Map metrics toolkit provides a set of metrics to quantitatively evaluate trajectory quality via estimating consiste

Mobile Robotics Lab. at Skoltech 31 Oct 28, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

Libo Qin 25 Sep 6, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Libo Qin 12 Sep 26, 2021