[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Overview

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF]

Language grade: Python MIT licensed

Wuyang Chen, Xinyu Gong, Zhangyang Wang

In ICLR 2021.

Overview

We present TE-NAS, the first published training-free neural architecture search method with extremely fast search speed (no gradient descent at all!) and high-quality performance.

Highlights:

  • Trainig-free and label-free NAS: we achieved extreme fast neural architecture search without a single gradient descent.
  • Bridging the theory-application gap: We identified two training-free indicators to rank the quality of deep networks: the condition number of their NTKs, and the number of linear regions in their input space.
  • SOTA: TE-NAS achieved extremely fast search speed (one 1080Ti, 20 minutes on NAS-Bench-201 space / four hours on DARTS space on ImageNet) and maintains competitive accuracy.

Prerequisites

  • Ubuntu 16.04
  • Python 3.6.9
  • CUDA 10.1 (lower versions may work but were not tested)
  • NVIDIA GPU + CuDNN v7.3

This repository has been tested on GTX 1080Ti. Configurations may need to be changed on different platforms.

Installation

  • Clone this repo:
git clone https://github.com/chenwydj/TENAS.git
cd TENAS
  • Install dependencies:
pip install -r requirements.txt

Usage

0. Prepare the dataset

  • Please follow the guideline here to prepare the CIFAR-10/100 and ImageNet dataset, and also the NAS-Bench-201 database.
  • Remember to properly set the TORCH_HOME and data_paths in the prune_launch.py.

1. Search

NAS-Bench-201 Space

python prune_launch.py --space nas-bench-201 --dataset cifar10 --gpu 0
python prune_launch.py --space nas-bench-201 --dataset cifar100 --gpu 0
python prune_launch.py --space nas-bench-201 --dataset ImageNet16-120 --gpu 0

DARTS Space (NASNET)

python prune_launch.py --space darts --dataset cifar10 --gpu 0
python prune_launch.py --space darts --dataset imagenet-1k --gpu 0

2. Evaluation

  • For architectures searched on nas-bench-201, the accuracies are immediately available at the end of search (from the console output).
  • For architectures searched on darts, please use DARTS_evaluation for training the searched architecture from scratch and evaluation.

Citation

@inproceedings{chen2020tenas,
  title={Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective},
  author={Chen, Wuyang and Gong, Xinyu and Wang, Zhangyang},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

Acknowledgement

Comments
  • Calculating number of linear regions

    Calculating number of linear regions

    Dear authors,

    I am having a question for calculating number of linear regions. It seems that in TE-NAS, input images are augmented to be of size (1000,1,3,3): lrc_model = Linear_Region_Collector(input_size=(1000, 1, 3, 3), sample_batch=3, dataset=xargs.dataset, data_path=xargs.data_path, seed=xargs.rand_seed)

    Could you explain what the reason is behind this?

    opened by YiteWang 6
  • NTK calculation incorrect for networks with multiple outputs?

    NTK calculation incorrect for networks with multiple outputs?

    Howdy!

    In: https://github.com/VITA-Group/TENAS/blob/main/lib/procedures/ntk.py

    on line 45:

    logit[_idx:_idx+1].backward(torch.ones_like(logit[_idx:_idx+1]), retain_graph=True)
    

    I am confused about your calculation of the NTK, and believe that you may be misusing the first argument of the torch.Tensor.backward() function.

    E.g.: when playing with the codebase with a very small 8 parameter network with 2 outputs:

    class small(torch.nn.Module):
        def __init__(self,):
            super(small, self).__init__() 
            self.d1 = torch.nn.Linear(2,2,bias=False)
            self.d2 = torch.nn.Linear(2,2,bias=False)
        def forward(self, x):
            x = self.d1(x)
            x = self.d2(x)
            return x
    

    Where for this explanation I have modified to:

    gradient = torch.ones_like(logit[_idx:_idx+1])
    gradient[0,0] = a
    gradient[0,1] = b
    logit[_idx:_idx+1].backward(gradient, retain_graph=True)
    

    whereby J I mean your 'grad' list for a single network:

    e.g.: lines 45 & 46:

    grads = [torch.stack(_grads, 0) for _grads in grads]
    ntks = [torch.einsum('nc,mc->nm', [_grads, _grads]) for _grads in grads]
    print('J: ',grads)
    

    for

    gradient[0,0] = 0
    gradient[0,1] = 1
    

    J: [tensor([[-0.6255, -0.5019, 0.1758, 0.1411, 0.0000, 0.0000, -0.0727, -0.4643], [ 0.9368, -0.0947, -0.2633, 0.0266, 0.0000, 0.0000, 0.0955, -0.0812]])]

    =======

    for

    gradient[0,0] = 1
    gradient[0,1] = 0
    

    J: [tensor([[ 0.1540, 0.1236, -0.6473, -0.5194, -0.0727, -0.4643, 0.0000, 0.0000], [-0.2307, 0.0233, 0.9694, -0.0980, 0.0955, -0.0812, 0.0000, 0.0000]])]

    =======

    for

    gradient[0,0] = 1
    gradient[0,1] = 1
    

    J: [tensor([[-0.4715, -0.3783, -0.4715, -0.3783, -0.0727, -0.4643, -0.0727, -0.4643], [ 0.7061, -0.0714, 0.7062, -0.0714, 0.0955, -0.0812, 0.0955, -0.0812]])]

    """

    And so you can verify that your code is adding the two components together to get the last result.

    The problem is that your Jacobian should have size: number_samples x [(number_outputs x number_weights)] ; See your own paper, page 2, where you show that the Jacobian's components are defined on the subscript i, the ith output of the model.

    If I am right, then any network that has multiple outputs would have their NTK values incorrectly calculated, would have a time and memory footprint that is systematically reduced by the fact that these gradients are being pooled together.

    opened by awe2 5
  • Question about training time?

    Question about training time?

    I m training CIFAR10 dataset on a machine with a GPU(2080Ti).

    According to the paper,

    TE-NAS achieves a test error of 2.63%, ranking among the top of recent NAS results, but meanwhile largely reduces the search cost to only 30 minutes.

    However, it takes me about 1hr to execute 80% of the process. image

    The command I use is python prune_launch.py --space darts --dataset cifar10 --gpu 1 following the instructions. Do I miss anything during the training?

    opened by mru4913 3
  • Running NTK/LRC multiple times gives very inconsistent results

    Running NTK/LRC multiple times gives very inconsistent results

    Hi there,

    I am trying to reproduce the NTK and LRC functions that you have in your code and when I run the NTK for 3 (or 5) repeated runs with the same settings and input model, I get vastly different results, ie:

    ntk_original, ntk
    896.1322631835938 828.4542236328125
    1274.0692138671875 1108.636962890625
    890.8836059570312 1008.2345581054688
    

    I would love to get a better sense of what the NTK actually does and how we can get consistent results.

    Also do we need to initialize with kaiming? what is the point of this initialization and is there an alternative (ie. xavier, zero, none).

    opened by Priahi 2
  • For DARTS search space-with Imagenet-1k dataset, the time reported in the paper (4 hours) is not even close when replicated.

    For DARTS search space-with Imagenet-1k dataset, the time reported in the paper (4 hours) is not even close when replicated.

    For DARTS search space-with Imagenet-1k dataset, the time reported in the paper (4 hours) is not even close when replicated on Tesla v100 (it should be lower than 4 hours as v100 is much faster than 1080 Ti).

    It takes around 10 hours with the batch_size 24 (as mentioned in the code).

    What do you think might be the issue?

    opened by oshindutta 1
  • Linear_Region_Collector

    Linear_Region_Collector

    Dear authors,

    We're using Linear_Region_Collector class to count the number of linear regions in our network. However, it always returns the same result(batch size * sample_batch). Could be please give us some information about the suitable way of using it?

    opened by Yasaman-Haghighi 1
  • Where to find architectures searched for ImageNet

    Where to find architectures searched for ImageNet

    Nice work! I really enjoy your work! I am wondering whether you have any plans to release the architectures searched for ImageNet in your paper. Thanks ahead!

    opened by MingLin-home 1
  • How to eval the Darts?

    How to eval the Darts?

    I followed the instructions and successfully obtained "arch_parameter.npy". It took me around 2hrs to search a network from cifar10 dataset. How do I reuse your code to evaluate the performance of the network?

    BTW, image

    opened by mru4913 1
Owner
VITA
Visual Informatics Group @ University of Texas at Austin
VITA
[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

VITA 101 Dec 29, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Self-Damaging Contrastive Learning Introduction The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervis

VITA 51 Dec 29, 2022
[ICML 2021] "Graph Contrastive Learning Automated" by Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

Graph Contrastive Learning Automated PyTorch implementation for Graph Contrastive Learning Automated [talk] [poster] [appendix] Yuning You, Tianlong C

Shen Lab at Texas A&M University 80 Nov 23, 2022
[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

VITA 40 Dec 30, 2022
[Preprint] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

Chasing Sparsity in Vision Transformers: An End-to-End Exploration Codes for [Preprint] Chasing Sparsity in Vision Transformers: An End-to-End Explora

VITA 64 Dec 8, 2022
[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Codes for this paper: [CVPR 2022] The Pr

VITA 16 Nov 26, 2022
"SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image", Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Humphrey Shi, Zhangyang Wang

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image [Paper] [Website] Pipeline Code Environment pip install -r requirements

VITA 250 Jan 5, 2023
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

F-Clip — Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang

Xili Dai 115 Dec 28, 2022
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022
Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
SuMa++: Efficient LiDAR-based Semantic SLAM (Chen et al IROS 2019)

SuMa++: Efficient LiDAR-based Semantic SLAM This repository contains the implementation of SuMa++, which generates semantic maps only using three-dime

Photogrammetry & Robotics Bonn 701 Dec 30, 2022
Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Geometric Vector Perceptron Implementation of Geometric Vector Perceptron, a simple circuit with 3d rotation equivariance for learning over large biom

Phil Wang 59 Nov 24, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022