code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Overview

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling

This repository contains our PyTorch training code, evaluation code and pretrained models for AttentiveNAS.

[Update 06/21] Recenty, we have improved AttentiveNAS using an adaptive knowledge distillation training strategy, see our AlphaNet repo for more details of this work. AlphaNet has been accepted by ICML'21.

[Update 07/21] We provide an example code for searching the best models of FLOPs vs. accuracy trade-offs at here.

For more details, please see AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling by Dilin Wang, Meng Li, Chengyue Gong and Vikas Chandra.

If you find this repo useful in your research, please consider citing our work:

@article{wang2020attentivenas,
  title={AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling},
  author={Wang, Dilin and Li, Meng and Gong, Chengyue and Chandra, Vikas},
  journal={arXiv preprint arXiv:2011.09011},
  year={2020}
}

Evaluation

To reproduce our results:

  • Please first download our pretrained AttentiveNAS models from a Google Drive path and put the pretrained models under your local folder ./attentive_nas_data

  • To evaluate our pre-trained AttentiveNAS models, from AttentiveNAS-A0 to A6, on ImageNet with a single GPU, please run:

    python test_attentive_nas.py --config-file ./configs/eval_attentive_nas_models.yml --model a[0-6]

    Expected results:

    Name MFLOPs Top-1 (%)
    AttentiveNAS-A0 203 77.3
    AttentiveNAS-A1 279 78.4
    AttentiveNAS-A2 317 78.8
    AttentiveNAS-A3 357 79.1
    AttentiveNAS-A4 444 79.8
    AttentiveNAS-A5 491 80.1
    AttentiveNAS-A6 709 80.7

Training

To train our AttentiveNAS models from scratch, please run

python train_attentive_nas.py --config-file configs/train_attentive_nas_models.yml --machine-rank ${machine_rank} --num-machines ${num_machines} --dist-url ${dist_url}

We adopt SGD training on 64 GPUs. The mini-batch size is 32 per GPU; all training hyper-parameters are specified in train_attentive_nas_models.yml.

Additional data

  • A (sub-network config, FLOPs) lookup table could be used for constructing the architecture distribution under FLOPs-constraints.
  • A accuracy predictor trained via scikit-learn, which takes a subnetwork configuration as input, and outputs its predicted accuracy on ImageNet.
    • Convert a subnetwork configuration to our accuracy predictor compatibale inputs:
        res = [cfg['resolution']]
        for k in ['width', 'depth', 'kernel_size', 'expand_ratio']:
            res += cfg[k]
        input = np.asarray(res).reshape((1, -1))
    

License

The majority of AttentiveNAS is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Once For All is licensed under the Apache 2.0 license.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more info.

Comments
  • Acc predictor

    Acc predictor

    The following is how to convert a subnetwork configuration to accuracy predictor compatibale inputs as u provide" res = [cfg['resolution']] for k in ['width', 'depth', 'kernel_size', 'expand_ratio']: res += cfg[k] input = np.asarray(res).reshape((1, -1))

    Does the order ['resolution', 'width', 'depth', 'kernel_size', 'expand_ratio'] matter?

    opened by Tongzhou0101 3
  • Accuracy Predictor

    Accuracy Predictor

    Hi, thanks for the great work! I have a question about the usage of the accuracy predictor.

    Specifically, a predictor is used to get the acc of sub-networks and rank them during training, as described in your paper. But in the code, I didn't find where the predictor is used, like here (https://github.com/facebookresearch/AttentiveNAS/blob/88ad92f82dc343a0e7d681f1fb9a8deeb45be928/train_attentive_nas.py#L291), the criterion(model(input)) is used the get the predicted acc instead.

    I am a little confused about this part, is there any important code I missed or any statement I misunderstood? Looking forward to your reply : )

    opened by minghaoBD 2
  • The actual way to get best/worst pareto models

    The actual way to get best/worst pareto models

    It seems that the accuracy predictor is not fed into the sampler. Instead of using accuracy predictor, the code shows the way to get the pareto models is to rank the computed losses of k models after running forward. I am confused that the way to get the best/worst pareto model during training is different from the details in the paper. Am I misunderstanding the paper or missing the details of the code?

    opened by RachelXu7 1
  • The supernet appears to be reinitialize during the training process

    The supernet appears to be reinitialize during the training process

    The supernet appears to be reinitialize during the training process. I have met this issue when I'm running the AlphaNet. The log is as follows: Example 1: [10/09 16:00:53]: Epoch: [4][ 50/312] Time 2.075 ( 2.485) Data 0.000 ( 0.273) Loss 4.9844e+00 (4.9407e+00) Acc@1 17.43 ( 16.29) Acc@5 37.01 ( 35.80) [10/09 16:01:15]: Epoch: [4][ 60/312] Time 2.258 ( 2.431) Data 0.000 ( 0.228) Loss 4.9118e+00 (4.9424e+00) Acc@1 15.50 ( 16.19) Acc@5 34.94 ( 35.68) [10/09 16:01:37]: Epoch: [4][ 70/312] Time 2.368 ( 2.400) Data 0.000 ( 0.196) Loss 6.8941e+00 (5.1301e+00) Acc@1 0.10 ( 14.50) Acc@5 0.81 ( 32.05) [10/09 16:01:59]: Epoch: [4][ 80/312] Time 1.940 ( 2.374) Data 0.000 ( 0.172) Loss 6.8695e+00 (5.3466e+00) Acc@1 0.10 ( 12.73) Acc@5 0.76 ( 28.20)

    Example 2: [10/11 08:46:30]: Epoch: [169][170/312] Time 2.279 ( 2.272) Data 0.000 ( 0.082) Loss 3.7633e+00 (3.6145e+00) Acc@1 41.94 ( 43.52) Acc@5 64.28 ( 67.07) [10/11 08:46:53]: Epoch: [169][180/312] Time 2.159 ( 2.270) Data 0.000 ( 0.077) Loss 3.7879e+00 (3.6247e+00) Acc@1 39.58 ( 43.30) Acc@5 63.65 ( 66.86) [10/11 08:47:15]: Epoch: [169][190/312] Time 2.206 ( 2.266) Data 0.000 ( 0.073) Loss 6.7652e+00 (3.6773e+00) Acc@1 0.22 ( 42.50) Acc@5 0.68 ( 65.76) [10/11 08:47:37]: Epoch: [169][200/312] Time 2.339 ( 2.262) Data 0.000 ( 0.069) Loss 6.8340e+00 (3.8188e+00) Acc@1 0.07 ( 40.39) Acc@5 0.44 ( 62.51)

    After re-initialization, the supernet gradually fits again if training continues. Is it because of the sandwich rule?

    opened by liwei9719 1
  • Codes about evolutionary search on the ImageNet validation set

    Codes about evolutionary search on the ImageNet validation set

    It seems that the codes about evolutionary search are not included in this repo, will you guys open source them?

    Also, the codes for generating the look-up table are also not included, it will be useful to have them:)

    opened by chenbohua3 1
  • Question about ImageNet dataset

    Question about ImageNet dataset

    Could you please clarify the exact ImageNet Dataset you have used for training and testing? Is it the original ILSVRC2012 from https://image-net.org/challenges/LSVRC/2012/2012-downloads.php or from somewhere else? Is it train and validation split or you have used test split as well? What was the number of images in train and validation dataset?

    Also have you made any preprocessing, e.g. resizing or you took the raw images?

    opened by marianpetruk 1
  •  pretrained AttentiveNAS models I downloaded is corrupt

    pretrained AttentiveNAS models I downloaded is corrupt

    pretrained AttentiveNAS models I downloaded from https://drive.google.com/file/d/1cCla-OQNIAn-rjsY2b832DuP59ZKr8uh/view?usp=sharing is corrupt.I don't know if I'm wrong. thanks

    opened by yangyang90 2
Owner
Facebook Research
Facebook Research
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
A code generator from ONNX to PyTorch code

onnx-pytorch Generating pytorch code from ONNX. Currently support onnx==1.9.0 and torch==1.8.1. Installation From PyPI pip install onnx-pytorch From

Wenhao Hu 94 Jan 6, 2023
This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

International Business Machines 72 Jan 6, 2023
Convert Python 3 code to CUDA code.

Py2CUDA Convert python code to CUDA. Usage To convert a python file say named py_file.py to CUDA, run python generate_cuda.py --file py_file.py --arch

Yuval Rosen 3 Jul 14, 2021
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Bayesian Methods Research Group 56 Nov 15, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Low-code/No-code approach for deep learning inference on devices

EzEdgeAI A concept project that uses a low-code/no-code approach to implement deep learning inference on devices. It provides a componentized framewor

On-Device AI Co., Ltd. 7 Apr 5, 2022
Code for all the Advent of Code'21 challenges mostly written in python

Advent of Code 21 Code for all the Advent of Code'21 challenges mostly written in python. They are not necessarily the best or fastest solutions but j

null 4 May 26, 2022
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

J K Terry 32 Nov 9, 2021
Opinionated code formatter, just like Python's black code formatter but for Beancount

beancount-black Opinionated code formatter, just like Python's black code formatter but for Beancount Try it out online here Features MIT licensed - b

Launch Platform 16 Oct 11, 2022
a delightful machine learning tool that allows you to train, test and use models without writing code

igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop

Nidhal Baccouri 3k Jan 5, 2023
Pytorch Lightning code guideline for conferences

Deep learning project seed Use this seed to start new deep learning / ML projects. Built in setup.py Built in requirements Examples with MNIST Badges

Pytorch Lightning 1k Jan 2, 2023
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

AutoViz and Auto_ViML 397 Dec 30, 2022
Code samples for my book "Neural Networks and Deep Learning"

Code samples for "Neural Networks and Deep Learning" This repository contains code samples for my book on "Neural Networks and Deep Learning". The cod

Michael Nielsen 13.9k Dec 26, 2022
Code for: https://berkeleyautomation.github.io/bags/

DeformableRavens Code for the paper Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks. Here is the

Daniel Seita 121 Dec 30, 2022
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

Facebook Research 68 Dec 29, 2022