An experimental technique for efficiently exploring neural architectures.

Overview

SMASH: One-Shot Model Architecture Search through HyperNetworks

An experimental technique for efficiently exploring neural architectures.

SMASHGIF

This repository contains code for the SMASH paper and video.

SMASH bypasses the need for fully training candidate models by learning an auxiliary HyperNet to approximate model weights, allowing for rapid comparison of a wide range of network architectures at the cost of a single training run.

Installation

To run this script, you will need PyTorch and a CUDA-capable GPU. If you wish to run it on CPU, just remove all the .cuda() calls.

Note that this code was written in PyTorch 0.12, and is not guaranteed to work on 0.2 until next week when I get a chance to update my own version. Please also be aware that, while thoroughly commented, this is research code for a heckishly complex project. I'll be doing more cleanup work to improve legibility soon.

Running

To run with default parameters, simply call

python train.py

This will by default train a SMASH net with nominally the same parametric budget as a WRN-40-4. Note that validation scores during training are calculated using a random architecture for each batch, and are therefore sort of an "average" measure.

After training, to sample and evaluate SMASH scores, call

python eval.py --SMASH=YOUR_MODEL_NAME_HERE_.pth

This will by default sample 500 random architectures, then perturb the best-found architecture 100 times, then employ a sort of Markov Chain to further perturb the best found architecture.

To select the best architecture and train a resulting net, then call

python train.py --SMASH=YOUR_MODEL_NAME_HERE_archs.npz

This will by default take the best architectuure There are lots of different options, including a number of experimental settings such as architectural gradient descent by proxy, in-op multiplicative gating, variable nonlinearities, setting specific op configuration types. Take a look at the train_parser in utils.py for details, though note that some of these weirder ones may be deprecated.

This code has boilerplate for loading Imagenet32x32 and ModelNet, but doesn't download or preprocess them on its own. It supports model parallelism on a single node, and half-precision training, though simple weightnorm is unstable in FP16 so you probably can't train a SMASH network with it.

Notes

This README doc is in very early stages, and will be updated soon.

Acknowledgments

Comments
  • Typo? ESL = EML?

    Typo? ESL = EML?

    Is this line a typo?

    https://github.com/ajbrock/SMASH/blob/72e26ac761fb3b9c815cd5c0137fd54f7b310029/layers.py#L54

    It seems to me that the EML should multiply right?

    opened by jesseengel 2
  • Search space quantization

    Search space quantization

    Thanks for your great work and kind sharing.

    I am interested in the search space scale of your work, also known as the network options during search. Have you ever calculated that? For the reason that, you apply a brand-new search space representation and I am afraid mistake it by misunderstanding.

    Thanks in advance.

    opened by rrryan2016 0
  • Accuracy only about 27% on CIFAR-100

    Accuracy only about 27% on CIFAR-100

    Hi,

    I like the idea very much, but I am wondering how this work is going since there have been no updates for 4 months.

    I ran the 3 instructions twice mentioned in the readme, and achieved only 27.2% and 26.5% val error on CIFAR-100, substantially higher than the paper.

    Is this because the default setting is incorrect, or the variance is huge due to randomness?

    Thank you!

    opened by songzhaozhe 1
  • output = net(input,w,*arch)

    output = net(input,w,*arch)

    Hi, I changed the input data to search face recognition network, the input size is (3,112,96),when i run the output = net(input,w,*arch) of train_fn() function in the code of train.py it has displayed that:

    RuntimeError: Given input size: (256x28x24). Calculated output size: (256x1x0). Output size is too small at /opt/conda/conda-bld/pytorch_1502001039157/work/torch/lib/THCUNN/generic/SpatialAveragePooling.cu:63 image

    I know i should modified the input size of the SMASH network,but I can not find it, How should I change it? Thank you.

    opened by Devy001 1
  • Output format

    Output format

    I ran the code, and just want to be clear that I'm understanding the output format.

    $ python train.py --which-dataset C10
    $ python evaluate.py --SMASH=SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs --which-dataset C10
    $ python train.py --SMASH SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs --which-dataset C10
    $ tail -n 4 logs/SMASH_Main_SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs_Rank0_C10_seed0_100epochs_log.jsonl
    {"epoch": 98, "train_loss": 0.001096960324814265, "_stamp": 1504033169.525098, "train_err": 0.015555555555555555}
    {"epoch": 98, "val_loss": 0.2705254354281351, "_stamp": 1504033174.813815, "val_err": 5.84}
    {"epoch": 99, "train_loss": 0.0011473518449727433, "_stamp": 1504033324.084391, "train_err": 0.011111111111111112}
    {"epoch": 99, "val_loss": 0.2725760878948495, "_stamp": 1504033329.318958, "val_err": 5.8}
    

    I figure the 5.8 in the last line indicate that I've wound up w/ a trained model that gets 5.8% error on CIFAR-10 -- is that right? Which number does the 5.8 correspond to in Table 1 in the paper -- SmashV1=5.53 or SmashV2=4.03 or something else? I'm in the process of working through the code, but to double check that I understand the inputs/outputs.

    Thanks Ben

    opened by bkj 1
Owner
Andy Brock
Dimensionality Diabolist
Andy Brock
Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Ceaser-Cipher The Caesar Cipher technique is one of the earliest and simplest me

Lateefah Ajadi 2 May 12, 2022
Peek-a-Boo: What (More) is Disguised in a Randomly Weighted Neural Network, and How to Find It Efficiently

Peek-a-Boo: What (More) is Disguised in a Randomly Weighted Neural Network, and How to Find It Efficiently This repository is the official implementat

VITA 4 Dec 20, 2022
This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

Brycen Westgarth 110 Jan 7, 2023
Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

AngularGrad Optimizer This repository contains the oficial implementation for AngularGrad: A New Optimization Technique for Angular Convergence of Con

mario 124 Sep 16, 2022
Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

Yuge Zhang 6 Sep 7, 2022
A ultra-lightweight 3D renderer of the Tensorflow/Keras neural network architectures

A ultra-lightweight 3D renderer of the Tensorflow/Keras neural network architectures

Souvik Pratiher 16 Nov 17, 2021
Learning Versatile Neural Architectures by Propagating Network Codes

Learning Versatile Neural Architectures by Propagating Network Codes Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang,

Mingyu Ding 36 Dec 6, 2022
Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

AdvRush Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21) Environmental Set-up Python == 3.6.12, PyTorch =

null 11 Dec 10, 2022
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Jan 2, 2023
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.6k Feb 18, 2021
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

BossNAS This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transforme

Changlin Li 127 Dec 26, 2022
Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

RandWireNN Unofficial PyTorch Implementation of: Exploring Randomly Wired Neural Networks for Image Recognition. Results Validation result on Imagenet

Seung-won Park 684 Nov 2, 2022
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Dec 31, 2022
[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks Code for NeurIPS 2021 Paper "Exploring Architectural Ingredients of A

Hanxun Huang 26 Dec 1, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Dec 31, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Jan 6, 2023
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.3k Feb 12, 2021
A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

sam.pytorch A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization ( Foret+2020) Paper, Official implementa

Ryuichiro Hataya 102 Dec 28, 2022
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.

BitPack is a practical tool that can efficiently save quantized neural network models with mixed bitwidth.

Zhen Dong 36 Dec 2, 2022
Efficiently computes derivatives of numpy code.

Note: Autograd is still being maintained but is no longer actively developed. The main developers (Dougal Maclaurin, David Duvenaud, Matt Johnson, and

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton 6.1k Jan 8, 2023