An experimental technique for efficiently exploring neural architectures.

Related tags

Deep Learning SMASH
Overview

SMASH: One-Shot Model Architecture Search through HyperNetworks

An experimental technique for efficiently exploring neural architectures.

SMASHGIF

This repository contains code for the SMASH paper and video.

SMASH bypasses the need for fully training candidate models by learning an auxiliary HyperNet to approximate model weights, allowing for rapid comparison of a wide range of network architectures at the cost of a single training run.

Installation

To run this script, you will need PyTorch and a CUDA-capable GPU. If you wish to run it on CPU, just remove all the .cuda() calls.

Note that this code was written in PyTorch 0.12, and is not guaranteed to work on 0.2 until next week when I get a chance to update my own version. Please also be aware that, while thoroughly commented, this is research code for a heckishly complex project. I'll be doing more cleanup work to improve legibility soon.

Running

To run with default parameters, simply call

python train.py

This will by default train a SMASH net with nominally the same parametric budget as a WRN-40-4. Note that validation scores during training are calculated using a random architecture for each batch, and are therefore sort of an "average" measure.

After training, to sample and evaluate SMASH scores, call

python eval.py --SMASH=YOUR_MODEL_NAME_HERE_.pth

This will by default sample 500 random architectures, then perturb the best-found architecture 100 times, then employ a sort of Markov Chain to further perturb the best found architecture.

To select the best architecture and train a resulting net, then call

python train.py --SMASH=YOUR_MODEL_NAME_HERE_archs.npz

This will by default take the best architectuure There are lots of different options, including a number of experimental settings such as architectural gradient descent by proxy, in-op multiplicative gating, variable nonlinearities, setting specific op configuration types. Take a look at the train_parser in utils.py for details, though note that some of these weirder ones may be deprecated.

This code has boilerplate for loading Imagenet32x32 and ModelNet, but doesn't download or preprocess them on its own. It supports model parallelism on a single node, and half-precision training, though simple weightnorm is unstable in FP16 so you probably can't train a SMASH network with it.

Notes

This README doc is in very early stages, and will be updated soon.

Acknowledgments

Comments
  • Typo? ESL = EML?

    Typo? ESL = EML?

    Is this line a typo?

    https://github.com/ajbrock/SMASH/blob/72e26ac761fb3b9c815cd5c0137fd54f7b310029/layers.py#L54

    It seems to me that the EML should multiply right?

    opened by jesseengel 2
  • Search space quantization

    Search space quantization

    Thanks for your great work and kind sharing.

    I am interested in the search space scale of your work, also known as the network options during search. Have you ever calculated that? For the reason that, you apply a brand-new search space representation and I am afraid mistake it by misunderstanding.

    Thanks in advance.

    opened by rrryan2016 0
  • Accuracy only about 27% on CIFAR-100

    Accuracy only about 27% on CIFAR-100

    Hi,

    I like the idea very much, but I am wondering how this work is going since there have been no updates for 4 months.

    I ran the 3 instructions twice mentioned in the readme, and achieved only 27.2% and 26.5% val error on CIFAR-100, substantially higher than the paper.

    Is this because the default setting is incorrect, or the variance is huge due to randomness?

    Thank you!

    opened by songzhaozhe 1
  • output = net(input,w,*arch)

    output = net(input,w,*arch)

    Hi, I changed the input data to search face recognition network, the input size is (3,112,96),when i run the output = net(input,w,*arch) of train_fn() function in the code of train.py it has displayed that:

    RuntimeError: Given input size: (256x28x24). Calculated output size: (256x1x0). Output size is too small at /opt/conda/conda-bld/pytorch_1502001039157/work/torch/lib/THCUNN/generic/SpatialAveragePooling.cu:63 image

    I know i should modified the input size of the SMASH network,but I can not find it, How should I change it? Thank you.

    opened by Devy001 1
  • Output format

    Output format

    I ran the code, and just want to be clear that I'm understanding the output format.

    $ python train.py --which-dataset C10
    $ python evaluate.py --SMASH=SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs --which-dataset C10
    $ python train.py --SMASH SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs --which-dataset C10
    $ tail -n 4 logs/SMASH_Main_SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs_Rank0_C10_seed0_100epochs_log.jsonl
    {"epoch": 98, "train_loss": 0.001096960324814265, "_stamp": 1504033169.525098, "train_err": 0.015555555555555555}
    {"epoch": 98, "val_loss": 0.2705254354281351, "_stamp": 1504033174.813815, "val_err": 5.84}
    {"epoch": 99, "train_loss": 0.0011473518449727433, "_stamp": 1504033324.084391, "train_err": 0.011111111111111112}
    {"epoch": 99, "val_loss": 0.2725760878948495, "_stamp": 1504033329.318958, "val_err": 5.8}
    

    I figure the 5.8 in the last line indicate that I've wound up w/ a trained model that gets 5.8% error on CIFAR-10 -- is that right? Which number does the 5.8 correspond to in Table 1 in the paper -- SmashV1=5.53 or SmashV2=4.03 or something else? I'm in the process of working through the code, but to double check that I understand the inputs/outputs.

    Thanks Ben

    opened by bkj 1
Owner
Andy Brock
Dimensionality Diabolist
Andy Brock
Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

AngularGrad Optimizer This repository contains the oficial implementation for AngularGrad: A New Optimization Technique for Angular Convergence of Con

mario 124 Sep 16, 2022
Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

Yuge Zhang 6 Sep 7, 2022
Learning Versatile Neural Architectures by Propagating Network Codes

Learning Versatile Neural Architectures by Propagating Network Codes Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang,

Mingyu Ding 36 Dec 6, 2022
Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

AdvRush Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21) Environmental Set-up Python == 3.6.12, PyTorch =

null 11 Dec 10, 2022
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

BossNAS This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transforme

Changlin Li 127 Dec 26, 2022
Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

RandWireNN Unofficial PyTorch Implementation of: Exploring Randomly Wired Neural Networks for Image Recognition. Results Validation result on Imagenet

Seung-won Park 684 Nov 2, 2022
[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks Code for NeurIPS 2021 Paper "Exploring Architectural Ingredients of A

Hanxun Huang 26 Dec 1, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Dec 31, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Jan 6, 2023
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.3k Feb 12, 2021
A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

sam.pytorch A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization ( Foret+2020) Paper, Official implementa

Ryuichiro Hataya 102 Dec 28, 2022
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.

BitPack is a practical tool that can efficiently save quantized neural network models with mixed bitwidth.

Zhen Dong 36 Dec 2, 2022
Efficiently computes derivatives of numpy code.

Note: Autograd is still being maintained but is no longer actively developed. The main developers (Dougal Maclaurin, David Duvenaud, Matt Johnson, and

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton 6.1k Jan 8, 2023
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

CoTr: Efficient 3D Medical Image Segmentation by bridging CNN and Transformer This is the official pytorch implementation of the CoTr: Paper: CoTr: Ef

null 218 Dec 25, 2022
Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets This is the official implementation of "Towards Good Pract

Sanja Fidler's Lab 52 Nov 22, 2022
Sharpness-Aware Minimization for Efficiently Improving Generalization

Sharpness-Aware-Minimization-TensorFlow This repository provides a minimal implementation of sharpness-aware minimization (SAM) (Sharpness-Aware Minim

Sayak Paul 54 Dec 8, 2022
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

null 93 Nov 6, 2022
Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Trash-Sorter-Extraordinaire Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash

Rameen Mahmood 1 Nov 7, 2021
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022