BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

Changlin Li

Last update: Dec 26, 2022

Related tags

Deep Learning BossNAS

Overview

BossNAS

This repository contains PyTorch evaluation code, retraining code and pretrained models of our paper: BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search. [pdf link]

Illustration of the Siamese supernets training with ensemble bootstrapping.

Illustration of the fabric-like Hybrid CNN-transformer Search Space with flexible down-sampling positions.

Our Results and Trained Models

Here is a summary of our searched models:

Model	MAdds	Steptime	Top-1 (%)	Top-5 (%)	Url
BossNet-T0 w/o SE	3.4B	101ms	80.5	95.0	checkpoint
BossNet-T0	3.4B	115ms	80.8	95.2	checkpoint
BossNet-T0^	5.7B	147ms	81.6	95.6	same as above
BossNet-T1	7.9B	156ms	81.9	95.6	checkpoint
BossNet-T1^	10.5B	165ms	82.2	95.7	same as above

Here is a summary of architecture rating accuracy of our method:

Search space	Dataset	Kendall tau	Spearman rho	Pearson R
MBConv	ImageNet	0.65	0.78	0.85
NATS-Bench Ss	Cifar10	0.53	0.73	0.72
NATS-Bench Ss	Cifar100	0.59	0.76	0.79

Usage

1. Requirements

Install PyTorch 1.7.0+ and torchvision 0.8.1+, for example:

conda install -c pytorch pytorch torchvision

Install pytorch-image-models 0.3.2, for example:

pip install timm==0.3.2

Download the ImageNet dataset from http://image-net.org/, and move validation images to labeled subfolders
- To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.shvalprep.sh

2. Retrain or Evaluate our BossNet-T models

First, move to retraining code directory to perform Retraining or Evaluation.
```
cd HyTra_retraining
```
Our retraining code of BossNet-T is based on DeiT repository.

You can evaluate our BossNet-T models with the following command:

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model bossnet_T0 --input-size 224 --batch-size 128 --data-path /PATH/TO/ImageNet --num_workers 8 --eval --resume PATH/TO/BossNet-T0-80_8.pth

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model bossnet_T1 --input-size 224 --batch-size 128 --data-path /PATH/TO/ImageNet --num_workers 8 --eval --resume PATH/TO/BossNet-T1-81_9.pth

Please download our checkpoint files from the result table. Please change the --nproc_per_node option to suit your GPU numbers, and change the --data-path, --resume and --input-size accordingly.

You can retrain our BossNet-T models with the following command:

Please change the --nproc_per_node and --data-path accordingly. Note that the learning rate will be automatically scaled according to the GPU numbers and batchsize. We recommend training with 128 batchsize and 8 GPUs.

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model bossnet_T0 --input-size 224 --batch-size 128 --data-path /PATH/TO/ImageNet --num_workers 8

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model bossnet_T1 --input-size 224 --batch-size 128 --data-path /PATH/TO/ImageNet --num_workers 8

Architecture of our BossNet-T0

3. Evaluate architecture rating accuracy of BossNAS

You can get the ranking correlations of BossNAS on MBConv search space with the following commands:
```
cd MBConv_ranking
python get_model_score_mbconv.py
```

You can get the ranking correlations of BossNAS on NATS-Bench Ss with the following commands:
```
cd NATS_SS_ranking
python get_model_score_nats.py
```

Citation

@article{li2021bossnas,
  author = {Li, Changlin and
            Tang, Tao and
            Wang, Guangrun and
            Peng, Jiefeng and
            Wang, Bing and
            Liang, Xiaodan and
            Chang, Xiaojun},
  title = {BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search},
  journal = {arXiv:2103.12424},
  year = 2021,
}

TODO

Searching code will be released later.

Comments

How to select architectures from the trained supernet?

Hi, thanks for your great work!

I tried using your given searching code for training the supernet. But I did not figure out how to search the potential architectures from such a supernet?

I guess the validation hook serves as such functions, but I did not find the saved path information after training one epoch. Are there other files I need to explore or just waiting for more epochs to be trained?

Could you advise me about that, thanks in advance for your time and help!

Best, Haoran

opened by ranery 7
imagenet ACC1 is low (49.6%) when evaluate BossNet-T0-80_8.pth
Hello! I try to reproduce your model,but when I evaluate the pretrained model(BossNet-T0-80_8.pth),the ACC1 is too low! Did i miss something? Can you help me?

The run command as follows: root@v-dev-11135821-66b7bdd9f5-l9rlv:/data/juicefs_hz_cv_v3/11135821/bak/BossNAS/retraining_hytra# python main.py --model bossnet_T0 --input-size 224 --batch-size 128 --eval --resume /data/juicefs_hz_cv_v3/11135821/bak/model/BossNet-T0-80_8.pth Not using distributed mode Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=128, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/data/glusterfs_cv_04/public_data/imagenet/CLS-LOC/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=True, inat_category='name', input_size=224, local_rank=0, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='bossnet_T0', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='output/bossnet_T0-20210804-163815', patience_epochs=10, pin_mem=True, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='/data/juicefs_hz_cv_v3/11135821/bak/model/BossNet-T0-80_8.pth', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1) Creating model: bossnet_T0 number of params: 38415960 Test: [ 0/261] eta: 0:39:36 loss: 1.9650 (1.9650) acc1: 68.2292 (68.2292) acc5: 90.1042 (90.1042) time: 9.1044 data: 4.8654 max mem: 5605 Test: [ 50/261] eta: 0:01:38 loss: 3.2916 (3.0472) acc1: 41.6667 (47.3039) acc5: 65.1042 (70.5372) time: 0.2928 data: 0.0004 max mem: 5605 Test: [100/261] eta: 0:01:01 loss: 2.9675 (3.1048) acc1: 46.8750 (45.7921) acc5: 69.2708 (69.9722) time: 0.2953 data: 0.0003 max mem: 5605 Test: [150/261] eta: 0:00:39 loss: 2.4230 (2.9457) acc1: 55.2083 (47.8960) acc5: 75.5208 (71.5370) time: 0.2989 data: 0.0003 max mem: 5605 Test: [200/261] eta: 0:00:20 loss: 2.6540 (2.9105) acc1: 48.4375 (48.1913) acc5: 68.7500 (71.3490) time: 0.3023 data: 0.0002 max mem: 5605 Test: [250/261] eta: 0:00:03 loss: 1.6506 (2.8344) acc1: 61.9792 (49.0310) acc5: 83.8542 (72.0431) time: 0.3036 data: 0.0003 max mem: 5605 Test: [260/261] eta: 0:00:00 loss: 1.6135 (2.8001) acc1: 65.6250 (49.5740) acc5: 86.4583 (72.5500) time: 0.3888 data: 0.0001 max mem: 5605 Test: Total time: 0:01:28 (0.3397 s / it)

Acc@1 49.574 Acc@5 72.550 loss 2.800 Accuracy of the network on the 50000 test images: 49.6%
opened by fanliaveline 4
Some questions about the code and paper.
Hi, great work!

I have some question about the code and paper:

In section 3.3 of the paper which is about the searching phase, when calculating the evaluation loss in equation(5) and (6) the probability ensemble of the architecture population is from the online network, but in the code it's from the target network, which makes me confused.

Still in section 3.3, it is mentioned that the searching are with an evolutionary algorithm, I read the references[12] and [54] but still have no clue how the evolutionary algorithm is implemented in the code, to be specific, how the architecture population is evolved?

In the code of hytra_supernet.py, the stage depths are set to [4,3,2,2], is there a particular reason to set so? why not use [4,4,4,4] so that all possible pathes can be chosen?

Thanks a lot for your time and I'm looking forward to your reply!
opened by zhy0860 2
About formulation (1) and (6)

Hi, very thanks for sharing your nice work. In the paper's formulation (1) and (6)， all has λ_k. But it seems to be no explaination about them. Could you please point it out here.

opened by NickChang97 2
Some questions about BossNAS
Hi, thanks for your excellent work~

It is inspiring and practical for improving the sub-net ranking correlations. But I have a few questions.

Although it is beneficial to upgrade the Ranking correlation on each small stage by progressively searching, will it lead to the accumulation of error? The best path of the previous stage maybe not suitable for the following. How could explain it?

Why is the ResAttn operator only searched on depth=1/2?

On the hybrid search space, ensemble different resolutions output is weird, since it discards the structure information by adaptive pooling, so I don't know why it can be suitable.

As shown in Table~4, Unsupv. EB is better than Supv. class. Do you have a theoretical explanation about it?
opened by huizhang0110 2

Question about Ranking nats

after run code:

cd ranking_nats
python get_model_score_nats.py

I got:

kendall tau begin
BossNAS: KendalltauResult(correlation=-0.534180602248828, pvalue=0.0)
(-0.7180607093955225, 0.0)
SpearmanrResult(correlation=-0.7341493538551311, pvalue=0.0)

opened by pprp 2

How to obtain the searched model

I used the searching code for a small number of epochs, can you share where exactly is the best model architecture stored when any custom NAS is performed? the pth files are saved in work_dir but im not sure where the corresponding architecture is stored so I can use a custom generated model together with these weights ?

opened by hamdjalil 3
Question about the code of searching second Hytra block.
Hi, appreciate it for your time.

I find an issue in the code of Hytra search phase. When searching for the second block, after the first evaluation the chosen best path of the second block will be appended after the best path of the first block, then the training process is conducted in a three block structure.

Detailed codes are as follows: (val_hook.py) if self.every_n_epochs(runner, block_inteval): best_path = results[0][0] best_path = [int(i) for i in list(best_path)]

if len(model.best_paths) == model.start_block + 1: model.best_paths.pop() model.best_paths.append(best_path)

(siamese_supernets_hytra.py ) if self.start_block > 0: for i, best_path in enumerate(self.best_paths): img_v1 = self.online_backbone(img_v1, start_block=i, forward_op=best_path, block_op=True)[0] img_v2 = self.online_backbone(img_v2, start_block=i, forward_op=best_path, block_op=True)[0]

In other word, the searching is not continued afte a frozen best path of previous block, but with two, the best path of the current block chosen by each evaluation stage is also freezed and appended, it means the path of second block will appear twice during searching. I can't understand why doing so.

It will lead to an issue that if the downsampling is used in the freezed best path of the second block. For instance, suppose the spatial resolution has already reached the smallest scale 1/32 in the freezed previous best path of second block, when continuing searcing if the downsampling is occured again in the current path of second block, there will be an error of mismatch in shape. It makes me confused and we did encounter this problem in our implemetation.

I'm sorry if I haven't described the issue clearly. Thanks a lot for your time again and I'm looking forward to your reply.
bug
opened by zhy0860 2
Running HyTra search on CIFAR10 ( or potentially on other datasets...)

I have some doubts on how to search on HyTra with datasets different from Imagenet. Is it possible? I tried to run the search on CIFAR10 but it gives me this error with a HyTracifar10 config file: RuntimeError: Given groups=1, weight of size [256, 1024, 1, 1], expected input[256, 512, 4, 4] to have 1024 channels, but got 512 channels instead. This is the config file I created for this purpose. Since the configs file were not so clear to me (my fault) I simply tried to mix the NATScifar10 config file and the HytraImagenet config file to obtain a HyTracifar10 version. The model and the dataset seem to be created/loaded correctly, I think there is a kind of mismatch on size. Up to now, I'm trying to run only on CIFAR10 but my intention is to generalize the process on different datasets (not only the most famous ones). I would like to know if this generalization can be already obtained with your code (or with slight modifies) or if the work was supposed to run only on the main datasets.

HytraCifar10 config

import copy base = 'base.py'

model = dict( type='SiameseSupernetsHyTra', pretrained=None, base_momentum=0.99, pre_conv=True, backbone=dict( type='SupernetHyTra', ), start_block=0, num_block=4, neck=dict( type='NonLinearNeckSimCLRProject', in_channels=2048, hid_channels=4096, out_channels=256, num_layers=2, sync_bn=False, with_bias=True, with_last_bn=False, with_avg_pool=True), head=dict(type='LatentPredictHead', size_average=True, predictor=dict(type='NonLinearNeckSimCLR', in_channels=256, hid_channels=4096, out_channels=256, num_layers=2, sync_bn=False, with_bias=True, with_last_bn=False, with_avg_pool=False)))

dataset settings

data_source_cfg = dict(type='NATSCifar10', root='../data/cifar/', return_label=False) train_dataset_type = 'BYOLDataset' test_dataset_type = 'StoragedBYOLDataset' img_norm_cfg = dict(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.201]) train_pipeline = [ dict(type='RandomCrop', size=32, padding=4), dict(type='RandomHorizontalFlip'), ]

prefetch

prefetch = False if not prefetch: train_pipeline.extend([dict(type='ToTensor'), dict(type='Normalize', **img_norm_cfg)]) train_pipeline1 = copy.deepcopy(train_pipeline) train_pipeline2 = copy.deepcopy(train_pipeline)

test_pipeline1 = copy.deepcopy(train_pipeline1) test_pipeline2 = copy.deepcopy(train_pipeline2) data = dict( imgs_per_gpu=256, # total 256*4(gpu)*4(interval)=4096 workers_per_gpu=2, train=dict( type=train_dataset_type, data_source=dict(split='train', **data_source_cfg), pipeline1=train_pipeline1, pipeline2=train_pipeline2), val=dict( type=test_dataset_type, data_source=dict(split='test', **data_source_cfg), pipeline1=test_pipeline1, pipeline2=test_pipeline2,), test=dict( type=test_dataset_type, data_source=dict(split='test', **data_source_cfg), pipeline1=test_pipeline1, pipeline2=test_pipeline2,))

optimizer

optimizer = dict(type='LARS', lr=4.8, weight_decay=0.000001, momentum=0.9, paramwise_options={ '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay=0., lars_exclude=True), 'bias': dict(weight_decay=0., lars_exclude=True)})

apex

use_fp16 = True

interval for accumulate gradient

update_interval = 8 optimizer_config = dict(update_interval=update_interval, use_fp16=use_fp16)

learning policy

lr_config = dict( policy='CosineAnnealing', min_lr=0., warmup='linear', warmup_iters=1, warmup_ratio=0.0001, # cannot be 0 warmup_by_epoch=True) checkpoint_config = dict(interval=1)

runtime settings

total_epochs = 24

additional hooks

custom_hooks = [ dict(type='BYOLHook', end_momentum=1., update_interval=update_interval), dict(type='RandomPathHook'), dict( type='ValBestPathHook', dataset=data['val'], bn_dataset=data['train'], initial=True, interval=2, optimizer_cfg=optimizer, lr_cfg=lr_config, imgs_per_gpu=256, workers_per_gpu=4, epoch_per_stage=6, resume_best_path='') # e.g. 'path_rank/bestpath_2.yml' ]

resume_from = 'checkpoints/stage3_epoch3.pth'

resume_optimizer = False

cudnn_benchmark = True

opened by matteogambella 0
Pre-trained supernet weights release

Hi,

Your approach is very impressive; I was wondering if you're planning to release the weights of the supernets you trained? (I'm specifically interested in the HyTra supernet)

opened by AwesomeLemon 2

Releases(v0.1)

v0.1(Mar 23, 2021)

Release for BossNet-T0 w/o SE, BossNet-T0 and BossNet-T1 checkpoint files.
Source code(tar.gz)
Source code(zip)
BossNet-M1-76_2.pth(75.93 MB)
BossNet-M2-77_4.pth(71.36 MB)
BossNet-T0-80_8.pth(587.19 MB)
BossNet-T0-nose-80_5.pth(570.42 MB)
BossNet-T1-81_9.pth(587.20 MB)

Owner

Changlin Li

GitHub

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

39 Dec 17, 2022

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

8 Nov 2, 2022

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

291 Nov 18, 2022

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

5 Dec 13, 2021

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction Requirements The code has been tested running under Python 3.7.4, with the foll

84 Jan 1, 2023

DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

What is DeepHyper? DeepHyper is a software package that uses learning, optimization, and parallel computing to automate the design and development of

214 Jan 8, 2023

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

92 Nov 30, 2022

Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

3.2k Dec 31, 2022

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

5 Nov 3, 2022

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 1, 2023

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

139 Jan 1, 2023

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

156 Nov 28, 2022

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

Related tags

Overview

BossNAS

Our Results and Trained Models

Usage

1. Requirements

2. Retrain or Evaluate our BossNet-T models

3. Evaluate architecture rating accuracy of BossNAS

Citation

TODO

Comments

HytraCifar10 config

dataset settings

prefetch

optimizer

apex

interval for accumulate gradient

learning policy

runtime settings

additional hooks

resume_from = 'checkpoints/stage3_epoch3.pth'

resume_optimizer = False

Releases(v0.1)

v0.1(Mar 23, 2021)

Owner

Changlin Li

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Model search is a framework that implements AutoML algorithms for model architecture search at scale

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

Deep Multimodal Neural Architecture Search

Official implementation of Rethinking Graph Neural Architecture Search from Message-passing (CVPR2021)

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."