Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

Overview

DNA

This repository provides the code of our paper: Blockwisely Supervised Neural Architecture Search with Knowledge Distillation.

Illustration of DNA. Each cell of the supernet is trained independently to mimic the behavior of the corresponding teacher block.

Comparison of model ranking for DNA vs. DARTS, SPOS and MnasNet under two different hyper-parameters.

Our Trained Models

Usage

1. Requirements

2. Searching

The code for supernet training, evaluation and searching is under searching directory.

  • cd searching

i) Train & evaluate the block-wise supernet with knowledge distillation

  • Modify datadir in initialize/data.yaml to your ImageNet path.
  • Modify nproc_per_node in dist_train.sh to suit your GPU number. The default batch size is 64 for 8 GPUs, you can change batch size and learning rate in initialize/train_pipeline.yaml
  • By default, the supernet will be trained sequentially from stage 1 to stage 6 and evaluate after each stage. This will take about 2 days on 8 GPUs with EfficientNet B7 being the teacher. Resuming from checkpoints is supported. You can also change start_stage in initialize/train_pipeline.yaml to force start from a intermediate stage without loading checkpoint.
  • sh dist_train.sh

ii) Search for the best architecture under constraint.

Our traversal search can handle a search space with 6 ops in each layer, 6 layers in each stage, 6 stages in total. A search process like this should finish in half an hour with a single cpu. To perform search over a larger search space, you can manually divide the search space or use other search algorithms such as Evolution Algorithms to process our evaluated architecture potential files.

  • Copy the path of architecture potential files generated in step i) to potential_yaml in process_potential.py. Modify the constraint in process_potential.py.
  • python process_potential.py

3. Retraining

The retraining code is simplified from the repo: pytorch-image-models and is under retraining directory.

  • cd retraining

  • Retrain our models or your searched models

    • Modify the run_example.sh: change data path and hyper-params according to your requirements
    • Add your searched model architecture to model.py. You can also use our searched and predefined DNA models.
    • sh run_example.sh
  • You can evaluate our models with the following command:
    python validate.py PATH/TO/ImageNet/validation --model DNA_a --checkpoint PATH/TO/model.pth.tar

    • PATH/TO/ImageNet/validation should be replaced by your validation data path.
    • --model : DNA_a can be replaced by DNA_b, DNA_c, DNA_d for our different models.
    • --checkpoint : Suggest the path of your downloaded checkpoint here.
Comments
  • The model after searching for the best architecture under constraint

    The model after searching for the best architecture under constraint

    Hi @changlin31 , thank you for your great work! I really enjoyed your paper. I want to ask you a question regarding the checkpoint of models under constraint.

    • After running the ii) Search for the best architecture under constraint, I only get the best model architecture w/o saving this trained single best model under constraint. If I want to test this best model architecture, should I load the student supernet from the checkpoint in step (i) and use the encoding code to perform this subnetwork in the supernet? Since we have already trained these blocks w/ KD in the previous step (i), I think I can get the trained model directly in the step (ii) and get something like a single model checkpoint like DNA_a, DNA_b, etc?

    • Can you point out how to setup the constraint to get these DNA_a ~ DNA_d?

    Thank you in advance!

    opened by KelvinYang0320 6
  • imagenet top1 acc 无法达到论文精度

    imagenet top1 acc 无法达到论文精度

    训练方法:
    --lr=0.05
    --n_gpu=4
    --batch_size=256
    --n_worker=32
    --lr_type=cos
    --n_epoch=150
    --wd=4e-5
    --seed=2018
    --optim=SGD

        # preprocessing
        input_size = 224
        imagenet_tran_train = [
            transforms.RandomResizedCrop(input_size, scale=(0.2, 1.0)),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
        ]
        imagenet_tran_test = [
            transforms.Resize(int(input_size / 0.875)),
            #transforms.Resize([256,256]),
            transforms.CenterCrop(input_size),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            #transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),         
        ]
    

    结果: acc/train_top1:80 acc/test_top1:74 能看一下是什么原因吗?或者训练代码或者细节能够开源吗?谢谢

    opened by betterhalfwzm 6
  • Doubts about the retraining accuracy

    Doubts about the retraining accuracy

    Thanks for your nice work and released code. We have tried the retraining part on ImageNet and here are some questions.

    1. When retraining the searched model under your default training settings (8gpus in one machine), we get the accuracy below: DNA_a: 76.31000003662109 (epoch 496) DNA_b: 76.63600003417969 (epoch 483) DNA_c: 77.20800003662109 (epoch 474) DNA_d: 77.7220000366211 (epoch 433)

    2. We read the issue in #10 , and change the training settings. Specifically, we use 32 gpus, the batchsize=128 and lr=0.256, and optimize the network at each step.
      (--nproc_per_node=8 --nnodes=4) --model ${model_name} --epochs 500 --warmup-epochs 5 --batch-size 128 --lr 0.256 --opt rmsproptf --opt-eps 0.001 \ --sched step --decay-epochs 3 --decay-rate 0.963 --color-jitter 0.06 --drop 0.2 -j 8 --num-classes 1000 --model-ema \ We only retrained the DNA_a to check the accuracy. However, we only get a worse result: DNA_a best metric=76.08965138479905 (epoch 486).

    Could you please help me find out why this difference? Great thanks and best wishes.

    good first issue 
    opened by ShunLu91 5
  • How to deal with dimension mismatch?

    How to deal with dimension mismatch?

    Firstly, thanks for your nice work! However, the dimension mismatch occurs when the student model blocks' output is changing but the output dimension of teacher model is fixed and I haven't found more details about this issue in your paper. Especially the channel, did you just take the first N channels directly? So could you please tell me how to deal with it?

    opened by ShunLu91 4
  • Errors in both single-GPU and multi-GPU searching

    Errors in both single-GPU and multi-GPU searching

    Hi,

    I followed the steps in the README but saw errors during searching using either single-GPU or multi-GPU boxes.

    Have you encountered these issues before or have any idea how to fix them? TIA.

    • single-GPU: I modified set --nproc_per_node=1. The searching started as expected but couldn't finish stage 0. The error message is as follows:
    
    12/28 07:27:02 AM WORLD_SIZE in os.environ is 1
    12/28 07:27:02 AM Namespace(amp=False, batch_size=64, color_jitter=0.4, cooldown_epochs=0, data_config=None, datadir='/home/ubuntu/workspace/datasets/ILSVRC2012/', dataset='imagenet', decay_epochs=1, decay_rate=0.9, distill_last_stage=True, distributed=False, eval_intervals=2, eval_metric='prec1', eval_mode=False, exp_dir='', feature_train=True, guide_input=True, guide_loss_fn='mse', hyperparam_config=None, img_size=224, index='', init_classifier=False, interpolation='', label_train=False, local_rank=0, log_interval=50, loss_weight=[0.5, 0.5], lr=[0.002, 0.005, 0.005, 0.005, 0.005, 0.002], mean=None, min_lr=1e-08, mixup=0.0, mixup_off_epoch=0, model_ema=False, model_ema_decay=0.9998, model_ema_force_cpu=False, model_pool='', momentum=0.9, num_classes=1000, num_gpu=1, opt='adam', opt_eps=1e-08, output='', potential_eval_times=20, prefetcher=True, pretrain=False, print_detail=True, recovery_interval=0, remode='pixel', reprob=0.5, reset_after_stage=False, reset_bn_eval=True, resume='', reverse_train=False, save_images=False, save_last_feature=True, sched='step', seed=42, separate_train=False, smoothing=0.1, stage_num=6, start_epoch=None, start_stage=None, std=None, step_epochs=20, sync_bn=False, test_dispatch='', top_model_num=3, train_mode=False, update_frequency=1, warmup_epochs=0, warmup_lr=0.001, weight_decay=0.0001, workers=4)
    12/28 07:27:02 AM Training with a single process on 1 GPUs.
    12/28 07:27:04 AM Data processing configuration for current model + dataset:
    12/28 07:27:04 AM       input_size: (3, 224, 224)
    12/28 07:27:04 AM       interpolation: bicubic
    12/28 07:27:04 AM       mean: (0.485, 0.456, 0.406)
    12/28 07:27:04 AM       std: (0.229, 0.224, 0.225)
    12/28 07:27:04 AM       crop_pct: 0.875
    12/28 07:27:06 AM NVIDIA APEX installed. AMP off.
    12/28 07:27:32 AM
    Train: stage 0, epoch 1, step [   0/20018]  Loss: 109.597771 (109.5978)  Time: 2.011s,   31.82/s  LR: 1.800e-03  Data & Guide Time: 1.644
    GuideMean: -0.64644  GuideStd: 10.40032  OutMean: 0.00000 (0.00000)  OutStd: 0.99985 (0.99985)  Dist_Mean: 0.64644 (0.64644)
    GRLoss: 1.00459 (1.00459)  CLLoss: 0.79709 (0.79709)  KLCosLoss: 0.57991 (0.57991)
    FeatureLoss: 0.00000 (0.00000)  Top1Acc: 0.00000(0.00000)
    Relative MSE loss: 1.01323(1.01323)
    
    .....
    12/29 06:58:47 AM Random Test: stage 0, epoch 20  Loss: 20.4754  Prec@1: 0.0000  Time: 0.216s,   74.05/s
    12/29 06:58:48 AM Current checkpoints:
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-6.pth.tar', 19.889211503295897)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-14.pth.tar', 19.960276111450195)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-4.pth.tar', 19.97588088684082)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-16.pth.tar', 20.030977337646483)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-8.pth.tar', 20.106792897033692)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-10.pth.tar', 20.107453624572752)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-12.pth.tar', 20.242049604492188)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-18.pth.tar', 20.277006747436523)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-2.pth.tar', 20.39269996520996)
     ('./output/test/adam-step-ep20-lr0.002-bs64-20201228-072702/checkpoint-0-20.pth.tar', 20.47537907836914)
    
    Traceback (most recent call last):
      File "train.py", line 273, in <module>
        main()
      File "train.py", line 268, in main
        writer=writer)
      File "/home/ubuntu/workspace/repos/DNA/searching/dna/distill_train.py", line 100, in distill_train
        reset_data=reset_data)
      File "/home/ubuntu/workspace/repos/DNA/searching/dna/distill_train.py", line 695, in _potential
        for layer in supernet.module.modules():
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 576, in __getattr__
        type(self).__name__, name))
    AttributeError: 'StudentSuperNet' object has no attribute 'module'
    Traceback (most recent call last):
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
        main()
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
        cmd=cmd)
    subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/envs/pytorch_p36/bin/python', '-u', 'train.py', '--local_rank=0']' returned non-zero exit status 1.
    
    
    • multi-GPU: --nproc_per_node=4 but it resulted in set faults.
    12/30 05:01:12 AM WORLD_SIZE in os.environ is 4
    12/30 05:01:12 AM Namespace(amp=False, batch_size=64, color_jitter=0.4, cooldown_epochs=0, data_config=None, datadir='/home/ubuntu/workspace/datasets/ILSVRC2012/', dataset='imagenet', decay_epochs=1, decay_rate=0.9, distill_last_stage=True, distributed=False, eval_intervals=2, eval_metric='prec1', eval_mode=False, exp_dir='', feature_train=True, guide_input=True, guide_loss_fn='mse', hyperparam_config=None, img_size=224, index='', init_classifier=False, interpolation='', label_train=False, local_rank=0, log_interval=50, loss_weight=[0.5, 0.5], lr=[0.002, 0.005, 0.005, 0.005, 0.005, 0.002], mean=None, min_lr=1e-08, mixup=0.0, mixup_off_epoch=0, model_ema=False, model_ema_decay=0.9998, model_ema_force_cpu=False, model_pool='', momentum=0.9, num_classes=1000, num_gpu=1, opt='adam', opt_eps=1e-08, output='', potential_eval_times=20, prefetcher=True, pretrain=False, print_detail=True, recovery_interval=0, remode='pixel', reprob=0.5, reset_after_stage=False, reset_bn_eval=True, resume='', reverse_train=False, save_images=False, save_last_feature=True, sched='step', seed=42, separate_train=False, smoothing=0.1, stage_num=6, start_epoch=None, start_stage=None, std=None, step_epochs=20, sync_bn=False, test_dispatch='', top_model_num=3, train_mode=False, update_frequency=1, warmup_epochs=0, warmup_lr=0.001, weight_decay=0.0001, workers=4)
    12/30 05:01:12 AM Training in distributed mode with multiple processes, 1 GPU per process. CUDA 2, Process 2, total 4.
    12/30 05:01:12 AM Training in distributed mode with multiple processes, 1 GPU per process. CUDA 3, Process 3, total 4.
    12/30 05:01:13 AM Training in distributed mode with multiple processes, 1 GPU per process. CUDA 1, Process 1, total 4.
    12/30 05:01:13 AM Training in distributed mode with multiple processes, 1 GPU per process. CUDA 0, Process 0, total 4.
    12/30 05:01:15 AM Data processing configuration for current model + dataset:
    12/30 05:01:15 AM       input_size: (3, 224, 224)
    12/30 05:01:15 AM       interpolation: bicubic
    12/30 05:01:15 AM       mean: (0.485, 0.456, 0.406)
    12/30 05:01:15 AM       std: (0.229, 0.224, 0.225)
    12/30 05:01:15 AM       crop_pct: 0.875
    12/30 05:01:18 AM NVIDIA APEX installed. AMP off.
    ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@ERROR: Unexpected segmentation fault encountered in worker.
    ^@Traceback (most recent call last):
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
        main()
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main
        process.wait()
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/subprocess.py", line 1477, in wait
        (pid, sts) = self._try_wait(0)
      File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/subprocess.py", line 1424, in _try_wait
        (pid, sts) = os.waitpid(self.pid, wait_flags)
    
    good first issue 
    opened by lcmeng 3
  • Mismatch Results of DNA_c

    Mismatch Results of DNA_c

    Hi,

    Thanks for sharing the training code. I try to retrain DNA_c with this config: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 ~/imagenet --model DNA_c \ --epochs 500 --warmup-epochs 5 --batch-size 128 --lr 0.064 --opt rmsproptf --opt-eps 0.001 --sched step --decay-epochs 3 --decay-rate 0.963 --color-jitter 0.06 --drop 0.2 -j 8 --num-classes 1000 --model-ema After 500 epochs training, the best top1 accuracy is 77.2%, which is 0.6% lower than paper. *** Best metric: 77.19799990478515 (epoch 458)

    opened by hongyuanyu 3
  • What is the transfer setting for CIFAR?

    What is the transfer setting for CIFAR?

    In the paper, I notice there are CIFAR transfer results on EfficientNet/MixNet and also DNA. Could you be kind to share the details? E.g., what kind of network modifications need to be done for CIFAR? Did you upscale to 224*224? What are the hyperparameters (lr, optimizer, epochs, batch size etc.) for transfer training? Any hint is deeply appreciated.

    opened by serser 3
  • 您好,我在运行searching/train.py时出现了一个错误

    您好,我在运行searching/train.py时出现了一个错误

    错误显示如下: File "D:/test/DNA-master-new/DNA-master/searching/train.py", line 277, in main() File "D:/test/DNA-master-new/DNA-master/searching/train.py", line 272, in main writer=writer) File "D:\test\DNA-master-new\DNA-master\searching\dna\distill_train.py", line 104, in distill_train reset_data=reset_data) File "D:\test\DNA-master-new\DNA-master\searching\dna\distill_train.py", line 699, in _potential for layer in supernet.module.modules(): File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 594, in getattr type(self).name, name)) AttributeError: 'StudentSuperNet' object has no attribute 'module' 还请您能够帮忙解答!

    opened by doer-hjh 2
  • The models are different in paper, code and trained models

    The models are different in paper, code and trained models

    I mainly focus on DNA-c model, but I found the model structure in the paper, architecture defined in code, and the released trained model are all different. Which model is the best? Could you provide a definite network structure? Thank you!

    opened by 5663015 2
  • Code hangs when training in stage 3, epoch 1, step 6-7

    Code hangs when training in stage 3, epoch 1, step 6-7

    Dear authors: Thank you for your great work. Recently I tried to reproduce your paper and perform a complete train & search. By modifying the dist_train.sh file and changing nproc_per_node to 4 to suit my machine(4x3090), I succeeded to finish training stage 0-2, but when entering stage 3, the code hangs after printing infos for stage 3 epoch 1 step 0.

    After doing some research, I found several interesting & strange details:

    1. The code doesn't hang immediatly after entering stage 3, it successfully perform several complete steps of forward, backward, reduce and step, and hangs around step 6-7 (7 in most cases)
    2. The (apparent) reason the program hangs is, one of the process is stuck in optimizer.step() after sucessfully calling loss.backward(). This is strange as I can't imagine how can optimizer.step() fail if gradients are propagated backward successfully but that's exactly the case. This process never print any debug log I set after optimizer.step(). The other processes just wait for it and the whole program hangs.
    3. Process of any rank (including rank 0) can get stuck. Only one process gets stuck each time. The other three processes run just fine until the next all_reduce.

    The above points, although are seemingly strange and random, can be reproduced stably on our machine.... We have tried several different versions of pytorch docker from nvidia(https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) (1.7.0, 1.8.0 and newer ones) and this problem just continues. As both the codes and dockers are vanila, I can't tell which side the bug is coming from.

    Update: Just when I type these lines, a classmate of mine told me that, after changing to another docker with pytorch 1.6.1 and cuda 11.0 (tag 20.06, the oldest docker from nvidia that supports cuda 11.0), this problem disappears mysteriously. I'm still posting this issue to tell researchers in the future, don't run this code on pytorch>1.6.1.

    good first issue 
    opened by Peter-1213 1
  • Supernet and process_potential.py Workflow Clarifications

    Supernet and process_potential.py Workflow Clarifications

    Hi,

    I just need to verify I understand your workflow correctly which can also make things clearer for futuristic readers.

    1. In the first distillation part sh dist_train.sh, the student supernet is defined as having a single cell per block/stage. Because the default block_cfg in the supernet class have a fixed number of layers and the get_all_models function only gets the combinations of a block's fixed number of layers with the number of candidate operations. Thus, in order to do the search for multiple cells/stage, I need to do multiple separate runs on this search script, with each time I modify the number of layers in block_cfg according to the desired width/depth, leading to multiple .yaml files generated per stage.

    2. In process_potential.py, The traversal search can work with multiple cells per block by extending the list entries within potential_yaml and layers_cfgs lists to account for other .yaml files generated for different cells in each stage as this example: potential_yaml=[['$1st_cell$/potential-0.yaml', '$2nd_cell$/potential-0.yaml'], ['$1st_cell$/potential-1.yaml'], ..] and layer_cfgs=[[2,4,4,4,4,1],[3,4,4,4,4,1]]

    Thanks in advance.

    good first issue 
    opened by MohanadOdema 1
  • What are the numbers meaning in process_potential.py?

    What are the numbers meaning in process_potential.py?

    First of all, thank you for your codes! Using these codes, I'm trying to apply this for my code. Meanwhile I don't understand what are the meaning of subtracting numbers as following : 284976, 358960, 73368, 55412, 51540, etc. Can you explain it to me?

            stage3_max_param = stage4_max_param - 284976 * 4 + 358960
            stage2_max_param = stage3_max_param - 73368 * 3 + 55412
            stage1_max_param = stage2_max_param - 51540 * 3 + 18650
            stage0_max_param = stage1_max_param - 13770 * 3 + 6566
    

    These are 108~111 lines of process_potential.py. Thank you.

    opened by K-jihyeon 0
  • How to set mse_weight?

    How to set mse_weight?

    Hi, Thanks for your excellent work. As the title show, how to set mse_weight? Any rules to follow ? mse_weight = [0.0684, 0.171, 0.3422, 0.2395, 0.5474, 0.3422] [(https://github.com/changlin31/DNA/blob/master/searching/dna/distill_train.py#L329)]

    opened by sunnyxiaohu 0
Owner
Changlin Li
Changlin Li
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

ZJU-VIPA 47 Jan 9, 2023
TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

Sayak Paul 67 Dec 20, 2022
PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

PocketNet This is the official repository of the paper: PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and M

Fadi Boutros 40 Dec 22, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

null 39 Dec 17, 2022
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 2, 2022
Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

FRSKD Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021) Requirements Pytho

null 75 Dec 28, 2022
Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation Introduction WAKD is a PyTorch implementation for our ICPR-2022 pap

null 2 Oct 20, 2022
Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation'

OD-Rec Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation' Paper, saved teacher models and Andro

Xin Xia 11 Nov 22, 2022
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022
An implementation for Neural Architecture Search with Random Labels (CVPR 2021 poster) on Pytorch.

Neural Architecture Search with Random Labels(RLNAS) Introduction This project provides an implementation for Neural Architecture Search with Random L

null 18 Nov 8, 2022
DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

What is DeepHyper? DeepHyper is a software package that uses learning, optimization, and parallel computing to automate the design and development of

DeepHyper Team 214 Jan 8, 2023
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Dec 31, 2022
Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment The official implementation of Arch-Net: Model Distillation for Architecture A

MEGVII Research 22 Jan 5, 2023
AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Frank Liu 26 Oct 13, 2022
Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching Official pytorch implementation of "Show, Attend and Distill: Kn

Clova AI Research 80 Dec 16, 2022
Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Data Efficient Stagewise Knowledge Distillation Table of Contents Data Efficient Stagewise Knowledge Distillation Table of Contents Requirements Image

IvLabs 112 Dec 2, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation (TESKD) By Zheng Li[1,4], Xiang Li[2], Lingfeng Yang[2,4], Jian Yang[2], Zh

Zheng Li 9 Sep 26, 2022