AlphaNet Improved Training of Supernet with Alpha-Divergence

Overview

AlphaNet: Improved Training of Supernet with Alpha-Divergence

This repository contains our PyTorch training code, evaluation code and pretrained models for AlphaNet.

PWC

Our implementation is largely based on AttentiveNAS. To reproduce our results, please first download the AttentiveNAS repo, and use our train_alphanet.py for training and test_alphanet.py for testing.

For more details, please see AlphaNet: Improved Training of Supernet with Alpha-Divergence by Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, Vikas Chandra.

If you find this repo useful in your research, please consider citing our work and AttentiveNAS:

@article{wang2021alphanet,
  title={AlphaNet: Improved Training of Supernet with Alpha-Divergence},
  author={Wang, Dilin and Gong, Chengyue and Li, Meng and Liu, Qiang and Chandra, Vikas},
  journal={arXiv preprint arXiv:2102.07954},
  year={2021}
}

@article{wang2020attentivenas,
  title={AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling},
  author={Wang, Dilin and Li, Meng and Gong, Chengyue and Chandra, Vikas},
  journal={arXiv preprint arXiv:2011.09011},
  year={2020}
}

Evaluation

To reproduce our results:

  • Please first download our pretrained AlphaNet models from a Google Drive path and put the pretrained models under your local folder ./alphanet_data

  • To evaluate our pre-trained AlphaNet models, from AlphaNet-A0 to A6, on ImageNet with a single GPU, please run:

    python test_alphanet.py --config-file ./configs/eval_alphanet_models.yml --model a[0-6]

    Expected results:

    Name MFLOPs Top-1 (%)
    AlphaNet-A0 203 77.87
    AlphaNet-A1 279 78.94
    AlphaNet-A2 317 79.20
    AlphaNet-A3 357 79.41
    AlphaNet-A4 444 80.01
    AlphaNet-A5 (small) 491 80.29
    AlphaNet-A5 (base) 596 80.62
    AlphaNet-A6 709 80.78
  • Additionally, here is our pretrained supernet with KL based inplace-KD and here is our pretrained supernet without inplace-KD.

Training

To train our AlphaNet models from scratch, please run:

python train_alphanet.py --config-file configs/train_alphanet_models.yml --machine-rank ${machine_rank} --num-machines ${num_machines} --dist-url ${dist_url}

We adopt SGD training on 64 GPUs. The mini-batch size is 32 per GPU; all training hyper-parameters are specified in train_alphanet_models.yml.

Evolutionary search

In case you want to search the set of models of your own interest - we provide an example to show how to search the Pareto models for the best FLOPs vs. accuracy tradeoffs in parallel_supernet_evo_search.py; to run this example:

python parallel_supernet_evo_search.py --config-file configs/parallel_supernet_evo_search.yml 

License

AlphaNet is licensed under CC-BY-NC.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more info.

Comments
  • How to modify the loss function to apply to multi-label classification tasks

    How to modify the loss function to apply to multi-label classification tasks

    Hello, I applied it to multi-label classification tasks, but as the following scenario shows, the loss function does not converge. How do I modify my loss function?

    Epoch: [0][4000/7176] Time 2.197 ( 2.047) Data 0.000 ( 0.010) Loss -1.7713e+00 (-6.9243e-01) Acc@1 0.13 ( 0.13) Acc@5 0.13 ( 0.13) Epoch: [0][4050/7176] Time 1.978 ( 2.047) Data 0.000 ( 0.010) Loss -2.5454e+00 (-7.1199e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][4100/7176] Time 2.234 ( 2.048) Data 0.000 ( 0.010) Loss -3.2569e+00 (-7.2793e-01) Acc@1 0.17 ( 0.13) Acc@5 0.17 ( 0.13) Epoch: [0][4150/7176] Time 1.940 ( 2.048) Data 0.000 ( 0.010) Loss -1.6426e+00 (-7.5071e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][4200/7176] Time 2.101 ( 2.047) Data 0.000 ( 0.010) Loss -2.9430e+00 (-7.7234e-01) Acc@1 0.13 ( 0.13) Acc@5 0.13 ( 0.13) Epoch: [0][4250/7176] Time 2.056 ( 2.047) Data 0.000 ( 0.009) Loss -2.6333e+00 (-7.9679e-01) Acc@1 0.20 ( 0.13) Acc@5 0.20 ( 0.13) Epoch: [0][4300/7176] Time 2.002 ( 2.047) Data 0.000 ( 0.009) Loss -2.0716e+00 (-8.1554e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][4350/7176] Time 1.919 ( 2.048) Data 0.000 ( 0.009) Loss -1.1843e+00 (-8.3195e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][4400/7176] Time 1.972 ( 2.047) Data 0.001 ( 0.009) Loss -2.2746e+00 (-8.4905e-01) Acc@1 0.20 ( 0.13) Acc@5 0.20 ( 0.13) Epoch: [0][4450/7176] Time 2.151 ( 2.047) Data 0.000 ( 0.009) Loss -2.3914e+00 (-8.6940e-01) Acc@1 0.19 ( 0.13) Acc@5 0.19 ( 0.13) Epoch: [0][4500/7176] Time 1.873 ( 2.047) Data 0.000 ( 0.009) Loss -2.9244e+00 (-8.8694e-01) Acc@1 0.12 ( 0.13) Acc@5 0.12 ( 0.13) Epoch: [0][4550/7176] Time 2.112 ( 2.047) Data 0.000 ( 0.009) Loss -2.9324e+00 (-9.0775e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][4600/7176] Time 1.902 ( 2.047) Data 0.000 ( 0.009) Loss -2.0961e+00 (-9.2422e-01) Acc@1 0.13 ( 0.13) Acc@5 0.13 ( 0.13) Epoch: [0][4650/7176] Time 1.752 ( 2.047) Data 0.000 ( 0.009) Loss -1.9184e+00 (-9.3473e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][4700/7176] Time 1.967 ( 2.047) Data 0.000 ( 0.009) Loss -6.9974e-01 (-9.4724e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13) Epoch: [0][4750/7176] Time 1.994 ( 2.046) Data 0.000 ( 0.009) Loss -3.3847e+00 (-9.6442e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13) Epoch: [0][4800/7176] Time 2.085 ( 2.047) Data 0.000 ( 0.008) Loss -2.5070e+00 (-9.8096e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13) Epoch: [0][4850/7176] Time 1.899 ( 2.047) Data 0.000 ( 0.008) Loss -2.2142e+00 (-9.9467e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][4900/7176] Time 2.096 ( 2.046) Data 0.000 ( 0.008) Loss -3.2781e+00 (-1.0132e+00) Acc@1 0.19 ( 0.13) Acc@5 0.19 ( 0.13) Epoch: [0][4950/7176] Time 1.782 ( 2.046) Data 0.000 ( 0.008) Loss -2.8195e+00 (-1.0289e+00) Acc@1 0.17 ( 0.13) Acc@5 0.17 ( 0.13) Epoch: [0][5000/7176] Time 1.996 ( 2.046) Data 0.000 ( 0.008) Loss -3.1988e+00 (-1.0456e+00) Acc@1 0.18 ( 0.13) Acc@5 0.18 ( 0.13) Epoch: [0][5050/7176] Time 2.248 ( 2.046) Data 0.000 ( 0.008) Loss -1.9582e+00 (-1.0614e+00) Acc@1 0.13 ( 0.13) Acc@5 0.13 ( 0.13) Epoch: [0][5100/7176] Time 2.029 ( 2.046) Data 0.000 ( 0.008) Loss -2.2378e+00 (-1.0686e+00) Acc@1 0.18 ( 0.13) Acc@5 0.18 ( 0.13) Epoch: [0][5150/7176] Time 1.943 ( 2.046) Data 0.000 ( 0.008) Loss -2.6617e+00 (-1.0803e+00) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13) Epoch: [0][5200/7176] Time 2.050 ( 2.046) Data 0.000 ( 0.008) Loss -2.4670e+00 (-1.0952e+00) Acc@1 0.17 ( 0.13) Acc@5 0.17 ( 0.13) Epoch: [0][5250/7176] Time 2.051 ( 2.046) Data 0.000 ( 0.008) Loss -2.1453e+00 (-1.1098e+00) Acc@1 0.20 ( 0.13) Acc@5 0.20 ( 0.13) Epoch: [0][5300/7176] Time 2.064 ( 2.045) Data 0.000 ( 0.008) Loss -2.2843e+00 (-1.1214e+00) Acc@1 0.18 ( 0.13) Acc@5 0.18 ( 0.13) Epoch: [0][5350/7176] Time 2.232 ( 2.045) Data 0.000 ( 0.008) Loss -2.4472e+00 (-1.1356e+00) Acc@1 0.16 ( 0.13) Acc@5 0.16 ( 0.13) Epoch: [0][5400/7176] Time 1.980 ( 2.045) Data 0.000 ( 0.008) Loss -2.9187e+00 (-1.1485e+00) Acc@1 0.18 ( 0.14) Acc@5 0.18 ( 0.14) Epoch: [0][5450/7176] Time 2.144 ( 2.046) Data 0.000 ( 0.007) Loss -2.7685e+00 (-1.1622e+00) Acc@1 0.16 ( 0.14) Acc@5 0.16 ( 0.14) Epoch: [0][5500/7176] Time 1.839 ( 2.045) Data 0.000 ( 0.007) Loss -2.7240e+00 (-1.1766e+00) Acc@1 0.12 ( 0.14) Acc@5 0.12 ( 0.14) Epoch: [0][5550/7176] Time 1.953 ( 2.046) Data 0.000 ( 0.007) Loss -2.1483e+00 (-1.1920e+00) Acc@1 0.15 ( 0.14) Acc@5 0.15 ( 0.14) Epoch: [0][5600/7176] Time 1.842 ( 2.045) Data 0.000 ( 0.007) Loss -1.6526e+00 (-1.1999e+00) Acc@1 0.17 ( 0.14) Acc@5 0.17 ( 0.14)

    opened by Xiangyu-Han 6
  • Training accuracy suddenly approaches zero

    Training accuracy suddenly approaches zero

    Hi, I meet a question during the training process, the accuracy suddenly approaches zero (just like the network parameters are re-initialized). The log is as follows:

    Epoch: [48][ 70/625] Time 1.383 ( 1.428) Data 0.000 ( 0.152) Loss -1.8897e+00 (-2.0853e+00) Acc@1 56.54 ( 60.11) Acc@5 79.59 ( 81.28) Epoch: [48][ 80/625] Time 1.346 ( 1.401) Data 0.000 ( 0.133) Loss -2.3125e+00 (-2.0805e+00) Acc@1 57.71 ( 60.05) Acc@5 79.44 ( 81.26) Epoch: [48][ 90/625] Time 1.224 ( 1.386) Data 0.000 ( 0.119) Loss -2.1656e+00 (-2.0629e+00) Acc@1 58.15 ( 59.98) Acc@5 80.13 ( 81.25) Epoch: [48][100/625] Time 1.474 ( 1.375) Data 0.000 ( 0.107) Loss -2.3581e+00 (-2.0779e+00) Acc@1 58.06 ( 59.92) Acc@5 78.66 ( 81.22) Epoch: [48][110/625] Time 1.215 ( 1.367) Data 0.000 ( 0.097) Loss -4.3465e+00 (-2.1967e+00) Acc@1 0.10 ( 56.28) Acc@5 0.59 ( 76.59) Epoch: [48][120/625] Time 1.111 ( 1.357) Data 0.000 ( 0.089) Loss -4.5741e+00 (-2.3879e+00) Acc@1 0.15 ( 51.64) Acc@5 0.39 ( 70.31) Epoch: [48][130/625] Time 1.152 ( 1.350) Data 0.000 ( 0.083) Loss -4.5668e+00 (-2.5548e+00) Acc@1 0.05 ( 47.71) Acc@5 0.44 ( 64.99) Epoch: [48][140/625] Time 1.232 ( 1.346) Data 0.000 ( 0.077) Loss -4.5077e+00 (-2.6952e+00) Acc@1 0.29 ( 44.34) Acc@5 0.88 ( 60.44) Epoch: [48][150/625] Time 1.219 ( 1.341) Data 0.000 ( 0.072) Loss -4.4784e+00 (-2.8117e+00) Acc@1 0.34 ( 41.41) Acc@5 1.12 ( 56.49) Epoch: [48][160/625] Time 1.274 ( 1.338) Data 0.000 ( 0.067) Loss -4.2812e+00 (-2.9086e+00) Acc@1 0.34 ( 38.86) Acc@5 0.93 ( 53.04)

    What's more, the training loss is always negative, is that right?

    opened by liwei9719 6
  • How can I preserve the search architecture?

    How can I preserve the search architecture?

    Hi,when testing, I found it use the default architecture(a0~a6) you provided in eval_alphanet_model.yaml. How should I preserve my own architecture when training the searching stage?

    opened by howardgriffin 6
  • The AdaptiveLossSoft become NAN

    The AdaptiveLossSoft become NAN

    Hi, When I use my own dataset and train the Knowledge distillation by the AdaptiveLossSoft, the loss will gradually become NAN and the acc1 will first increase and then decrease (acc5 is set the same as acc1 in my code, just ignore this). Any suggestions?

    Epoch: [0][ 0/7176] Time 71.193 (71.193) Data 66.870 (66.870) Loss 2.8784e-01 (2.8784e-01) Acc@1 0.09 ( 0.09) Acc@5 0.09 ( 0.09) Epoch: [0][ 100/7176] Time 1.870 ( 2.728) Data 0.000 ( 0.662) Loss 4.6225e-02 (-7.6008e-02) Acc@1 0.08 ( 0.11) Acc@5 0.08 ( 0.11) Epoch: [0][ 200/7176] Time 1.931 ( 2.363) Data 0.000 ( 0.333) Loss 4.9116e-01 (7.5377e-02) Acc@1 0.09 ( 0.11) Acc@5 0.09 ( 0.11) Epoch: [0][ 300/7176] Time 1.860 ( 2.250) Data 0.000 ( 0.222) Loss 2.0824e-01 (2.0263e-01) Acc@1 0.10 ( 0.11) Acc@5 0.10 ( 0.11) Epoch: [0][ 400/7176] Time 2.271 ( 2.200) Data 0.000 ( 0.167) Loss 8.5119e-01 (3.0746e-01) Acc@1 0.12 ( 0.11) Acc@5 0.12 ( 0.11) Epoch: [0][ 500/7176] Time 2.325 ( 2.173) Data 0.000 ( 0.134) Loss 1.6488e+00 (4.4695e-01) Acc@1 0.10 ( 0.11) Acc@5 0.10 ( 0.11) Epoch: [0][ 600/7176] Time 2.029 ( 2.150) Data 0.000 ( 0.111) Loss 1.2261e+00 (6.1032e-01) Acc@1 0.13 ( 0.11) Acc@5 0.13 ( 0.11) Epoch: [0][ 700/7176] Time 1.945 ( 2.134) Data 0.000 ( 0.096) Loss -3.8604e-01 (6.4971e-01) Acc@1 0.12 ( 0.11) Acc@5 0.12 ( 0.11) Epoch: [0][ 800/7176] Time 1.981 ( 2.121) Data 0.000 ( 0.084) Loss -1.3327e-02 (5.4336e-01) Acc@1 0.15 ( 0.12) Acc@5 0.15 ( 0.12) Epoch: [0][ 900/7176] Time 2.034 ( 2.108) Data 0.000 ( 0.074) Loss 1.0979e+00 (5.1461e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12) Epoch: [0][1000/7176] Time 1.975 ( 2.101) Data 0.000 ( 0.067) Loss 1.6591e-02 (4.1266e-01) Acc@1 0.18 ( 0.12) Acc@5 0.18 ( 0.12) Epoch: [0][1100/7176] Time 1.715 ( 2.095) Data 0.000 ( 0.061) Loss -1.0021e+00 (3.3724e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12) Epoch: [0][1200/7176] Time 1.852 ( 2.088) Data 0.000 ( 0.056) Loss -3.7594e-01 (2.9739e-01) Acc@1 0.13 ( 0.12) Acc@5 0.13 ( 0.12) Epoch: [0][1300/7176] Time 1.892 ( 2.082) Data 0.000 ( 0.052) Loss -7.8806e-02 (2.5089e-01) Acc@1 0.12 ( 0.12) Acc@5 0.12 ( 0.12) Epoch: [0][1400/7176] Time 1.956 ( 2.078) Data 0.000 ( 0.048) Loss 8.6050e-02 (2.3144e-01) Acc@1 0.19 ( 0.13) Acc@5 0.19 ( 0.13) Epoch: [0][1500/7176] Time 2.031 ( 2.074) Data 0.000 ( 0.045) Loss -1.8159e-01 (2.2123e-01) Acc@1 0.16 ( 0.13) Acc@5 0.16 ( 0.13) Epoch: [0][1600/7176] Time 2.118 ( 2.072) Data 0.000 ( 0.042) Loss 3.8409e-01 (2.1557e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][1700/7176] Time 2.163 ( 2.069) Data 0.000 ( 0.039) Loss 3.2751e-01 (2.1508e-01) Acc@1 0.15 ( 0.13) Acc@5 0.15 ( 0.13) Epoch: [0][1800/7176] Time 2.166 ( 2.068) Data 0.000 ( 0.037) Loss -3.0104e-01 (2.1683e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][1900/7176] Time 1.822 ( 2.066) Data 0.000 ( 0.035) Loss 3.6041e-01 (2.1936e-01) Acc@1 0.14 ( 0.13) Acc@5 0.14 ( 0.13) Epoch: [0][2000/7176] Time 1.888 ( 2.065) Data 0.000 ( 0.034) Loss 6.0852e-02 (2.2056e-01) Acc@1 0.17 ( 0.14) Acc@5 0.17 ( 0.14) Epoch: [0][2100/7176] Time 1.928 ( 2.064) Data 0.000 ( 0.032) Loss 7.0139e-01 (2.2213e-01) Acc@1 0.20 ( 0.14) Acc@5 0.20 ( 0.14) Epoch: [0][2200/7176] Time 2.212 ( 2.061) Data 0.000 ( 0.031) Loss 2.7252e-01 (2.1953e-01) Acc@1 0.21 ( 0.14) Acc@5 0.21 ( 0.14) Epoch: [0][2300/7176] Time 1.816 ( 2.060) Data 0.000 ( 0.029) Loss -1.5090e-01 (2.2140e-01) Acc@1 0.15 ( 0.14) Acc@5 0.15 ( 0.14) Epoch: [0][2400/7176] Time 1.929 ( 2.059) Data 0.000 ( 0.028) Loss 4.2306e-01 (2.1328e-01) Acc@1 0.15 ( 0.15) Acc@5 0.15 ( 0.15) Epoch: [0][2500/7176] Time 1.886 ( 2.057) Data 0.000 ( 0.027) Loss 2.7449e-01 (1.9290e-01) Acc@1 0.17 ( 0.15) Acc@5 0.17 ( 0.15) Epoch: [0][2600/7176] Time 1.813 ( 2.056) Data 0.000 ( 0.026) Loss 5.1589e-02 (2.1373e-01) Acc@1 0.16 ( 0.15) Acc@5 0.16 ( 0.15) Epoch: [0][2700/7176] Time 2.145 ( 2.055) Data 0.000 ( 0.025) Loss -6.0235e-01 (1.9399e-01) Acc@1 0.19 ( 0.15) Acc@5 0.19 ( 0.15) Epoch: [0][2800/7176] Time 1.944 ( 2.054) Data 0.000 ( 0.024) Loss 7.8085e-02 (1.7437e-01) Acc@1 0.15 ( 0.15) Acc@5 0.15 ( 0.15) Epoch: [0][2900/7176] Time 1.778 ( 2.053) Data 0.000 ( 0.023) Loss -1.6850e-03 (1.6211e-01) Acc@1 0.13 ( 0.15) Acc@5 0.13 ( 0.15) Epoch: [0][3000/7176] Time 1.767 ( 2.052) Data 0.000 ( 0.022) Loss nan (nan) Acc@1 0.00 ( 0.15) Acc@5 0.00 ( 0.15) Epoch: [0][3100/7176] Time 2.064 ( 2.050) Data 0.000 ( 0.022) Loss nan (nan) Acc@1 0.00 ( 0.14) Acc@5 0.00 ( 0.14) Epoch: [0][3200/7176] Time 2.222 ( 2.048) Data 0.000 ( 0.021) Loss nan (nan) Acc@1 0.00 ( 0.14) Acc@5 0.00 ( 0.14) Epoch: [0][3300/7176] Time 2.206 ( 2.046) Data 0.000 ( 0.020) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13) Epoch: [0][3400/7176] Time 1.906 ( 2.044) Data 0.000 ( 0.020) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13) Epoch: [0][3500/7176] Time 2.058 ( 2.042) Data 0.000 ( 0.019) Loss nan (nan) Acc@1 0.00 ( 0.13) Acc@5 0.00 ( 0.13) Epoch: [0][3600/7176] Time 1.912 ( 2.040) Data 0.000 ( 0.019) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12) Epoch: [0][3700/7176] Time 2.006 ( 2.038) Data 0.000 ( 0.018) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12) Epoch: [0][3800/7176] Time 1.990 ( 2.036) Data 0.000 ( 0.018) Loss nan (nan) Acc@1 0.00 ( 0.12) Acc@5 0.00 ( 0.12) Epoch: [0][3900/7176] Time 2.073 ( 2.035) Data 0.000 ( 0.017) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11) Epoch: [0][4000/7176] Time 2.152 ( 2.033) Data 0.000 ( 0.017) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11) Epoch: [0][4100/7176] Time 2.183 ( 2.033) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.11) Acc@5 0.00 ( 0.11) Epoch: [0][4200/7176] Time 2.054 ( 2.031) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4300/7176] Time 1.870 ( 2.030) Data 0.000 ( 0.016) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4400/7176] Time 1.923 ( 2.029) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4500/7176] Time 1.891 ( 2.028) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4600/7176] Time 1.866 ( 2.027) Data 0.000 ( 0.015) Loss nan (nan) Acc@1 0.00 ( 0.10) Acc@5 0.00 ( 0.10) Epoch: [0][4700/7176] Time 1.887 ( 2.026) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09) Epoch: [0][4800/7176] Time 2.037 ( 2.025) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09) Epoch: [0][4900/7176] Time 2.019 ( 2.024) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09) Epoch: [0][5000/7176] Time 1.936 ( 2.023) Data 0.000 ( 0.014) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09) Epoch: [0][5100/7176] Time 1.972 ( 2.022) Data 0.000 ( 0.013) Loss nan (nan) Acc@1 0.00 ( 0.09) Acc@5 0.00 ( 0.09)

    opened by howardgriffin 4
  • How were the final architectures selected?

    How were the final architectures selected?

    Hello,

    I like your work, but I'm a bit confused about how final models a0-a6 were selected. In the paper, in section 4.2, subsection "Evaluation" you describe an evolutionary search procedure. However, in the subsection "Improvements on SOTA" you write that you choose a0-a6 architectures to be the same as in the AttentiveNAS paper. Do I understand correctly that the results of the evolutionary search were not used when selecting the final models?

    Thanks in advance!

    opened by AwesomeLemon 4
  • The problem of increasing memory usage and learning rate

    The problem of increasing memory usage and learning rate

    Hello! Thank you for your excellent work! I had a problem with increasing memory usage (not GPU memory) while training the supernet. I checked and found that it was caused by lines 62 through 68 of the https://github.com/facebookresearch/AttentiveNAS/blob/main/evaluate/attentive_nas_eval.py. If I delete this code, the memory usage stays flat.Have you encountered such problems?

    opened by liujiawei2333 3
  • Why use the training dataset in the test stage?

    Why use the training dataset in the test stage?

    Hi, I check the test_alphanet.py and find it use the training dataset when testing. From your comment 'bn running stats calibration following Slimmable' , does it mean to caculate the training data's mean and variance in BN? Thank you in advance!

    opened by howardgriffin 3
  • there are some files missing or I can't find them

    there are some files missing or I can't find them

    I run the test_alphanet.py with python test_alphanet.py --config-file ./configs/eval_alphanet_models.yml --model a[0-6] then it reports an error ModuleNotFouror: No module named 'models' e325f2655ca89622384e291e6a9b2f5

    I lookup the files but i can't find models and some others in the picture in the files c3ac0ae0e00da9e2c2b5b666491c5e6

    thanks ps:I'm a rookie so i maybe make low-level mistakes

    opened by yangyang90 1
  • is AlphaNet a0 ~ a6 exactly same as the a0 ~ a6 in attentative NAS?

    is AlphaNet a0 ~ a6 exactly same as the a0 ~ a6 in attentative NAS?

    Hello there,

    I find a very interesting setup in the AlphaNet/Attentative codebases:

    From these configuration files: https://github.com/facebookresearch/AttentiveNAS/blob/master/configs/eval_attentive_nas_models.yml https://github.com/facebookresearch/AlphaNet/blob/master/configs/eval_alphanet_models.yml

    These networks look exactly the same, am I right? Thanks.

    opened by linnanwang 1
Owner
Facebook Research
Facebook Research
Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

The official code for the NeurIPS 2021 paper Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

null 13 Dec 22, 2022
PyTorch implementations of the beta divergence loss.

Beta Divergence Loss - PyTorch Implementation This repository contains code for a PyTorch implementation of the beta divergence loss. Dependencies Thi

Billy Carson 7 Nov 9, 2022
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

WGAN-GP An pytorch implementation of Paper "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

Marvin Cao 1.4k Dec 14, 2022
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue. This

null 290 Dec 29, 2022
PyMatting: A Python Library for Alpha Matting

Given an input image and a hand-drawn trimap (top row), alpha matting estimates the alpha channel of a foreground object which can then be composed onto a different background (bottom row).

PyMatting 1.4k Dec 30, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 5, 2022
Quick program made to generate alpha and delta tables for Hidden Markov Models

HMM_Calc Functions for generating Alpha and Delta tables from a Hidden Markov Model. Parameters: a: Matrix of transition probabilities. a[i][j] = a_{i

Adem Odza 1 Dec 4, 2021
SymPy-powered, Wolfram|Alpha-like answer engine totally in your browser, without backend computation

SymPy Beta SymPy Beta is a fork of SymPy Gamma. The purpose of this project is to run a SymPy-powered, Wolfram|Alpha-like answer engine totally in you

Liumeo 25 Dec 21, 2022
Alpha-Zero - Telegram Group Manager Bot Written In Python Using Pyrogram

✨ Alpha Zero Bot ✨ Telegram Group Manager Bot + Userbot Written In Python Using

null 1 Feb 17, 2022
Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

Sanghyun Jo 150 Nov 14, 2022
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

CC 4.4k Dec 27, 2022
Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters [ Project | Paper | Official code base ] ⬅️ Thanks the original

Jianfei Guo 239 Dec 22, 2022
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

Rex Cheng 456 Dec 12, 2022
Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021

Embedding Transfer with Label Relaxation for Improved Metric Learning Official PyTorch implementation of CVPR 2021 paper Embedding Transfer with Label

Sungyeon Kim 37 Dec 6, 2022
DrQ-v2: Improved Data-Augmented Reinforcement Learning

DrQ-v2: Improved Data-Augmented RL Agent Method DrQ-v2 is a model-free off-policy algorithm for image-based continuous control. DrQ-v2 builds on DrQ,

Facebook Research 234 Jan 1, 2023
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

Jianzhu Guo 3.4k Jan 2, 2023
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks (SDPoint) This repository contains the cod

Jason Kuen 17 Jul 4, 2022