S2-BNN (Self-supervised Binary Neural Networks Using Distillation Loss)
This is the official pytorch implementation of our paper:
by Zhiqiang Shen, Zechun Liu, Jie Qin, Lei Huang, Kwang-Ting Cheng and Marios Savvides.
In this paper, we introduce a simple yet effective self-supervised approach using distillation loss for learning efficient binary neural networks. Our proposed method can outperform the simple contrastive learning baseline (MoCo V2) by an absolute gain of 5.5∼15% on ImageNet.
The student models are not restricted to the binary neural networks, you can replace with any efficient/compact models.
Citation
If you find our code is helpful for your research, please cite:
@InProceedings{Shen_2021_CVPR,
author = {Shen, Zhiqiang and Liu, Zechun and Qin, Jie and Huang, Lei and Cheng, Kwang-Ting and Savvides, Marios},
title = {S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2021}
}
Preparation
1. Requirements:
- Python
- PyTorch
- Torchvision
2. Data:
- Download ImageNet dataset following https://github.com/pytorch/examples/tree/master/imagenet#requirements.
Training & Testing
To train a model, run the following scripts. All our models are trained with 8 GPUs.
1. Standard Two-Step Training:
Our enhanced MoCo V2:
Step 1:
cd Contrastive_only/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48
Step 2:
cd Contrastive_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --model-path ../step1/checkpoint_0199.pth.tar
Our MoCo V2 + Distillation Loss:
Download real-valued teacher network here. We use MoCo V2 800-epoch pretrained model, while you can choose other stronger self-supervised models as the teachers.
Step 1:
cd Contrastive+Distillation/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar
Step 2:
cd Contrastive+Distillation/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar --model-path ../step1/checkpoint_0199.pth.tar
Our Distillation Loss Only:
Step 1:
cd Distillation_only/step1
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar
Step 2:
cd Distillation_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar --model-path ../step1/checkpoint_0199.pth.tar
2. Simple One-Step Training (Conventional):
Our enhanced MoCo V2:
cd Contrastive_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48
Our MoCo V2 + Distillation Loss:
cd Contrastive+Distillation/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar
Our Distillation Loss Only:
cd Distillation_only/step2
python main_moco.py --lr 0.0003 --batch-size 256 --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders] --mlp --moco-t 0.2 --aug-plus --cos -j 48 --wd 0 --teacher-path ../../moco_v2_800ep_pretrain.pth.tar
You can replace binary neural networks with any kinds of efficient/compact models on one-step training.
3. Testing:
-
To linearly evaluate a model, run the following script:
python main_lincls.py --lr 0.1 -j 24 --batch-size 256 --pretrained /home/szq/projects/s2bnn/checkpoint_0199.pth.tar --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 [imagenet-folder with train and val folders]
Results & Models
We provide pre-trained models with different training strategies, we report in the table #epochs, OPs, Top-1 accuracy on ImageNet validation set:
Models | #Epoch | FLOPs (x108) | OPs (x108) | Top-1 (%) | Trained models |
---|---|---|---|---|---|
MoCo V2 baseline | 200 | 0.12 | 0.87 | 46.9 | Download |
Our enhanced MoCo V2 | 200 | 0.12 | 0.87 | 52.5 | Download |
Our MoCo V2 + Distillation Loss | 200 | 0.12 | 0.87 | 56.0 | Download |
Our Distillation Loss Only | 200 | 0.12 | 0.87 | 61.5 | Download |
Training Logs
Our linear evaluation logs are availabe at here.
Acknowledgement
MoCo V2 (Improved Baselines with Momentum Contrastive Learning)
ReActNet (ReActNet: Towards Precise Binary NeuralNetwork with Generalized Activation Functions)
MEAL V2 (MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks)
Contact
Zhiqiang Shen, CMU (zhiqiangshen0214 at gmail.com)