How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

Zechun Liu

Last update: Sep 20, 2022

Related tags

Deep Learning AdamBNN

Overview

AdamBNN

This is the pytorch implementation of our paper "How Do Adam and Training Strategies Help BNNs Optimization?", published in ICML 2021.

In this work, we explore the intrisic reasons why Adam is superior to other optimizers like SGD for BNN optimization and provide analytical explanations that support specific training strategies. By visualizing the optimization trajectory, we show that the optimization lies in extremely rugged loss landscape and the second-order momentum in Adam is crucial to revitalize the weights that are dead due to the activation saturation in BNNs. Based on analysis, we derive a specific training scheme and achieve 70.5% top-1 accuracy on the ImageNet dataset using the same achitecture as ReActNet while achieving 1.1% higher accuracy.

Citation

If you find our code useful for your research, please consider citing:

@conference{liu2021how,
title = {How do adam and training strategies help bnns optimization?},
author = {Liu, Zechun and Shen, Zhiqiang and Li, Shichao and Helwegen, Koen and Huang, Dong and Cheng, Kwang-Ting},
booktitle = {International Conference on Machine Learning},
year = {2021},
organization={PMLR}
}

Run

1. Requirements:

python3, pytorch 1.7.1, torchvision 0.8.2

2. Data:

Download ImageNet dataset

3. Steps to run:

(1) Step1: binarizing activations

Change directory to ./step1/
run bash run.sh

(2) Step2: binarizing weights + activations

Change directory to ./step2/
run bash run.sh

Models

Methods	Backbone	Top1-Acc	FLOPs	Trained Model
ReActNet	ReActNet-A	69.4%	0.87 x 10^8	Model-ReAct
AdamBNN	ReActNet-A	70.5%	0.87 x 10^8	Model-ReAct-AdamBNN-Training

Contact

Zechun Liu, HKUST and CMU (zliubq at connect.ust.hk / zechunl at andrew.cmu.edu)

Zhiqiang Shen, CMU (zhiqians at andrew.cmu.edu)

You might also like...

Comments

Can you provide the model files for the first stage?

I paid attention to your work very early and I admire you very much. I want to train from the second stage, so I would like to ask if you can provide the first stage model?

opened by 17818587795 3
Question about visualization

Thanks Zechun! Very interesting work!

Could you provide the code about Figure. 1, i.e. "the actual optimization landscape from real-valued and binary networks with the same architecture"

Thanks a lot!

opened by techmonsterwang 0

How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

Related tags

Overview

AdamBNN

Citation

Run

1. Requirements:

2. Data:

3. Steps to run:

Models

Contact

You might also like...

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Supplementary code for SIGGRAPH 2021 paper: Discovering Diverse Athletic Jumping Strategies

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Code for ICML 2021 paper: How could Neural Networks understand Programs?

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"

Comments

Can you provide the model files for the first stage?

Question about visualization

Owner

Zechun Liu

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"