CVPR 2021 | Activate or Not: Learning Customized Activation.
This repository contains the official Pytorch implementation of the paper Activate or Not: Learning Customized Activation, CVPR 2021.
ACON
We propose a novel activation function we term the ACON that explicitly learns to activate the neurons or not. Below we show the ACON activation function and its first derivatives. β controls how fast the first derivative asymptotes to the upper/lower bounds, which are determined by p1 and p2.
Training curves
We show the training curves of different activations here.
TFNet
To show the effectiveness of the proposed acon family, we also provide an extreme simple toy funnel network (TFNet) made only by pointwise convolution and ACON-FReLU operators.
Main results
The following results are the ImageNet top-1 accuracy relative improvements compared with the ReLU baselines. The relative improvements of Meta-ACON are about twice as much as SENet.
The comparison between ReLU, Swish and ACON-C. We show improvements without additional amount of FLOPs and parameters:
Model | FLOPs | #Params. | top-1 err. (ReLU) | top-1 err. (Swish) | top-1 err. (ACON) |
---|---|---|---|---|---|
ShuffleNetV2 0.5x | 41M | 1.4M | 39.4 | 38.3 (+1.1) | 37.0 (+2.4) |
ShuffleNetV2 1.5x | 299M | 3.5M | 27.4 | 26.8 (+0.6) | 26.5 (+0.9) |
ResNet 50 | 3.9G | 25.5M | 24.0 | 23.5 (+0.5) | 23.2 (+0.8) |
ResNet 101 | 7.6G | 44.4M | 22.8 | 22.7 (+0.1) | 21.8 (+1.0) |
ResNet 152 | 11.3G | 60.0M | 22.3 | 22.2 (+0.1) | 21.2 (+1.1) |
Next, by adding a negligible amount of FLOPs and parameters, meta-ACON shows sigificant improvements:
Model | FLOPs | #Params. | top-1 err. |
---|---|---|---|
ShuffleNetV2 0.5x (meta-acon) | 41M | 1.7M | 34.8 (+4.6) |
ShuffleNetV2 1.5x (meta-acon) | 299M | 3.9M | 24.7 (+2.7) |
ResNet 50 (meta-acon) | 3.9G | 25.7M | 22.0 (+2.0) |
ResNet 101 (meta-acon) | 7.6G | 44.8M | 21.0 (+1.8) |
ResNet 152 (meta-acon) | 11.3G | 60.5M | 20.5 (+1.8) |
The simple TFNet without the SE modules can outperform the state-of-the art light-weight networks without the SE modules.
FLOPs | #Params. | top-1 err. | |
---|---|---|---|
MobileNetV2 0.17 | 42M | 1.4M | 52.6 |
ShuffleNetV2 0.5x | 41M | 1.4M | 39.4 |
TFNet 0.5 | 43M | 1.3M | 36.6 (+2.8) |
MobileNetV2 0.6 | 141M | 2.2M | 33.3 |
ShuffleNetV2 1.0x | 146M | 2.3M | 30.6 |
TFNet 1.0 | 135M | 1.9M | 29.7 (+0.9) |
MobileNetV2 1.0 | 300M | 3.4M | 28.0 |
ShuffleNetV2 1.5x | 299M | 3.5M | 27.4 |
TFNet 1.5 | 279M | 2.7M | 26.0 (+1.4) |
MobileNetV2 1.4 | 585M | 5.5M | 25.3 |
ShuffleNetV2 2.0x | 591M | 7.4M | 25.0 |
TFNet 2.0 | 474M | 3.8M | 24.3 (+0.7) |
Trained Models
Usage
Requirements
Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh
Train:
python train.py --train-dir YOUR_TRAINDATASET_PATH --val-dir YOUR_VALDATASET_PATH
Eval:
python train.py --eval --eval-resume YOUR_WEIGHT_PATH --train-dir YOUR_TRAINDATASET_PATH --val-dir YOUR_VALDATASET_PATH
Citation
If you use these models in your research, please cite:
@inproceedings{ma2021activate,
title={Activate or Not: Learning Customized Activation},
author={Ma, Ningning and Zhang, Xiangyu and Liu, Ming and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}