ELimNet
ELimNet: Eliminating Layers in a Neural Network Pretrained with Large Dataset for Downstream Task
- Removed top layers from pretrained EfficientNetB0 and ResNet18 to construct lightweight CNN model with less than 1M #params.
- Assessed on Trash Annotations in Context(TACO) Dataset sampled for 6 classes with 20,851 images.
- Compared performance with lightweight models generated with Optuna's Neural Architecture Search(NAS) constituted with same convolutional blocks.
Quickstart
Installation
# clone the repository
git clone https://github.com/snoop2head/elimnet
# fetch image dataset and unzip
!wget -cq https://aistages-prod-server-public.s3.amazonaws.com/app/Competitions/000081/data/data.zip
!unzip ./data.zip -d ./
Train
# finetune on the dataset with pretrained model
python train.py --model ./model/efficientnet_b0.yaml
# finetune on the dataset with ElimNet
python train.py --model ./model/efficientnet_b0_elim_3.yaml
Inference
# inference with the lastest ran model
python inference.py --model_dir ./exp/latest/
Performance
Performance is compared with (1) original pretrained model and (2) Optuna NAS constructed models with no pretrained weights.
- Indicates that top convolutional layers eliminated pretrained CNN models outperforms empty Optuna NAS models generated with same convolutional blocks.
- Suggests that eliminating top convolutional layers creates lightweight model that shows similar(or better) classifcation performance with original pretrained model.
- Reduces parameters to 7%(or less) of its original parameters while maintaining(or improving) its performance. Saves inference time by 20% or more by eliminating top convolutional layters.
ELimNet vs Pretrained Models (Train)
[100 epochs] | # of Parameters | # of Layers | Train | Validation | Test F1 |
---|---|---|---|---|---|
Pretrained EfficientNet B0 | 4.0M | 352 | Loss: 0.43 Acc: 81.23% F1: 0.84 |
Loss: 0.469 Acc: 82.17% F1: 0.76 |
0.7493 |
EfficientNet B0 Elim 2 | 0.9M | 245 | Loss:0.652 Acc: 87.22% F1: 0.84 |
Loss: 0.622 Acc: 87.22% F1: 0.77 |
0.7603 |
EfficientNet B0 Elim 3 | 0.30M | 181 | Loss: 0.602 Acc: 78.17% F1: 0.74 |
Loss: 0.661 Acc: 77.41% F1: 0.74 |
0.7349 |
Resnet18 | 11.17M | 69 | Loss: 0.578 Acc: 78.90% F1: 0.76 |
Loss: 0.700 Acc: 76.17% F1: 0.719 |
- |
Resnet18 Elim 2 | 0.68M | 37 | Loss: 0.447 Acc: 83.73% F1: 0.71 |
Loss: 0.712 Acc: 75.42% F1: 0.71 |
- |
ELimNet vs Pretrained Models (Inference)
# of Parameters | # of Layers | CPU times (sec) | CUDA time (sec) | Test Inference Time (sec) | |
---|---|---|---|---|---|
Pretrained EfficientNet B0 | 4.0M | 352 | 3.9s | 4.0s | 105.7s |
EfficientNet B0 Elim 2 | 0.9M | 245 | 4.1s | 13.0s | 83.4s |
EfficientNet B0 Elim 3 | 0.30M | 181 | 3.0s | 9.0s | 73.5s |
Resnet18 | 11.17M | 69 | - | - | - |
Resnet18 Elim 2 | 0.68M | 37 | - | - | - |
ELimNet vs Empty Optuna NAS Models (Train)
[100 epochs] | # of Parameters | # of Layers | Train | Valid | Test F1 |
---|---|---|---|---|---|
Empty MobileNet V3 | 4.2M | 227 | Loss 0.925 Acc: 65.18% F1: 0.58 |
Loss 0.993 Acc: 62.83% F1: 0.56 |
- |
Empty EfficientNet B0 | 1.3M | 352 | Loss 0.867 Acc: 67.28% F1: 0.61 |
Loss 0.898 Acc: 66.80% F1: 0.61 |
0.6337 |
Empty DWConv & InvertedResidualv3 NAS | 0.08M | 66 | - | Loss: 0.766 Acc: 71.71% F1: 0.68 |
0.6740 |
Empty MBConv NAS | 0.33M | 141 | Loss: 0.786 Acc: 70.72% F1: 0.66 |
Loss: 0.866 Acc: 68.09% F1: 0.62 |
0.6245 |
Resnet18 Elim 2 | 0.68M | 37 | Loss: 0.447 Acc: 83.73% F1: 0.71 |
Loss: 0.712 Acc: 75.42% F1: 0.71 |
- |
EfficientNet B0 Elim 3 | 0.30M | 181 | Loss: 0.602 Acc: 78.17% F1: 0.74 |
Loss: 0.661 Acc: 77.41% F1: 0.74 |
0.7603 |
ELimNet vs Empty Optuna NAS Models (Inference)
# of Parameters | # of Layers | CPU times (sec) | CUDA time (sec) | Test Inference Time (sec) | |
---|---|---|---|---|---|
Empty MobileNet V3 | 4.2M | 227 | 4 | 13 | - |
Empty EfficientNet B0 | 1.3M | 352 | 3.780 | 3.782 | 68.4s |
Empty DWConv & InvertedResidualv3 NAS |
0.08M | 66 | 1 | 3.5 | 61.1s |
Empty MBConv NAS | 0.33M | 141 | 2.14 | 7.201 | 67.1s |
Resnet18 Elim 2 | 0.68M | 37 | - | - | - |
EfficientNet B0 Elim 3 | 0.30M | 181 | 3.0s | 9s | 73.5s |
Background & WiP
Background
- NLP tasks are usually downstream tasks of finetuning large pretrained transformers models(i.e. BERT, RoBERTa, XLNet).
- Removing top transformers layers may yield 40% reduction in size while preserving up to 98.2% of the performance.
- Likewise, for computer vision's classification task, removing convolutional top layers from pretrained models are tested.
Work in Progress
- Will test the performance of replacing convolutional blocks with pretrained weights with a single convolutional layer without pretrained weights.
- Will add ResNet18's inference time data and compare with Optuna's NAS constructed lightweight model.
- Will test on pretrained MobileNetV3, MnasNet on torchvision with elimination based lightweight model architecture search.
- Will be applied on other small datasets such as Fashion MNIST dataset and Plant Village dataset.
Others
- "Empty" stands for model with no pretrained weights.
- "EfficientNet B0 Elim 2" means 2 convolutional blocks have been eliminated from pretrained EfficientNet B0. Number next to "Elim" annotates how many convolutional blocks have been removed.
- Table's performance illustrates best performance out of 100 epochs of finetuning on TACO Dataset.