Learning Features with Parameter-Free Layers (ICLR 2022)

Overview

Learning Features with Parameter-Free Layers (ICLR 2022)

Dongyoon Han, YoungJoon Yoo, Beomyoung Kim, Byeongho Heo | Paper

NAVER AI Lab, NAVER CLOVA

Updates

  • 02.11.2022 Code has been uploaded
  • 02.06.2022 Initial update

Abstract

Trainable layers such as convolutional building blocks are the standard network design choices by learning parameters to capture the global context through successive spatial operations. When designing an efficient network, trainable layers such as the depthwise convolution is the source of efficiency in the number of parameters and FLOPs, but there was little improvement to the model speed in practice. This paper argues that simple built-in parameter-free operations can be a favorable alternative to the efficient trainable layers replacing spatial operations in a network architecture. We aim to break the stereotype of organizing the spatial operations of building blocks into trainable layers. Extensive experimental analyses based on layer-level studies with fully-trained models and neural architecture searches are provided to investigate whether parameter-free operations such as the max-pool are functional. The studies eventually give us a simple yet effective idea for redesigning network architectures, where the parameter-free operations are heavily used as the main building block without sacrificing the model accuracy as much. Experimental results on the ImageNet dataset demonstrate that the network architectures with parameter-free operations could enjoy the advantages of further efficiency in terms of model speed, the number of the parameters, and FLOPs.

Some Analyses in The Paper

1. Depthwise convolution is replaceble with a parameter-free operation:

2. Parameter-free operations are frequently searched in normal building blocks by NAS:

3. R50-hybrid (with the eff-bottlenecks) yields a localizable features (see the Grad-CAM visualizations):

Our Proposed Models

1. Schematic illustration of our models

  • Here, we provide example models where the parameter-free operations (i.e., eff-layer) are mainly used;

  • Parameter-free operations such as the max-pool2d and avg-pool2d can replace the spatial operations (conv and SA).

2. Brief model descriptions

resnet_pf.py: resnet50_max(), resnet50_hybrid(): R50-max, R50-hybrid - model with the efficient bottlenecks

vit_pf.py: vit_s_max() - ViT with the efficient transformers

pit_pf.py: pit_s_max() - PiT with the efficient transformers

Usage

Requirements

pytorch >= 1.6.0
torchvision >= 0.7.0
timm >= 0.3.4
apex == 0.1.0

Pretrained models

Network Img size Params. (M) FLOPs (G) GPU (ms) Top-1 (%) Top-5 (%)
R50 224x224 25.6 4.1 8.7 76.2 93.8
R50-max 224x224 14.2 2.2 6.8 74.3 92.0
R50-hybrid 224x224 17.3 2.6 7.3 77.1 93.1
Network Img size Throughputs Vanilla +CutMix +DeiT
R50 224x224 962 / 112 76.2 77.6 78.8
ViT-S-max 224x224 763 / 96 74.2 77.3 79.8
PiT-S-max 224x224 1000 / 92 75.7 78.1 80.1

Model load & evaluation

Example code of loading resnet50_hybrid without timm:

import torch
from resnet_pf import resnet50_hybrid

model = resnet50_hybrid() 
model.load_state_dict(torch.load('./weight/checkpoint.pth'))
print(model(torch.randn(1, 3, 224, 224)))

Example code of loading pit_s_max with timm:

import torch
import timm
import pit_pf
   
model = timm.create_model('pit_s_max', pretrained=False)
model.load_state_dict(torch.load('./weight/checkpoint.pth'))
print(model(torch.randn(1, 3, 224, 224)))

Directly run each model can verify a single iteration of forward and backward of the mode.

Training

Our ResNet-based models can be trained with any PyTorch training codes; we recommend timm. We provide a sample script for training R50_hybrid with the standard 90-epochs training setup:

  python3 -m torch.distributed.launch --nproc_per_node=4 train.py ./ImageNet_dataset/ --model resnet50_hybrid --opt sgd --amp \
  --lr 0.2 --weight-decay 1e-4 --batch-size 256 --sched step --epochs 90 --decay-epochs 30 --warmup-epochs 3 --smoothing 0\

Vision transformers (ViT and PiT) models are also able to be trained with timm, but we recommend the code DeiT to train with. We provide a sample training script with the default training setup in the package:

  python3 -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model vit_s_max --batch-size 256 --data-path ./ImageNet_dataset/

License

Copyright 2022-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

How to cite

@inproceedings{han2022learning,
    title={Learning Features with Parameter-Free Layers},
    author={Dongyoon Han and YoungJoon Yoo and Beomyoung Kim and Byeongho Heo},
    year={2022},
    journal={International Conference on Learning Representations (ICLR)},
}
You might also like...
[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics
[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

CPDeform Code and data for paper Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics at ICLR 2022 (Spotlight). @InProceed

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

F8Net Fixed-Point 8-bit Only Multiplication for Network Quantization (ICLR 2022 Oral) OpenReview | arXiv | PDF | Model Zoo | BibTex PyTorch implementa

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).
Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation [OpenReview] [arXiv] [Code] The official implementation of GeoDiff: A Geome

[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

DAB-DETR This is the official pytorch implementation of our ICLR 2022 paper DAB-DETR. Authors: Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

Spectral Tensor Train Parameterization of Deep Learning Layers
Spectral Tensor Train Parameterization of Deep Learning Layers

Spectral Tensor Train Parameterization of Deep Learning Layers This repository is the official implementation of our AISTATS 2021 paper titled "Spectr

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

Owner
NAVER AI
Official account of NAVER AI, Korea No.1 Industrial AI Research Group
NAVER AI
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

Rishabh Anand 184 Dec 12, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

Med-AIR@CUHK 156 Dec 15, 2022
ReLoss - Official implementation for paper "Relational Surrogate Loss Learning" ICLR 2022

Relational Surrogate Loss Learning (ReLoss) Official implementation for paper "R

Tao Huang 31 Nov 22, 2022
Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

The Official Implementation of CLIB (Continual Learning for i-Blurry) Online Continual Learning on Class Incremental Blurry Task Configuration with An

NAVER AI 34 Oct 26, 2022
Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022

MetaMorph: Learning Universal Controllers with Transformers This is the code for the paper MetaMorph: Learning Universal Controllers with Transformers

Agrim Gupta 50 Jan 3, 2023
A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 Oral paper PiCO; also see our Project

王皓波 83 May 11, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

About Code release for Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (ICLR 2022 Spotlight)

THUML @ Tsinghua University 221 Dec 31, 2022
[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

AMOS This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: Pretraining Text Encoders wi

Microsoft 22 Sep 15, 2022