Aggragrating Nested Transformer Official Jax Implementation

Google Research

Last update: Dec 20, 2022

Related tags

Overview

Aggragrating Nested Transformer Official Jax Implementation

NesT is a simple method, which aggragrates nested local transformers on image blocks. The idea makes vision transformers attain better accuracy, data efficiency, and convergence on the ImageNet benchmark. NesT can be scaled to small datasets to match convnet accuracy.

This is not an officially supported Google product.

Pretrained Models and Results

Model	Accuracy	Checkpoint path
Nest-B	83.8	gs://gresearch/nest-checkpoints/nest-b_imagenet
Nest-S	83.3	gs://gresearch/nest-checkpoints/nest-s_imagenet
Nest-T	81.5	gs://gresearch/nest-checkpoints/nest-t_imagenet

Note: Accuracy is evaluated on the ImageNet2012 validation set.

Tensorbord.dev

See ImageNet training logs at Tensorboard.dev.

Colab

Colab is available for test: https://colab.sandbox.google.com/github/google-research/nested-transformer/blob/main/colab.ipynb

Instruction on Image Classification

Environment setup

virtualenv -p python3 --system-site-packages nestenv
source nestenv/bin/activate

pip install -r requirements.txt

Evaluate on ImageNet

At the first time, download ImageNet following tensorflow_datasets instruction from command lines. Optionally, download all pre-trained checkpoints

bash ./checkpoints/download_checkpoints.sh

Run the evaluation script to evaluate NesT-B.

python main.py --config configs/imagenet_nest.py --config.eval_only=True \
  --config.init_checkpoint="./checkpoints/nest-b_imagenet/ckpt.39" \
  --workdir="./checkpoints/nest-t_imagenet_eval"

Train on ImageNet

The default configuration trains NesT-B on TPUv2 8x8 with per device batch size 16.

python main.py --config configs/imagenet_nest.py --jax_backend_target=<TPU_IP_ADDRESS> --jax_xla_backend="tpu_driver" --workdir="./checkpoints/nest-b_imagenet"

Note: See jax/cloud_tpu_colab for info about TPU_IP_ADDRESS.

Train NesT-T on 8 GPUs.

python main.py --config configs/imagenet_nest_tiny.py --workdir="./checkpoints/nest-t_imagenet_8gpu"

The codebase does not support multi-node GPU training (>8 GPUs). The models reported in our paper is trained using TPU with 1024 total batch size.

Train on CIFAR

# Recommend to train on 2 GPUs. Training NesT-T can use 1 GPU.
CUDA_VISIBLE_DEVICES=0,1 python  main.py --config configs/cifar_nest.py --workdir="./checkpoints/nest_cifar"

Cite

@inproceedings{zhang2021aggregating,
  title={Aggregating Nested Transformers},
  author={Zizhao Zhang and Han Zhang and Long Zhao and Ting Chen and Tomas Pfister},
  booktitle={arXiv preprint arXiv:2105.12723},
  year={2021}
}

You might also like...

Implementation of FitVid video prediction model in JAX/Flax.

FitVid Video Prediction Model Implementation of FitVid video prediction model in JAX/Flax. If you find this code useful, please cite it in your paper:

62 Nov 25, 2022

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

71 Dec 1, 2022

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

26 Oct 5, 2022

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

BraVe This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short. The model provided in this package wa

44 Nov 20, 2022

This is a JAX implementation of Neural Radiance Fields for learning purposes.

learn-nerf This is a JAX implementation of Neural Radiance Fields for learning purposes. I've been curious about NeRF and its follow-up work for a whi

62 Dec 20, 2022

A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

11 Jul 24, 2022

Advantage Actor Critic (A2C): jax + flax implementation

Advantage Actor Critic (A2C): jax + flax implementation Current version supports only environments with continious action spaces and was tested on muj

3 Jan 23, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Comments

Discrepancies vs Table A1 in paper
I noticed some possible discrepancies of the architecture parameters here vs those in table A1 of the paper

For ImageNet models, is it correct that:

The table should say h=[3,3,4]?

The order of the scale_hidden_dims in the table is inverted. That is, hierarchies 1, 2 and 3 should say [4d, 4h] × 2, 1 [2d, 2h] × 2, 4 [d, h] × k, 16?
opened by alexander-soare 3
Training hours & Imagenet accuracy

Hello, thanks for sharing your interesting work.

I was trying to reproduce the NesT-T ImageNet result in this link using TPUs.

Here are my TPU-v3 8 cores result (link) by using exactly the same hyperparameters in imagenet_nest_tiny.py

As you can see, it takes 63 hours for training while your result takes 21 hours. How can I reduce training hours such as your result? If this difference came from the data loading time, could you tell me the types of data storage that you used? Right now, I'm using the google cloud storage bucket for data storage.

Furthermore, I can see the accuracy difference around 0.5% (81.0 v.s. 81.5). Could you explain this difference?

opened by arunos728 3
Regarding GradCAT implementation

Hi I'm interested to work with the Nest model, however I'm facing difficulty with the implementation of GradCAT. Could you please share the implementation?

opened by rush2406 3
Model Converge Problem

I am training on a medium-scale dataset that consists of 100,000 images. The learning rate and weight decay as the same as your config but still not working. Any opinion?

Regards, Khawar Islam

opened by khawar-islam 8

Owner

Google Research

GitHub

GAN JAX - A toy project to generate images from GANs with JAX

GAN JAX - A toy project to generate images from GANs with JAX This project aims to bring the power of JAX, a Python framework developped by Google and

14 Nov 29, 2022

CLOOB training (JAX) and inference (JAX and PyTorch)

cloob-training Pretrained models There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint train

64 Nov 27, 2022

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Unofficial PyTorch implementation of Luna: Linear Unified Nested Attention The quadratic computational and memory complexities of the Transformer’s at

32 Nov 7, 2022

Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

The Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more" Arxiv preprint Louay Hazami · Rayhane Mama · Ragavan Thurairatn

144 Dec 23, 2022

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

6.5k Jan 9, 2023

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

87 Dec 16, 2022

a reimplementation of Holistically-Nested Edge Detection in PyTorch

pytorch-hed This is a personal reimplementation of Holistically-Nested Edge Detection [1] using PyTorch. Should you be making use of this work, please

375 Dec 6, 2022

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 8, 2022

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Aggragrating Nested Transformer Official Jax Implementation

Related tags

Overview

Aggragrating Nested Transformer Official Jax Implementation

Pretrained Models and Results

Tensorbord.dev

Colab

Instruction on Image Classification

Environment setup

Evaluate on ImageNet

Train on ImageNet

Train NesT-T on 8 GPUs.

Train on CIFAR

Cite

You might also like...

Implementation of FitVid video prediction model in JAX/Flax.

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

This is a JAX implementation of Neural Radiance Fields for learning purposes.

A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Advantage Actor Critic (A2C): jax + flax implementation

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Comments

Discrepancies vs Table A1 in paper

Training hours & Imagenet accuracy

Regarding GradCAT implementation

Model Converge Problem

Owner

Google Research

GAN JAX - A toy project to generate images from GANs with JAX

CLOOB training (JAX) and inference (JAX and PyTorch)

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

a reimplementation of Holistically-Nested Edge Detection in PyTorch

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).