Code release for "BoxeR: Box-Attention for 2D and 3D Transformers"

Nguyen Duy Kien

Last update: Dec 7, 2022

Related tags

Overview

BoxeR

By Duy-Kien Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees Snoek.

This repository is an official implementation of the paper BoxeR: Box-Attention for 2D and 3D Transformers.

Introduction

TL; DR. BoxeR is a Transformer-based network for end-to-end 2D object detection and instance segmentation, along with 3D object detection. The core of the network is Box-Attention which predicts regions of interest to attend by learning the transformation (translation, scaling, and rotation) from reference windows, yielding competitive performance on several vision tasks.

Abstract. In this paper, we propose a simple attention mechanism, we call box-attention. It enables spatial interaction between grid features, as sampled from boxes of interest, and improves the learning capability of transformers for several vision tasks. Specifically, we present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on an input feature map. The BoxeR computes attention weights on these boxes by considering its grid structure. Notably, BoxeR-2D naturally reasons about box information within its attention module, making it suitable for end-to-end instance detection and segmentation tasks. By learning invariance to rotation in the box-attention module, BoxeR-3D is capable of generating discriminative information from a bird's-eye view plane for 3D end-to-end object detection. Our experiments demonstrate that the proposed BoxeR-2D achieves state-of-the-art results on COCO detection and instance segmentation. Besides, BoxeR-3D improves over the end-to-end 3D object detection baseline and already obtains a compelling performance for the vehicle category of Waymo Open, without any class-specific optimization.

License

This project is released under the MIT License.

Citing BoxeR

If you find BoxeR useful in your research, please consider citing:

@article{nguyen2021boxer,
  title={BoxeR: Box-Attention for 2D and 3D Transformers},
  author={Duy{-}Kien Nguyen and Jihong Ju and Olaf Booij and Martin R. Oswald and Cees G. M. Snoek},
  journal={arXiv preprint arXiv:2111.13087},
  year={2021}
}

Main Results

COCO Instance Segmentation Baselines with BoxeR-2D

Name	param (M)	infer time (fps)	box AP	box AP-S	box AP-M	box AP-L	segm AP	segm AP-S	segm AP-M	segm AP-L
BoxeR-R50-3x	40.1	12.5	50.3	33.4	53.3	64.4	42.9	22.8	46.1	61.7
BoxeR-R101-3x	59.0	10.0	50.7	33.4	53.8	65.7	43.3	23.5	46.4	62.5
BoxeR-R101-5x	59.0	10.0	51.9	34.2	55.8	67.1	44.3	24.7	48.0	63.8

Installation

Requirements

Linux, CUDA>=11, GCC>=5.4
Python>=3.8

We recommend you to use Anaconda to create a conda environment:
```
conda create -n boxer python=3.8
```
Then, activate the environment:
```
conda activate boxer
```
PyTorch>=1.10.1, torchvision>=0.11.2 (following instructions here)

For example, you could install pytorch and torchvision as following:
```
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
```
Other requirements & Compilation
```
python -m pip install -e BoxeR
```
You can test the CUDA operators (box and instance attention) by running
```
python tests/box_attn_test.py
python tests/instance_attn_test.py
```

Usage

Dataset preparation

The datasets are assumed to exist in a directory specified by the environment variable $E2E_DATASETS. If the environment variable is not specified, it will be set to be .data. Under this directory, detectron2 will look for datasets in the structure described below.

$E2E_DATASETS/
├── coco/
└── waymo/

For COCO Detection and Instance Segmentation, please download COCO 2017 dataset and organize them as following:

$E2E_DATASETS/
└── coco/
	├── annotation/
		├── instances_train2017.json
		├── instances_val2017.json
		└── image_info_test-dev2017.json
	├── image/
		├── train2017/
		├── val2017/
		└── test2017/
	└── vocabs/
		└── coco_categories.txt - the mapping from coco categories to indices.

The coco_categories.txt can be downloaded here.

For Waymo Detection, please download Waymo Open dataset and organize them as following:

$E2E_DATASETS/
└── waymo/
	├── infos/
		├── dbinfos_train_1sweeps_withvelo.pkl
		├── infos_train_01sweeps_filter_zero_gt.pkl
		└── infos_val_01sweeps_filter_zero_gt.pkl
	└── lidars/
		├── gt_database_1sweeps_withvelo/
			├── CYCLIST/
			├── VEHICLE/
			└── PEDESTRIAN/
		├── train/
			├── annos/
			└── lidars/
		└── val/
			├── annos/
			└── lidars/

You can generate data files for our training and evaluation from raw data by running create_gt_database.py and create_imdb in tools/preprocess.

Training

Our script is able to automatically detect the number of available gpus on a single node. It works best with Slurm system when it can auto-detect the number of available gpus along with nodes. The command for training BoxeR is simple as following:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE}

For example,

COCO Detection

python tools/run.py --config e2edet/config/COCO-Detection/boxer2d_R_50_3x.yaml --model boxer2d --task detection

COCO Instance Segmentation

python tools/run.py --config e2edet/config/COCO-InstanceSegmentation/boxer2d_R_50_3x.yaml --model boxer2d --task detection

Waymo Detection,

python tools/run.py --config e2edet/config/Waymo-Detection/boxer3d_pointpillar.yaml --model boxer3d --task detection3d

Some tips to speed-up training

If your file system is slow to read images but your memory is huge, you may consider enabling 'cache_mode' option to load whole dataset into memory at the beginning of training:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE} dataset_config.${TASK_TYPE}.cache_mode=True

If your GPU memory does not fit the batch size, you may consider to use 'iter_per_update' to perform gradient accumulation:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE} training.iter_per_update=2

Our code also supports mixed precision training. It is recommended to use when you GPUs architecture can perform fast FP16 operations:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE} training.use_fp16=(float16 or bfloat16)

Evaluation

You can get the config file and pretrained model of BoxeR, then run following command to evaluate it on COCO 2017 validation/test set:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE} training.run_type=(val or test or val_test)

For Waymo evaluation, you need to additionally run the script e2edet/evaluate/waymo_eval.py from the root folder to get the final result.

Analysis and Visualization

You can get the statistics of BoxeR (fps, flops, # parameters) by running tools/analyze.py from the root folder.

python tools/analyze.py --config-path save/COCO-InstanceSegmentation/boxer2d_R_101_3x.yaml --model-path save/COCO-InstanceSegmentation/boxer2d_final.pth --tasks speed flop parameter

The notebook for BoxeR-2D visualization is provided in tools/visualization/BoxeR_2d_segmentation.ipynb.

Comments

How to use the retrained model to evaluate on my own test dataset

when i evaluate on my own test dataset, the evaluation looks not work. I just changed the data path on config file. 2022-07-26T10:17:57 INFO: Total Parameters: 39696075. Trained Parameters: 39696075 2022-07-26T10:17:57 INFO: Starting inference on test set 2022-07-26T10:17:57 INFO: Evaluation time. Running on full test set... 2022-07-26T10:19:01 INFO: progress on test: 0/0, : --------------------, update: 0, epoch: 0, max_update: 0, num_image: 873 2022-07-26T10:19:02 INFO: The inference finished!

opened by imhumanbean 2
installation error

Hi, I have pip installed pytorch1.10. And when I install e2edet as the readme, this is the following error. ####### [root@vm-0-3-tlinux /nvmelv/cwh8szh/BoxeR]# python -m pip install -e BoxeR ERROR: BoxeR is not a valid editable requirement. It should either be a path to a local project or a VCS URL (beginning with bzr+http, bzr+https, bzr+ssh, bzr+sftp, bzr+ftp, bzr+lp, bzr+file, git+http, git+https, git+ssh, git+git, git+file, hg+file, hg+http, hg+https, hg+ssh, hg+static-http, svn+ssh, svn+http, svn+https, svn+svn, svn+file). #######

And, when I install e2edet with "python setup.py develop" or "pip install -v -e .", it can install e2edet successfully. ####### Using /root/anaconda3/envs/cwhpy38/lib/python3.8/site-packages Finished processing dependencies for e2edet==0.1 #######

But, when I run test.py or run.py, it occur the following errors, maybe the pytorch error with e2edet compilation. ####### Traceback (most recent call last): File "tests/box_attn_test.py", line 5, in from e2edet.module.ops import BoxAttnFunction File "/nvmelv/cwh8szh/BoxeR/e2edet/init.py", line 2, in import e2edet.model File "/nvmelv/cwh8szh/BoxeR/e2edet/model/init.py", line 4, in from e2edet.model.base_model import BaseDetectionModel File "/nvmelv/cwh8szh/BoxeR/e2edet/model/base_model.py", line 14, in from e2edet.criterion import build_loss, build_metric File "/nvmelv/cwh8szh/BoxeR/e2edet/criterion/init.py", line 2, in import e2edet.criterion.losses File "/nvmelv/cwh8szh/BoxeR/e2edet/criterion/losses.py", line 10, in from e2edet.module.matcher import build_matcher File "/nvmelv/cwh8szh/BoxeR/e2edet/module/init.py", line 4, in from e2edet.module.transformer import build_transformer File "/nvmelv/cwh8szh/BoxeR/e2edet/module/transformer.py", line 14, in from .box_transformer import BoxTransformer File "/nvmelv/cwh8szh/BoxeR/e2edet/module/box_transformer.py", line 6, in from .box_attention import BoxAttention, InstanceAttention File "/nvmelv/cwh8szh/BoxeR/e2edet/module/box_attention.py", line 7, in from e2edet.module.ops import BoxAttnFunction, InstanceAttnFunction File "/nvmelv/cwh8szh/BoxeR/e2edet/module/ops/init.py", line 1, in from e2edet.module.ops.box_attention_func import BoxAttnFunction, InstanceAttnFunction File "/nvmelv/cwh8szh/BoxeR/e2edet/module/ops/box_attention_func.py", line 6, in from e2edet import ops ImportError: /nvmelv/cwh8szh/BoxeR/e2edet/ops.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor8data_ptrIhEEPT_v #######

####### (cwhpy38) [root@vm-0-3-tlinux /nvmelv/cwh8szh/BoxeR]# CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/run.py --config e2edet/config/COCO-Detection/boxer2d_R_50_3x.yaml --model boxer2d --task detection training.use_fp16=bfloat16 Traceback (most recent call last): File "tools/run.py", line 8, in import e2edet File "/nvmelv/cwh8szh/BoxeR/e2edet/init.py", line 2, in import e2edet.model File "/nvmelv/cwh8szh/BoxeR/e2edet/model/init.py", line 4, in from e2edet.model.base_model import BaseDetectionModel File "/nvmelv/cwh8szh/BoxeR/e2edet/model/base_model.py", line 14, in from e2edet.criterion import build_loss, build_metric File "/nvmelv/cwh8szh/BoxeR/e2edet/criterion/init.py", line 2, in import e2edet.criterion.losses File "/nvmelv/cwh8szh/BoxeR/e2edet/criterion/losses.py", line 10, in from e2edet.module.matcher import build_matcher File "/nvmelv/cwh8szh/BoxeR/e2edet/module/init.py", line 4, in from e2edet.module.transformer import build_transformer File "/nvmelv/cwh8szh/BoxeR/e2edet/module/transformer.py", line 14, in from .box_transformer import BoxTransformer File "/nvmelv/cwh8szh/BoxeR/e2edet/module/box_transformer.py", line 6, in from .box_attention import BoxAttention, InstanceAttention File "/nvmelv/cwh8szh/BoxeR/e2edet/module/box_attention.py", line 7, in from e2edet.module.ops import BoxAttnFunction, InstanceAttnFunction File "/nvmelv/cwh8szh/BoxeR/e2edet/module/ops/init.py", line 1, in from e2edet.module.ops.box_attention_func import BoxAttnFunction, InstanceAttnFunction File "/nvmelv/cwh8szh/BoxeR/e2edet/module/ops/box_attention_func.py", line 6, in from e2edet import ops ImportError: /nvmelv/cwh8szh/BoxeR/e2edet/ops.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor8data_ptrIhEEPT_v #######

opened by Splendon 2
TypeError: meshgrid() got an unexpected keyword argument 'indexing'

Loaded pretrained resnet: Traceback (most recent call last): File "tools/run.py", line 82, in run() File "tools/run.py", line 71, in run torch.multiprocessing.spawn( File "/root/anaconda3/envs/cwhpy38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/anaconda3/envs/cwhpy38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/root/anaconda3/envs/cwhpy38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 2 terminated with the following error: Traceback (most recent call last): File "/root/anaconda3/envs/cwhpy38/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/nvmelv/cwh8szh/BoxeR/tools/run.py", line 40, in distributed_main main(configuration, init_distributed=True) File "/nvmelv/cwh8szh/BoxeR/tools/run.py", line 29, in main trainer.load() File "/nvmelv/cwh8szh/BoxeR/e2edet/trainer/base_trainer.py", line 57, in load self.load_model_and_optimizer() File "/nvmelv/cwh8szh/BoxeR/e2edet/trainer/base_trainer.py", line 103, in load_model_and_optimizer self.model = build_model(self.config, num_classes) File "/nvmelv/cwh8szh/BoxeR/e2edet/model/init.py", line 22, in build_model model.build() File "/nvmelv/cwh8szh/BoxeR/e2edet/model/base_model.py", line 46, in build self._build() File "/nvmelv/cwh8szh/BoxeR/e2edet/model/boxer2d.py", line 54, in _build self.transformer = build_transformer(self.config.transformer) File "/nvmelv/cwh8szh/BoxeR/e2edet/module/transformer.py", line 392, in build_transformer return BoxTransformer( File "/nvmelv/cwh8szh/BoxeR/e2edet/module/box_transformer.py", line 33, in init encoder_layer = BoxTransformerEncoderLayer( File "/nvmelv/cwh8szh/BoxeR/e2edet/module/box_transformer.py", line 312, in init self.self_attn = BoxAttention(d_model, nlevel, nhead) File "/nvmelv/cwh8szh/BoxeR/e2edet/module/box_attention.py", line 168, in init self._create_kernel_indices(kernel_size, "kernel_indices") File "/nvmelv/cwh8szh/BoxeR/e2edet/module/box_attention.py", line 182, in _create_kernel_indices i, j = torch.meshgrid(indices, indices, indexing="ij") TypeError: meshgrid() got an unexpected keyword argument 'indexing'

opened by Splendon 2
Scripts of running Nuscenes dataset

Hi, thank you very much for this SOTA job. I have run BoxeR on coco2017 successfully with pytorch1.8.2. Since waymo dataset is too large, I would like to run BoxeR on nuscenes dataset, is there any some scripts of nuscenes dataset and 2d/3d config.yaml?

opened by Splendon 1
AttributeError: 'NoneType' object has no attribute 'task'

python tools/run.py --config e2edet/config/COCO-Detection/boxer2d_R_50_3x.yaml --model boxer2d --task detection include path: /nvmelv/cwh8szh/BoxeR/e2edet/config/base_boxer2d_detection.yaml Traceback (most recent call last): File "tools/run.py", line 82, in run() File "tools/run.py", line 47, in run configuration = Configuration(args) File "/nvmelv/cwh8szh/BoxeR/e2edet/utils/configuration.py", line 72, in init self._update_specific(self.config, args) File "/nvmelv/cwh8szh/BoxeR/e2edet/utils/configuration.py", line 244, in _update_specific config.task = args.task AttributeError: 'NoneType' object has no attribute 'task'

opened by Splendon 1
Bug

I appreciate your work, but I found some bugs when using your code. I wonder if you will fix and release a new version in the future?

my env

it seems that u missed return config here in your released version.

opened by Tegala 1
CUDA out of memory.

I have built this project successfully. But when I run the test programs, it reported the lack of memory. It means that the project cannont run on my laptop?

opened by lymdove 1
Instance Attention

Hello, I would like to know if the instance attention could add into normal transformers such as ViT? And could you please upload the saved models if possible (to [email protected]) and thanks a lot for that!!!

opened by Git-Hoon 3
Backbone of Swin-Transformer

Hi, the paper of BoxeR has showed the results of resnet50 and resnet101, but not showed the other backbone such as resnext and swin-transformer. Is there any actions of to opensource the results, codes and configs of BoxeR based on resnext and swin-transformer? Thank you very much for this nice BoxeR, I really like it.

opened by Splendon 1
matrix contains invalid numeric entries

when training process reaches the 33th epoch, the following error is reported, in which xxxxxxx denotes a folder

-- Process 4 terminated with the following error: Traceback (most recent call last): File "/home/ma-user/anaconda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/tools/run.py", line 41, in distributed_main main(configuration, init_distributed=True) File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/tools/run.py", line 31, in main trainer.train() File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/e2edet/trainer/base_trainer.py", line 218, in train train_epoch(0, self) File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/e2edet/trainer/engine.py", line 171, in train_epoch output, _ = _forward("train", batch, model, trainer) File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/e2edet/trainer/engine.py", line 208, in _forward output = model(sample, target) File "/home/ma-user/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ma-user/anaconda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/e2edet/model/base_model.py", line 140, in call loss_dict = self.losses(model_output, target) File "/home/ma-user/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/e2edet/criterion/losses.py", line 496, in forward indices = self.matcher(enc_outputs, bin_targets) File "/home/ma-user/anaconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ma-user/anaconda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/e2edet/module/matcher.py", line 136, in forward linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1)) File "/home/ma-user/xxxxxxx/user-job-dir/BoxeR/e2edet/module/matcher.py", line 136, in linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1)) File "/home/ma-user/anaconda/lib/python3.7/site-packages/scipy/optimize/_lsap.py", line 93, in linear_sum_assignment raise ValueError("matrix contains invalid numeric entries") ValueError: matrix contains invalid numeric entries

opened by mountain111 3

Owner

Nguyen Duy Kien

Learn things deeply

GitHub

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

56 Dec 28, 2022

Code release for "BoxeR: Box-Attention for 2D and 3D Transformers"

Related tags

Overview

BoxeR

Introduction

License

Citing BoxeR

Main Results

COCO Instance Segmentation Baselines with BoxeR-2D

Installation

Requirements

Usage

Dataset preparation

Training

Some tips to speed-up training

Evaluation

Analysis and Visualization

Comments

Owner

Nguyen Duy Kien

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

This is the official code release for the paper Shape and Material Capture at Home

Code release for "COTR: Correspondence Transformer for Matching Across Images"

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

This is the dataset and code release of the OpenRooms Dataset.

Code release of paper "Deep Multi-View Stereo gone wild"

Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

Code Release for Learning to Adapt to Evolving Domains

Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

Code release for NeuS

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".