[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Related tags

Deep Learning MVDeTr
Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

@inproceedings{hou2021multiview,
  title={Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)},
  author={Hou, Yunzhong and Zheng, Liang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia (MM ’21)},
  year={2021}
}

Overview

We release the PyTorch code for MVDeTr, a state-of-the-art multiview pedestrian detector. Its superior performance should be credited to transformer architectures, updated loss terms, and view-coherent data augmentations. Moreover, MVDeTr is also very efficient and can be trained on a single RTX 2080TI. This repo also includes a simplified version of MVDet, which also runs on a single RTX 2080TI.

Content

MVDeTr Code

This repo is dedicated to the code for MVDeTr.

Dependencies

This code uses the following libraries

  • python
  • pytorch & tochvision
  • numpy
  • matplotlib
  • pillow
  • opencv-python
  • kornia

Data Preparation

By default, all datasets are in ~/Data/. We use MultiviewX and Wildtrack in this project.

Your ~/Data/ folder should look like this

Data
├── MultiviewX/
│   └── ...
└── Wildtrack/ 
    └── ...

Code Preparation

Before running the code, one should go to multiview_detector/models/ops and run bash mask.sh to build the deformable transformer (forked from Deformable DETR).

Training

In order to train classifiers, please run the following,

python main.py -d wildtrack
python main.py -d multiviewx

This should automatically return evaluation results similar to the reported 91.5% MODA on Wildtrack dataset and 93.7% MODA on MultiviewX dataset.

Architectures

This repo supports multiple architecture variants. For MVDeTr, please specify --world_feat deform_trans; for a similar fully convolutional architecture like MVDet, please specify --world_feat conv.

Loss terms

This repo supports multiple loss terms. For the focal loss variant as in MVDeTr, please specify --use_mse 0; for the MSE loss as in MVDet, please specify ----use_mse 1.

Augmentations

This repo includes support for view coherent data augmentation, which applies affine transformations onto the per-view inputs, and then invert the per-view feature maps to maintain multiview coherency.

Pre-trained models

You can download the checkpoints at this link.

You might also like...
Code for
Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video Project Page | Paper NeuralRecon: Real-Time Coherent 3D Reconstruction from Mon

object detection; robust detection; ACM MM21 grand challenge; Security AI Challenger Phase VII
object detection; robust detection; ACM MM21 grand challenge; Security AI Challenger Phase VII

赛题背景 在商品知识产权领域,知识产权体现为在线商品的设计和品牌。不幸的是,在每一天,存在着非法商户通过一些对抗手段干扰商标识别来逃避侵权,这带来了很高的知识产权风险和财务损失。为了促进先进的多媒体人工智能技术的发展,以保护企业来之不易的创作和想法免受恶意使用和剽窃,因此提出了鲁棒性标识检测挑战赛

The first dataset on shadow generation for the foreground object in real-world scenes.
The first dataset on shadow generation for the foreground object in real-world scenes.

Object-Shadow-Generation-Dataset-DESOBA Object Shadow Generation is to deal with the shadow inconsistency between the foreground object and the backgr

We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.
We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Auto-exposure fusion for single-image shadow removal We propose a new method for effective shadow removal by regarding it as an exposure fusion proble

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance
Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance Project Page | Paper | Data This repository contains an implementatio

pytorch implementation of
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

Pytorch codes for
Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

Self-Supervised-MVS This repository is the official PyTorch implementation of our AAAI 2021 paper: "Self-supervised Multi-view Stereo via Effective Co

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

DPT This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and model

Comments
  • confused about the use of reference points

    confused about the use of reference points

    https://github.com/hou-yz/MVDeTr/blob/87783a7f7dcf77ea96760e4c49478c151121cd0e/multiview_detector/models/mvdetr.py#L130

    does this reference points means the 2d pixel of bev grid? when you get world feature from camera features B * N * C * H * W, you did deformable transformer on this feature, so the final features has nothing with the image features, what does this reference points mean?

    opened by ucasqcz 3
  • Regarding img_reduce

    Regarding img_reduce

    Hello, I just wanted to ask about the significance of img_reduce being 12.

    Assuming we use Wildtrack, the images will be 1920 * 1080. The data loading part of the code will resize this to 1280 * 720 (2/3 of the original size) and is coded as multiplying by 8 and dividing by img_reduce (=12)

    The only time the image is actually shrunk by a factor of 12 seems to be in the visualization code (I may be wrong).

    In addition, for line 157 in mvdetr.py, I am having a bit of trouble understanding how this counteracts the effects of the augmentation we performed on the original imge.

    Thank you and I look forward to hearing from you soon.

    opened by luorix1 3
  • Issue when training MVDeTr

    Issue when training MVDeTr

    Error while training MVDeTr

    Size mismatch within transform module

    Traceback (most recent call last):
      File "main.py", line 186, in <module>
        main(args)
      File "main.py", line 132, in main
        train_loss = trainer.train(epoch, train_loader, optimizer, scaler, scheduler)
      File "/workspace/MVDeTr/multiview_detector/trainer.py", line 51, in train
        (world_heatmap, world_offset), (imgs_heatmap, imgs_offset, imgs_wh) = self.model(data, affine_mats)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/workspace/MVDeTr/multiview_detector/models/mvdetr.py", line 202, in forward
        world_feat = self.world_feat(world_feat, visualize=visualize)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/workspace/MVDeTr/multiview_detector/models/trans_world_feat.py", line 98, in forward
        memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/workspace/MVDeTr/multiview_detector/models/deformable_transformer.py", line 50, in forward
        output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/workspace/MVDeTr/multiview_detector/models/deformable_transformer.py", line 77, in forward
        src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index,
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/workspace/MVDeTr/multiview_detector/models/ops/modules/ms_deform_attn.py", line 104, in forward
        sampling_locations = reference_points[:, :, None, :, None, :] \
    RuntimeError: The size of tensor a (7) must match the size of tensor b (8) at non-singleton dimension 3
    
    opened by luorix1 2
Owner
Yunzhong Hou
Yunzhong Hou, a PhD student at ANU.
Yunzhong Hou
Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

Triple-cooperative Video Shadow Detection Code and dataset for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"[arXiv link] [official l

Zhihao Chen 24 Oct 4, 2022
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

Jake YANG 62 Nov 21, 2022
Learning from Synthetic Shadows for Shadow Detection and Removal [Inoue+, IEEE TCSVT 2020].

Learning from Synthetic Shadows for Shadow Detection and Removal (IEEE TCSVT 2020) Overview This repo is for the paper "Learning from Synthetic Shadow

Naoto Inoue 67 Dec 28, 2022
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022
Multiview 3D object detection on MultiviewC dataset through moft3d.

Multiview Orthographic Feature Transformation for 3D Object Detection Multiview 3D object detection on MultiviewC dataset through moft3d. Introduction

Jiahao Ma 20 Dec 21, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry Official implementation of the paper Multi-View Depth Est

Bae, Gwangbin 138 Dec 28, 2022
Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB) Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dez

null 3 Nov 19, 2022
Generative Models as a Data Source for Multiview Representation Learning

GenRep Project Page | Paper Generative Models as a Data Source for Multiview Representation Learning Ali Jahanian, Xavier Puig, Yonglong Tian, Phillip

Ali 81 Dec 3, 2022