End-to-End Object Detection with Fully Convolutional Network

Last update: Dec 22, 2022

Related tags

Overview

End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

Experiments in the paper were conducted on the internal framework, thus we reimplement them on cvpods and report details as below.

Requirements

cvpods
scipy >= 1.5.4

Get Started

install cvpods locally (requires cuda to compile)

python3 -m pip install 'git+https://github.com/Megvii-BaseDetection/cvpods.git'
# (add --user if you don't have permission)

# Or, to install it from a local clone:
git clone https://github.com/Megvii-BaseDetection/cvpods.git
python3 -m pip install -e cvpods

# Or,
pip install -r requirements.txt
python3 setup.py build develop

prepare datasets

cd /path/to/cvpods
cd datasets
ln -s /path/to/your/coco/dataset coco

Train & Test

git clone https://github.com/Megvii-BaseDetection/DeFCN.git
cd DeFCN/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms  # for example

# Train
pods_train --num-gpus 8

# Test
pods_test --num-gpus 8 \
    MODEL.WEIGHTS /path/to/your/save_dir/ckpt.pth # optional
    OUTPUT_DIR /path/to/your/save_dir # optional

# Multi node training
## sudo apt install net-tools ifconfig
pods_train --num-gpus 8 --num-machines N --machine-rank 0/1/.../N-1 --dist-url "tcp://MASTER_IP:port"

Results on COCO2017 val set

model	assignment	with NMS	lr sched.	mAP	mAR	download
FCOS	one-to-many	Yes	3x + ms	41.4	59.1	weight \| log
FCOS baseline	one-to-many	Yes	3x + ms	40.9	58.4	weight \| log
Anchor	one-to-one	No	3x + ms	37.1	60.5	weight \| log
Center	one-to-one	No	3x + ms	35.2	61.0	weight \| log
Foreground Loss	one-to-one	No	3x + ms	38.7	62.2	weight \| log
POTO	one-to-one	No	3x + ms	39.2	61.7	weight \| log
POTO + 3DMF	one-to-one	No	3x + ms	40.6	61.6	weight \| log
POTO + 3DMF + Aux	mixture*	No	3x + ms	41.4	61.5	weight \| log

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

2x + ms schedule is adopted in the paper, but we adopt 3x + ms schedule here to achieve higher performance.
It's normal to observe ~0.3AP noise in POTO.

Results on CrowdHuman val set

model	assignment	with NMS	lr sched.	AP50	mMR	recall	download
FCOS	one-to-many	Yes	30k iters	86.1	54.9	94.2	weight \| log
ATSS	one-to-many	Yes	30k iters	87.2	49.7	94.0	weight \| log
POTO	one-to-one	No	30k iters	88.5	52.2	96.3	weight \| log
POTO + 3DMF	one-to-one	No	30k iters	88.8	51.0	96.6	weight \| log
POTO + 3DMF + Aux	mixture*	No	30k iters	89.1	48.9	96.5	weight \| log

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

It's normal to observe ~0.3AP noise in POTO, and ~1.0mMR noise in all methods.

Ablations on COCO2017 val set

model	assignment	with NMS	lr sched.	mAP	mAR	note
POTO	one-to-one	No	6x + ms	40.0	61.9
POTO	one-to-one	No	9x + ms	40.2	62.3
POTO	one-to-one	No	3x + ms	39.2	61.1	replace Hungarian algorithm by `argmax`
POTO + 3DMF	one-to-one	No	3x + ms	40.9	62.0	remove GN in 3DMF
POTO + 3DMF + Aux	mixture*	No	3x + ms	41.5	61.5	remove GN in 3DMF

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

For one-to-one assignment, more training iters lead to higher performance.
The argmax (also known as top-1) operation is indeed the approximate solution of bipartite matching in dense prediction methods.
It seems harmless to remove GN in 3DMF, which also leads to higher inference speed.

Acknowledgement

This repo is developed based on cvpods. Please check cvpods for more details and features.

License

This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Citing

If you use this work in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{wang2020end,
  title   =  {End-to-End Object Detection with Fully Convolutional Network},
  author  =  {Wang, Jianfeng and Song, Lin and Li, Zeming and Sun, Hongbin and Sun, Jian and Zheng, Nanning},
  journal =  {arXiv preprint arXiv:2012.03544},
  year    =  {2020}
}

Contributing to the project

Any pull requests or issues about the implementation are welcome. If you have any issue about the library (e.g. installation, environments), please refer to cvpods.

Comments

Dose it support limited GPU index?

There are only 2 gpus free in my machine now, so I tried to limit the gpu index and numbers as follow, and it raised such error: (open-mmlab) liangtian@node001:~/project/DeFCN/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf$ CUDA_VISIBLE_DEVICES=4,5 pods_train --num-gpus 2 Traceback (most recent call last): File "/mnt/xfs1/home/liangtian/project/cvpods/tools/train_net.py", line 27, in from cvpods.checkpoint import DetectionCheckpointer File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/init.py", line 3, in from .utils import setup_environment File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/utils/init.py", line 27, in from .visualizer import ColorMode, VideoVisualizer, VisImage, Visualizer, colormap, random_color File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/utils/visualizer/init.py", line 5, in from .video_visualizer import * File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/utils/visualizer/video_visualizer.py", line 6, in from .visualizer import ColorMode, Visualizer, _create_text_labels, _PanopticPrediction File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/utils/visualizer/visualizer.py", line 17, in from cvpods.structures import BitMasks, Boxes, BoxMode, Keypoints, PolygonMasks, RotatedBoxes File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/structures/init.py", line 2, in from .boxes import Boxes, BoxMode, pairwise_ioa, pairwise_iou File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/structures/boxes.py", line 11, in from cvpods.layers import cat File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/layers/init.py", line 4, in from .deform_conv import DeformConv, ModulatedDeformConv File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/layers/deform_conv.py", line 11, in from cvpods import _C ImportError: /mnt/xfs1/home/liangtian/project/cvpods/cvpods/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6cvpods26psroi_pooling_forward_cudaERN2at6TensorES2_S2_iifii

opened by rainylt 5
ImportError: cannot import name 'config' from 'config'

Hi, when i try to run the command 'pods_test' or 'pods_train', i got the error as follow:

Traceback (most recent call last): File "xxx/cvpods/tools/test_net.py", line 192, in from config import config # isort:skip # noqa: E402

Any suggestion to fix this? many thanks

opened by daniel-qian 4
关于CrowdHuman数据集

您好！由于cvpods封装程度较高，有些地方没有理解。请问DeFCN训练时使用的是CrowdHuman中的Visible box还是Full box？配置文件中num_classes=1，那么类别中的mask是被去除还是转化为person呢？ box的格式是表示xmin,ymin,xmax,ymax两个点的坐标还是中心点坐标与长宽呢？

opened by Asthestarsfalll 4
DeFCN pytorch model to onnx face with some trouble

I want to export the model of DeFCN to onnx to better see the details of this model,so i try to export with add some code ,but get error. DeFCN/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf_wo_gn.aux/net.py
i modify poto.res50.fpn.coco.800size.3x_ms.3dmf_wo_gn.aux as poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux for import, add some code and run:

from cvpods.engine import default_argument_parser, default_setup from DeFCN.playground.detection.coco.poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux.config import config args = default_argument_parser().parse_args() print("Command Line Args:", args) config.link_log() print("soft link to {}".format(config.OUTPUT_DIR)) cfg,logger=default_setup(config,args) model=build_model(cfg)

model.eval() input=torch.randn((1,3,48,48)) input=torch.rand(1) output=model(input) print(output.shape) input_names=['input'] output_names=['output'] torch.onnx.export(model,input,'1.onnx',input_names=input_names,output_names=output_names,verbose=True,opset_version=11)

but get error as follows:

[09/22 11:36:18 c2.utils.env.env]: Using a generated random seed 18403068 Traceback (most recent call last): File "/media/liang/linux_data/github_download/cvpods-master/DeFCN/playground/detection/coco/poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux/net.py", line 66, in output=model(input) File "/home/liang/miniconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/media/liang/linux_data/github_download/cvpods-master/DeFCN/playground/detection/coco/poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux/fcos.py", line 117, in forward images = self.preprocess_image(batched_inputs) File "/media/liang/linux_data/github_download/cvpods-master/DeFCN/playground/detection/coco/poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux/fcos.py", line 546, in preprocess_image images = [x["image"].to(self.device) for x in batched_inputs] File "/media/liang/linux_data/github_download/cvpods-master/DeFCN/playground/detection/coco/poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux/fcos.py", line 546, in images = [x["image"].to(self.device) for x in batched_inputs] IndexError: too many indices for tensor of dimension 0

Process finished with exit code 1

opened by liang532 4
Problem with using ( # no center sampling, it will use all the locations within a ground-truth box)

It quite easy appears "nan" in the box_delta when not using center sampling. Is it an normal state? or something what I miss? Thank for your great work

opened by b03505036 2
Confusions about the paper

Aux_loss is contradict to the core insight of the paper. Concretely, the insight of the paper is to design an one-to-one label asssignment method, however, aux_loss is an one-to-many assignment method. So the proposed method can also be realized by using ATSS with an proper one-to-one label assignment (e.g., POTO in this paper). No correponding results varify which loss is more import to the performance.

opened by SunSet0864 2
mAP is the same after more than 500.000 steps

Hi there,

I am facing a strange situation. I did an inference at step no. 1.000.000 out of 2.000.000 on trained with the same config as poto.res50.fpn.coco.800size.3x_ms.3dmf_wo_gn.aux. The problem is that I cannot see any difference in inference results when I do it again at step 1.500.000. The results of the inference are the same.

Do you know why this might happen?

Thank you.

opened by attilab97 1
request

I am very interested in your research direction. Could you please send me a copy of your training code? I promise it will never be used for commercial purposes.

opened by yxx-byte 1
Issue during inference using POTO

NMS post-processing is not used in POTO. The code: https://github.com/Megvii-BaseDetection/DeFCN/blob/a82393e290455fd11a1a088723a8050791b44c15/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf/fcos.py#L407 says that topk_candidates and score_threshold are useless in POTO. So how to modify the codes in function "inference_single_image" when using POTO? Would you like to give some ideas? Thank you.

opened by SunSet0864 1
Issue about 3DMF

The filtered feature after 3DMF is added by the original one to form the final output:

https://github.com/Megvii-BaseDetection/DeFCN/blob/a82393e290455fd11a1a088723a8050791b44c15/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf/fcos.py#L607

It is a little contradictory to the purpose of the 3DMF which is used to filter the "unnecessary" point in the feature. If the original feature is used to form the final output, the output will not be a sparse feature map, and it may bring some adverse effects to the results without NMS processing.

opened by Hwang64 1
Assertion issue

In the function apply_deltas(/cvpods/modeling/box_regression.py), there is a assertion at the first of this function. I think there may be some considerations for this assertion, and maybe sometimes, the deltas will be infinite. So could you share some experiences when debuging the code here?

opened by Hwang64 1
Multilabel classification

@zengarden hi thanks for opensourcing the code base , i had one query can we modify the current architecture to perform single bounding box detection wtih mulit label classification ? if so what is the modifications which has to be done eg:

Thanks in advance

opened by abhigoku10 0

Owner

BaseDetection Team of Megvii

GitHub

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems This is the implementation of the paper: Learning Knowledge Bases with Par

42 Nov 10, 2022

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

FPGA & FreeNet Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification by Zhuo Zheng, Yanfei Zhong, Ailong M

92 Jan 3, 2023

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

114 Nov 28, 2022

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

End-to-End Object Detection with Learnable Proposal, CVPR2021

1.2k Dec 27, 2022

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

487 Dec 31, 2022

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

34 Dec 31, 2022

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

111 Dec 27, 2022

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

39 Aug 2, 2021

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

116 Nov 9, 2022

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

onnx-facial-lmk-detector End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model, model.onnx. Demo You can

42 Dec 30, 2022

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

BMW-Anonymization-Api Data privacy and individuals’ anonymity are and always have been a major concern for data-driven companies. Therefore, we design

148 Dec 21, 2022

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

3 Jan 26, 2022

Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

2k Jan 5, 2023

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

MOTR: End-to-End Multiple-Object Tracking with TRansformer This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object

348 Jan 7, 2023

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

842 Jan 4, 2023

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .

F-Clip — Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang

115 Dec 28, 2022

Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

158 Dec 21, 2022

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

606 Dec 21, 2022

End-to-End Object Detection with Fully Convolutional Network

Related tags

Overview

End-to-End Object Detection with Fully Convolutional Network

Requirements

Get Started

Results on COCO2017 val set

Results on CrowdHuman val set

Ablations on COCO2017 val set

Acknowledgement

License

Citing

Contributing to the project

Comments

Owner

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Yolo object detection - Yolo object detection with python

Deformable DETR is an efficient and fast-converging end-to-end object detector.

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

Another pytorch implementation of FCN (Fully Convolutional Networks)

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .