End-to-End Object Detection with Fully Convolutional Network

Overview

End-to-End Object Detection with Fully Convolutional Network

GitHub

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

Experiments in the paper were conducted on the internal framework, thus we reimplement them on cvpods and report details as below.

Requirements

Get Started

  • install cvpods locally (requires cuda to compile)
python3 -m pip install 'git+https://github.com/Megvii-BaseDetection/cvpods.git'
# (add --user if you don't have permission)

# Or, to install it from a local clone:
git clone https://github.com/Megvii-BaseDetection/cvpods.git
python3 -m pip install -e cvpods

# Or,
pip install -r requirements.txt
python3 setup.py build develop
  • prepare datasets
cd /path/to/cvpods
cd datasets
ln -s /path/to/your/coco/dataset coco
  • Train & Test
git clone https://github.com/Megvii-BaseDetection/DeFCN.git
cd DeFCN/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms  # for example

# Train
pods_train --num-gpus 8

# Test
pods_test --num-gpus 8 \
    MODEL.WEIGHTS /path/to/your/save_dir/ckpt.pth # optional
    OUTPUT_DIR /path/to/your/save_dir # optional

# Multi node training
## sudo apt install net-tools ifconfig
pods_train --num-gpus 8 --num-machines N --machine-rank 0/1/.../N-1 --dist-url "tcp://MASTER_IP:port"

Results on COCO2017 val set

model assignment with NMS lr sched. mAP mAR download
FCOS one-to-many Yes 3x + ms 41.4 59.1 weight | log
FCOS baseline one-to-many Yes 3x + ms 40.9 58.4 weight | log
Anchor one-to-one No 3x + ms 37.1 60.5 weight | log
Center one-to-one No 3x + ms 35.2 61.0 weight | log
Foreground Loss one-to-one No 3x + ms 38.7 62.2 weight | log
POTO one-to-one No 3x + ms 39.2 61.7 weight | log
POTO + 3DMF one-to-one No 3x + ms 40.6 61.6 weight | log
POTO + 3DMF + Aux mixture* No 3x + ms 41.4 61.5 weight | log

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

  • 2x + ms schedule is adopted in the paper, but we adopt 3x + ms schedule here to achieve higher performance.
  • It's normal to observe ~0.3AP noise in POTO.

Results on CrowdHuman val set

model assignment with NMS lr sched. AP50 mMR recall download
FCOS one-to-many Yes 30k iters 86.1 54.9 94.2 weight | log
ATSS one-to-many Yes 30k iters 87.2 49.7 94.0 weight | log
POTO one-to-one No 30k iters 88.5 52.2 96.3 weight | log
POTO + 3DMF one-to-one No 30k iters 88.8 51.0 96.6 weight | log
POTO + 3DMF + Aux mixture* No 30k iters 89.1 48.9 96.5 weight | log

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

  • It's normal to observe ~0.3AP noise in POTO, and ~1.0mMR noise in all methods.

Ablations on COCO2017 val set

model assignment with NMS lr sched. mAP mAR note
POTO one-to-one No 6x + ms 40.0 61.9
POTO one-to-one No 9x + ms 40.2 62.3
POTO one-to-one No 3x + ms 39.2 61.1 replace Hungarian algorithm by argmax
POTO + 3DMF one-to-one No 3x + ms 40.9 62.0 remove GN in 3DMF
POTO + 3DMF + Aux mixture* No 3x + ms 41.5 61.5 remove GN in 3DMF

* We adopt a one-to-one assignment in POTO and a one-to-many assignment in the auxiliary loss, respectively.

  • For one-to-one assignment, more training iters lead to higher performance.
  • The argmax (also known as top-1) operation is indeed the approximate solution of bipartite matching in dense prediction methods.
  • It seems harmless to remove GN in 3DMF, which also leads to higher inference speed.

Acknowledgement

This repo is developed based on cvpods. Please check cvpods for more details and features.

License

This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Citing

If you use this work in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{wang2020end,
  title   =  {End-to-End Object Detection with Fully Convolutional Network},
  author  =  {Wang, Jianfeng and Song, Lin and Li, Zeming and Sun, Hongbin and Sun, Jian and Zheng, Nanning},
  journal =  {arXiv preprint arXiv:2012.03544},
  year    =  {2020}
}

Contributing to the project

Any pull requests or issues about the implementation are welcome. If you have any issue about the library (e.g. installation, environments), please refer to cvpods.

Comments
  • Dose it support limited GPU index?

    Dose it support limited GPU index?

    There are only 2 gpus free in my machine now, so I tried to limit the gpu index and numbers as follow, and it raised such error: (open-mmlab) liangtian@node001:~/project/DeFCN/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf$ CUDA_VISIBLE_DEVICES=4,5 pods_train --num-gpus 2 Traceback (most recent call last): File "/mnt/xfs1/home/liangtian/project/cvpods/tools/train_net.py", line 27, in from cvpods.checkpoint import DetectionCheckpointer File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/init.py", line 3, in from .utils import setup_environment File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/utils/init.py", line 27, in from .visualizer import ColorMode, VideoVisualizer, VisImage, Visualizer, colormap, random_color File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/utils/visualizer/init.py", line 5, in from .video_visualizer import * File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/utils/visualizer/video_visualizer.py", line 6, in from .visualizer import ColorMode, Visualizer, _create_text_labels, _PanopticPrediction File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/utils/visualizer/visualizer.py", line 17, in from cvpods.structures import BitMasks, Boxes, BoxMode, Keypoints, PolygonMasks, RotatedBoxes File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/structures/init.py", line 2, in from .boxes import Boxes, BoxMode, pairwise_ioa, pairwise_iou File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/structures/boxes.py", line 11, in from cvpods.layers import cat File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/layers/init.py", line 4, in from .deform_conv import DeformConv, ModulatedDeformConv File "/mnt/xfs1/home/liangtian/project/cvpods/cvpods/layers/deform_conv.py", line 11, in from cvpods import _C ImportError: /mnt/xfs1/home/liangtian/project/cvpods/cvpods/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6cvpods26psroi_pooling_forward_cudaERN2at6TensorES2_S2_iifii

    opened by rainylt 5
  • ImportError: cannot import name 'config' from 'config'

    ImportError: cannot import name 'config' from 'config'

    Hi, when i try to run the command 'pods_test' or 'pods_train', i got the error as follow:

    Traceback (most recent call last): File "xxx/cvpods/tools/test_net.py", line 192, in from config import config # isort:skip # noqa: E402

    Any suggestion to fix this? many thanks

    opened by daniel-qian 4
  • 关于CrowdHuman数据集

    关于CrowdHuman数据集

    您好! 由于cvpods封装程度较高,有些地方没有理解。 请问DeFCN训练时使用的是CrowdHuman中的Visible box还是Full box? 配置文件中num_classes=1,那么类别中的mask是被去除还是转化为person呢? box的格式是表示xmin,ymin,xmax,ymax两个点的坐标还是中心点坐标与长宽呢?

    opened by Asthestarsfalll 4
  • DeFCN pytorch  model to onnx face with some trouble

    DeFCN pytorch model to onnx face with some trouble

    I want to export the model of DeFCN to onnx to better see the details of this model,so i try to export with add some code ,but get error. DeFCN/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf_wo_gn.aux/net.py
    i modify poto.res50.fpn.coco.800size.3x_ms.3dmf_wo_gn.aux as poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux for import, add some code and run:

    from cvpods.engine import default_argument_parser, default_setup from DeFCN.playground.detection.coco.poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux.config import config args = default_argument_parser().parse_args() print("Command Line Args:", args) config.link_log() print("soft link to {}".format(config.OUTPUT_DIR)) cfg,logger=default_setup(config,args) model=build_model(cfg)

    model.eval() input=torch.randn((1,3,48,48)) input=torch.rand(1) output=model(input) print(output.shape) input_names=['input'] output_names=['output'] torch.onnx.export(model,input,'1.onnx',input_names=input_names,output_names=output_names,verbose=True,opset_version=11)

    but get error as follows:

    [09/22 11:36:18 c2.utils.env.env]: Using a generated random seed 18403068 Traceback (most recent call last): File "/media/liang/linux_data/github_download/cvpods-master/DeFCN/playground/detection/coco/poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux/net.py", line 66, in output=model(input) File "/home/liang/miniconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/media/liang/linux_data/github_download/cvpods-master/DeFCN/playground/detection/coco/poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux/fcos.py", line 117, in forward images = self.preprocess_image(batched_inputs) File "/media/liang/linux_data/github_download/cvpods-master/DeFCN/playground/detection/coco/poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux/fcos.py", line 546, in preprocess_image images = [x["image"].to(self.device) for x in batched_inputs] File "/media/liang/linux_data/github_download/cvpods-master/DeFCN/playground/detection/coco/poto_res50_fpn_coco_800size_3x_ms_3dmf_wo_gn_aux/fcos.py", line 546, in images = [x["image"].to(self.device) for x in batched_inputs] IndexError: too many indices for tensor of dimension 0

    Process finished with exit code 1

    opened by liang532 4
  • Problem with  using  ( # no center sampling, it will use all the locations within a ground-truth box)

    Problem with using ( # no center sampling, it will use all the locations within a ground-truth box)

    It quite easy appears "nan" in the box_delta when not using center sampling. Is it an normal state? or something what I miss? Thank for your great work

    opened by b03505036 2
  • Confusions about the paper

    Confusions about the paper

    Aux_loss is contradict to the core insight of the paper. Concretely, the insight of the paper is to design an one-to-one label asssignment method, however, aux_loss is an one-to-many assignment method. So the proposed method can also be realized by using ATSS with an proper one-to-one label assignment (e.g., POTO in this paper). No correponding results varify which loss is more import to the performance.

    opened by SunSet0864 2
  • mAP is the same after more than 500.000 steps

    mAP is the same after more than 500.000 steps

    Hi there,

    I am facing a strange situation. I did an inference at step no. 1.000.000 out of 2.000.000 on trained with the same config as poto.res50.fpn.coco.800size.3x_ms.3dmf_wo_gn.aux. The problem is that I cannot see any difference in inference results when I do it again at step 1.500.000. The results of the inference are the same.

    Do you know why this might happen?

    Thank you.

    opened by attilab97 1
  • request

    request

    I am very interested in your research direction. Could you please send me a copy of your training code? I promise it will never be used for commercial purposes.

    opened by yxx-byte 1
  • Issue during inference using POTO

    Issue during inference using POTO

    NMS post-processing is not used in POTO. The code: https://github.com/Megvii-BaseDetection/DeFCN/blob/a82393e290455fd11a1a088723a8050791b44c15/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf/fcos.py#L407 says that topk_candidates and score_threshold are useless in POTO. So how to modify the codes in function "inference_single_image" when using POTO? Would you like to give some ideas? Thank you.

    opened by SunSet0864 1
  • Issue about 3DMF

    Issue about 3DMF

    The filtered feature after 3DMF is added by the original one to form the final output:

    https://github.com/Megvii-BaseDetection/DeFCN/blob/a82393e290455fd11a1a088723a8050791b44c15/playground/detection/coco/poto.res50.fpn.coco.800size.3x_ms.3dmf/fcos.py#L607

    It is a little contradictory to the purpose of the 3DMF which is used to filter the "unnecessary" point in the feature. If the original feature is used to form the final output, the output will not be a sparse feature map, and it may bring some adverse effects to the results without NMS processing.

    opened by Hwang64 1
  • Assertion issue

    Assertion issue

    In the function apply_deltas(/cvpods/modeling/box_regression.py), there is a assertion at the first of this function. I think there may be some considerations for this assertion, and maybe sometimes, the deltas will be infinite. So could you share some experiences when debuging the code here?

    opened by Hwang64 1
  • Multilabel classification

    Multilabel classification

    @zengarden hi thanks for opensourcing the code base , i had one query can we modify the current architecture to perform single bounding box detection wtih mulit label classification ? if so what is the modifications which has to be done eg: image

    Thanks in advance

    opened by abhigoku10 0
Owner
BaseDetection Team of Megvii
null
KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems This is the implementation of the paper: Learning Knowledge Bases with Par

CAiRE 42 Nov 10, 2022
FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

FPGA & FreeNet Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification by Zhuo Zheng, Yanfei Zhong, Ailong M

Zhuo Zheng 92 Jan 3, 2023
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Kakao Brain 114 Nov 28, 2022
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

End-to-End Object Detection with Learnable Proposal, CVPR2021

Peize Sun 1.2k Dec 27, 2022
Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

Facebook Research 487 Dec 31, 2022
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

null 34 Dec 31, 2022
Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

null 111 Dec 27, 2022
Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network

null 39 Aug 2, 2021
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

Kai Li (李凯) 116 Nov 9, 2022
End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model

onnx-facial-lmk-detector End-to-end face detection, cropping, norm estimation, and landmark detection in a single onnx model, model.onnx. Demo You can

atksh 42 Dec 30, 2022
BMW TechOffice MUNICH 148 Dec 21, 2022
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022
Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

null 3 Jan 26, 2022
Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

null 2k Jan 5, 2023
This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

MOTR: End-to-End Multiple-Object Tracking with TRansformer This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object

null 348 Jan 7, 2023
[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

 同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 842 Jan 4, 2023
The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

F-Clip — Fully Convolutional Line Parsing This repository contains the official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang

Xili Dai 115 Dec 28, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

Matthew Macy 606 Dec 21, 2022