QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection



This repository is the official implementation of our paper: QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection


a. Install Pytorch 1.4 following here

b. Install APEX following here

c. Install our Pytorch based sparse convolution operation following here

d. Install the detectron2 toolkit following here, note that we build our approach based on version 0.2.1. Note you may follow the instructions to set COCO configs

d. Clone our repository and have fun with it!


1. Data preparation

a. To prepare MS-COCO, you may follow the instructions of Detectron2

b. We provide the data preprocessing code for VisDrone2018. You need to first download dataset from here

c. Check visdrone/data_prepare.py to process the dataset

2. Training

% train coco RetinaNet baseline
python train_coco.py --config-file models/retinanet/configs/coco/train.yaml --num-gpu 8 OUTPUT_DIR /path/to/workdir

% train coco QueryDet 
python train_coco.py --config-file models/querydet/configs/coco/train.yaml --num-gpu 8 OUTPUT_DIR /path/to/workdir

% train VisDrone RetinaNet baseline
python train_visdrone.py --config-file models/retinanet/configs/visdrone/train.yaml --num-gpu 8 OUTPUT_DIR /path/to/workdir

% train VisDrone QueryDet
python train_visdrone.py --config-file models/querydet/configs/visdrone/train.yaml --num-gpu 8 OUTPUT_DIR /path/to/workdir

3. Test

% test coco RetinaNet baseline
python infer_coco.py --config-file models/retinanet/configs/coco/test.yaml --num-gpu 8 --eval-only MODEL.WEIGHTS /path/to/workdir/model_final.pth

% test coco QueryDet with Dense Inference
python infer_coco.py --config-file models/querydet/configs/coco/test.yaml --num-gpu 8 --eval-only MODEL.WEIGHTS /path/to/workdir/model_final.pth

% test coco QueryDet with CSQ
python infer_coco.py --config-file models/querydet/configs/coco/test.yaml --num-gpu 8 --eval-only MODEL.WEIGHTS /path/to/workdir/model_final.pth MODEL.QUERY.QUERY_INFER True

    self.act_type) File "/root/anaconda3/envs/querydet/lib/python3.7/site-packages/torch/cuda/amp/autocast_mode.py", line 209, in decorate_fwd return fwd(*args, **kwargs) File "/root/anaconda3/envs/querydet/lib/python3.7/site-packages/spconv/pytorch/functional.py", line 224, in forward raise e File "/root/anaconda3/envs/querydet/lib/python3.7/site-packages/spconv/pytorch/functional.py", line 214, in forward act_type) File "/root/anaconda3/envs/querydet/lib/python3.7/site-packages/spconv/pytorch/ops.py", line 1467, in implicit_gemm assert filters.is_contiguous() AssertionError


    python infer_visdrone.py --config-file models/querydet/configs/visdrone/test.yaml --num-gpu 1 --eval-only MODEL.WEIGHTS / /out/model_final.pth MODEL.QUERY.QUERY_INFER True

    opened by XuKer 6
    I have just trained QueryDet with the following results

    The weight is uploaded here GoogleDriver. Training log is here. If you think my training is correct, you can use it to share it in your repository. I have got frustrated to retrain QueryDet. It will be helpful to someone else if they have this pre-trained weight.

    [05/19 09:29:28] d2.evaluation.evaluator INFO: Total inference time: 0:04:38.781605 (0.111736 s / img per device, on 2 devices)
    [05/19 09:29:28] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:04:28 (0.107591 s / img per device, on 2 devices)
    [05/19 09:29:36] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
    [05/19 09:29:36] d2.evaluation.coco_evaluation INFO: Saving results to ./default_dir/inference/coco_instances_results.json
    [05/19 09:29:40] d2.evaluation.coco_evaluation INFO: Evaluating predictions with unofficial COCO API...
    [05/19 09:30:01] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: 
    |   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
    | 35.458 | 54.958 | 38.396 | 21.139 | 38.342 | 45.597 |

    I set up conda virtual environment following Pytorch 1.7, Detectron2 v0.3. All packages are built from source with CUDA 11.2, Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-72-generic x86_64).

    export CUDA_HOME=/usr/local/cuda-11.2/
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2/lib64:/usr/local/cuda-11.2/extras/CUPTI/lib64
    export PATH=$PATH:$CUDA_HOME/bin
    mkdir querydet_torch17
    cd querydet_torch17
    # Create a new environment
    conda create --name qtorch17 python=3.7  -y
    conda activate qtorch17
    # a. Install Pytorch 1.4
    # Install Dependencies, Pytorch
    conda install -y astunparse numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses
    # Clone Pytorch 1.7 source
    git clone --recursive https://github.com/pytorch/pytorch.git --branch release/1.7
    cd pytorch
    # Change CUDA version installed in your computer
    # CUDA only: Add LAPACK support for the GPU if needed
    conda install -c pytorch magma-cuda112 -y # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo
    export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
    python setup.py install
    cd ..
    # Clone Pytorch VISION 0.8.0 source
    git clone --recursive https://github.com/pytorch/vision.git --branch release/0.8.0
    cd vision
    conda install -y jpeg libpng 
    conda install -y -c conda-forge accimage
    mkdir build
    cd build
    # Add -DWITH_CUDA=on support for the CUDA if needed
    # Use `pip show torch | grep Location` to find install location of Pytorch
    # -DTorch_DIR=../anaconda3/envs/qtorch17/lib/python3.7/site-packages
    cmake -DWITH_CUDA=on -DCMAKE_PREFIX_PATH=../anaconda3/envs/qtorch17/lib/python3.7/site-packages ..
    #make install
    cd ..
    python setup.py install
    cd ../..
    # b. Install APEX for mixed precision training
    git clone https://github.com/NVIDIA/apex
    cd apex
    pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
    cd ..
    # c. Install our Pytorch based sparse convolution toolkit
    # uninstall spconv and cumm installed by pip
    # install build-essential, install CUDA
    git clone https://github.com/FindDefinition/cumm
    cd ./cumm
    pip install -e .
    cd ..
    git clone https://github.com/traveller59/spconv
    cd ./spconv
    pip install -e .
    cd ..
    # in python, import spconv and wait for build finish.
    # git reset --hard bc8714d if facing an error ImportError: cannot import name 'TensorOpParams' from 'cumm.gemm.algospec.core'
    # Check this Issue: https://github.com/traveller59/spconv/issues/438
    # d. Install the detectron2 toolkit. Note we build our approach based on version 0.2.1, you may follow the instructions to set environment configs
    wget https://github.com/facebookresearch/detectron2/archive/refs/tags/v0.3.zip
    unzip v0.3.zip
    cd detectron2-0.3
    python -m pip install -e .
    cd ..
    # e. Install the Detectron2_Backbone for usage of MobileNet and ShuffleNet
    git clone https://github.com/sxhxliang/detectron2_backbone.git
    cd detectron2_backbone
    python setup.py build develop
    # f. Clone our repository and have fun with it!
    CUDA_VISIBLE_DEVICES=6,7 python train_coco.py
    opened by JohnPekl 5
    Thank you for publishing this source code.

    I am trying to retrain QueryDet-PyTorch on MS-COCO but face the following error.

    RuntimeError: Invoked 'with amp.scale_loss, but internal Amp state has not been initialized. model, optimizer = amp.initialize(model, optimizer, opt_level=...) must be called beforewith amp.scale_loss`.

    Amp state is initialized if comm.get_world_size() > 1. I guess you haven't handled when comm.get_world_size() == 1.

    opened by JohnPekl 5
    My questions are:

    1. When queryDet is applied to RPN, is the query head parallel to the ROI operator?
    2. It is mentioned in section 3.4 of the paper that: "Thirdly, two-stage methods rely on operations likeRoIAlign [15] or RoIPooling [12] to align the features withthe first stage proposal. Nevertheless, they are not used inour approach since we don’t have boxes output in the coarseprediction.“ So do P3 and P4 not have regression and the network discards RoIAlign and RoIPooling?

    When QueryDet is applied to RPN, is its network structure like this? image

    opened by Oswells 4
    Thanks for your wonderful jobs,but I got an error when trying to train a retinanet baseline:

    $python train_coco.py --config-file models/retinanet/configs/coco/train.yaml --num-gpu 4 OUTPUT_DIR workdir Traceback (most recent call last): File "train_coco.py", line 9, in from train_tools.coco_train import default_argument_parser, start_train File "/disk1/qdworkspace/QueryDet-PyTorch/train_tools/coco_train.py", line 53, in from models.backbone import build ModuleNotFoundError: No module named 'models.backbone'

    It seems that the program is looking for a file 'models.backbone' but there is no backbone.py in QueryDet-PyTorch/models/ ?

    opened by jy-Hamlet 4
    when i use queydet to train in visdrone, everything is ok, but when i infer in visdrone, there is some issues about spconv. image the spconv`s version is 2.x

    opened by jigongbao 3
    251, in run_step self._detect_anomaly(losses, loss_dict) AttributeError: 'Trainer' object has no attribute '_detect_anomaly' 请问是不是我的torch版本不对,所以导致的这个问题呢?谢谢!

    opened by BEVISjyy 3
    According to the project instructions and computer configuration, I built pytorch1.4+cuda10.1+detectron2-0.2.1 on a windows system, but pytorch1.4.0 does not support windows distributed training (the error is ==== AttributeError: module 'torch.distributed' has no attribute 'deprecated'), is there a successful configuration for a higher version? Or should I re-build this version of the environment on a linux system?

    opened by xiyanbupapang 2
    Hi, I was running the code but this problem occurs, here is the complete error message:

    Traceback (most recent call last): File "/media/SSD/user1/QueryDet-PyTorch-main/train_coco.py", line 15, in launch( File "/media/SSD/user1/QueryDet-PyTorch-main/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "/media/SSD/user1/QueryDet-PyTorch-main/train_tools/coco_train.py", line 158, in start_train return trainer.train() File "/media/SSD/user1/QueryDet-PyTorch-main/apex_tools/apex_trainer.py", line 227, in train self.run_step() File "/media/SSD/user1/QueryDet-PyTorch-main/apex_tools/apex_trainer.py", line 249, in run_step loss_dict = self.model(data) File "/home/user1/.conda/envs/sr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/media/SSD/user1/QueryDet-PyTorch-main/models/querydet/detector.py", line 165, in forward return self.train_forward(batched_inputs, just_forward) File "/media/SSD/user1/QueryDet-PyTorch-main/models/querydet/detector.py", line 187, in train_forward all_anchors, all_centers = self.anchor_generator(all_features) File "/home/user1/.conda/envs/sr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/media/SSD/user1/QueryDet-PyTorch-main/utils/anchor_gen.py", line 30, in forward anchors_over_all_feature_maps, centers_over_all_feature_maps = self._grid_anchors(grid_sizes) File "/media/SSD/user1/QueryDet-PyTorch-main/utils/anchor_gen.py", line 20, in _grid_anchors shift_x, shift_y = _create_grid_offsets(size, stride, self.offset, base_anchors.device) File "/media/SSD/user1/QueryDet-PyTorch-main/detectron2/modeling/anchor_generator.py", line 43, in _create_grid_offsets shifts_x = move_device_like( File "/home/user1/.conda/envs/sr/lib/python3.9/site-packages/torch/jit/_trace.py", line 1118, in wrapper return fn(*args, **kwargs) File "/media/SSD/user1/QueryDet-PyTorch-main/detectron2/layers/wrappers.py", line 146, in move_device_like return src.to(dst.device) AttributeError: 'torch.device' object has no attribute 'device'

    How can I resolve this? Many thanks I have already tried different version of environments, I used cuda 10.1+ torch 1.8.1 + python 3.7 and this problem still exists.

    opened by YuxuanWen-Code 2
    def _make_sparse_tensor(self, query_logits, last_ys, last_xs, anchors, feature_value):
            if last_ys is None:
                N, _, qh, qw = query_logits.size()
                assert N == 1
                prob  = torch.sigmoid_(query_logits).view(-1)
                pidxs = torch.where(prob > self.score_th)[0]# .float()
                y = torch.floor_divide(pidxs, qw).int()
                x = torch.remainder(pidxs, qw).int()
                prob  = torch.sigmoid_(query_logits).view(-1)
                pidxs = prob > self.score_th
                y = last_ys[pidxs]
                x = last_xs[pidxs]
            if y.size(0) == 0:
                return None, None, None, None, None, None 
            _, fc, fh, fw = feature_value.shape
            ys, xs = [], []
            for i in range(2):
                for j in range(2):
                    ys.append(y * 2 + i)
                    xs.append(x * 2 + j)
            ys = torch.cat(ys, dim=0)
            xs = torch.cat(xs, dim=0)
            inds = (ys * fw + xs).long()
            sparse_ys = []
            sparse_xs = []
            for i in range(-1*self.context, self.context+1):
                for j in range(-1*self.context, self.context+1):
            sparse_ys = torch.cat(sparse_ys, dim=0)
            sparse_xs = torch.cat(sparse_xs, dim=0)
            good_idx = (sparse_ys >= 0) & (sparse_ys < fh) & (sparse_xs >= 0)  & (sparse_xs < fw)
            sparse_ys = sparse_ys[good_idx]
            sparse_xs = sparse_xs[good_idx]
            sparse_yx = torch.stack((sparse_ys, sparse_xs), dim=0).t()
            sparse_yx = torch.unique(sparse_yx, sorted=False, dim=0)
            sparse_ys = sparse_yx[:, 0]
            sparse_xs = sparse_yx[:, 1]
            sparse_inds = (sparse_ys * fw + sparse_xs).long()
            sparse_features = feature_value.view(fc, -1).transpose(0, 1)[sparse_inds].view(-1, fc)
            sparse_indices  = torch.stack((torch.zeros_like(sparse_ys), sparse_ys, sparse_xs), dim=0).t().contiguous()
            sparse_tensor = spconv.SparseConvTensor(sparse_features, sparse_indices, [fh, fw], 1)
            anchors = anchors.tensor.view(-1, self.anchor_num, 4)
            selected_anchors = anchors[inds].view(1, -1, 4)
            return sparse_tensor, ys, xs, inds, selected_anchors, sparse_indices.size(0)

    Many thanks to the author for providing a good solution for small object detection, but I have a little question about run_qinfer in the source code. During inference, authors choose the locations whose predicted scores are larger than a threshold σ as queries. Then $q_{l}^{0}$ will be mapped to its four nearest neighbors on $P_{l−1}$ as key positions. The implementation of this part corresponds to the following operations in _make_sparse_tensor()

    ys, xs = [], []
    for i in range(2):
        for j in range(2):
            ys.append(y * 2 + i)
            xs.append(x * 2 + j)
    ys = torch.cat(ys, dim=0)
    xs = torch.cat(xs, dim=0)
    inds = (ys * fw + xs).long()

    But I don't understand why the following operations are required when constructing sparse indices. Why not directly use ys and xs to construct the sparse indices, and what is the point of self.context?

    for i in range(-1*self.context, self.context+1):
        for j in range(-1*self.context, self.context+1):
    sparse_ys = torch.cat(sparse_ys, dim=0)
    sparse_xs = torch.cat(sparse_xs, dim=0)
    good_idx = (sparse_ys >= 0) & (sparse_ys < fh) & (sparse_xs >= 0)  & (sparse_xs < fw)
    sparse_ys = sparse_ys[good_idx]
    sparse_xs = sparse_xs[good_idx]

    I noticed that the author set cfg.MODEL.QUERY.CONTEXT = 2 in model/config.py, then according to the above code, 25 points are expanded around each point as query_key.

    Could you do me a favour to explain the reason for using the self.context to construct sparse indices.

    opened by furh95 2
    when i run the code, i aways met this problem, QueryDet-Pytorch don't have this file and directory, i don't konw how to slove the problem,could you help me?

    opened by Frank-jinchuan 2
    Traceback (most recent call last): File "train_visdrone.py", line 15, in launch( File "c:\users\lsy\desktop\querydet-pytorch-main\detectron2-windows\detectron2\engine\launch.py", line 82, in launch main_func(*args) File "C:\Users\lsy\Desktop\QueryDet-PyTorch-main\train_tools\visdrone_train.py", line 179, in start_train return trainer.train() File "C:\Users\lsy\Desktop\QueryDet-PyTorch-main\apex_tools\apex_trainer.py", line 234, in train self.run_step() File "C:\Users\lsy\Desktop\QueryDet-PyTorch-main\apex_tools\apex_trainer.py", line 256, in run_step loss_dict = self.model(data) File "D:\Anaconda\envs\QueryDet\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\lsy\Desktop\QueryDet-PyTorch-main\models\retinanet\retinanet.py", line 197, in forward losses = self.det_loss(gt_classes, gt_deltas, box_cls, box_delta, self.focal_loss_alpha, self.focal_loss_gamma, self.cls_weights, self.reg_weights) File "C:\Users\lsy\Desktop\QueryDet-PyTorch-main\models\retinanet\retinanet.py", line 278, in det_loss assert len(cls_weights) == len(pred_logits)

    hello,how to solve this problem?

    opened by syddpy666 0
    def make_json(images, annotations, new_label_json): ann_dict = {} ann_dict['categories'] = [ {'supercategory': 'things', 'id': 1, 'name': 'pedestrian'}, {'supercategory': 'things', 'id': 2, 'name': 'people'}, {'supercategory': 'things', 'id': 3, 'name': 'bicycle'}, {'supercategory': 'things', 'id': 4, 'name': 'car'}, {'supercategory': 'things', 'id': 5, 'name': 'van'}, {'supercategory': 'things', 'id': 6, 'name': 'truck'}, {'supercategory': 'things', 'id': 7, 'name': 'tricycle'}, {'supercategory': 'things', 'id': 8, 'name': 'awning-tricycle'}, {'supercategory': 'things', 'id': 9, 'name': 'bus'}, {'supercategory': 'things', 'id': 10, 'name': 'motor'} ] There is no id=0 setting. When you are training, you will find no categories_ id=0 in the json file?

    opened by myc1998 2
    I only have a 3070 on my computer. When I started training, it kept showing "CUDA out of Memory". I would like to ask whether a single GPU can complete the training, thanks

    opened by kourlephy 3
    Hello author, your job is very good. Can you answer the meaning of the following code for me?

        for i in range(-1*self.context, self.context+1):
            for j in range(-1*self.context, self.context+1):
        sparse_ys = torch.cat(sparse_ys, dim=0)
        sparse_xs = torch.cat(sparse_xs, dim=0)
        good_idx = (sparse_ys >= 0) & (sparse_ys < fh) & (sparse_xs >= 0)  & (sparse_xs < fw)
    opened by wx24598 0
    Hello author, your job is very good. Can you answer the meaning of the following code for me?

        for i in range(-1*self.context, self.context+1):
            for j in range(-1*self.context, self.context+1):
        sparse_ys = torch.cat(sparse_ys, dim=0)
        sparse_xs = torch.cat(sparse_xs, dim=0)
        good_idx = (sparse_ys >= 0) & (sparse_ys < fh) & (sparse_xs >= 0)  & (sparse_xs < fw)
    opened by wx24598 0
