The official codes for the ICCV2021 Oral presentation "Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework"

Overview

P2PNet (ICCV2021 Oral Presentation)

This repository contains codes for the official implementation in PyTorch of P2PNet as described in Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework.

An brief introduction of P2PNet can be found at 机器之心 (almosthuman).

The codes is tested with PyTorch 1.5.0. It may not run with other versions.

Visualized demos for P2PNet

The network

The overall architecture of the P2PNet. Built upon the VGG16, it firstly introduce an upsampling path to obtain fine-grained feature map. Then it exploits two branches to simultaneously predict a set of point proposals and their confidence scores.

Comparison with state-of-the-art methods

The P2PNet achieved state-of-the-art performance on several challenging datasets with various densities.

Methods Venue SHTechPartA
MAE/MSE
SHTechPartB
MAE/MSE
UCF_CC_50
MAE/MSE
UCF_QNRF
MAE/MSE
CAN CVPR'19 62.3/100.0 7.8/12.2 212.2/243.7 107.0/183.0
Bayesian+ ICCV'19 62.8/101.8 7.7/12.7 229.3/308.2 88.7/154.8
S-DCNet ICCV'19 58.3/95.0 6.7/10.7 204.2/301.3 104.4/176.1
SANet+SPANet ICCV'19 59.4/92.5 6.5/9.9 232.6/311.7 -/-
DUBNet AAAI'20 64.6/106.8 7.7/12.5 243.8/329.3 105.6/180.5
SDANet AAAI'20 63.6/101.8 7.8/10.2 227.6/316.4 -/-
ADSCNet CVPR'20 55.4/97.7 6.4/11.3 198.4/267.3 71.3/132.5
ASNet CVPR'20 57.78/90.13 -/- 174.84/251.63 91.59/159.71
AMRNet ECCV'20 61.59/98.36 7.02/11.00 184.0/265.8 86.6/152.2
AMSNet ECCV'20 56.7/93.4 6.7/10.2 208.4/297.3 101.8/163.2
DM-Count NeurIPS'20 59.7/95.7 7.4/11.8 211.0/291.5 85.6/148.3
Ours - 52.74/85.06 6.25/9.9 172.72/256.18 85.32/154.5

Comparison on the NWPU-Crowd dataset.

Methods MAE[O] MSE[O] MAE[L] MAE[S]
MCNN 232.5 714.6 220.9 1171.9
SANet 190.6 491.4 153.8 716.3
CSRNet 121.3 387.8 112.0 522.7
PCC-Net 112.3 457.0 111.0 777.6
CANNet 110.0 495.3 102.3 718.3
Bayesian+ 105.4 454.2 115.8 750.5
S-DCNet 90.2 370.5 82.9 567.8
DM-Count 88.4 388.6 88.0 498.0
Ours 77.44 362 83.28 553.92

The overall performance for both counting and localization.

nAP$_{\delta}$ SHTechPartA SHTechPartB UCF_CC_50 UCF_QNRF NWPU_Crowd
$\delta=0.05$ 10.9% 23.8% 5.0% 5.9% 12.9%
$\delta=0.25$ 70.3% 84.2% 54.5% 55.4% 71.3%
$\delta=0.50$ 90.1% 94.1% 88.1% 83.2% 89.1%
$\delta={{0.05:0.05:0.50}}$ 64.4% 76.3% 54.3% 53.1% 65.0%

Comparison for the localization performance in terms of F1-Measure on NWPU.

Method F1-Measure Precision Recall
FasterRCNN 0.068 0.958 0.035
TinyFaces 0.567 0.529 0.611
RAZ 0.599 0.666 0.543
Crowd-SDNet 0.637 0.651 0.624
PDRNet 0.653 0.675 0.633
TopoCount 0.692 0.683 0.701
D2CNet 0.700 0.741 0.662
Ours 0.712 0.729 0.695

Installation

  • Clone this repo into a directory named P2PNET_ROOT
  • Organize your datasets as required
  • Install Python dependencies. We use python 3.6.5 and pytorch 1.5.0
pip install -r requirements.txt

Organize the counting dataset

We use a list file to collect all the images and their ground truth annotations in a counting dataset. When your dataset is organized as recommended in the following, the format of this list file is defined as:

train/scene01/img01.jpg train/scene01/img01.txt
train/scene01/img02.jpg train/scene01/img02.txt
...
train/scene02/img01.jpg train/scene02/img01.txt

Dataset structures:

DATA_ROOT/
        |->train/
        |    |->scene01/
        |    |->scene02/
        |    |->...
        |->test/
        |    |->scene01/
        |    |->scene02/
        |    |->...
        |->train.list
        |->test.list

DATA_ROOT is your path containing the counting datasets.

Annotations format

For the annotations of each image, we use a single txt file which contains one annotation per line. Note that indexing for pixel values starts at 0. The expected format of each line is:

x1 y1
x2 y2
...

Training

The network can be trained using the train.py script. For training on SHTechPartA, use

CUDA_VISIBLE_DEVICES=0 python train.py --data_root $DATA_ROOT \
    --dataset_file SHHA \
    --epochs 3500 \
    --lr_drop 3500 \
    --output_dir ./logs \
    --checkpoints_dir ./weights \
    --tensorboard_dir ./logs \
    --lr 0.0001 \
    --lr_backbone 0.00001 \
    --batch_size 8 \
    --eval_freq 1 \
    --gpu_id 0

By default, a periodic evaluation will be conducted on the validation set.

Testing

A trained model (with an MAE of 51.96) on SHTechPartA is available at "./weights", run the following commands to launch a visualization demo:

CUDA_VISIBLE_DEVICES=0 python run_test.py --weight_path ./weights/SHTechA.pth --output_dir ./logs/

Acknowledgements

  • Part of codes are borrowed from the C^3 Framework.
  • We refer to DETR to implement our matching strategy.

Citing P2PNet

If you find P2PNet is useful in your project, please consider citing us:

@inproceedings{song2021rethinking,
  title={Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework},
  author={Song, Qingyu and Wang, Changan and Jiang, Zhengkai and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Wu, Yang},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2021}
}

Related works from Tencent Youtu Lab

  • [AAAI2021] To Choose or to Fuse? Scale Selection for Crowd Counting. (paper link & codes)
  • [ICCV2021] Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting. (paper link & codes)
Comments
  • 关于Eval的结果

    关于Eval的结果

    我直接训练最好得到54,但是我用给出的权重(51.9,阈值保持默认为0.5)跑出来是53.13,但是阈值改成0.45是可以到52的,请问51.9是在当前代码默认setting下训练出来的还是有所调整呢? 如果是在当前代码下跑出来的,请问能给一下你们的SHTechA的数据集和组织格式吗?因为我只改了loader。

    opened by donggoing 11
  • Configuration for ShanghaiTech PartB

    Configuration for ShanghaiTech PartB

    Hi,

    I could reproduce MAE results on ShanghaiTech part A using supplied arguments for train script. However, with same parameters, for ShanghaiTech part B I can not reproduce results, getting a MAE of 11.18 instead of article results which get a 6.25 .

    I have created an script to build train files out of ShanghaiTech datasets and I can confirm that this part is ok because I can reproduce MAE with generated files (for part A).

    For partB, I have tried with same supplied parameters and also changing learning rate but MAE does not go below 11.

    Any more tweaks to be done in dataset loading? Any different parameter values?

    Many thanks for your great article and code,

    Ricard

    opened by ricardborras 9
  • Convert to ONNX

    Convert to ONNX

    Hi all!

    I'd love to use this model in our ONNX flows but wasn't able to convert it to ONNX. Is there any known way of converting this model to ONNX?

    Code I am using:

    import os
    import sys
    import torch
    
    sys.path.append(os.path.abspath(f"{os.getcwd()}/model"))
    
    # Available after the above append
    # it's in the model folder
    from model.models.p2pnet import P2PNet
    from model.models.backbone import Backbone_VGG
    
    def main():
        onnx_model_name = sys.argv[1] or "model"
        onnx_model_name = f"{onnx_model_name}.onnx"
    
        print("Loading Model")
        # Create the model
        model_backbone = Backbone_VGG("vgg16_bn", True)
        model = P2PNet(model_backbone, 2, 2)
    
        # Load Weights
        checkpoint = torch.load("./model/weights/SHTechA.pth", map_location=torch.device('cpu'))
        model.load_state_dict(checkpoint["model"])
        model.eval() # Put in inference mode
        
        # Create dummy input
        dummy_input = torch.randn(1, 3, 640, 640)
        # dummy_input1 = torch.randn(1, 3, 1024, 1024)
        # dummy_input = (dummy_input0, dummy_input1)
    
        # Export as ONNX
        print(f"Exporting as ONNX: {onnx_model_name}")
        torch.onnx._export(
            model,
            dummy_input,
            onnx_model_name, # Output name
            opset_version=13, # ONNX Opset Version
            export_params=True, # Store the trained parameters in the model file
            do_constant_folding=True, # Execute constant folding for optimization
            input_names = ['input'],   # the model's input names 
            # output_names = ['pred_logits', 'pred_points'], # the model's output names (see forward in the architecture)
            output_names = ['pred_logits', 'pred_points'], # the model's output names (see forward in the architecture)
            dynamic_axes={
                # Input is an image [batch_size, channels, width, height]
                # all of it can be variable so we need to add it in dynamic_axes
                'input': {
                    0: 'batch_size',
                    1: 'channels',
                    2: 'width',
                    3: 'height'
                }, 
                'pred_logits': [0, 1, 2],
                'pred_points': [0, 1, 2],
            } 
        )
    
    
    if __name__ == "__main__":
        main()
    
    

    Error I receive:

    [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Add node. Name:'Add_88' Status Message: Add_88: right operand cannot broadcast on dim 1 LeftShape: {1,40960,2}, RightShape: {1,25600,2}
    

    image

    opened by XavierGeerinck 2
  • How to resolve

    How to resolve "out of memory" during evaluate on high resolution single image?

    https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet/blob/5c91a81ca062b1c7fd3db3ad1c55b1c21f0a7455/train.py#L187 As I run code above, an error "out of memory" pops while I run the script on single gpu RTX6000 with 24G memory.

    opened by Tracyummy 1
  • Pretrained VGG weights

    Pretrained VGG weights

    Hello,

    thanks for uploading the code!

    When trying to run a test using supplied pretrained Shanghai Dataset Part A, an error is raised about VGG_BN pretrained weights:

    No such file or directory: '/apdcephfs/private_changanwang/checkpoints/vgg16_bn-6c64b313.pth

    Could you share it?

    Thanks

    opened by ricardborras 1
  • 您好,我也是一样的情况,在出现了Namespace(backbone='vgg16_bn', gpu_id=0, line=2, output_dir='./logs/', row=2, weight_path='./weights/SHTechA.pth')之后,后面就没有反应了。不知道有没有顺利解决这个问题的

    您好,我也是一样的情况,在出现了Namespace(backbone='vgg16_bn', gpu_id=0, line=2, output_dir='./logs/', row=2, weight_path='./weights/SHTechA.pth')之后,后面就没有反应了。不知道有没有顺利解决这个问题的

        您好,我也是一样的情况,在出现了Namespace(backbone='vgg16_bn', gpu_id=0, line=2, output_dir='./logs/', row=2, weight_path='./weights/SHTechA.pth')之后,后面就没有反应了。不知道有没有顺利解决这个问题的
    

    Originally posted by @fatakWang in https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet/issues/16#issuecomment-1364650983

    opened by fatakWang 1
  • [Bug]: key error in weight_dict & losses

    [Bug]: key error in weight_dict & losses

    In models/p2pnet.py:

    Line 278: losses['loss_point'] = loss_bbox.sum() / num_points Line 335: weight_dict = {'loss_ce': 1, 'loss_points': args.point_loss_coef}

    one is "loss_points" and another is "loss_point" (an "s" is different) ! This would cause that the point loss is actually not used, which is fully wrong!

    opened by zhiyuanyou 0
  • Is there any specific reason as to why VGG-16 network is chosen for feature extraction?

    Is there any specific reason as to why VGG-16 network is chosen for feature extraction?

    While reading the paper, I came to know that the authors used 13 convs from VGG16 network to extract deep features. Since VGG nets have been around for a quite a long time, why didn't you choose some more efficient and accurate networks? Or is there something I don't know of? Thank you.

    opened by bit-scientist 0
  • RuntimeError when doing image resize in run_test.py. please help ~

    RuntimeError when doing image resize in run_test.py. please help ~

    hello ~thanks for this great job. I get some trouble when resizing images in run_tset.py. I modify new_width and new_height like this:

    # load the images
    img_raw = Image.open(data_path+img_path).convert('RGB')
    # round the size
    width, height = img_raw.size
    img_raw = img_raw.resize((int(width*0.5), int(height*0.5)), Image.ANTIALIAS)
    

    there is the error message: Namespace(backbone='vgg16_bn', gpu_id=0, line=2, output_dir='./logs/', row=2, weight_path='./weights/SHTechA.pth') File "run_test.py", line 140, in main(args) File "run_test.py", line 107, in main result = self.forward(*input, **kwargs) File "D:\working_space_HJ\CrowdCounting-P2PNet-main\models\p2pnet.py", line 215, in forward features_fpn = self.fpn([features[1], features[2], features[3]]) File "C:\ProgramData\Anaconda3\envs\crowd\lib\site-packages\torch\nn\modules\module.py", line 550, in call result = self.forward(*input, **kwargs) File "D:\working_space_HJ\CrowdCounting-P2PNet-main\models\p2pnet.py", line 183, in forward P4_x = P5_upsampled_x + P4_x RuntimeError: The size of tensor a (134) must match the size of tensor b (135) at non-singleton dimension 2

    please help me and I'm really appreciate it if anyone could answer these questions. Thanks very much. ^ ^

    opened by slent310 0
  • Accuracy of UCF-QNRF

    Accuracy of UCF-QNRF

    Can anyone reproduce the accuracy of UCF-QNRF? I hava conducted acout ten experiments on UCF-QNRF, however, the MAE and RMSE is 100 and 180. I can not achieve the reported MAE:85 and RMSE:154.5, even I use the settings in this paper.

    opened by 1286710929 2
Owner
Tencent YouTu Research
Tencent YouTu Research
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

HU Zeyu 82 Dec 27, 2022
TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

null 264 Jan 9, 2023
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

null 63 Nov 18, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

null 64 Jan 5, 2023
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

LMFD-PAD Note This is the official repository of the paper: LMFD-PAD: Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechani

null 28 Dec 2, 2022
Pytorch implementation of 'Fingerprint Presentation Attack Detector Using Global-Local Model'

RTK-PAD This is an official pytorch implementation of 'Fingerprint Presentation Attack Detector Using Global-Local Model', which is accepted by IEEE T

null 6 Aug 1, 2022
Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Rank & Sort Loss for Object Detection and Instance Segmentation The official implementation of Rank & Sort Loss. Our implementation is based on mmdete

Kemal Oksuz 229 Dec 20, 2022
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022
This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.

PyTorch implementation of DAQ This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021. For more informatio

CV Lab @ Yonsei University 36 Nov 4, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

null 163 Dec 22, 2022
Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021) This repository is the official PyTorc

Jingyun Liang 139 Dec 29, 2022
Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"

M3D-VTON: A Monocular-to-3D Virtual Try-On Network Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network" Paper | Suppl

null 109 Dec 29, 2022
[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

Yuxin Chen 148 Dec 16, 2022
Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

[ICCV2021] Learning to Regress Bodies from Images using Differentiable Semantic Rendering Getting Started DSR has been implemented and tested on Ubunt

Sai Kumar Dwivedi 83 Nov 27, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

RobustNet (CVPR 2021 Oral): Official Project Webpage Codes and pretrained models will be released soon. This repository provides the official PyTorch

Sungha Choi 173 Dec 21, 2022