The official codes for the ICCV2021 Oral presentation "Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework"


P2PNet (ICCV2021 Oral Presentation)

This repository contains codes for the official implementation in PyTorch of P2PNet as described in Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework.

An brief introduction of P2PNet can be found at 机器之心 (almosthuman).

The codes is tested with PyTorch 1.5.0. It may not run with other versions.

Visualized demos for P2PNet

The network

The overall architecture of the P2PNet. Built upon the VGG16, it firstly introduce an upsampling path to obtain fine-grained feature map. Then it exploits two branches to simultaneously predict a set of point proposals and their confidence scores.

Comparison with state-of-the-art methods

The P2PNet achieved state-of-the-art performance on several challenging datasets with various densities.

Methods Venue SHTechPartA
CAN CVPR'19 62.3/100.0 7.8/12.2 212.2/243.7 107.0/183.0
Bayesian+ ICCV'19 62.8/101.8 7.7/12.7 229.3/308.2 88.7/154.8
S-DCNet ICCV'19 58.3/95.0 6.7/10.7 204.2/301.3 104.4/176.1
SANet+SPANet ICCV'19 59.4/92.5 6.5/9.9 232.6/311.7 -/-
DUBNet AAAI'20 64.6/106.8 7.7/12.5 243.8/329.3 105.6/180.5
SDANet AAAI'20 63.6/101.8 7.8/10.2 227.6/316.4 -/-
ADSCNet CVPR'20 55.4/97.7 6.4/11.3 198.4/267.3 71.3/132.5
ASNet CVPR'20 57.78/90.13 -/- 174.84/251.63 91.59/159.71
AMRNet ECCV'20 61.59/98.36 7.02/11.00 184.0/265.8 86.6/152.2
AMSNet ECCV'20 56.7/93.4 6.7/10.2 208.4/297.3 101.8/163.2
DM-Count NeurIPS'20 59.7/95.7 7.4/11.8 211.0/291.5 85.6/148.3
Ours - 52.74/85.06 6.25/9.9 172.72/256.18 85.32/154.5

Comparison on the NWPU-Crowd dataset.

Methods MAE[O] MSE[O] MAE[L] MAE[S]
MCNN 232.5 714.6 220.9 1171.9
SANet 190.6 491.4 153.8 716.3
CSRNet 121.3 387.8 112.0 522.7
PCC-Net 112.3 457.0 111.0 777.6
CANNet 110.0 495.3 102.3 718.3
Bayesian+ 105.4 454.2 115.8 750.5
S-DCNet 90.2 370.5 82.9 567.8
DM-Count 88.4 388.6 88.0 498.0
Ours 77.44 362 83.28 553.92

The overall performance for both counting and localization.

nAP$_{\delta}$ SHTechPartA SHTechPartB UCF_CC_50 UCF_QNRF NWPU_Crowd
$\delta=0.05$ 10.9% 23.8% 5.0% 5.9% 12.9%
$\delta=0.25$ 70.3% 84.2% 54.5% 55.4% 71.3%
$\delta=0.50$ 90.1% 94.1% 88.1% 83.2% 89.1%
$\delta={{0.05:0.05:0.50}}$ 64.4% 76.3% 54.3% 53.1% 65.0%

Comparison for the localization performance in terms of F1-Measure on NWPU.

Method F1-Measure Precision Recall
FasterRCNN 0.068 0.958 0.035
TinyFaces 0.567 0.529 0.611
RAZ 0.599 0.666 0.543
Crowd-SDNet 0.637 0.651 0.624
PDRNet 0.653 0.675 0.633
TopoCount 0.692 0.683 0.701
D2CNet 0.700 0.741 0.662
Ours 0.712 0.729 0.695


  • Clone this repo into a directory named P2PNET_ROOT
  • Organize your datasets as required
  • Install Python dependencies. We use python 3.6.5 and pytorch 1.5.0
pip install -r requirements.txt

Organize the counting dataset

We use a list file to collect all the images and their ground truth annotations in a counting dataset. When your dataset is organized as recommended in the following, the format of this list file is defined as:

train/scene01/img01.jpg train/scene01/img01.txt
train/scene01/img02.jpg train/scene01/img02.txt
train/scene02/img01.jpg train/scene02/img01.txt

Dataset structures:

        |    |->scene01/
        |    |->scene02/
        |    |->...
        |    |->scene01/
        |    |->scene02/
        |    |->...

DATA_ROOT is your path containing the counting datasets.

Annotations format

For the annotations of each image, we use a single txt file which contains one annotation per line. Note that indexing for pixel values starts at 0. The expected format of each line is:

x1 y1
x2 y2


The network can be trained using the script. For training on SHTechPartA, use

CUDA_VISIBLE_DEVICES=0 python --data_root $DATA_ROOT \
    --dataset_file SHHA \
    --epochs 3500 \
    --lr_drop 3500 \
    --output_dir ./logs \
    --checkpoints_dir ./weights \
    --tensorboard_dir ./logs \
    --lr 0.0001 \
    --lr_backbone 0.00001 \
    --batch_size 8 \
    --eval_freq 1 \
    --gpu_id 0

By default, a periodic evaluation will be conducted on the validation set.


A trained model (with an MAE of 51.96) on SHTechPartA is available at "./weights", run the following commands to launch a visualization demo:

CUDA_VISIBLE_DEVICES=0 python --weight_path ./weights/SHTechA.pth --output_dir ./logs/


  Part of codes are borrowed from the C^3 Framework.
  We refer to DETR to implement our matching strategy.

Citing P2PNet

If you find P2PNet is useful in your project, please consider citing us:

  title={Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework},
  author={Song, Qingyu and Wang, Changan and Jiang, Zhengkai and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Wu, Yang},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},

Related works from Tencent Youtu Lab

  [AAAI2021] To Choose or to Fuse? Scale Selection for Crowd Counting. (paper link & codes)
  [ICCV2021] Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting. (paper link & codes)
    I could reproduce MAE results on ShanghaiTech part A using supplied arguments for train script. However, with same parameters, for ShanghaiTech part B I can not reproduce results, getting a MAE of 11.18 instead of article results which get a 6.25 .

    I have created an script to build train files out of ShanghaiTech datasets and I can confirm that this part is ok because I can reproduce MAE with generated files (for part A).

    For partB, I have tried with same supplied parameters and also changing learning rate but MAE does not go below 11.

    Any more tweaks to be done in dataset loading? Any different parameter values?

    Many thanks for your great article and code,


  • Convert to ONNX

    Hi all!

    I'd love to use this model in our ONNX flows but wasn't able to convert it to ONNX. Is there any known way of converting this model to ONNX?

    Code I am using:

    import os
    import sys
    import torch
    # Available after the above append
    # it's in the model folder
    from model.models.p2pnet import P2PNet
    from model.models.backbone import Backbone_VGG
    def main():
        onnx_model_name = sys.argv[1] or "model"
        onnx_model_name = f"{onnx_model_name}.onnx"
        print("Loading Model")
        # Create the model
        model_backbone = Backbone_VGG("vgg16_bn", True)
        model = P2PNet(model_backbone, 2, 2)
        # Load Weights
        checkpoint = torch.load("./model/weights/SHTechA.pth", map_location=torch.device('cpu'))
        model.eval() # Put in inference mode
        # Create dummy input
        dummy_input = torch.randn(1, 3, 640, 640)
        # dummy_input1 = torch.randn(1, 3, 1024, 1024)
        # dummy_input = (dummy_input0, dummy_input1)
        # Export as ONNX
        print(f"Exporting as ONNX: {onnx_model_name}")
            onnx_model_name, # Output name
            opset_version=13, # ONNX Opset Version
            export_params=True, # Store the trained parameters in the model file
            do_constant_folding=True, # Execute constant folding for optimization
            input_names = ['input'],   # the model's input names 
            # output_names = ['pred_logits', 'pred_points'], # the model's output names (see forward in the architecture)
            output_names = ['pred_logits', 'pred_points'], # the model's output names (see forward in the architecture)
                # Input is an image [batch_size, channels, width, height]
                # all of it can be variable so we need to add it in dynamic_axes
                'input': {
                    0: 'batch_size',
                    1: 'channels',
                    2: 'width',
                    3: 'height'
                'pred_logits': [0, 1, 2],
                'pred_points': [0, 1, 2],
    if __name__ == "__main__":

    Error I receive:

    [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Add node. Name:'Add_88' Status Message: Add_88: right operand cannot broadcast on dim 1 LeftShape: {1,40960,2}, RightShape: {1,25600,2}


  • How to resolve

    How to resolve "out of memory" during evaluate on high resolution single image? As I run code above, an error "out of memory" pops while I run the script on single gpu RTX6000 with 24G memory.

  • Pretrained VGG weights

    thanks for uploading the code!

    When trying to run a test using supplied pretrained Shanghai Dataset Part A, an error is raised about VGG_BN pretrained weights:

    No such file or directory: '/apdcephfs/private_changanwang/checkpoints/vgg16_bn-6c64b313.pth

    Could you share it?


  • [Bug]: key error in weight_dict & losses

    In models/

    Line 278: losses['loss_point'] = loss_bbox.sum() / num_points Line 335: weight_dict = {'loss_ce': 1, 'loss_points': args.point_loss_coef}

    one is "loss_points" and another is "loss_point" (an "s" is different) ! This would cause that the point loss is actually not used, which is fully wrong!

    opened by zhiyuanyou 0
  • Is there any specific reason as to why VGG-16 network is chosen for feature extraction?

    While reading the paper, I came to know that the authors used 13 convs from VGG16 network to extract deep features. Since VGG nets have been around for a quite a long time, why didn't you choose some more efficient and accurate networks? Or is there something I don't know of? Thank you.

    opened by bit-scientist 0
    hello ~thanks for this great job. I get some trouble when resizing images in I modify new_width and new_height like this:

    # load the images
    img_raw ='RGB')
    # round the size
    width, height = img_raw.size
    img_raw = img_raw.resize((int(width*0.5), int(height*0.5)), Image.ANTIALIAS)

    there is the error message: Namespace(backbone='vgg16_bn', gpu_id=0, line=2, output_dir='./logs/', row=2, weight_path='./weights/SHTechA.pth') File "", line 140, in main(args) File "", line 107, in main result = self.forward(*input, **kwargs) File "D:\working_space_HJ\CrowdCounting-P2PNet-main\models\", line 215, in forward features_fpn = self.fpn([features[1], features[2], features[3]]) File "C:\ProgramData\Anaconda3\envs\crowd\lib\site-packages\torch\nn\modules\", line 550, in call result = self.forward(*input, **kwargs) File "D:\working_space_HJ\CrowdCounting-P2PNet-main\models\", line 183, in forward P4_x = P5_upsampled_x + P4_x RuntimeError: The size of tensor a (134) must match the size of tensor b (135) at non-singleton dimension 2

    please help me and I'm really appreciate it if anyone could answer these questions. Thanks very much. ^ ^

  • Accuracy of UCF-QNRF

    Can anyone reproduce the accuracy of UCF-QNRF? I hava conducted acout ten experiments on UCF-QNRF, however, the MAE and RMSE is 100 and 180. I can not achieve the reported MAE:85 and RMSE:154.5, even I use the settings in this paper.

    opened by 1286710929 2
