Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Overview

RGBT Crowd Counting

Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [PDF]

Download RGBT-CC Dataset & Models: [Dropbox][BaiduYun (PW: RGBT)]

Our framework can be implemented with various backbone networks. You can refer to this page for implementing BL+IADM. Moreover, the proposed framework can also be applied to RGBD crowd counting and the implementation of CSRNet+IADM is available.

If you use this code and benchmark for your research, please cite our work:

@inproceedings{liu2021cross,
  title={Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting},
  author={Liu, Lingbo and Chen, Jiaqi and Wu, Hefeng and Li, Guanbin and Li, Chenglong and Lin, Liang},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Introduction

Crowd counting is a fundamental yet challenging task, which desires rich information to generate pixel-wise crowd density maps. In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to capture the complementary information of different modalities fully. Specifically, our IADM incorporates two collaborative information transfers to dynamically enhance the modality-shared and modality-specific representations with a dual information propagation mechanism. Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.

RGBT-CC Benchmark

To promote the future research of this task, we propose a large-scale RGBT Crowd Counting (RGBT-CC) benchmark. Specifically, this benchmark consists of 2,030 pairs of 640x480 RGB-thermal images captured in various scenarios (e.g., malls, streets, playgrounds, train stations, metro stations, etc). Among these samples, 1,013 pairs are captured in the light and 1,017 pairs are in the darkness. A total of 138,389 pedestrians are marked with point annotations, on average 68 people per image. Finally, the proposed RGBT-CC benchmark is randomly divided into three parts: 1030 pairs are used for training, 200 pairs are for validation and 800 pairs are for testing. Compared with those Internet-based datasets with serious bias, our RGBT-CC dataset has closer crowd density distribution to realistic cities, since our images are captured in urban scenes with various densities. Therefore, our dataset has wider applications for urban crowd analysis.

Method

The proposed RGBT crowd counting framework is composed of three parallel backbones and an Information Aggregation-Distribution Module (IADM). Specifically, the top and bottom backbones are developed for modality-specific (i.e. RGB images and thermal images) representation learning, while the middle backbone is designed for modality-shared representation learning. To fully exploit the multimodal complementarities, our IADM dynamically transfers the specific-shared information to collaboratively enhance the modality-specific and modality-shared representations. Consequently, the final modality-shared feature contains comprehensive information and facilitates generating high-quality crowd density maps.

Experiments

More References

Crowd Counting with Deep Structured Scale Integration Network, ICCV 2019 [PDF]

Crowd Counting using Deep Recurrent Spatial-Aware Network, IJCAI 2018 [PDF]

Efficient Crowd Counting via Structured Knowledge Transfer, ACM MM 2020 [PDF]

You might also like...
The implement of papar
The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

SIGIR2021-EGLN The implement of paper "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization" Neural graph based Col

PyTorch implementations of the paper:
PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

DRNet for Video Indvidual Counting (CVPR 2022) Introduction This is the official PyTorch implementation of paper: DR.VIC: Decomposition and Reasoning

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)
DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021
Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

DHF1K =========================================================================== Wenguan Wang, J. Shen, M.-M Cheng and A. Borji, Revisiting Video Sal

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue. This

Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.
Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting Official PyTorch implementation of the paper: Improving Graph Neural Net

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition
[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

Comments
  • 关于test和val阶段

    关于test和val阶段

    您好,我关注了您RGBT-CC的工作,感觉收获很多,近期工作想在您的工作上进行一个扩展。关于您的代码中test和val的部分,是将整个480640的图片进行正向传播后进行评估。但如果模型对输入尺寸有限制,比如256256,需要在评估时先进行resize或者crop再拼接的方法,针对您给的预训练模型,我两种方法都尝试了一下,但指标下降严重,想请教一下您有没有好的建议,非常感谢。

    opened by gaiyi7788 0
  • 请问贝叶斯loss部分的代码是否有误

    请问贝叶斯loss部分的代码是否有误

    之前提问过是否能使用多batch训练,您这边回复说没有尝试过,我读了一下贝叶斯loss的代码,发现以下问题:

    class Bay_Loss(Module):
        def __init__(self, use_background, device):
            super(Bay_Loss, self).__init__()
            self.device = device
            self.use_bg = use_background
    
        def forward(self, prob_list, pre_density):
            loss = 0
            for idx, prob in enumerate(prob_list):  # iterative through each sample
                if prob is None:  # image contains no annotation points
                    pre_count = torch.sum(pre_density[idx])
                    target = torch.zeros((1,), dtype=torch.float32, device=self.device)
                else:
                    N = len(prob)
                    if self.use_bg:
                        target = torch.ones((N,), dtype=torch.float32, device=self.device)
                        target[-1] = 0.0  # the expectation count of background should be zero
                    else:
                        target = torch.ones((N,), dtype=torch.float32, device=self.device)
                    pre_count = torch.sum(pre_density[idx].view((1, -1)) * prob, dim=1)  # flatten into vector
    
            loss += torch.sum(torch.abs(target - pre_count))
            loss = loss / len(prob_list)
            return loss
    

    最后几行我认为+=位置写错,少一个缩进,应改为

                loss += torch.sum(torch.abs(target - pre_count))
            loss = loss / len(prob_list)
            return loss
    

    loss加和写在了循环外面,所以多batch无法正常训练。贝叶斯loss的源代码似乎不存在这个问题。

    目前修改后我可以进行多batch训练,以上个人理解,望指正,感谢!

    opened by gaiyi7788 0
  • 请问为什么我BL+IADM for RGBT Crowd Counting训练准确率达不到这么高。

    请问为什么我BL+IADM for RGBT Crowd Counting训练准确率达不到这么高。

    谢谢大佬提供的代码,我在复现的时候遇到了些问题。

    我使用的单张3090,其他全部使用默认参数进行直接训练。训练过程为:

    ................... 08-27 08:31:48 -----Epoch 197/199----- 08-27 08:33:28 Epoch 197 Train, Loss: 6.54, GAME0: 3.61 MSE: 7.36, Cost 99.3 sec 08-27 08:33:43 Epoch 197 Val200, GAME0 10.53 GAME1 15.08 GAME2 19.54 GAME3 27.40 MSE 17.58 Re 0.1624, 08-27 08:33:43 *** Best Val GAME0 10.531 GAME3 27.315 model epoch 197 08-27 08:34:38 Epoch 197 Test800, GAME0 17.60 GAME1 21.88 GAME2 26.41 GAME3 34.43 MSE 30.26 Re 0.2425, 08-27 08:34:38 -----Epoch 198/199----- 08-27 08:36:18 Epoch 198 Train, Loss: 6.21, GAME0: 3.40 MSE: 6.97, Cost 99.9 sec 08-27 08:36:34 Epoch 198 Val200, GAME0 12.27 GAME1 16.63 GAME2 20.74 GAME3 29.18 MSE 24.31 Re 0.1924, 08-27 08:36:34 -----Epoch 199/199----- 08-27 08:38:15 Epoch 199 Train, Loss: 6.49, GAME0: 3.68 MSE: 6.53, Cost 101.2 sec 08-27 08:38:30 Epoch 199 Val200, GAME0 10.66 GAME1 15.33 GAME2 19.65 GAME3 27.63 MSE 15.74 Re 0.1970,

    ./test.sh

    testing... /root/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:3672: UserWarning: nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.") Test800, GAME0 17.60 GAME1 21.88 GAME2 26.41 GAME3 34.43 MSE 30.26 Re 0.2425,

    结果只有17.6。请问是否为我实验参数的问题。

    opened by lyccol 8
Owner
null
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

null 19 Oct 14, 2022
Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

LibraNet This repository includes the official implementation of LibraNet for crowd counting, presented in our paper: Weighing Counts: Sequential Crow

Hao Lu 18 Nov 5, 2022
DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

DCSL Generalizable Crowd Counting via Diverse Context Style Learning Requirement

null 3 Jun 13, 2022
PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

Xinlei-Pei 6 Dec 23, 2022
LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

LWCC: A LightWeight Crowd Counting library for Python LWCC is a lightweight crowd counting framework for Python. It wraps four state-of-the-art models

Matija Teršek 39 Dec 28, 2022
The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

GCoNet The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection . Trained model Download final_gconet.pth

Qi Fan 46 Nov 17, 2022
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

GraspNet 209 Dec 29, 2022
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

null 184 Dec 11, 2022
Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021 Official Pytorch implementation of PCME | Paper Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de R

NAVER AI 87 Dec 21, 2022
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

Isen (Songyao Jiang) 128 Dec 8, 2022