Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Last update: Dec 8, 2022

Related tags

Deep Learning RGBTCrowdCounting

Overview

RGBT Crowd Counting

Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [PDF]

Download RGBT-CC Dataset & Models: [Dropbox][BaiduYun (PW: RGBT)]

Our framework can be implemented with various backbone networks. You can refer to this page for implementing BL+IADM. Moreover, the proposed framework can also be applied to RGBD crowd counting and the implementation of CSRNet+IADM is available.

If you use this code and benchmark for your research, please cite our work:

@inproceedings{liu2021cross,
  title={Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting},
  author={Liu, Lingbo and Chen, Jiaqi and Wu, Hefeng and Li, Guanbin and Li, Chenglong and Lin, Liang},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Introduction

Crowd counting is a fundamental yet challenging task, which desires rich information to generate pixel-wise crowd density maps. In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to capture the complementary information of different modalities fully. Specifically, our IADM incorporates two collaborative information transfers to dynamically enhance the modality-shared and modality-specific representations with a dual information propagation mechanism. Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.

RGBT-CC Benchmark

To promote the future research of this task, we propose a large-scale RGBT Crowd Counting (RGBT-CC) benchmark. Specifically, this benchmark consists of 2,030 pairs of 640x480 RGB-thermal images captured in various scenarios (e.g., malls, streets, playgrounds, train stations, metro stations, etc). Among these samples, 1,013 pairs are captured in the light and 1,017 pairs are in the darkness. A total of 138,389 pedestrians are marked with point annotations, on average 68 people per image. Finally, the proposed RGBT-CC benchmark is randomly divided into three parts: 1030 pairs are used for training, 200 pairs are for validation and 800 pairs are for testing. Compared with those Internet-based datasets with serious bias, our RGBT-CC dataset has closer crowd density distribution to realistic cities, since our images are captured in urban scenes with various densities. Therefore, our dataset has wider applications for urban crowd analysis.

Method

The proposed RGBT crowd counting framework is composed of three parallel backbones and an Information Aggregation-Distribution Module (IADM). Specifically, the top and bottom backbones are developed for modality-specific (i.e. RGB images and thermal images) representation learning, while the middle backbone is designed for modality-shared representation learning. To fully exploit the multimodal complementarities, our IADM dynamically transfers the specific-shared information to collaboratively enhance the modality-specific and modality-shared representations. Consequently, the final modality-shared feature contains comprehensive information and facilitates generating high-quality crowd density maps.

Experiments

More References

Crowd Counting with Deep Structured Scale Integration Network, ICCV 2019 [PDF]

Crowd Counting using Deep Recurrent Spatial-Aware Network, IJCAI 2018 [PDF]

Efficient Crowd Counting via Structured Knowledge Transfer, ACM MM 2020 [PDF]

You might also like...

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

SIGIR2021-EGLN The implement of paper "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization" Neural graph based Col

15 Dec 27, 2022

PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

DRNet for Video Indvidual Counting (CVPR 2022) Introduction This is the official PyTorch implementation of paper: DR.VIC: Decomposition and Reasoning

35 Nov 22, 2022

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

48 Dec 30, 2022

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

130 Dec 2, 2022

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

42 Nov 17, 2022

Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

DHF1K =========================================================================== Wenguan Wang, J. Shen, M.-M Cheng and A. Borji, Revisiting Video Sal

126 Dec 3, 2022

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue. This

290 Dec 29, 2022

Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting Official PyTorch implementation of the paper: Improving Graph Neural Net

58 Dec 31, 2022

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

29 Dec 10, 2022

Comments

关于test和val阶段

您好，我关注了您RGBT-CC的工作，感觉收获很多，近期工作想在您的工作上进行一个扩展。关于您的代码中test和val的部分，是将整个480640的图片进行正向传播后进行评估。但如果模型对输入尺寸有限制，比如256256，需要在评估时先进行resize或者crop再拼接的方法，针对您给的预训练模型，我两种方法都尝试了一下，但指标下降严重，想请教一下您有没有好的建议，非常感谢。

opened by gaiyi7788 0

请问贝叶斯loss部分的代码是否有误

之前提问过是否能使用多batch训练，您这边回复说没有尝试过，我读了一下贝叶斯loss的代码，发现以下问题：

class Bay_Loss(Module):
    def __init__(self, use_background, device):
        super(Bay_Loss, self).__init__()
        self.device = device
        self.use_bg = use_background

    def forward(self, prob_list, pre_density):
        loss = 0
        for idx, prob in enumerate(prob_list):  # iterative through each sample
            if prob is None:  # image contains no annotation points
                pre_count = torch.sum(pre_density[idx])
                target = torch.zeros((1,), dtype=torch.float32, device=self.device)
            else:
                N = len(prob)
                if self.use_bg:
                    target = torch.ones((N,), dtype=torch.float32, device=self.device)
                    target[-1] = 0.0  # the expectation count of background should be zero
                else:
                    target = torch.ones((N,), dtype=torch.float32, device=self.device)
                pre_count = torch.sum(pre_density[idx].view((1, -1)) * prob, dim=1)  # flatten into vector

        loss += torch.sum(torch.abs(target - pre_count))
        loss = loss / len(prob_list)
        return loss

最后几行我认为+=位置写错，少一个缩进，应改为

            loss += torch.sum(torch.abs(target - pre_count))
        loss = loss / len(prob_list)
        return loss

loss加和写在了循环外面，所以多batch无法正常训练。贝叶斯loss的源代码似乎不存在这个问题。

目前修改后我可以进行多batch训练，以上个人理解，望指正，感谢！

opened by gaiyi7788 0

请问为什么我BL+IADM for RGBT Crowd Counting训练准确率达不到这么高。

谢谢大佬提供的代码，我在复现的时候遇到了些问题。

我使用的单张3090，其他全部使用默认参数进行直接训练。训练过程为：

................... 08-27 08:31:48 -----Epoch 197/199----- 08-27 08:33:28 Epoch 197 Train, Loss: 6.54, GAME0: 3.61 MSE: 7.36, Cost 99.3 sec 08-27 08:33:43 Epoch 197 Val200, GAME0 10.53 GAME1 15.08 GAME2 19.54 GAME3 27.40 MSE 17.58 Re 0.1624, 08-27 08:33:43 *** Best Val GAME0 10.531 GAME3 27.315 model epoch 197 08-27 08:34:38 Epoch 197 Test800, GAME0 17.60 GAME1 21.88 GAME2 26.41 GAME3 34.43 MSE 30.26 Re 0.2425, 08-27 08:34:38 -----Epoch 198/199----- 08-27 08:36:18 Epoch 198 Train, Loss: 6.21, GAME0: 3.40 MSE: 6.97, Cost 99.9 sec 08-27 08:36:34 Epoch 198 Val200, GAME0 12.27 GAME1 16.63 GAME2 20.74 GAME3 29.18 MSE 24.31 Re 0.1924, 08-27 08:36:34 -----Epoch 199/199----- 08-27 08:38:15 Epoch 199 Train, Loss: 6.49, GAME0: 3.68 MSE: 6.53, Cost 101.2 sec 08-27 08:38:30 Epoch 199 Val200, GAME0 10.66 GAME1 15.33 GAME2 19.65 GAME3 27.63 MSE 15.74 Re 0.1970,

./test.sh

testing... /root/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:3672: UserWarning: nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.") Test800, GAME0 17.60 GAME1 21.88 GAME2 26.41 GAME3 34.43 MSE 30.26 Re 0.2425,

结果只有17.6。请问是否为我实验参数的问题。

opened by lyccol 8

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Related tags

Overview

RGBT Crowd Counting

Introduction

RGBT-CC Benchmark

Method

Experiments

More References

You might also like...

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

Comments

关于test和val阶段

请问贝叶斯loss部分的代码是否有误

请问为什么我BL+IADM for RGBT Crowd Counting训练准确率达不到这么高。

./test.sh

Owner

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

DCSL - Generalizable Crowd Counting via Diverse Context Style Learning

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.