Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

Overview

DroneCrowd

Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark.

Introduction

VisDrone

This paper proposes a space-time multi-scale attention network (STANet) to solve density map estimation, localization and tracking in dense crowds of video clips captured by drones with arbitrary crowd density, perspective, and flight altitude. Our STANet method aggregates multi-scale feature maps in sequential frames to exploit the temporal coherency, and then predict the density maps, localize the targets, and associate them in crowds simultaneously. A coarse-to-fine process is designed to gradually apply the attention module on the aggregated multi-scale feature maps to enforce the network to exploit the discriminative space-time features for better performance. The whole network is trained in an end-to-end manner with the multi-task loss, formed by three terms, i.e., the density map loss, localization loss and association loss. The non-maximal suppression followed by the min-cost flow framework is used to generate the trajectories of targets' in scenarios. Since existing crowd counting datasets merely focus on crowd counting in static cameras rather than density map estimation, counting and tracking in crowds on drones, we have collected a new large-scale drone-based dataset, DroneCrowd, formed by 112 video clips with 33,600 high resolution frames (i.e., 1920x1080) captured in 70 different scenarios. With intensive amount of effort, our dataset provides 20,800 people trajectories with 4.8 million head annotations and several video-level attributes in sequences. Extensive experiments are conducted on two challenging public datasets, i.e., Shanghaitech and UCF-QNRF, and our DroneCrowd, to demonstrate that STANet achieves favorable performance against the state-of-the-arts.

Dataset

ECCV2020 Challenge

The VisDrone 2020 Crowd Counting Challenge requires participating algorithms to count persons in each frame. The challenge will provide 112 challenging sequences, including 82 video sequences for training (2,420 frames in total), and 30 sequences for testing (900 frames in total), which are available on the download page. We manually annotate persons with points in each video frame.

DroneCrowd (1.03 GB): BaiduYun(code: h0j8)| GoogleDrive

DroneCrowd (Full Version)

This full version consists of 112 video clips with 33,600 high resolution frames (i.e., 1920x1080) captured in 70 different scenarios. With intensive amount of effort, our dataset provides 20,800 people trajectories with 4.8 million head annotations and several video-level attributes in sequences.

DroneCrowd BaiduYun(code:ml1u)| GoogleDrive

Code

Space-Time Neighbor-Aware Network (STNNet-pytorch)

Space-Time Multi-Scale Attention Network (STANet-pytorch)

Citation

Please cite this paper if you want to use it in your work.

@inproceedings{dronecrowd_cvpr2021,
  author    = {Longyin Wen and
               Dawei Du and
               Pengfei Zhu and
               Qinghua Hu and
               Qilong Wang and
               Liefeng Bo and
               Siwei Lyu},
  title     = {Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark},
  booktitle = {CVPR},
  year      = {2021}
}
Comments
  • Where do you store the tracklets of each pedestrian ?

    Where do you store the tracklets of each pedestrian ?

    In mytest.py it seems to me that calc_trkpt helps you to find the next detection point using tracking. Are you storing tracklets somewhere to help you rebuild the complete track of each pedestrian ? How can we reproduce the tracking that is illustrated in the DroneCrowd README.md ?

    opened by MounirB 1
  • [STNNet] spatial-correlation-sampler lib issue with RTX 3000 series GPUs

    [STNNet] spatial-correlation-sampler lib issue with RTX 3000 series GPUs

    Hello, It seems that the spatial-correlation-sampler doesn't work with a RTX 3000 series GPU when we try to install it through pip along with this conda environment conda create -n STTNet python=3.6 pytorch=1.6 torchvision -c pytorch I made it work by replacing python and pytorch by newest versions; respectively 3.9 and 1.11.

    Do we need oldest versions of pytorch to make STNNet work ?

    opened by MounirB 1
  • Can't read GT_img015203.mat and GT_img015205.mat in test_data/ground_truth

    Can't read GT_img015203.mat and GT_img015205.mat in test_data/ground_truth

    Thanks for providing such a good data set! I can't read the files GT_img015203.mat and GT_img015204.mat in test_data/ground_truth both with matlab and python. I download these files from BaiduYun. could you update these two files? thank you so much.

    opened by Amelie01 0
  • The same images in validation and test sets

    The same images in validation and test sets

    Hi, Why does val_data contain the same images as test_data?

    val_data

    bartosz@bartosz-pro:~/Downloads/val_data/images$ ll | head -n 20
    total 111000
    drwxrwxr-x 2 bartosz bartosz  20480 lis  6  2020 ./
    drwxrwxr-x 4 bartosz bartosz   4096 lis  6  2020 ../
    -rw-rw-r-- 1 bartosz bartosz 296163 mar 13  2018 img011001.jpg
    -rw-rw-r-- 1 bartosz bartosz 295048 mar 13  2018 img011002.jpg
    -rw-rw-r-- 1 bartosz bartosz 291832 mar 13  2018 img011003.jpg
    -rw-rw-r-- 1 bartosz bartosz 292177 mar 13  2018 img011004.jpg
    -rw-rw-r-- 1 bartosz bartosz 292308 mar 13  2018 img011005.jpg
    -rw-rw-r-- 1 bartosz bartosz 292594 mar 13  2018 img011006.jpg
    -rw-rw-r-- 1 bartosz bartosz 291670 mar 13  2018 img011007.jpg
    -rw-rw-r-- 1 bartosz bartosz 292245 mar 13  2018 img011008.jpg
    -rw-rw-r-- 1 bartosz bartosz 296073 mar 13  2018 img011009.jpg
    -rw-rw-r-- 1 bartosz bartosz 293852 mar 13  2018 img011010.jpg
    -rw-rw-r-- 1 bartosz bartosz 294563 mar 13  2018 img011011.jpg
    -rw-rw-r-- 1 bartosz bartosz 294614 mar 13  2018 img011012.jpg
    -rw-rw-r-- 1 bartosz bartosz 294053 lut  9  2019 img015001.jpg
    -rw-rw-r-- 1 bartosz bartosz 294591 lut  9  2019 img015002.jpg
    -rw-rw-r-- 1 bartosz bartosz 294779 lut  9  2019 img015003.jpg
    -rw-rw-r-- 1 bartosz bartosz 294860 mar 13  2018 img015004.jpg
    -rw-rw-r-- 1 bartosz bartosz 295132 mar 13  2018 img015005.jpg
    

    test_data

    bartosz@bartosz-pro:~/Downloads/test_data/images$ ll | head -n 20
    total 2769944
    drwxrwxr-x 2 bartosz bartosz 299008 mar  8  2021 ./
    drwxrwxr-x 4 bartosz bartosz   4096 mar  8  2021 ../
    -rw-rw-r-- 1 bartosz bartosz 296163 mar 13  2018 img011001.jpg
    -rw-rw-r-- 1 bartosz bartosz 295048 mar 13  2018 img011002.jpg
    -rw-rw-r-- 1 bartosz bartosz 291832 mar 13  2018 img011003.jpg
    -rw-rw-r-- 1 bartosz bartosz 292177 mar 13  2018 img011004.jpg
    -rw-rw-r-- 1 bartosz bartosz 292308 mar 13  2018 img011005.jpg
    -rw-rw-r-- 1 bartosz bartosz 292594 mar 13  2018 img011006.jpg
    -rw-rw-r-- 1 bartosz bartosz 291670 mar 13  2018 img011007.jpg
    -rw-rw-r-- 1 bartosz bartosz 292245 mar 13  2018 img011008.jpg
    -rw-rw-r-- 1 bartosz bartosz 296073 mar 13  2018 img011009.jpg
    -rw-rw-r-- 1 bartosz bartosz 293852 mar 13  2018 img011010.jpg
    -rw-rw-r-- 1 bartosz bartosz 294563 mar 13  2018 img011011.jpg
    -rw-rw-r-- 1 bartosz bartosz 294614 mar 13  2018 img011012.jpg
    -rw-rw-r-- 1 bartosz bartosz 294075 mar 13  2018 img011013.jpg
    -rw-rw-r-- 1 bartosz bartosz 293981 mar 13  2018 img011014.jpg
    -rw-rw-r-- 1 bartosz bartosz 293826 mar 13  2018 img011015.jpg
    -rw-rw-r-- 1 bartosz bartosz 294447 mar 13  2018 img011016.jpg
    -rw-rw-r-- 1 bartosz bartosz 295887 mar 13  2018 img011017.jpg
    
    opened by bartoszptak 0
  • Questions of training and testing Process

    Questions of training and testing Process

    Hello, thanks for the great work!!

    I tried to test the results. 'Setup environment' and 'Download the DroneCrowd data' and 'Ground-Truth Generation' are finished. The pre-trained models are downloaded. I get errors when try python mytest.py . The error :ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2

    image


    how might able to solve it?

    I am looking forward to your reply.

    opened by jackydinosaur 8
Owner
VisDrone
The official website for the VisDrone Challenge
VisDrone
Estimation of human density in a closed space using deep learning.

Siemens HOLLZOF challenge - Human Density Estimation Add project description here. Installing Dependencies: Install Python3 either system-wide, user-w

null 3 Aug 8, 2021
This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their

Liron Bdolah 8 May 22, 2022
Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization This is the official implementaion of paper TS-CAM: Token Semant

vasgaowei 112 Jan 2, 2023
Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

DEFT: Detection Embeddings for Tracking DEFT: Detection Embeddings for Tracking, Mohamed Chaabane, Peter Zhang, J. Ross Beveridge, Stephen O'Hara

Mohamed Chaabane 253 Dec 18, 2022
Tello Drone Trajectory Tracking

With this library you can track the trajectory of your tello drone or swarm of drones in real time.

Kamran Asgarov 2 Oct 12, 2022
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

RNN-for-Joint-NLU Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Kim SungDong 194 Dec 28, 2022
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 1.3k Jan 8, 2023
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

Computer Vision Insitute, SZU 113 Dec 27, 2022
PyTorch implementations of algorithms for density estimation

pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invert

Ilya Kostrikov 546 Dec 5, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs)

Description This program presents convolutional kernel density estimation, a method used to detect intercritical epilpetic spikes (IEDs) in [Gardy et

Ludovic Gardy 0 Feb 9, 2022
Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

Chong Yu Quan 6 Dec 31, 2022
TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

TPH-YOLOv5 This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured

cv516Buaa 439 Dec 22, 2022
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
STMTrack: Template-free Visual Tracking with Space-time Memory Networks

STMTrack This is the official implementation of the paper: STMTrack: Template-free Visual Tracking with Space-time Memory Networks. Setup Prepare Anac

Zhihong Fu 62 Dec 21, 2022