Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

Zj Li

Last update: Dec 31, 2022

Related tags

Computer Vision SeqNet

Overview

This repository hosts the source code of our paper: [AAAI 2021]Sequential End-to-end Network for Efficient Person Search. SeqNet achieves the state-of-the-art performance on two widely used benchmarks and runs at 11.5 FPS on a single GPU. You can find a brief Chinese introduction at zhihu.

Performance profile:

Dataset	mAP	Top-1	Model
CUHK-SYSU	94.8	95.7	model
PRW	47.6	87.6	model

The network structure is simple and suitable as baseline:

Installation

Run pip install -r requirements.txt in the root directory of the project.

Quick Start

Let's say $ROOT is the root directory.

Download CUHK-SYSU and PRW datasets, and unzip them to $ROOT/data

$ROOT/data
├── CUHK-SYSU
└── PRW

Following the link in the above table, download our pretrained model to anywhere you like, e.g., $ROOT/exp_cuhk
Evaluate its performance by specifing the paths of checkpoint and corresponding configuration file.

python train.py --cfg $ROOT/exp_cuhk/config.yaml --eval --ckpt $ROOT/exp_cuhk/epoch_19.pth

Training

Pick one configuration file you like in $ROOT/configs, and run with it.

python train.py --cfg configs/cuhk_sysu.yaml

Note: At present, our script only supports single GPU training, but distributed training will be also supported in future. By default, the batch size and the learning rate during training are set to 5 and 0.003 respectively, which requires about 28GB of GPU memory. If your GPU cannot provide the required memory, try smaller batch size and learning rate (performance may degrade). Specifically, your setting should follow the Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k. For example:

python train.py --cfg configs/cuhk_sysu.yaml INPUT.BATCH_SIZE_TRAIN 2 SOLVER.BASE_LR 0.0012

Tip: If the training process stops unexpectedly, you can resume from the specified checkpoint.

python train.py --cfg configs/cuhk_sysu.yaml --resume --ckpt /path/to/your/checkpoint

Test

Suppose the output directory is $ROOT/exp_cuhk. Test the trained model:

python train.py --cfg $ROOT/exp_cuhk/config.yaml --eval --ckpt $ROOT/exp_cuhk/epoch_19.pth

Test with Context Bipartite Graph Matching algorithm:

python train.py --cfg $ROOT/exp_cuhk/config.yaml --eval --ckpt $ROOT/exp_cuhk/epoch_19.pth EVAL_USE_CBGM True

Test the upper bound of the person search performance by using GT boxes:

python train.py --cfg $ROOT/exp_cuhk/config.yaml --eval --ckpt $ROOT/exp_cuhk/epoch_19.pth EVAL_USE_GT True

Pull Request

Pull request is welcomed! Before submitting a PR, DO NOT forget to run ./dev/linter.sh that provides syntax checking and code style optimation.

Citation

@inproceedings{li2021sequential,
  title={Sequential End-to-end Network for Efficient Person Search},
  author={Li, Zhengjia and Miao, Duoqian},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  year={2021}
}

Comments

run python train.py can't get the search results

When I run python train.py --cfg $ROOT/exp_cuhk/config.yaml --ckpt $ROOT/exp_cuhk/epoch_19.pth, I can't get the search results in demo_imgs. So I want to ask if it should be python demo.py --cfg cfg $ROOT/exp_cuhk/config.yaml --ckpt $ROOT/exp_cuhk/epoch_19.pth. Thanks!

opened by circlety 6
Problems running demo.py

Hi, I was running the demo code of seqnet: python demo.py --cfg ~/checkpoints/seqnet/config.yaml --ckpt ~/checkpoints/seqnet/epoch_17.pth

but I got the following error:

Traceback (most recent call last): File "demo.py", line 89, in main(args) File "demo.py", line 63, in main query_feat = model(query_img, query_target)[0] File "/home/mmmm/miniconda3/envs/seqnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, **kwargs) File "/home/mmmm/repo/SeqNet/models/seqnet.py", line 134, in forward return self.inference(images, targets, query_img_as_gallery) File "/home/mmmm/repo/SeqNet/models/seqnet.py", line 117, in inference box_features = self.roi_heads.box_roi_pool(features, boxes, images.image_sizes) File "/home/mmmm/miniconda3/envs/seqnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, **kwargs) File "/home/mmmm/miniconda3/envs/seqnet/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 141, in forward sampling_ratio=self.sampling_ratio File "/home/mmmm/miniconda3/envs/seqnet/lib/python3.7/site-packages/torchvision/ops/roi_align.py", line 70, in roi_align return _RoIAlignFunction.apply(input, rois, output_size, spatial_scale, sampling_ratio) File "/home/mmmm/miniconda3/envs/seqnet/lib/python3.7/site-packages/torchvision/ops/roi_align.py", line 25, in forward output_size[0], output_size[1], sampling_ratio) RuntimeError: Expected tensor for argument #1 'input' to have the same type as tensor for argument #2 'rois'; but type Variable[CUDAFloatType] does not equal Variable[CUDALongType] (while checking arguments for ROIAlign_forward_cuda) (checkSameType at /pytorch/aten/src/ATen/TensorUtils.cpp:140)

So far I had no idea what caused this error, could you help me with this problem? Thx a lot.
PyTorch version mismatch

opened by Svithjod 5
"Loss is nan, stopping train" appears regularly

I followed the steps in the READ.ME, configured the file directory structure, and trained the model. But there are always strange problems, like the log information intercepted below.

----OUTPUT---- Epoch: [5] [1660/2241] eta: 0:08:56 lr: 0.003000 loss: 2.2882 (2.4257) loss_proposal_cls: 0.0818 (0.0915) loss_proposal_reg: 1.2728 (1.4000) loss_box_cls: 0.1167 (0.1311) loss_box_reg: 0.1667 (0.1707) loss_box_reid: 0.4618 (0.5611) loss_rpn_reg: 0.0283 (0.0344) loss_rpn_cls: 0.0317 (0.0369) time: 0.9248 data: 0.0005 max mem: 24005 Loss is nan, stopping training {'loss_proposal_cls': tensor(0.0837, device='cuda:0', grad_fn=), 'loss_proposal_reg': tensor(1.3923, device='cuda:0', grad_fn=), 'loss_box_cls': tensor(0.1187, device='cuda:0', grad_fn=), 'loss_box_reg': tensor(0.1719, device='cuda:0', grad_fn=), 'loss_box_reid': tensor(nan, device='cuda:0', grad_fn=), 'loss_rpn_reg': tensor(0.0457, device='cuda:0', grad_fn=), 'loss_rpn_cls': tensor(0.0226, device='cuda:0', grad_fn=)}

This phenomenon occurs after executing a fixed number of epochs. The error "Loss is nan, stopping training" is very regular. For example, after 5 epochs, it will appear after the 1160th batch of the 6th epoch, whether it is training from epoch=0 or using the --resume command .

Whether the model is trained on the RTX A6000，RTX A5000 or Tesla V100 32G, or whether the batch size and learning rate are adjusted in equal proportions, this error will occur, thus stopping the training.

I used the --resume command to train for 20 epochs, and observed that every time the problem appeared on the loss_box_reid.

This should be a bug in the code, but I'm not quite sure how it came about and how to fix it.
PyTorch version mismatch

opened by YizJia 4
How I know the searched people is the same as the people in the gallery?

a person may appear in serveral pictures,and the dataset has the unique person ID for every one? because gallery and query has duplicate pics, and one pic may contain serveral persons, so you compute IOU between searched perosn box with groud truth to get positive case? thank you!

opened by Yue-Rain 4
oim loss is not contain CQ(circuler queue) softmax loss ?

74 in oim.py

loss_oim=F.cross_entropy(projected,label,ignore_index=5554) I have a question, if you ignore_index 5554, you just calculate labelled pids classification loss. And where is unlabelled pids softmax cross_entorpy loss?

opened by HenryZhangJianhe 3
工程问题

作者您好，我最近在Nvidia NX上做一个行人检测+行人识别的项目，我采用yolov5+reid的方案，但是跟踪速度不能满足我的需求，我想用这个算法来替换我的方案。因此我想请教三个问题，1:如果我减小这个网络，将速度提升至实时，是否会对精度有特别大的影响。2:这个算法能否应用于同时对三个行人识别的场景。3:我是否可以理解为，person search可以完全替代且优于检测+reid的方案。4:希望您能提供一点思路和建议。最后，祝您生活愉快

opened by Jiangsiping 2
Code for the "NAE+ SeqNet" version for compare

I have tried your code, it is really concise and easy to understand. In order to learn it more comprehensively， May you please also release the NAE+ SeqNet version for compare

Many thanks！

opened by Smalltarget108 2

For demo.py : "RuntimeError: CUDA out of memory."

Hello,I use the trained model to run demo.py,it passes one gallery image and throws the error.

Processing demo_imgs/gallery-2.jpg
Traceback (most recent call last):
  File "demo.py", line 89, in <module>
    main(args)
  File "demo.py", line 69, in main
    gallery_output = model(gallery_img)[0]
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/SeqNet/models/seqnet.py", line 132, in forward
    return self.inference(images, targets, query_img_as_gallery)
  File "/home/ubuntu/workspace/SeqNet/models/seqnet.py", line 107, in inference
    features = self.backbone(images.tensors)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/workspace/SeqNet/models/resnet.py", line 27, in forward
    feat = super(Backbone, self).forward(x)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torchvision/models/resnet.py", line 113, in forward
    out = self.bn3(out)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/batchnorm.py", line 131, in forward
    return F.batch_norm(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2014, in batch_norm
    return torch.batch_norm(
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.80 GiB total capacity; 6.81 GiB already allocated; 1.38 MiB free; 6.89 GiB reserved in total by PyTorch)

Can you give some suggestions to avoid "OOM"?

opened by developer-young 2

question about freezing layers in ResNet backbone
Hi. Thank you for great research and an awesome implementation.

It's not really an issue, but I have a minor question about your model implementation in models/resnet.py. Here, you freeze conv1/batchnorm layers within ResNet, as follows:

resnet.conv1.weight.requires_grad_(False) resnet.bn1.weight.requires_grad_(False) resnet.bn1.bias.requires_grad_(False)

Is there a particular reason behind this choice?
opened by sanghoooon 2
hi, have you tried train seqnet with both cuhk-sysu and prw?

Thank you for your strong codebase, I recently try to train seqnet with both cuhk-sysu and prw, but the results even get worse. I'm wondering whether the author have tried such a setting?

opened by caposerenity 1
hi，your model in google can not find

https://drive.google.com/file/d/1wKhCHy7uTHx8zxNS62Y1236GNv5TzFzq/view?usp=sharing https://drive.google.com/file/d/1I9OI6-sfVyop_aLDIWaYwd7Z4hD34hwZ/view?usp=sharing this two we can not download for the reason that in google 回收站

opened by mikeswf 1
how to get the reported mAP?

Hi, I trained the model with bs 1 and lr 0.0006, but only got mAP = 85.79%, so I wonder how to approach the performance you mentioned in your paper? Do I need to fix any other configs?

opened by Wuzimeng 1

Owner

Zj Li

ECNU.bachelor.CS | TONGJI.master.CV

GitHub

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

572 Jan 5, 2023

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

261 Nov 21, 2022

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

81 Jan 1, 2023

An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

PyTorch implementation of Learning by Aligning (ICCV 2021) This is an official PyTorch implementation of the paper "Learning by Aligning: Visible-Infr

30 Nov 5, 2022

Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

185 Jan 1, 2023

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

422 Jan 3, 2023

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

243 Dec 30, 2022

End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

89 Aug 4, 2022

textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

An End-to-End TextSpotter with Explicit Alignment and Attention This is initially described in our CVPR 2018 paper. Getting Started Installation Clone

323 Nov 10, 2022

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别文本检测：CTPN 文本识别：DenseNet + CTC 环境部署 sh setup.sh 注：CPU环境执行前需注释掉for gpu部分，并解开for cpu部分的注释 Demo 将测试图片放入test_images

2.6k Dec 29, 2022

A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022

code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

85 Dec 31, 2022

A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Home-Security-Demo A facial recognition program that plays a alarm (mp3 file) when a person is seen in the room. A basic theif using Python and OpenCV

4 Nov 2, 2021

A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

1 Nov 8, 2021

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

496 Jan 5, 2023

Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

1 Jun 28, 2022

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

31 Nov 22, 2022

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

76 Dec 6, 2022

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

91 Nov 22, 2022

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

Related tags

Overview

Installation

Quick Start

Training

Test

Pull Request

Citation

Comments

74 in oim.py

Owner

Zj Li

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

End-to-end pipeline for real-time scene text detection and recognition.

textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

A Joint Video and Image Encoder for End-to-End Retrieval

code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

A simple python program to record security cam footage by detecting a face and body of a person in the frame.

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"