TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

cv516Buaa

Last update: Dec 22, 2022

Related tags

Deep Learning tph-yolov5

Overview

TPH-YOLOv5

This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios".
On VisDrone Challenge 2021, TPH-YOLOv5 wins 4th place and achieves well-matched results with 1st place model.
You can get VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results for more information.

Install

$ git clone https://github.com/cv516Buaa/tph-yolov5
$ cd tph-yolov5
$ pip install -r requirements.txt

Convert labels

VisDrone2YOLO_lable.py transfer VisDrone annotiations to yolo labels.
You should set the path of VisDrone dataset in VisDrone2YOLO_lable.py first.

$ python VisDrone2YOLO_lable.py

Inference

Datasets : VisDrone
Weights :
- yolov5l-xs-1.pt: | Baidu Drive(pw: vibe). | Google Drive |
- yolov5l-xs-2.pt: | Baidu Drive(pw: vffz). | Google Drive |

val.py runs inference on VisDrone2019-DET-val, using weights trained with TPH-YOLOv5.
(We provide two weights trained by two different models based on YOLOv5l.)

$ python val.py --weights ./weights/yolov5l-xs-1.pt --img 1996 --data ./data/VisDrone.yaml
                                    yolov5l-xs-2.pt
--augment --save-txt  --save-conf --task val --batch-size 8 --verbose --name v5l-xs

Ensemble

If you inference dataset with different models, then you can ensemble the result by weighted boxes fusion using wbf.py.
You should set img path and txt path in wbf.py.

$ python wbf.py

Train

train.py allows you to train new model from strach.

$ python train.py --img 1536 --batch 2 --epochs 80 --data ./data/VisDrone.yaml --weights yolov5l.pt --hy data/hyps/hyp.VisDrone.yaml --cfg models/yolov5l-xs-tr-cbam-spp-bifpn.yaml --name v5l-xs

Description of TPH-yolov5 and citation

If you have any question, please discuss with me by sending email to [email protected]
If you find this code useful please cite:

@inproceedings{zhu2021tph,
  title={TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios},
  author={Zhu, Xingkui and Lyu, Shuchang and Wang, Xu and Zhao, Qi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={2778--2788},
  year={2021}
}

References

Thanks to their great works

Comments

Converted model gives error on TensorRT
I want to use the yolov5l-xs-1.pt model to perform inference and optimize it using TensorRT. I understand you are not using TensorRT, but I thought you might understand the issue

I have exported the .pt file to an onnx file using the export.py program (without --dynamic flag). It gave this warning, but I don't understand what it means:

WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

Later, when loading the ONNX file with TensorRT, I get this error:

[TensorRT] ERROR: [graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 9: Internal Error (Reshape_218: reshape changes volume)

Apparently there is a node which reshapes to a different volume, which is not allowed according to TensorRT. Do you know what I can do about this issue? Please let me know if you could use more information!

EDIT: I have also tried it with the --dynamic flag. It looks like it is going through more of the ONNX model when loading, but eventually it gives this error:

[TRT] 4: [network.cpp::validate::2713] Error Code 4: Internal Error (images: dynamic input is missing dimensions in profile 0.)

EDIT: This is what the node (from the non-dynamic model) looks like:
opened by maarten0912 6
ret = input.softmax(dim) RuntimeError: CUDA out of memory

Hey,

When I am trying to run the inference command I am getting the next error: ret = input.softmax(dim) RuntimeError: CUDA out of memory. Tried to allocate 962.00 MiB (GPU 0; 3.81 GiB total capacity; 1.79 GiB already allocated; 725.00 MiB free; 1.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried to change the batch to 4 and even to 2 but it doesn't solve the problem.

What can I do to solve this?

opened by RoyCopter 5
将SPPF 替换成ASPP，报错

self.m = nn.ModuleList([nn.Conv2d(c_, c_, kernel_size=3, stride=1, padding= x //2, dilation= x //2, bias=False) for x in k]) TypeError: 'int' object is not iterable

opened by ChinaRush 3
Training run swin module error

May I ask the author why this is? Without modifying any code, use the training code: python train.py --img 1536 --adam --batch 4 --epochs 80 --data ./data/VisDrone.yaml --weights yolov5l.pt --hy data/hyps /hyp.VisDrone.yaml --cfg models/yolov5l-xs-tph.yaml --name v5l-xs-tph Error message：

Traceback (most recent call last): File "train.py", line 630, in main(opt) File "train.py", line 527, in main train(opt.hyp, opt, device, callbacks) File "train.py", line 119, in train model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create File "/root/autodl-tmp/tph-yolov5-main/models/yolo.py", line 104, in init self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist File "/root/autodl-tmp/tph-yolov5-main/models/yolo.py", line 291, in parse_model m_ = nn.Sequential(*(m(args) for _ in range(n))) if n > 1 else m(args) # module File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 493, in init self.m = SwinTransformerBlock(c_, c_, c_//32, n) File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 426, in init self.tr = nn.Sequential((SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size, shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers))) File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 426, in self.tr = nn.Sequential((SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size, shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers))) File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 338, in init self.attn = WindowAttention( File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 259, in init coords = torch.stack(torch.meshgrid([coords_h, coords_w], indexing="ij")) # [2, Mh, Mw] TypeError: meshgrid() got an unexpected keyword argument 'indexing'

opened by iscyy 3
TPH

I want to know where the TPH module is？ And I want to know whether the network structure described in the file "yolov5l-xs-tr-cbam-spp-bifpn.yaml" is consistent with the network structure described in the paper.

opened by sandy-0326 2
Unable to val

Hello author, the following error occurred in the reasoning process, please advise, thank you。 AttributeError: Can't get attribute 'NonDynamicallyQuantizableLinear' on <module 'torch.nn.modules.linear' from '/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py'>

opened by iscyy 2
No Loggers in utils.loggers

Traceback (most recent call last): File "train.py", line 48, in from utils.loggers import Loggers ImportError: cannot import name 'Loggers' from 'utils.loggers' (unknown location)

opened by sandy-0326 2
help

File "/home/test/anaconda3/envs/whn_PT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/test/anaconda3/envs/whn_PT/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/test/anaconda3/envs/whn_PT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/mnt/4T/whn/thesis/7-add swintransformer/models/common.py", line 318, in forward attn_windows = self.attn(x_windows, mask=attn_mask) # [nWB, MhMw, C] File "/home/test/anaconda3/envs/whn_PT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in call_impl result = self.forward(*input, **kwargs) File "/mnt/4T/whn/thesis/7-add swintransformer/models/common.py", line 239, in forward x = (attn @ v).transpose(1, 2).reshape(B, N, C) RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in

opened by yxwhn 1
The training issue about 0 MAP

Thanks for your work. I use this respository to train on the VisDrone dataset just using 10 images to have a fast try, but I find this training is not exactly performed on VisDrone with 0 MAP. However, this code works normally on coco128 dataset. It is very strange and I need your help.

opened by Shaosifan 1
PyTorch 1.11.0 compatibility updates (see yolov5)

Hi, we ran into an AttributeError running inference with detect.py due to a change of the PyTorch API in version 1.11 (no more "recompute_scale_factor"). So we added this fix according to the official yolov5 "PyTorch 1.11.0 compatibility update".

opened by martinbaerwolff 1
RuntimeError: result type Float can't be cast to the desired output type long int ?

File "/content/drive/MyDrive/Yolo v5/utils/loss.py", line 240, in build_targets indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices RuntimeError: result type Float can't be cast to the desired output type long int

opened by TerrafYassin 0
WBF, mAP

Dear author, I'm confusing the usage of WBF. Could you please give me a guide?

Q: When I get the wbf_labels based on the ensemble for val results of two models, how to calculate the new mAP based on these wbf_labels? Is there a command or code for such calculation??

opened by Agustinwgq 0
TypeError: meshgrid() got an unexpected keyword argument 'indexing'

python train.py --img 1536 --adam --batch 4 --epochs 80 --data ./data/VisDrone.yaml --weights yolov5l.pt --hy data/hyps/hyp.VisDrone.yaml --cfg models/yolov5l-xs-tph.yaml --name v5l-xs-tph

have a error problem File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 493, in init self.m = SwinTransformerBlock(c_, c_, c_//32, n) File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 426, in init self.tr = nn.Sequential((SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size, shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers))) File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 426, in self.tr = nn.Sequential((SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size, shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers))) File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 340, in init attn_drop=attn_drop, proj_drop=drop) File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 259, in init coords = torch.stack(torch.meshgrid([coords_h, coords_w], indexing="ij")) # [2, Mh, Mw] TypeError: meshgrid() got an unexpected keyword argument 'indexing'

opened by xiaomujiang 1

Owner

cv516Buaa

Pattern Recognition and Artificial Intelligence Group Prof.Qi Zhao & Lijiang Chen Dr. Shuchang Lyu & Binghao Liu & Xingkui Zhu

GitHub

Drone detection using YOLOv5

This drone detection system uses YOLOv5 which is a family of object detection architectures and we have trained the model on Drone Dataset. Overview I

27 Dec 20, 2022

Multi-task yolov5 with detection and segmentation based on yolov5

YOLOv5DS Multi-task yolov5 with detection and segmentation based on yolov5(branch v6.0) decoupled head anchor free segmentation head README中文 Ablation

150 Dec 30, 2022

Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

6 Dec 31, 2022

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

191 Dec 31, 2022

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

35 Oct 26, 2022

An open-source, low-cost, image-based weed detection device for fallow scenarios.

Welcome to the OpenWeedLocator (OWL) project, an opensource hardware and software green-on-brown weed detector that uses entirely off-the-shelf compon

145 Jan 5, 2023

Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

62 Dec 3, 2022

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

98 Nov 16, 2022

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

YOLOv5-Lite：lighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

1.5k Jan 5, 2023

Yolov5-lite - Minimal PyTorch implementation of YOLOv5

Yolov5-Lite: Minimal YOLOv5 + Deep Sort Overview This repo is a shortened versio

57 Nov 28, 2022

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

4 Nov 3, 2022

The code uses SegFormer for Semantic Segmentation on Drone Dataset.

SegFormer_Segmentation The code uses SegFormer for Semantic Segmentation on Drone Dataset. The details for the SegFormer can be obtained from the foll

1 May 8, 2022

Tello Drone Trajectory Tracking

With this library you can track the trajectory of your tello drone or swarm of drones in real time.

2 Oct 12, 2022

This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes.

Polygon-Yolov5 This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes. Section I. Description The codes a

226 Jan 5, 2023

This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes.

Rotate-Yolov5 This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes. Section I. Description The codes are

90 Dec 13, 2022

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

1.4k Dec 30, 2022

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 ?? is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

73 Dec 16, 2022

YOLOv5 + ROS2 object detection package

YOLOv5-ROS YOLOv5 + ROS2 object detection package This program changes the input of detect.py (ultralytics/yolov5) to sensor_msgs/Image of ROS2. Requi

23 Dec 19, 2022

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

5.8k Dec 31, 2022