TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

Overview

TPH-YOLOv5

This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios".
On VisDrone Challenge 2021, TPH-YOLOv5 wins 4th place and achieves well-matched results with 1st place model. image
You can get VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results for more information.

Install

$ git clone https://github.com/cv516Buaa/tph-yolov5
$ cd tph-yolov5
$ pip install -r requirements.txt

Convert labels

VisDrone2YOLO_lable.py transfer VisDrone annotiations to yolo labels.
You should set the path of VisDrone dataset in VisDrone2YOLO_lable.py first.

$ python VisDrone2YOLO_lable.py

Inference

val.py runs inference on VisDrone2019-DET-val, using weights trained with TPH-YOLOv5.
(We provide two weights trained by two different models based on YOLOv5l.)

$ python val.py --weights ./weights/yolov5l-xs-1.pt --img 1996 --data ./data/VisDrone.yaml
                                    yolov5l-xs-2.pt
--augment --save-txt  --save-conf --task val --batch-size 8 --verbose --name v5l-xs

image

Ensemble

If you inference dataset with different models, then you can ensemble the result by weighted boxes fusion using wbf.py.
You should set img path and txt path in wbf.py.

$ python wbf.py

Train

train.py allows you to train new model from strach.

$ python train.py --img 1536 --batch 2 --epochs 80 --data ./data/VisDrone.yaml --weights yolov5l.pt --hy data/hyps/hyp.VisDrone.yaml --cfg models/yolov5l-xs-tr-cbam-spp-bifpn.yaml --name v5l-xs

image

Description of TPH-yolov5 and citation

If you have any question, please discuss with me by sending email to [email protected]
If you find this code useful please cite:

@inproceedings{zhu2021tph,
  title={TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios},
  author={Zhu, Xingkui and Lyu, Shuchang and Wang, Xu and Zhao, Qi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={2778--2788},
  year={2021}
}

References

Thanks to their great works

Comments
  • Converted model gives error on TensorRT

    Converted model gives error on TensorRT

    I want to use the yolov5l-xs-1.pt model to perform inference and optimize it using TensorRT. I understand you are not using TensorRT, but I thought you might understand the issue

    I have exported the .pt file to an onnx file using the export.py program (without --dynamic flag). It gave this warning, but I don't understand what it means:

    WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    

    Later, when loading the ONNX file with TensorRT, I get this error:

    [TensorRT] ERROR: [graphShapeAnalyzer.cpp::throwIfError::1306] Error Code 9: Internal Error (Reshape_218: reshape changes volume)
    

    Apparently there is a node which reshapes to a different volume, which is not allowed according to TensorRT. Do you know what I can do about this issue? Please let me know if you could use more information!

    EDIT: I have also tried it with the --dynamic flag. It looks like it is going through more of the ONNX model when loading, but eventually it gives this error:

    [TRT]    4: [network.cpp::validate::2713] Error Code 4: Internal Error (images: dynamic input is missing dimensions in profile 0.)
    

    EDIT: This is what the node (from the non-dynamic model) looks like: image

    opened by maarten0912 6
  •  ret = input.softmax(dim) RuntimeError: CUDA out of memory

    ret = input.softmax(dim) RuntimeError: CUDA out of memory

    Hey,

    When I am trying to run the inference command I am getting the next error: ret = input.softmax(dim) RuntimeError: CUDA out of memory. Tried to allocate 962.00 MiB (GPU 0; 3.81 GiB total capacity; 1.79 GiB already allocated; 725.00 MiB free; 1.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

    I tried to change the batch to 4 and even to 2 but it doesn't solve the problem.

    What can I do to solve this?

    opened by RoyCopter 5
  • 将SPPF 替换成ASPP, 报错

    将SPPF 替换成ASPP, 报错

    self.m = nn.ModuleList([nn.Conv2d(c_, c_, kernel_size=3, stride=1, padding= x //2, dilation= x //2, bias=False) for x in k]) TypeError: 'int' object is not iterable

    opened by ChinaRush 3
  • Training run swin module error

    Training run swin module error

    May I ask the author why this is? Without modifying any code, use the training code: python train.py --img 1536 --adam --batch 4 --epochs 80 --data ./data/VisDrone.yaml --weights yolov5l.pt --hy data/hyps /hyp.VisDrone.yaml --cfg models/yolov5l-xs-tph.yaml --name v5l-xs-tph Error message:

    Traceback (most recent call last): File "train.py", line 630, in main(opt) File "train.py", line 527, in main train(opt.hyp, opt, device, callbacks) File "train.py", line 119, in train model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create File "/root/autodl-tmp/tph-yolov5-main/models/yolo.py", line 104, in init self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist File "/root/autodl-tmp/tph-yolov5-main/models/yolo.py", line 291, in parse_model m_ = nn.Sequential(*(m(args) for _ in range(n))) if n > 1 else m(args) # module File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 493, in init self.m = SwinTransformerBlock(c_, c_, c_//32, n) File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 426, in init self.tr = nn.Sequential((SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size, shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers))) File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 426, in self.tr = nn.Sequential((SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size, shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers))) File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 338, in init self.attn = WindowAttention( File "/root/autodl-tmp/tph-yolov5-main/models/common.py", line 259, in init coords = torch.stack(torch.meshgrid([coords_h, coords_w], indexing="ij")) # [2, Mh, Mw] TypeError: meshgrid() got an unexpected keyword argument 'indexing'

    opened by iscyy 3
  • TPH

    TPH

    I want to know where the TPH module is? And I want to know whether the network structure described in the file "yolov5l-xs-tr-cbam-spp-bifpn.yaml" is consistent with the network structure described in the paper.

    opened by sandy-0326 2
  • Unable to val

    Unable to val

    Hello author, the following error occurred in the reasoning process, please advise, thank you。 AttributeError: Can't get attribute 'NonDynamicallyQuantizableLinear' on <module 'torch.nn.modules.linear' from '/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py'>

    opened by iscyy 2
  • No Loggers in utils.loggers

    No Loggers in utils.loggers

    Traceback (most recent call last): File "train.py", line 48, in from utils.loggers import Loggers ImportError: cannot import name 'Loggers' from 'utils.loggers' (unknown location)

    opened by sandy-0326 2
  • help

    help

    File "/home/test/anaconda3/envs/whn_PT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/test/anaconda3/envs/whn_PT/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/test/anaconda3/envs/whn_PT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/mnt/4T/whn/thesis/7-add swintransformer/models/common.py", line 318, in forward attn_windows = self.attn(x_windows, mask=attn_mask) # [nWB, MhMw, C] File "/home/test/anaconda3/envs/whn_PT/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in call_impl result = self.forward(*input, **kwargs) File "/mnt/4T/whn/thesis/7-add swintransformer/models/common.py", line 239, in forward x = (attn @ v).transpose(1, 2).reshape(B, N, C) RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in

    opened by yxwhn 1
  • The training issue about 0 MAP

    The training issue about 0 MAP

    Thanks for your work. I use this respository to train on the VisDrone dataset just using 10 images to have a fast try, but I find this training is not exactly performed on VisDrone with 0 MAP. However, this code works normally on coco128 dataset. It is very strange and I need your help.

    微信截图_20211129172500

    opened by Shaosifan 1
  • PyTorch 1.11.0 compatibility updates (see  yolov5)

    PyTorch 1.11.0 compatibility updates (see yolov5)

    Hi, we ran into an AttributeError running inference with detect.py due to a change of the PyTorch API in version 1.11 (no more "recompute_scale_factor"). So we added this fix according to the official yolov5 "PyTorch 1.11.0 compatibility update".

    opened by martinbaerwolff 1
  • RuntimeError: result type Float can't be cast to the desired output type long int ?

    RuntimeError: result type Float can't be cast to the desired output type long int ?

    File "/content/drive/MyDrive/Yolo v5/utils/loss.py", line 240, in build_targets indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices RuntimeError: result type Float can't be cast to the desired output type long int

    opened by TerrafYassin 0
  • WBF, mAP

    WBF, mAP

    Dear author, I'm confusing the usage of WBF. Could you please give me a guide?

    Q: When I get the wbf_labels based on the ensemble for val results of two models, how to calculate the new mAP based on these wbf_labels? Is there a command or code for such calculation??

    opened by Agustinwgq 0
  • TypeError: meshgrid() got an unexpected keyword argument 'indexing'

    TypeError: meshgrid() got an unexpected keyword argument 'indexing'

    python train.py --img 1536 --adam --batch 4 --epochs 80 --data ./data/VisDrone.yaml --weights yolov5l.pt --hy data/hyps/hyp.VisDrone.yaml --cfg models/yolov5l-xs-tph.yaml --name v5l-xs-tph

    have a error problem File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 493, in init self.m = SwinTransformerBlock(c_, c_, c_//32, n) File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 426, in init self.tr = nn.Sequential((SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size, shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers))) File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 426, in self.tr = nn.Sequential((SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size, shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers))) File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 340, in init attn_drop=attn_drop, proj_drop=drop) File "/data/zhangshilin/wangjun/529_zsl_6.0/tph-yolov5/tph-yolov5/models/common.py", line 259, in init coords = torch.stack(torch.meshgrid([coords_h, coords_w], indexing="ij")) # [2, Mh, Mw] TypeError: meshgrid() got an unexpected keyword argument 'indexing'

    opened by xiaomujiang 1
Owner
cv516Buaa
Pattern Recognition and Artificial Intelligence Group Prof.Qi Zhao & Lijiang Chen Dr. Shuchang Lyu & Binghao Liu & Xingkui Zhu
cv516Buaa
Drone detection using YOLOv5

This drone detection system uses YOLOv5 which is a family of object detection architectures and we have trained the model on Drone Dataset. Overview I

Tushar Sarkar 27 Dec 20, 2022
Multi-task yolov5 with detection and segmentation based on yolov5

YOLOv5DS Multi-task yolov5 with detection and segmentation based on yolov5(branch v6.0) decoupled head anchor free segmentation head README中文 Ablation

null 150 Dec 30, 2022
Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

Chong Yu Quan 6 Dec 31, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
An open-source, low-cost, image-based weed detection device for fallow scenarios.

Welcome to the OpenWeedLocator (OWL) project, an opensource hardware and software green-on-brown weed detector that uses entirely off-the-shelf compon

Guy Coleman 145 Jan 5, 2023
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 3, 2022
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

YOLOv5-Lite:lighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

pogg 1.5k Jan 5, 2023
Yolov5-lite - Minimal PyTorch implementation of YOLOv5

Yolov5-Lite: Minimal YOLOv5 + Deep Sort Overview This repo is a shortened versio

Kadir Nar 57 Nov 28, 2022
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 3, 2022
The code uses SegFormer for Semantic Segmentation on Drone Dataset.

SegFormer_Segmentation The code uses SegFormer for Semantic Segmentation on Drone Dataset. The details for the SegFormer can be obtained from the foll

Dr. Sander Ali Khowaja 1 May 8, 2022
Tello Drone Trajectory Tracking

With this library you can track the trajectory of your tello drone or swarm of drones in real time.

Kamran Asgarov 2 Oct 12, 2022
This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes.

Polygon-Yolov5 This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes. Section I. Description The codes a

xinzelee 226 Jan 5, 2023
This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes.

Rotate-Yolov5 This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes. Section I. Description The codes are

xinzelee 90 Dec 13, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 ?? is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

阿才 73 Dec 16, 2022
YOLOv5 + ROS2 object detection package

YOLOv5-ROS YOLOv5 + ROS2 object detection package This program changes the input of detect.py (ultralytics/yolov5) to sensor_msgs/Image of ROS2. Requi

Ar-Ray 23 Dec 19, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022