YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

DeepCam Shenzhen

Last update: Jan 7, 2023

Related tags

Deep Learning yolov5-face

Overview

Introduction

Yolov5-face is a real-time,high accuracy face detection.

Performance

Single Scale Inference on VGA resolution（max side is equal to 640 and scale).

Large family

Method	Backbone	Easy	Medium	Hard	#Params(M)	#Flops(G)
DSFD (CVPR19)	ResNet152	94.29	91.47	71.39	120.06	259.55
RetinaFace (CVPR20)	ResNet50	94.92	91.90	64.17	29.50	37.59
HAMBox (CVPR20)	ResNet50	95.27	93.76	76.75	30.24	43.28
TinaFace (Arxiv20)	ResNet50	95.61	94.25	81.43	37.98	172.95
SCRFD-34GF(Arxiv21)	Bottleneck Res	96.06	94.92	85.29	9.80	34.13
SCRFD-10GF(Arxiv21)	Basic Res	95.16	93.87	83.05	3.86	9.98
-	-	-	-	-	-	-
YOLOv5s	CSPNet	94.67	92.75	83.03	7.075	5.751
YOLOv5s6	CSPNet	95.48	93.66	82.8	12.386	6.280
YOLOv5m	CSPNet	95.30	93.76	85.28	21.063	18.146
YOLOv5m6	CSPNet	95.66	94.1	85.2	35.485	19.773
YOLOv5l	CSPNet	95.78	94.30	86.13	46.627	41.607
YOLOv5l6	CSPNet	96.38	94.90	85.88	76.674	45.279

Small family

Method	Backbone	Easy	Medium	Hard	#Params(M)	#Flops(G)
RetinaFace (CVPR20	MobileNet0.25	87.78	81.16	47.32	0.44	0.802
FaceBoxes (IJCB17)		76.17	57.17	24.18	1.01	0.275
SCRFD-0.5GF(Arxiv21)	Depth-wise Conv	90.57	88.12	68.51	0.57	0.508
SCRFD-2.5GF(Arxiv21)	Basic Res	93.78	92.16	77.87	0.67	2.53
-	-	-	-	-	-	-
YOLOv5n	ShuffleNetv2	93.74	91.54	80.32	1.726	2.111
YOLOv5n-0.5	ShuffleNetv2	90.76	88.12	73.82	0.447	0.571

Pretrained-Models

Name	Easy	Medium	Hard	FLOPs(G)	Params(M)	Link
yolov5n-0.5	90.76	88.12	73.82	0.571	0.447	Link: https://pan.baidu.com/s/1UgiKwzFq5NXI2y-Zui1kiA pwd: s5ow, https://drive.google.com/file/d/1XJ8w55Y9Po7Y5WP4X1Kg1a77ok2tL_KY/view?usp=sharing
yolov5n	93.61	91.52	80.53	2.111	1.726	Link: https://pan.baidu.com/s/1xsYns6cyB84aPDgXB7sNDQ pwd: lw9j,https://drive.google.com/file/d/18oenL6tjFkdR1f5IgpYeQfDFqU4w3jEr/view?usp=sharing
yolov5s	94.33	92.61	83.15	5.751	7.075	Link: https://pan.baidu.com/s/1fyzLxZYx7Ja1_PCIWRhxbw Link: eq0q,https://drive.google.com/file/d/1zxaHeLDyID9YU4-hqK7KNepXIwbTkRIO/view?usp=sharing
yolov5m	95.30	93.76	85.28	18.146	21.063	Link: https://pan.baidu.com/s/1oePvd2K6R4-gT0g7EERmdQ pwd: jmtk
yolov5l	95.78	94.30	86.13	41.607	46.627	Link: https://pan.baidu.com/s/11l4qSEgA2-c7e8lpRt8iFw pwd: 0mq7

Data preparation

Download WIDERFace datasets.
Download annotation files from google drive.

python3 train2yolo.py
python3 val2yolo.py

Training

CUDA_VISIBLE_DEVICES="0,1,2,3" python3 train.py --data data/widerface.yaml --cfg models/yolov5s.yaml --weights 'pretrained models'

WIDERFace Evaluation

python3 test_widerface.py --weights 'your test model' --img-size 640

cd widerface_evaluate
python3 evaluation.py

Test

Android demo

https://github.com/FeiGeChuanShu/ncnn_Android_face/tree/main/ncnn-android-yolov5_face

References

https://github.com/ultralytics/yolov5

https://github.com/DayBreak-u/yolo-face-with-landmark

https://github.com/xialuxi/yolov5_face_landmark

https://github.com/biubug6/Pytorch_Retinaface

https://github.com/deepinsight/insightface

Citation

If you think this work is useful for you, please cite

@article{YOLO5Face,
title = {YOLO5Face: Why Reinventing a Face Detector},
author = {Delong Qi and Weijun Tan and Qi Yao and Jingfeng Liu},
booktitle = {ArXiv preprint ArXiv:2105.12931},
year = {2021}
}

Comments

Some questions about TensorRT for Yolov5-face
@bobo0810 Really thanks for your TensorRT inference implementation!! There are some questions after successfully running the TensorRT of Yolov5-face:

The results in table look very impressive. But in my case, I test the RT time on 2080ti GPU after running following two codes:

start = time() for i in range(1000): pred=yolo_trt_model(img.cpu().numpy()) #tensorrt推理 ends = time() print('RT for one image: {} ms'.format(ends-start))

This code gives the RT for one image: 6 ms.

start = time() for i in range(1000): pred=yolo_trt_model(img.cpu().numpy()) #tensorrt推理 pred=yolo_trt_model.after_process(pred,device) ends = time() print('RT for one image: {} ms'.format(ends-start))

This code gives the RT for one image: 11 ms. Is such a test of RT time right in my understanding ?

It seems yolo_trt_model.after_process cost much time. Why not put this process into TensorRT, by uncommenting this line ? I find in original yolov5 repo, the overall model can be exported by this file. Is it possible to put the entire process of Yolov5-face into TenserRT?

The results in the table only consider the yolo_trt_model.__call__ running time, or both yolo_trt_model.__call__, yolo_trt_model.after_process and non_max_suppression_face are considered ?
opened by vtddggg 13
Question about running multiple inference with TensorRT

I ran the tensorRT Engine file for inference successfully, and the performance is amazing compared to default in Jetson TX2 board. However, when I try to run infer using the same loaded engine file it shows wrong results as well as the inference time has increased. Could you please recommend or guide on how to apply trt engine file for multiple inference. Thanks.

opened by ghimireadarsh 12
test_widerface.py where is the file wider_val.txt?

parser.add_argument('--folder_pict', default='/yolov5-face/data/widerface/val/wider_val.txt', type=str, help='folder_pict') how to get the file wider_val.txt? thanks

opened by aa12356jm 8
Align Landmar

Hi,

By the way have you 112x112 align version or utils somewhere ?

O can we use retinafaces align . Is the Landmark [5 point] is in same order as in retinaface?

opened by MyraBaba 6
多类别人脸，只能检测出来第一种人脸？

你好，我用这个代码跑3中类别人脸检测遇到了一点问题，多个类别训练完，跑检测脚本，只能预测出来第一个类别（人脸框，landm都正常），另外两个类别的人脸预测不出来，把过nms前，原始pred的tensor打印出来看，发现后两列被固定成0。然后检查了训练代码，发现另外两个类别的分类score和obj的score都是正常的，训练过程中的precision和recall指标都很低，不知道是不是跟nms超时10s有关。调试了很久找不到原因，可以帮忙分析下吗？谢谢。 hello, this is my question: train 3 class faces by this code, but when i run detect_face.py to pred a picture, only the 0 class can be success detected，the pred tensor of the other class faces always be 0.00000e+00. i checked the train.py, i also find the question, that is precision and recall is too low, but the pred tensor is ok in score of the other class face.

opened by changhy666 6
no face detected TensorRT

Hi, @bobo0810 Really appreciate your works! I've successfully run all the command line in your readme file. But the resulting image detects no face. What am I doing wrong? Is that cause of a different TensorRT version? My TensorRT version is 8.0.0.3 Thank you!

python torch2trt/main.py --trt_path pretrained/yolov5s-face.trt [TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +136, GPU +0, now: CPU 259, GPU 407 (MiB) [TensorRT] INFO: Loaded engine size: 50 MB [TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine begin: CPU 259 MiB, GPU 407 MiB [TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +184, GPU +76, now: CPU 455, GPU 527 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +110, GPU +46, now: CPU 565, GPU 573 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 565, GPU 555 (MiB) [TensorRT] INFO: [MemUsageSnapshot] deserializeCudaEngine end: CPU 565 MiB, GPU 555 MiB [TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.4.2 but loaded cuBLAS/cuBLAS LT 11.4.1 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 565, GPU 565 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 565, GPU 573 (MiB) bingding: input (1, 3, 640, 640) bingding: output (1, 25200, 16) img.shape: torch.Size([1, 3, 640, 640]) orgimg.shape: (768, 1024, 3) result save in {...}yolov5-face/torch2trt/result.jpg [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 688, GPU 785 (MiB)

opened by quocnhat 5
Why am I having this problem? AssertionError: No results.txt files found in /content/yolov5-face/runs/train/exp, nothing to plot.

When I run this command: !python3 train.py --data data/widerface.yaml --cfg models/yolov5s.yaml --weights yolovs-face.pt --epochs 1 I run into this problem. Traceback (most recent call last): File "train.py", line 513, in train(hyp, opt, device, tb_writer, wandb) File "train.py", line 400, in train plot_results(save_dir=save_dir) # save as results.png File "/content/yolov5-face/utils/plots.py", line 393, in plot_results assert len(files), 'No results.txt files found in %s, nothing to plot.' % os.path.abspath(save_dir) AssertionError: No results.txt files found in /content/yolov5-face/runs/train/exp, nothing to plot.

opened by m-hajiabadi 5

[C++]: 🍅add YOLO5Face MNN/TNN/NCNN/ONNXRuntime C++ demo

Add YOLO5Face MNN/TNN/NCNN/ONNXRuntime C++ demo, all tests passed ~

Features:

Usage

build lite.ai.toolkit

git clone --depth=1 https://github.com/DefTruth/lite.ai.toolkit.git  # latest
cd lite.ai.toolkit && sh ./build.sh  # On MacOS, you can use the built OpenCV, ONNXRuntime, MNN, NCNN and TNN libs in this repo.

download onnx/mnn/tnn/ncnn files:
- YOLO5Face ONNX Files、YOLO5Face MNN Files、YOLO5Face TNN Files、YOLO5Face NCNN Files
use YOLO5Face in C++

auto *yolov5face = new lite::cv::face::detect::YOLO5Face(onnx_path);
auto *yolov5face = new lite::mnn::cv::face::detect::YOLO5Face(mnn_path);
auto *yolov5face = new lite::tnn::cv::face::detect::YOLO5Face(tnn_path);
auto *yolov5face = new lite::ncnn::cv::face::detect::YOLO5Face(ncnn_path);

Demo

test_lite_yolo5face

#include "lite/lite.h"

static void test_default()
{
  std::string onnx_path = "../../../hub/onnx/cv/yolov5face-s-640x640.onnx"; // yolov5s-face
  std::string test_img_path = "../../../examples/lite/resources/test_lite_face_detector.jpg";
  std::string save_img_path = "../../../logs/test_lite_yolov5face.jpg";

  auto *yolov5face = new lite::cv::face::detect::YOLO5Face(onnx_path);

  std::vector<lite::types::BoxfWithLandmarks> detected_boxes;
  cv::Mat img_bgr = cv::imread(test_img_path);
  yolov5face->detect(img_bgr, detected_boxes);

  lite::utils::draw_boxes_with_landmarks_inplace(img_bgr, detected_boxes);

  cv::imwrite(save_img_path, img_bgr);

  std::cout << "Default Version Done! Detected Face Num: " << detected_boxes.size() << std::endl;

  delete yolov5face;
}

The output is: test_lite_yolov5face_onnx_2

opened by DefTruth 5

support export onnx/pb with output concat

verified ok! giving a value to slince variable is not support in exporting onnx or pb, such as op like '...' or ':'. if want get the output concat, please edit models/export.py like this: model.model[-1].export = False # set Detect() layer export=True model.model[-1].export_cat = True

opened by changhy666 4
Inference Speed

Thanks for sharing gread code.

I have a question about yolov5-face inference speed. Yolov5-face is more accurate than scrfd, but inference speed is more slower.

Is it true?

opened by LeeKyungwook 4
统计的Flops不一样

大佬您好，好像你readme中统计的flops和我用yolov5统计出来的不一样： yolov5-n：Model Summary: 308 layers, 1705462 parameters, 1705462 gradients, 5.0 GFLOPS yolov5-0.5n：Model Summary: 308 layers, 439734 parameters, 439734 gradients, 1.4 GFLOPS

opened by ppogg 3
Can't run validate after training

I'm training the model yolov5n-0.5 on WIDERFACE dataset. Whenever the train loop hits the validation step, it will just crash. When I use htop to check the memory usage, it showed me that whenever I hit the validation step, it will consume all of my RAM and swap memory (which is 16GB of RAM and 16GB of swap, resulting in 32GB of total memory) and cause memory overflow. Does anyone encountered this problem and what is the suggested fix?

opened by MS1908 0

torch2trt/main.py中有几行代码错误

torch2trt/main.py 69~71行，此处cv2.rectangle接收的就是左上角和右下角坐标，这里不需要使用xyxy2xywh，并且也不需要除以gn进行归一化，landmarks同理。

                xywh = (xyxy2xywh(det[j, :4].view(1, 4)) / gn).view(-1).tolist()
                conf = det[j, 4].cpu().numpy()
                landmarks = (det[j, 5:15].view(1, 10) / gn_lks).view(-1).tolist()

也就是应该和detect_face.py的174~176行一致，但是不知道为什么多写了，修改后为：

                xywh = det[j, :4].view(1, 4).view(-1).tolist()
                conf = det[j, 4].cpu().numpy()
                landmarks = (det[j, 5:15].view(1, 10)).view(-1).tolist()

opened by Monkey-D-Luffy-star 0

fine-tunning label.txt keypoint format

I am doing fine tuning. The kpt_label default value is 5, but if you give the key point coordinate value to label txt as in the form of bbox, it seems that it cannot be read. For example, bbox_id x y whatpt_id x y *5 What format should I create label.txt?

opened by qlqqqk 0

YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Related tags

Overview

Introduction

Performance

Pretrained-Models

Data preparation

Training

WIDERFace Evaluation

Test

Android demo

References

Citation

Comments

Features:

Usage

Demo

Owner

DeepCam Shenzhen

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

A light and fast one class detection framework for edge devices. We provide face detector, head detector, pedestrian detector, vehicle detector......

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

https://arxiv.org/abs/2102.11005

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"