基于AlphaPose的TensorRT加速

Last update: Dec 6, 2022

Related tags

Overview

1. Requirements

CUDA 11.1
TensorRT 7.2.2
Python 3.8.5
Cython
PyTorch 1.8.1
torchvision 0.9.1
numpy 1.17.4 (numpy版本过高会出报错 this issue )
python-package setuptools >= 40.0, reported by this issue

2. Results

AlphaPose 存在多个目标检测+姿态估计模型的组合，本仓库(fork from AlphaPose )仅对YOLOv3_SPP + Fast Pose 进行加速。

AlphaPose_trt inference rst

AlphaPose在数据预处理部分使用YOLOv3-SPP模型检测出一幅图像中的多个人物，然后将这些人物图像送入到FastPose模型中进行姿态估计。我们对YOLOv3_SPP模型以及FastPose模型都进行了加速，并记录了加速前后的mAP值，验证集来自MSCOCO val2017 。其中ground truth box表示FastPose模型的检测精度， detection boxes表示YOLOv3_SPP + FastPose模型的检测精度。

Method	ground truth box [email protected]	detection boxes [email protected]
AlphaPose	0.743	0.718
AlphaPose_trt	0.743	0.718

所有的测试过程都对GPU以及Memory进行了锁频

GPU Frequency = 1509MHz, Memory Frequency = 5001MHz，具体操作如下：

nvidia-smi -pm 1
nvidia-smi -q -d clock  # 查看memory以及gpu的频率
nvidia-smi -ac memoryFrq, gpuFrq
nvidia-smi -lgc gpuFrq,gpuFrq   # 将GPU进行锁频

2.1 YOLOv3-SPP speed up

下表记录了YOLOv3_SPP模型在不同batch size下的推理时间以及吞吐量，并计算了加速比(第三列以及第四列)。

实验环境为：Tesla T4

吞吐量: Throughput = 1000 / latency * batchsize

时延: Latency speed up = original latency / trt latency

model	Batchsize	Latency (ms)	Throughput	Latency Speedup	Throughput speedup	Volatile GPU-Util
YOLOv3-SPP	1	54.1	18.48	1x	1x	87%
	2	93.9	21.30			93%
	4	172.6	23.17			98%
	8	322.8	24.78			100%
YOLOv3-SPP_trt	1	20.1	49.75	2.7x	2.7x	100%
	2	33.7	59.35	2.8x	2.8x	100%
	4	60.5	66.12	2.9x	2.9x	100%
	8	115.5	69.26	2.8x	2.8x	100%

代码实现参考8.2部分

2.2 Fast Pose speed up

下表记录了Fast Pose模型在不同batch size下的推理时间以及吞吐量，并计算了加速比(第三列以及第四列)。

实验环境为：Tesla T4

model	Batchsize	Latency (ms)	Throughput	Latency Speedup	Throughput speedup	Volatile GPU-Util
FastPose	1	23.9	41.84	1x	1x	30%
	2	24.6	81.30			39%
	4	27.9	143.37			64%
	8	33.2	240.96			99%
	16	56.6	282.68			99%
	32	105.8	302.46			99%
	64	206.2	310.38			100%
FastPose_trt	1	1.49	671.14	16.0x	16.0x	3%
	2	2.32	862.07	10.6x	10.6x	3%
	4	4.06	985.22	6.9x	6.9x	38%
	8	7.69	1040.31	4.3x	4.3x	100%
	16	15.16	1055.41	3.7x	3.7x	100%
	32	29.98	1067.38	3.5x	3.5x	100%
	64	59.67	1072.57	3.5x	3.5x	100%

代码实现参考8.1部分

2.3 YOLOv3-SPP + FastPose speed up

下表记录了YOLOv3_SPP + FastPose模型在不同batch size下的推理时间以及吞吐量，并计算了加速比(第三列以及第四列)。

实验环境为：Tesla T4

model	Batchsize	Latency (ms)	Throughput	Latency Speedup	Throughput speedup	Volatile GPU-Util
AlphaPose	1	78.0	12.82	1x	1x	87%
	2	118.5	16.87			94%
	4	200.5	19.95			97%
	8	356	22.47			100%
AlphaPose_trt	1	21.59	46.32	3.6x	3.6x	100%
	2	36.02	55.52	3.3x	3.3x	100%
	4	64.56	61.96	3.1x	3.1x	100%
	8	123.19	64.94	3.5x	3.5x	100%

代码实现参考8.3部分

3. Code installation

AlphaPose的安装参考自这，主要有两种安装方式

3.1 使用conda进行安装

Install conda from here

# 1. Create a conda virtual environment.
conda create -n alphapose python=3.6 -y
conda activate alphapose

# 2. Install PyTorch
conda install pytorch==1.1.0 torchvision==0.3.0

# 3. Get AlphaPose
git clone https://github.com/MVIG-SJTU/AlphaPose.git
# git pull origin pull/592/head if you use PyTorch>=1.5
cd AlphaPose


# 4. install
export PATH=/usr/local/cuda/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
python -m pip install cython
sudo apt-get install libyaml-dev
################Only For Ubuntu 18.04#################
locale-gen C.UTF-8
# if locale-gen not found
sudo apt-get install locales
export LANG=C.UTF-8
######################################################
python setup.py build develop

3.2 使用pip进行安装

# 1. Install PyTorch
pip3 install torch==1.1.0 torchvision==0.3.0

# Check torch environment by:  python3 -m torch.utils.collect_env

# 2. Get AlphaPose
git clone https://github.com/MVIG-SJTU/AlphaPose.git
# git pull origin pull/592/head if you use PyTorch>=1.5
cd AlphaPose

# 3. install
export PATH=/usr/local/cuda/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
pip install cython
sudo apt-get install libyaml-dev
python3 setup.py build develop --user

4. YOLOv3-SPP(PyTorch) to engine

YOLOv3-SPP(PyTorch)可以转成static shape的engine模型以及dynamic shape的engine模型。前者表示engine的输入数据只能是固定的尺寸，而后者表示我们输入的数据尺寸可以是动态变化的，但是变化的范围要在我们转成engine时所设置的范围内。

4.1 转成static shape的engine模型

(1) YOLOv3_SPP转成onnx模型

下载YOLOv3_SPP的cfg 以及weights ，并分别放在 ./detector/yolo/cfg/以及./detector/yolo/data/文件夹下。 YOLOv3_SPP输入数据的尺寸默认为: 1x3x608x608

python ./darknet2onnx.py 
--cfg ./detector/yolo/cfg/yolov3-spp.cfg 
--weight ./detector/yolo/data/yolov3-spp.weights

执行该命令之后，会在当前目录下产生一个yolov3_spp_static.onnx模型

(2) 对模型进行修正

由于YOLOv3-SPP模型中存在Padding操作，trt不能直接识别，因此需要onnx进行修改 this issue。可能需要额外下载tensorflow-gpu == 2.4.1以及polygraphy == 0.22.0模块。

polygraphy surgeon sanitize yolov3_spp_static.onnx 
--fold-constants 
--output yolov3_spp_static_folded.onnx

执行该命令之后，会在当前目录下产生一个yolov3_spp_static_folded.onnx模型

(3) 由onnx模型生成engine

需要注册ScatterND plugin，将this repository 下的plugins文件夹以及Makifile文件放到当前目录下，然后make MakeFile文件，进行编译，编译之后会在build文件夹下产生一个ScatterND.so动态库。

trtexec --onnx=yolov3_spp_static_folded.onnx 
--explicitBatch 
--saveEngine=yolov3_spp_static_folded.engine 
--workspace=10240 --fp16 --verbose 
--plugins=build/ScatterND.so

执行该命令之后，会在当前目录下产生一个yolov3_spp_static_folded.engine模型

4.2 转成dynamic shape的engine模型

(1) YOLOv3_SPP模型转成onnx模型

输入数据的默认尺寸为: -1x3x608x608 (-1表示batch size可变)

python darknet2onnx_dynamic.py 
--cfg ./detector/yolo/cfg/yolov3-spp.cfg 
--weight ./detector/yolo/data/yolov3-spp.weights

执行该命令之后，会在当前目录下产生一个yolov3_spp_-1_608_608_dynamic.onnx模型

(2) 对onnx模型就行修改

polygraphy surgeon sanitize yolov3_spp_-1_608_608_dynamic.onnx 
--fold-constants 
--output yolov3_spp_-1_608_608_dynamic_folded.onnx

(3) 由onnx模型转成engine

minShapes设置能够输入数据的最小尺寸，optShapes可以与minShapes保持一致，maxShapes设置输入数据的最大尺寸，这三个是必须要设置的，可通过trtexec -h查看具体用法。转换模型的时候一定需要将ScatterND.so动态库进行加载，不然可能会报该plugin无法识别的错误。

trtexec --onnx=yolov3_spp_-1_608_608_dynamic_folded.onnx 
--explicitBatch 
--saveEngine=yolov3_spp_-1_608_608_dynamic_folded.engine 
--workspace=10240 --fp16 --verbose 
--plugins=build/ScatterND.so 
--minShapes=input:1x3x608x608 
--optShapes=input:1x3x608x608 
--maxShapes=input:64x3x608x608 
--shapes=input:1x3x608x608

执行该命令之后，会在当前目录下产生一个yolov3_spp_-1_608_608_dynamic_folded.engine 模型(之后我们可以传入不同batch size的输入数据进行推理)

5. FastPose(PyTorch) to engine

5.1 生成static shape的engine模型

(1) FastPose转成onnx模型

模型输入数据的默认尺寸为: 1x3x256x192

python pytorch2onnx.py --cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml 
--checkpoint ./pretrained_models/fast_res50_256x192.pth

执行完该指令之后，会在当前目录下生成一个fastPose.onnx模型

(2) onnx转成engine模型

trtexec trtexec --onnx=fastPose.onnx 
-saveEngine=fastPose.engine --workspace=10240 
--fp16 
--verbose

执行该命令之后，会在当前目录下生成一个fastPose.engine模型

5.2 生成dynamic shape的engine模型

(1) 生成onnx模型

模型输入数据的默认尺寸为：-1x3x256x192 (-1表示batch size可变)

python pytorch2onnx_dynamic.py 
--cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml 
--checkpoint ./pretrained_models/fast_res50_256x192.pth

执行该命令之后，会在当前目录下生成一个alphaPose_-1_3_256_192_dynamic.onnx模型

(2) onnx模型转成engine模型

trtexec --onnx=alphaPose_-1_3_256_192_dynamic.onnx 
--saveEngine=alphaPose_-1_3_256_192_dynamic.engine 
--workspace=10240 --fp16 --verbose 
--minShapes=input:1x3x256x192 
--optShapes=input:1x3x256x192 
--maxShapes=input:128x3x256x192 
--shapes=input:1x3x256x192 
--explicitBatch

执行该命令之后，会在当前目录下生成一个alphaPose_-1_3_256_192_dynamic.engine模型

上面的所有模型都可以从baidu Pan 获取(提取码: cumt)

6. Inference

这一部分主要使用加速前后的模型对图像以及视频进行检测

6.1 对图像进行检测

将图像放在example/demo文件夹下，然后执行下面的指令，检测结果将保存在examples/res/vis文件夹下

(1) 使用未加速模型对图像进行检测

python inference.py --cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml 
--checkpoint ./pretrained_models/fast_res50_256x192.pth  
--save_img  --showbox 
--indir ./examples/demo

(2) 使用tensorRT加速模型对图像进行检测

python trt_inference.py 
--yolo_engine ./yolov3_spp_static_folded.engine 
--pose_engine ./fastPose.engine 
--cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml 
--save_img  
--indir ./examples/demo 
--dll_file ./build/ScatterND.so

如果希望检测结果对人体进行目标检测，可以加上--showbox

6.2 对视频进行检测

将视频放在video文件夹下，推理的结果将保存在examples/res文件夹下

(1) 使用未加速模型对视频进行检测

python inference.py --cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml
--checkpoint ./pretrained_models/fast_res50_256x192.pth 
--save_video
--video ./videos/demo.avi

(2) 使用tensorRT加速模型对视频进行检测

python trt_inference.py --yolo_engine ./yolov3_spp_static_folded.engine
--cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml
--save_video
--video ./videos/demo_short.avi 
--dll_file ./build/ScatterND.so
--pose_engine ./fastPose.engine 
--detector yolo

注意：在对视频的检测过程中，如果使用加速的YOLOv3_SPP模型会产生bug，因为这里使用未加速的YOLOv3_SPP 模型，在后续的工作中会针对该bug对程序进行改进。其中--detector yolo表示使用未加速的YOLOv3_SPP模型，--detector yolo_trt表示使用加速的YOLOv3_SPP模型

7. Validation

该部分使用加速前后的模型对MSCOCO 2017的验证集val2017 进行测试。将annotations以及val207放到data/coco文件夹下。

(1) 使用未加速的模型进行验证

python validate.py --cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml 
--checkpoint ./pretrained_models/fast_res50_256x192.pth  
--flip-test
--detector yolo

(2) 使用加速的模型进行验证

python validate_trt.py --cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml 
--pose_engine ./fastPose.engine 
--yolo_engine ./yolov3_spp_static_folded.engine 
--dll_file ./build/ScatterND.so 
--flip-test
--detector yolo_trt

8. Speed Up Validation

8.1 FastPose模型加速效果验证

可以使用下面命令对FastPose人体姿态检测模型的加速效果进行验证，这里使用的是dynamic shape的engine进行推理。

python demo_trt_fastpose.py 
--cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml 
--checkpoint ./pretrained_models/fast_res50_256x192.pth 
--engine_path ./alphaPose_-1_3_256_192_dynamic.engine --batch 1

8.2 YOLOv3_SPP模型加速效果验证

可以使用下面命令对YOLOv3_SPP人体目标检测的加速效果进行验证。

python demo_trt_yolov3_spp.py --cfg ./detector/yolo/cfg/yolov3-spp.cfg 
--weight ./detector/yolo/data/yolov3-spp.weights 
--engine_path ./yolov3_spp_-1_608_608_dynamic_folded.engine
--batch 1

8.3 AlphaPose(YOLOv3_SPP + FastPose)

可以使用下面命令对AlphaPose模型的加速效果进行验证。

python demo_trt_alphapose.py 
--fastpose_cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml
--yolo_cfg ./detector/yolo/cfg/yolov3-spp.cfg
--weight ./detector/yolo/data/yolov3-spp.weights
--checkpoint ./pretrained_models/fast_res50_256x192.pth
--fastpose_engine ./alphaPose_-1_3_256_192_dynamic.engine
--yolo_engine ./yolov3_spp_-1_608_608_dynamic_folded.engine
--batch 1

9. TODO

目标检测使用轻量级网络(YOLOv3-tiny, YOLOv4_tiny等)
使用numpy+pycuda进行推理加速
模型蒸馏
模型剪枝
使用C++的API实现TensorRT加速

10. Citation

Please cite these papers in your publications if it helps your research:

@inproceedings{fang2017rmpe，
  title={{RMPE}: Regional Multi-person Pose Estimation}，
  author={Fang， Hao-Shu and Xie， Shuqin and Tai， Yu-Wing and Lu， Cewu}，
  booktitle={ICCV}，
  year={2017}
}

@article{li2018crowdpose，
  title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark}，
  author={Li， Jiefeng and Wang， Can and Zhu， Hao and Mao， Yihuan and Fang， Hao-Shu and Lu， Cewu}，
  journal={arXiv preprint arXiv:1812.00324}，
  year={2018}
}

@inproceedings{xiu2018poseflow，
  author = {Xiu， Yuliang and Li， Jiefeng and Wang， Haoyu and Fang， Yinghong and Lu， Cewu}，
  title = {{Pose Flow}: Efficient Online Pose Tracking}，
  booktitle={BMVC}，
  year = {2018}
}

11. Reference

(1) AlphaPose

(2) trt-samples-for-hackathon-cn

(3) pytorch-YOLOv4

(4) darknet

Comments

Low accuracy of trt model in demo pictures

Hello, Thanks in advance for sharing the method of alphapose acceleration. I follow the repo and get the trt engine successfully. However, when I use the trt engine for the real pictures, the engine does not work. would you encount similar problems? the red points are the result of torch model, the green ones are from trt engine, fp32 and fp16 has the same issue.

opened by Gourmentic 1
关于环境搭建：是使用python3.8.5 还是使用python3.6

您好，我不太明白这个库如何安装使用，我是git clone alphapose 源码，基于这个源码进行搭建环境（python3.6，torch1.1），还是先搭建python3.8.5 ，torch1.8.1 ，再git alphapose 源码进行编译alphapose。我使用python3.8.5 torch1.8.1，编译alphapose没有成功，使用python3.6 torch1.1 可以成功但是运行时候出现问题#6的错误，希望您指点一下，谢谢

opened by zhanghongyong123456 0
alphapose-yolov3-spp导出onnx

您好：看了您的转换，很厉害，如果我直接在alphapose代码中导出yolov3-spp加了convcat等操作报错，请帮忙指导一下

def load_yolo_model(args): print('loading yolo model ...') det_model = Darknet("yolo/cfg/yolov3-spp.cfg") det_model.load_weights('models/yolo/yolov3-spp.weights') det_model.net_info['height'] = args.inp_dim det_inp_dim = int(det_model.net_info['height']) assert det_inp_dim % 32 == 0 assert det_inp_dim > 32 det_model.cuda() det_model.eval() dummy_input = torch.randn(1, 3, 608, 608, device='cuda') torch.onnx.export(det_model, dummy_input, './yolov3-spp-cat.onnx', verbose=True, opset_version=11)

报错： Loading pose model from ./models/sppe/duc_se.pth loading yolo model ... /home/trq/wanda/AlphaPose_wanda/yolo/util.py:53: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! grid_len = np.arange(grid_size) /home/trq/wanda/AlphaPose_wanda/yolo/util.py:53: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! grid_len = np.arange(grid_size) /home/trq/wanda/AlphaPose_wanda/yolo/util.py:68: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! anchors = torch.FloatTensor(anchors) Traceback (most recent call last): File "/home/trq/wanda/AlphaPose_wanda/demo.py", line 191, in Pose_model=Pose(opt) File "/home/trq/wanda/AlphaPose_wanda/demo.py", line 61, in init self.det_model,self.det_inp_dim=load_yolo_model(self.args) File "/home/trq/wanda/AlphaPose_wanda/demo.py", line 48, in load_yolo_model torch.onnx.export(det_model, dummy_input, './yolov3-spp-cat.onnx', verbose=True, opset_version=11) File "/home/trq/anaconda3/lib/python3.6/site-packages/torch/onnx/init.py", line 230, in export custom_opsets, enable_onnx_checker, use_external_data_format) File "/home/trq/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 91, in export use_external_data_format=use_external_data_format) File "/home/trq/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 639, in _export dynamic_axes=dynamic_axes) File "/home/trq/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 450, in _model_to_graph _export_onnx_opset_version) RuntimeError: Tensors must have same number of dimensions: got 2 and 1

opened by DDX1126 0
Issue about plugins when converting onnx to tensorrt

Hi, oreo-lp I have compiled plugins as you guided. But when I convert an onnx model to tensorrt, it meets the problem. I used two plugins "scatterND" and "GridSampler". But maybe only one is recognized. Can you tell me how to solve this?

` /usr/src/tensorrt/bin/trtexec --onnx=adcp_1248_384_dynamic.onnx --saveEngine=adcp_engine.trt --explicitBatch --plugins=/home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/GridSamplerPlugin.so --plugins=/home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/ScatterND.so

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=adcp_1248_384_dynamic.onnx --saveEngine=adcp_engine.trt --explicitBatch --plugins=/home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/GridSamplerPlugin.so --plugins=/home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/ScatterND.so [10/25/2021-19:44:21] [I] === Model Options === [10/25/2021-19:44:21] [I] Format: ONNX [10/25/2021-19:44:21] [I] Model: adcp_1248_384_dynamic.onnx [10/25/2021-19:44:21] [I] Output: [10/25/2021-19:44:21] [I] === Build Options === [10/25/2021-19:44:21] [I] Max batch: explicit [10/25/2021-19:44:21] [I] Workspace: 16 MB [10/25/2021-19:44:21] [I] minTiming: 1 [10/25/2021-19:44:21] [I] avgTiming: 8 [10/25/2021-19:44:21] [I] Precision: FP32 [10/25/2021-19:44:21] [I] Calibration: [10/25/2021-19:44:21] [I] Safe mode: Disabled [10/25/2021-19:44:21] [I] Save engine: adcp_engine.trt [10/25/2021-19:44:21] [I] Load engine: [10/25/2021-19:44:21] [I] Builder Cache: Enabled [10/25/2021-19:44:21] [I] NVTX verbosity: 0 [10/25/2021-19:44:21] [I] Inputs format: fp32:CHW [10/25/2021-19:44:21] [I] Outputs format: fp32:CHW [10/25/2021-19:44:21] [I] Input build shapes: model [10/25/2021-19:44:21] [I] Input calibration shapes: model [10/25/2021-19:44:21] [I] === System Options === [10/25/2021-19:44:21] [I] Device: 0 [10/25/2021-19:44:21] [I] DLACore: [10/25/2021-19:44:21] [I] Plugins: /home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/ScatterND.so /home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/GridSamplerPlugin.so [10/25/2021-19:44:21] [I] === Inference Options === [10/25/2021-19:44:21] [I] Batch: Explicit [10/25/2021-19:44:21] [I] Input inference shapes: model [10/25/2021-19:44:21] [I] Iterations: 10 [10/25/2021-19:44:21] [I] Duration: 3s (+ 200ms warm up) [10/25/2021-19:44:21] [I] Sleep time: 0ms [10/25/2021-19:44:21] [I] Streams: 1 [10/25/2021-19:44:21] [I] ExposeDMA: Disabled [10/25/2021-19:44:21] [I] Spin-wait: Disabled [10/25/2021-19:44:21] [I] Multithreading: Disabled [10/25/2021-19:44:21] [I] CUDA Graph: Disabled [10/25/2021-19:44:21] [I] Skip inference: Disabled [10/25/2021-19:44:21] [I] Inputs: [10/25/2021-19:44:21] [I] === Reporting Options === [10/25/2021-19:44:21] [I] Verbose: Disabled [10/25/2021-19:44:21] [I] Averages: 10 inferences [10/25/2021-19:44:21] [I] Percentile: 99 [10/25/2021-19:44:21] [I] Dump output: Disabled [10/25/2021-19:44:21] [I] Profile: Disabled [10/25/2021-19:44:21] [I] Export timing to JSON file: [10/25/2021-19:44:21] [I] Export output to JSON file: [10/25/2021-19:44:21] [I] Export profile to JSON file: [10/25/2021-19:44:21] [I] [10/25/2021-19:44:21] [I] Loading supplied plugin library: /home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/ScatterND.so [10/25/2021-19:44:21] [I] Loading supplied plugin library: /home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/GridSamplerPlugin.so

Input filename: adcp_1248_384_dynamic.onnx ONNX IR version: 0.0.6 Opset version: 11 Producer name: pytorch Producer version: 1.6 Domain:
Model version: 0 Doc string:

[10/25/2021-19:44:29] [I] [TRT] builtin_op_importers.cpp:3676: Successfully created plugin: ScatterND [10/25/2021-19:44:29] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: GridSampler. Attempting to import as plugin. [10/25/2021-19:44:29] [I] [TRT] builtin_op_importers.cpp:3659: Searching for plugin: GridSampler, plugin_version: 1, plugin_namespace: [10/25/2021-19:44:29] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin GridSampler version 1 ERROR: builtin_op_importers.cpp:3661 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?" [10/25/2021-19:44:29] [E] Failed to parse onnx file [10/25/2021-19:44:29] [E] Parsing model failed [10/25/2021-19:44:29] [E] Engine creation failed [10/25/2021-19:44:29] [E] Engine set up failed &&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=adcp_1248_384_dynamic.onnx --saveEngine=adcp_engine.trt --explicitBatch --plugins=/home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/GridSamplerPlugin.so --plugins=/home/nvidia/Project/tensorrt_third/AlphaPose_TRT-master/build/ScatterND.so `

opened by rsj007 0
Can not found pytorch2onnx.py in this repositor

When I use this command line , python pytorch2onnx.py --cfg ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint ./pretrained_models/fast_res50_256x192.pth

I can not found pytorch2onnx.py in this repositor，did you forget to upload this file?

opened by HW140701 2

基于AlphaPose的TensorRT加速

Related tags

Overview

1. Requirements

2. Results

2.1 YOLOv3-SPP speed up

2.2 Fast Pose speed up

2.3 YOLOv3-SPP + FastPose speed up

3. Code installation

3.1 使用conda进行安装

3.2 使用pip进行安装

4. YOLOv3-SPP(PyTorch) to engine

4.1 转成static shape的engine模型

4.2 转成dynamic shape的engine模型

5. FastPose(PyTorch) to engine

5.1 生成static shape的engine模型

5.2 生成dynamic shape的engine模型

6. Inference

6.1 对图像进行检测

6.2 对视频进行检测

7. Validation

8. Speed Up Validation

8.1 FastPose模型加速效果验证

8.2 YOLOv3_SPP模型加速效果验证

8.3 AlphaPose(YOLOv3_SPP + FastPose)

9. TODO

10. Citation

11. Reference

Comments

Low accuracy of trt model in demo pictures

关于环境搭建：是使用python3.8.5 还是使用python3.6

alphapose-yolov3-spp导出onnx

Issue about plugins when converting onnx to tensorrt

Input filename: adcp_1248_384_dynamic.onnx ONNX IR version: 0.0.6 Opset version: 11 Producer name: pytorch Producer version: 1.6 Domain: Model version: 0 Doc string:

Can not found pytorch2onnx.py in this repositor

Owner

Input filename: adcp_1248_384_dynamic.onnx ONNX IR version: 0.0.6 Opset version: 11 Producer name: pytorch Producer version: 1.6 Domain:
Model version: 0 Doc string: