A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Overview

Movenet.Pytorch

license

Intro

start

MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. This is A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Google just release pre-train models(tfjs or tflite), which cannot be converted to some CPU inference framework such as NCNN,Tengine,MNN,TNN, and we can not add our own custom data to finetune, so there is this repo.

How To Run

1.Download COCO dataset2017 from https://cocodataset.org/. (You need train2017.zip, val2017.zip and annotations.)Unzip to movenet.pytorch/data/ like this:

├── data
    ├── annotations (person_keypoints_train2017.json, person_keypoints_val2017.json, ...)
    ├── train2017   (xx.jpg, xx.jpg,...)
    └── val2017     (xx.jpg, xx.jpg,...)

2.Make data to our data format.

python scripts/make_coco_data_17keypooints.py
Our data format: JSON file
Keypoints order:['nose', 'left_eye', 'right_eye', 'left_ear', 'right_ear', 
    'left_shoulder', 'right_shoulder', 'left_elbow', 'right_elbow', 'left_wrist', 
    'right_wrist', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 'left_ankle', 
    'right_ankle']

One item:
[{"img_name": "0.jpg",
  "keypoints": [x0,y0,z0,x1,y1,z1,...],
  #z: 0 for no label, 1 for labeled but invisible, 2 for labeled and visible
  "center": [x,y],
  "bbox":[x0,y0,x1,y1],
  "other_centers": [[x0,y0],[x1,y1],...],
  "other_keypoints": [[[x0,y0],[x1,y1],...],[[x0,y0],[x1,y1],...],...], #lenth = num_keypoints
 },
 ...
]

3.You can add your own data to the same format.

4.After putting data at right place, you can start training

python train.py

5.After training finished, you need to change the test model path to test. Such as this in predict.py

run_task.modelLoad("output/xxx.pth")

6.run predict to show predict result, or run evaluate.py to compute my acc on test dataset.

python predict.py

7.Convert to onnx.

python pth2onnx.py

Training Results

Some good samples

good

Some bad cases

bad

Tips to improve

1. Focus on data

  • Add COCO2014. (But as I know it has some duplicate data of COCO2017, and I don't know if google use it.)
  • Clean the croped COCO2017 data. (Some img just have little points, such as big face, big body,etc.MoveNet is a small network, COCO data is a little hard for it.)
  • Add some yoga, fitness, and dance videos frame from YouTube. (Highly Recommened! Cause Google did this on their Movenet and said 'Evaluations on the Active validation dataset show a significant performance boost relative to identical architectures trained using only COCO. ')

2. Change backbone

Try to ransfer Mobilenetv2(original Movenet) to Mobilenetv3 or Shufflenetv2 may get a litte improvement.If you just wanna reproduce the original Movenet, u can ignore this.

3. More fancy loss

Surely this is a muti-task learning. So add some loss to learn together may improve the performence. (Such as BoneLoss which I have added.) And we can never know how Google trained, cause we cannot see it from the pre-train tflite model file, so you can try any loss function you like.

4. Data Again

I just wanna you know the importance of the data. The more time you spend on clean data and add new data, the better performance your model will get! (While tips 2 and 3 may not.)

Resource

  1. Blog:Next-Generation Pose Detection with MoveNet and TensorFlow.js
  2. model card
  3. TFHub:movenet/singlepose/lightning
  4. My article share: 2021轻量级人体姿态估计模型修炼之路(附谷歌MoveNet复现经验)
Comments
  • Output rendering inconsistency with API

    Output rendering inconsistency with API

    HI,

    While using the tensorflow api, input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() interpreter.set_tensor(input_details[0]['index'], np.array(input_image)) interpreter.invoke() print( interpreter.get_tensor(output_details[0]['index']).shape)

    The output is 1,1,17,3 that is 17 keypoints and their respective y-x, coordinates and confidence scores

    I used the pretrained model in this repositories output folder called e91_valacc0.79763.pth converted to onnx then to tf and finally to tflite. The only change I made in pth2onnx.py is opset_version=10

    torch.onnx.export(run_task.model, dummy_input1, "output/pose.onnx", verbose=True, input_names=input_names, output_names=output_names, do_constant_folding=True,opset_version=10)

    now i did the same for the tflite model- that is the first 5 lines of code. The output shape is 1,34,48,48.

    How can I obtain the output in the same format as that we receive while using the API?

    opened by leftbackn3 8
  • 关于reg操作在训练时体现

    关于reg操作在训练时体现

    https://github.com/fire717/movenet.pytorch/blob/95ec8535245228aa4335243e68722810e50bcaf8/lib/task/task_tools.py#L124-L144

    您好,在prediction 阶段可以到 reg 进行了这样的一个处理, 也是TF的处理方式 48x48 根据x,y轴 以0到47的生成的坐标 x' = (rangeweight - reg)^2 tmp_reg = (x'+y')^0.5 + 1.8 keypoint_heatmap/tmp_reg -> 在取 maxpoint 的 reg_x, reg_y

    这样的一个操作请问在训练哪部分体现呢? 另外,这样得到的reg_x, reg_y 和 reg heatmap 直接得到的坐标有什么区别呢? 多谢

    edit:请问在微信交流一下其他细节方便吗?

    opened by cassie101 4
  • Found two problems, please take a look

    Found two problems, please take a look

    您好,为了节省时间,我就用中文说了。 首先感谢您的工作,节省了我们很多时间。 我今天才发现您开源了代码,在这之前我自己也参照您的文章进行了复现,但我还没有调出一个好的效果。 今天发现您开源之后挺激动的,立马用您的代码进行了复现,并阅读了核心实现部分。但我发现了2个问题:

    1. other_keypoints的标注生成有问题,这会导致生成heatmap时other_keypoints的高斯全部都挤在第17个channel上。具体由下面这行代码导致(应该是kid2而不是kid): https://github.com/fire717/movenet.pytorch/blob/5806c723f1a41e57f8816c286be8888bcac5099f/scripts/make_coco_data_17keypooints.py#L204
    2. 在生成keypoint regression target时,您对center周围一圈都赋值了,这与您的文章里一致。但是在计算loss时,似乎还是只使用了center那一个点的值。另外,在计算loss时,您使用了网络实际输出的center和keypoint位置来计算regression loss 和offset loss,这与centernet的做法不一致,centernet使用的是ground truth的位置。请问这些都是有意而为之的吗?
    opened by zwfcrazy 4
  • RuntimeError: stack expects each tensor to be equal size, but got [161, 48, 48] at entry 0 and [86, 48, 48] at entry 1

    RuntimeError: stack expects each tensor to be equal size, but got [161, 48, 48] at entry 0 and [86, 48, 48] at entry 1

    我生成了自己的新数据集(32点)。 然后参考以下做了数据集转换。 movenet.pytorch/scripts/make_coco_data_17keypoints.py 然后报了这个异常: RuntimeError: stack expects each tensor to be equal size, but got [161, 48, 48] at entry 0 and [86, 48, 48] at entry 1 暂时没能解决,有解决的思路吗?

    opened by bobby20180331 3
  • 关于key_points 的mirror 方法

    关于key_points 的mirror 方法

    你好作者,非常感谢你的开源code。 背景: 我想修改识别的key points 的总数,因此在改动的时候,mirror 这个地方不甚了解

    问题: 在修改到 data_augment 的时候,就发现,mirror 的方法 不太了解 该代码位于lib/data/data_augment.py

    ` def Mirror(src, label=None): """ item = { "img_name":save_name,
    "keypoints":save_keypoints, relative position "center":save_center, "other_centers":other_centers, "other_keypoints":other_keypoints, } # mirror 后左右手顺序就变了! """ keypoints = label['keypoints'] center = label['center'] other_centers = label['other_centers'] other_keypoints = label['other_keypoints']

    img = cv2.flip(src, 1)
    if label is None:
        return img, label
    
    for i in range(len(keypoints)):
        if i % 3 == 0:
            keypoints[i] = 1 - keypoints[i]
    keypoints = [
        keypoints[0], keypoints[1], keypoints[2],
        keypoints[6], keypoints[7], keypoints[8],
        keypoints[3], keypoints[4], keypoints[5],
        keypoints[12], keypoints[13], keypoints[14],
        keypoints[9], keypoints[10], keypoints[11],
        keypoints[18], keypoints[19], keypoints[20],
        keypoints[15], keypoints[16], keypoints[17],
        keypoints[24], keypoints[25], keypoints[26],
        keypoints[21], keypoints[22], keypoints[23],
        keypoints[30], keypoints[31], keypoints[32],
        keypoints[27], keypoints[28], keypoints[29],
        keypoints[36], keypoints[37], keypoints[38],
        keypoints[33], keypoints[34], keypoints[35],
        keypoints[42], keypoints[43], keypoints[44],
        keypoints[39], keypoints[40], keypoints[41],
        keypoints[48], keypoints[49], keypoints[50],
        keypoints[45], keypoints[46], keypoints[47]]
    

    `

    可以看到代码中keypoints 是写死的,但是这个规律没有找到。 如果我是 4个关键点 又或者 60 个关键点,他们都是左右分布的。我想问下,这个应该怎么修改,或者 原先是为啥要这样写的?

    opened by Galaxy-Ding 2
  • Support for Apple Silicon / M1

    Support for Apple Silicon / M1

    Support for GPU-accelerated training on Apple Silicon is apparently only available as of PyTorch version 1.12

    I also have other problems with running it, and before I spend hours on trying to fix it for my setup, did anybody already do it?

    Sorry for already filing an issue, if there was a Q/A section I would have used that before.

    thanks in advance

    opened by ningelsohn 2
  • ACC计算问题

    ACC计算问题

    https://github.com/fire717/movenet.pytorch/blob/f248899d974ec207dff48985e4b102190b12cde8/lib/task/task_tools.py#L261 您好大佬,打扰了,首先感谢您的分享,我想问一下您在这里说的“设为-1 且后续不参与acc计算的点” 是不是后续还是在算acc时参与计算了呀,还是说您在其他哪个地方将这些为-1的点去掉了呢?因为我其实print看这些-1的点还是参与计算了呀 https://github.com/fire717/movenet.pytorch/blob/f248899d974ec207dff48985e4b102190b12cde8/lib/utils/metrics.py#L19

    希望能得到您的解释谢谢!

    opened by Dou-Yuxiao 2
  • Question regarding the pre-trained models

    Question regarding the pre-trained models

    Hey Fire,

    Thanks for the repository. It looks well made and to the point.

    About the pre-trained models, are they the same ones that google has uploaded, converted into pytorch-compatible format or are they different ones? If they are different, what are they pre-trained on?

    opened by gaurvigoyal 2
  • offset groundtruth

    offset groundtruth

    https://github.com/fire717/movenet.pytorch/blob/bbc81408bd4da49789d912fd08635355fe123e60/lib/data/data_tools.py#L152-L153

    您好,请问下面一行应该是 small_y = int(regs[i*2+1,cy,cx]+cy) 吗?

    opened by cassie101 1
  • 关于修改模型实现多分类以及数据集格式转化和部署的相关问题?

    关于修改模型实现多分类以及数据集格式转化和部署的相关问题?

    @fire717 大佬你好,就是我是一名大二学生,然后是在中北大学的robomaster战队里负责用神经网络识别装甲板实现自动瞄准,不过就是之前我用yolo系列训练出来的模型最后实际测试时得到的bbox和装甲板的轮廓并不能很好的拟合,导致后续使用pnp进行姿态解算时会有较大误差,所以我想将传统yolo的数据集格式改为用四个角点的归一化坐标,现在的数据集格式是像这样:1 0.673029 0.373564 0.678429 0.426232 0.830433 0.401262 0.824525 0.351212,第一个数字是类别id,后面八个数字是归一化后的装甲板的四个角点坐标,之前我使用yolov5-face已经训练出来一个可以直接定位装甲板四个角点的模型,效果如下: ca84c03809b033d4-1.jpg 所以我想请教请教一下如何将我现在的数据集标注格式转化为您使用的coco格式,然后因为我们需要同时识别数字和颜色,所以我想将颜色和数字解耦,就是在head中增添一个1x1conv来单独输出颜色,之前以yolox为基础修改过,但是对于您的模型也不知道如何下手修改,最后就是我之前是通过openvino的c++接口来部署模型,所以不知道大佬能否提供c++实现后处理的相关思路,想参考一下,问的有点多,但还是希望大佬不吝赐:-)

    opened by Hezhexi2002 4
  • Finetune official models

    Finetune official models

    Hi, Can this repo be used to fine-tune the official tflite models of lightning and thunder? Please guide me through the process if this is possible. Thanks

    opened by gsrujana 1
  • 關於ground truth的decode

    關於ground truth的decode

    請問一下 關於下面這行ground truth

    https://github.com/fire717/movenet.pytorch/blob/95ec8535245228aa4335243e68722810e50bcaf8/lib/task/task.py#L187

    請問為何需要經過decode?? 我的理解是直接讀取jason file得到位置 但這邊跟predict一樣 似乎丟進model去decode 請問有特別的原因嗎? 這樣算loss時會不會有誤差? 謝謝

    opened by Viewpointet 1
  • Question regarding InvertedResidual block implementation

    Question regarding InvertedResidual block implementation

    感谢fire717老师分享的代码实现和解析文章。我拜读之后感到受益匪浅。

    这边对于mobilenet_v2的代码实现有一点细节方面的问题,不知道老师您能否为我解惑? 这是本repo对于InvertedResidual模块的实现

    def forward(self, x):
        x = self.conv1(x)
        for _ in range(self.n):
             x = x + self.conv2(x)
        return x
    

    以下是参考torchvision中的mobilenet_v2实现中对应的部分:

    # building inverted residual blocks
    for t, c, n, s in inverted_residual_setting:
        output_channel = _make_divisible(c * width_mult, round_nearest)
        for i in range(n):
            stride = s if i == 0 else 1
            features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer))
            input_channel = output_channel
    

    我目前理解的两者之间的区别是,本实现版本中的conv2部分是网络结构相同且共用相同权重的InvertedResidual block,输入通过conv1后将通过(n-1)个权重相同的模块。而在torchvision的实现版本中,因为block会调用InvertedResidua的constructor,生成的是网络结构相同但不共用权重的InvertedResidualBlock。不知道我这样的理解是否正确?想请教一下原版本movenet的实现也采用了类似的设计吗?

    opened by AndyAtCMU 2
  • 請問單色背景的錯誤predict / Single color background predict error.

    請問單色背景的錯誤predict / Single color background predict error.

    請問各位有沒有遇過在單色背景(如紅色)下,predict點位應該位於人物身上的被拉過去背景的情形?? 要如何解決?? 感謝 I met a situation like single color background such as red color, predict keypoints are located at red color not on human. Is there any solution for this situation ? thanks. 699pic_2ao1f7_xy

    opened by Viewpointet 4
Owner
Mr.Fire
Mr.Fire
MoveNet Single Pose on OpenVINO

MoveNet Single Pose tracking on OpenVINO Running Google MoveNet Single Pose models on OpenVINO. A convolutional neural network model that runs on RGB

null 35 Nov 11, 2022
MoveNet Single Pose on DepthAI

MoveNet Single Pose tracking on DepthAI Running Google MoveNet Single Pose models on DepthAI hardware (OAK-1, OAK-D,...). A convolutional neural netwo

null 64 Dec 29, 2022
Pose estimation with MoveNet Lightning

Pose Estimation With MoveNet Lightning MoveNet is the TensorFlow pre-trained model that identifies 17 different key points of the human body. It is th

Yash Vora 2 Jan 4, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

null 2 Dec 28, 2021
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Transformer-PyTorch A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (

Jared Wang 22 Feb 27, 2022
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022
SelfAugment extends MoCo to include automatic unsupervised augmentation selection.

SelfAugment extends MoCo to include automatic unsupervised augmentation selection. In addition, we've included the ability to pretrain on several new datasets and included a wandb integration.

Colorado Reed 24 Oct 26, 2022
This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

International Business Machines 72 Jan 6, 2023
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

Train longer, generalize better - Big batch training This is a code repository used to generate the results appearing in "Train longer, generalize bet

Elad Hoffer 145 Sep 16, 2022
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

null 55 Nov 23, 2022
PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

TUPE PyTorch implementation of Rethinking Positional Encoding in Language Pre-training. Quickstart Clone this repository. git clone https://github.com

Jake Tae 5 Jan 27, 2022