Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image

Overview

MeshGraphormer

This is our research code of Mesh Graphormer.

Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image. In this work, we study how to combine graph convolutions and self-attentions in a transformer to better model both local and global interactions.

Installation

Check INSTALL.md for installation instructions.

Model Zoo and Download

Please download our pre-trained models and other relevant files that are important to run our code.

Check DOWNLOAD.md for details.

Experiments

We provide python codes for training and evaluation.

Check EXP.md for details.

License

Our research code is released under the MIT license. See LICENSE for details.

We use submodules from third parties, such as huggingface/transformers and hassony2/manopth. Please see NOTICE for details.

Our models have dependency with SMPL and MANO models. Please note that any use of SMPL models and MANO models are subject to Software Copyright License for non-commercial scientific research purposes. Please see SMPL-Model License and MANO License for details.

Contributing

We welcome contributions and suggestions. Please check CONTRIBUTE and CODE_OF_CONDUCT for details.

Citations

If you find our work useful in your research, please consider citing:

@inproceedings{lin2021-mesh-graphormer,
author = {Lin, Kevin and Wang, Lijuan and Liu, Zicheng},
title = {Mesh Graphormer},
booktitle = {ICCV},
year = {2021},
}

Acknowledgments

Our implementation and experiments are built on top of open-source GitHub repositories. We thank all the authors who made their code public, which tremendously accelerates our project progress. If you find these works helpful, please consider citing them as well.

huggingface/transformers

HRNet/HRNet-Image-Classification

nkolot/GraphCMR

akanazawa/hmr

MandyMo/pytorch_HMR

hassony2/manopth

hongsukchoi/Pose2Mesh_RELEASE

mks0601/I2L-MeshNet_RELEASE

open-mmlab/mmdetection

Comments
  • not good result ??

    not good result ??

    https://user-images.githubusercontent.com/7630101/137278281-d38daeb7-6d1d-4762-96cb-eb256ea19f74.mp4

    I change from your code and write my own test script to inference from video, but the result seems not good as previous videos shows.

    the core code is :

            # forward-pass
            pred_camera, pred_3d_joints, pred_vertices_sub2, pred_vertices_sub, pred_vertices = Graphormer_model(images, smpl, mesh_sampler)
            # obtain 3d joints, which are regressed from the full mesh
            pred_3d_joints_from_smpl = smpl.get_h36m_joints(pred_vertices)
            pred_3d_joints_from_smpl = pred_3d_joints_from_smpl[:,cfg.H36M_J17_TO_J14,:]
    
            # obtain 2d joints, which are projected from 3d joints of smpl mesh
            pred_2d_joints_from_smpl = orthographic_projection(pred_3d_joints_from_smpl, pred_camera)
    

    the skeleton is pred_2d_joints_from_smpl . do i misunderstand something?

    opened by zhaishengfu 4
  • RuntimeError: The size of tensor a (750) must match the size of tensor b (25) at non-singleton dimension 0

    RuntimeError: The size of tensor a (750) must match the size of tensor b (25) at non-singleton dimension 0

    Hi, Thank you for your great work. I faced some problem when running evaluation on 3DPW. I was trying to use 1 GPU instead of 4. However, there are some errors that I don't understand. Would you help me see where I need to adjust?

    (gphmr) jyz@jyz-Alienware-Aurora-R8:~/GitHub/MeshGraphormer$ python src/tools/run_gphmer_bodymesh.py --val_yaml 3dpw/test_has_gender.yaml --arch hrnet-w64 --num_workers 4 --per_gpu_eval_batch_size 25 --num_hidden_layers 4 --num_attention_heads 4 --input_feat_dim 2051,512,128 --hidden_feat_dim 1024,256,64 --run_eval_only --resume_checkpoint ./models/graphormer_release/graphormer_3dpw_state_dict.bin set os.environ[OMP_NUM_THREADS] to 4 2021-10-28 04:44:38,749 Graphormer INFO: Using 1 GPUs /home/jyz/anaconda3/envs/gphmr/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. self._set_intXint(row, col, x.flat[0]) /home/jyz/anaconda3/envs/gphmr/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. self._set_intXint(row, col, x.flat[0]) /home/jyz/anaconda3/envs/gphmr/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. self._set_intXint(row, col, x.flat[0]) 2021-10-28 04:44:53,992 Graphormer INFO: Update config parameter num_hidden_layers: 12 -> 4 2021-10-28 04:44:53,992 Graphormer INFO: Update config parameter hidden_size: 768 -> 1024 2021-10-28 04:44:53,992 Graphormer INFO: Update config parameter num_attention_heads: 12 -> 4 2021-10-28 04:44:53,992 Graphormer INFO: Update config parameter intermediate_size: 3072 -> 2048 2021-10-28 04:44:54,909 Graphormer INFO: Init model from scratch. 2021-10-28 04:44:54,909 Graphormer INFO: Update config parameter num_hidden_layers: 12 -> 4 2021-10-28 04:44:54,909 Graphormer INFO: Update config parameter hidden_size: 768 -> 256 2021-10-28 04:44:54,909 Graphormer INFO: Update config parameter num_attention_heads: 12 -> 4 2021-10-28 04:44:54,909 Graphormer INFO: Update config parameter intermediate_size: 3072 -> 512 2021-10-28 04:44:55,045 Graphormer INFO: Init model from scratch. 2021-10-28 04:44:55,046 Graphormer INFO: Add Graph Conv 2021-10-28 04:44:55,046 Graphormer INFO: Update config parameter num_hidden_layers: 12 -> 4 2021-10-28 04:44:55,046 Graphormer INFO: Update config parameter hidden_size: 768 -> 64 2021-10-28 04:44:55,046 Graphormer INFO: Update config parameter num_attention_heads: 12 -> 4 2021-10-28 04:44:55,046 Graphormer INFO: Update config parameter intermediate_size: 3072 -> 128 2021-10-28 04:44:55,112 Graphormer INFO: Init model from scratch. => loading pretrained model models/hrnet/hrnetv2_w64_imagenet_pretrained.pth 2021-10-28 04:45:00,432 Graphormer INFO: => loading hrnet-v2-w64 model 2021-10-28 04:45:00,433 Graphormer INFO: Graphormer encoders total parameters: 83318598 2021-10-28 04:45:00,435 Graphormer INFO: Backbone total parameters: 128059944 2021-10-28 04:45:00,488 Graphormer INFO: Loading state dict from checkpoint ./models/graphormer_release/graphormer_3dpw_state_dict.bin 2021-10-28 04:45:07,582 Graphormer INFO: Training parameters Namespace(arch='hrnet-w64', config_name='', data_dir='datasets', device=device(type='cuda'), distributed=False, drop_out=0.1, hidden_feat_dim='1024,256,64', hidden_size=64, img_scale_factor=1, input_feat_dim='2051,512,128', interm_size_scale=2, intermediate_size=128, joints_loss_weight=1000.0, local_rank=0, logging_steps=1000, lr=0.0001, mesh_type='body', model_name_or_path='src/modeling/bert/bert-base-uncased/', num_attention_heads=4, num_gpus=1, num_hidden_layers=4, num_train_epochs=200, num_workers=4, output_dir='output/', per_gpu_eval_batch_size=25, per_gpu_train_batch_size=30, resume_checkpoint='./models/graphormer_release/graphormer_3dpw_state_dict.bin', run_eval_only=True, seed=88, train_yaml='imagenet2012/train.yaml', val_yaml='3dpw/test_has_gender.yaml', vertices_loss_weight=100.0, vloss_w_full=0.33, vloss_w_sub=0.33, vloss_w_sub2=0.33, which_gcn='0,0,1') 3dpw/test_has_gender.yaml Traceback (most recent call last): File "src/tools/run_gphmer_bodymesh.py", line 747, in main(args) File "src/tools/run_gphmer_bodymesh.py", line 734, in main run_eval_general(args, val_dataloader, _model, smpl, mesh_sampler) File "src/tools/run_gphmer_bodymesh.py", line 374, in run_eval_general mesh_sampler) File "src/tools/run_gphmer_bodymesh.py", line 412, in run_validate gt_vertices = smpl(gt_pose, gt_betas) File "/home/jyz/anaconda3/envs/gphmr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/jyz/GitHub/MeshGraphormer/MeshGraphormer/src/modeling/_smpl.py", line 94, in forward v_posed = v_shaped + torch.matmul(posedirs, lrotmin[:, :, None]).view(-1, 6890, 3) RuntimeError: The size of tensor a (750) must match the size of tensor b (25) at non-singleton dimension 0

    opened by CocoaPebble 3
  • body mesh reconstruction test

    body mesh reconstruction test

    thanks for sharing this wonderful project! I see that you tell us how to testhand , but not test body , so how can we test body mesh resconstruction with our own image or videos?

    opened by zhaishengfu 3
  • How to get the 'azureml' file?

    How to get the 'azureml' file?

    azureml Greeting! When I want to evaluate the code, I meet an error "No module named 'azureml'. I check the root and couldn't find the file. I would like to know how to get this file for the correct run. Thanks for your great work!

    opened by PomIsBest 2
  • how to inference and visualize the output mesh

    how to inference and visualize the output mesh

    thanks for your project! how can we inference the output mesh with our own video or picture? could you please provide the code? Also,how can we visualize the output mesh?

    opened by liushanyuan18 2
  • about the meshgraphormer architecture problem

    about the meshgraphormer architecture problem

    Hello, I want to know two things,

    1. I want to prove whether the input in the meshgraphormer framework and the information in the attention mechanism are as I drew the picture IMG_20210908_200139

    2. I want to know that grid features are obtained by extracting features through CNN. Its dimension is 49×1024. It is not marked in the paper that it uses upsampling. How did it transform from 49×1024 to 49×2051? image

    opened by zhangzhenyu-zzy 2
  • Clarification of loss

    Clarification of loss

    Dear authors, thanks for the great work! In your paper, you mentioned that L1 loss was applied to vertices, 3d joints and 2d joints in section 4.3: Training details.

    To be specific, we apply L1 loss to 3D mesh vertices and body joints. We also apply L1 loss to 2D projected body joints to improve the align- ment between the image and the reconstructed mesh.

    However, I noticed that MSE was applied to 2D and 3D joints in your code: https://github.com/microsoft/MeshGraphormer/blob/1c489e35e6bd3848ce0702891e4c8365b584bb8e/src/tools/run_gphmer_bodymesh.py#L168-L171

    I was wondering which loss did you use? And what is the loss used in the provided pretrained model?

    Thank you!

    opened by pangyyyyy 1
  • model predictions download links is invalid

    model predictions download links is invalid

    Thank you for your excellent work! model predictions download links is invalid now. Can you share it again? model predictions https://datarelease.blob.core.windows.net/metro/graphormer_release_ckpt200-multisc-pred.zip Looking forward to your reply.

    opened by Rainfalllen 1
  • Some download links is invalid now.

    Some download links is invalid now.

    Thank you for your excellent work! Some download links is invalid now.Can you share them again?

    1. Datasets https://datarelease.blob.core.windows.net/metro/datasets/filename.tar
    2. model predictions https://datarelease.blob.core.windows.net/metro/graphormer_release_ckpt200-multisc-pred.zip Looking forward to your reply.
    opened by julyiii 1
  • Training batch size

    Training batch size

    Greetings,

    In the paper, you mention that the training time batch size is 32, however, in the training code you provide for the mixed datasets, the --per_gpu_train_batch_size argument is set to 25. Which batch size should be used to reproduce the results?

    Thanks in advance.

    opened by Bozcomlekci 1
  • evaluate H36m on protocol2

    evaluate H36m on protocol2

    Hi! I have a question about evaluation. human36m protocol2 execute on subject9,11 with 64 frame steps. but i think your code is on 5 frame steps. ex) _000001.jpg, _000006.jpg, _000011.jpg, ...

    Could you share 64 frame steps evaluation code?

    opened by asw91666 1
  • question about H36M model

    question about H36M model

    When I trained with mixed datasets, the best result was generated by 76th epoch. The best mPJPE is 56.21, and PAmPJPE is 35.62, which are different from (https://datarelease.blob.core.windows.net/metro/models/2021-02-25-graphormer_h36m_log.txt), MPJPE 51.2, PAmPJPE 34.5.

    The command is the same as in EXP.md

    python -m torch.distributed.launch --nproc_per_node=8 src/tools/run_gphmer_bodymesh.py --train_yaml Tax-H36m-coco40k-Muco-UP-Mpii/train.yaml --val_yaml human3.6m/valid.protocol2.yaml --arch hrnet-w64 --num_workers 4 --per_gpu_train_batch_size 25 --per_gpu_eval_batch_size 25 --num_hidden_layers 4 --num_attention_heads 4 --lr 1e-4 --num_train_epochs 200 --input_feat_dim 2051,512,128 --hidden_feat_dim 1024,256,64

    Looking forward to your reply @kevinlin311tw, thx

    opened by GithubAccountTo 0
  • About the Predicted Camera Parameters and Orthographic Projection

    About the Predicted Camera Parameters and Orthographic Projection

    Usually, the camera parameters for a camera model should consist of f, c, R, and t, and they have 1, 2, 3 (rotation angle), 3 parameters. Even if we use the camera coordinate only instead of the world coordinate, the formula still requires f and c. In this case, the projected 2d points p2 of 3d points p3 should be with formula $$p_2 = f\cdot(p_3[:2]/p_3[2] - c)$$ And I found that the code in this repo for orthographic projection is:

    def orthographic_projection(X, camera):
        """Perform orthographic projection of 3D points X using the camera parameters
        Args:
            X: size = [B, N, 3]
            camera: size = [B, 3]
        Returns:
            Projected 2D points -- size = [B, N, 2]
        """ 
        camera = camera.view(-1, 1, 3)
        X_trans = X[:, :, :2] + camera[:, :, 1:]
        shape = X_trans.shape
        X_2d = (camera[:, :, 0] * X_trans.view(shape[0], -1)).view(shape)
        return X_2d
    

    The 3d points wasn't devided by its value on z axis, which would result a orthogonal projection instead of orthomal projection

    opened by ChenZhang-2000 0
  • how to reproduce the model as paper

    how to reproduce the model as paper

    how many epoch do I have to train to reproduce the model as paper I think, I have to use the pretrained model which trained on h36m dataset. right? then how many epochs do we need for fine tuning on 3dpw??

    opened by jhkim0759 6
  • How to get smpl pose rotation?

    How to get smpl pose rotation?

    Seems like the model directly output mesh ,and get joints position by smp model. Can I get the joints rotation from the positon? Do you have some functions doing that?

    opened by linyu0219 1
  • Resnet-50 backbone performance

    Resnet-50 backbone performance

    Thanks for your awesome work, hrnet-w64 is too heavy to inference on realtime, have you tried resnet-50. I wander it's performance.

    Also, I noticed there is another paper(hand mesh) with git repo:https://github.com/SeanChenxy/HandMesh, which use resnet-18 as backbone.

    Best!

    opened by StayYouth1993 1
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

PyMAF This repository contains the code for the following paper: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop Hongwe

Hongwen Zhang 450 Dec 28, 2022
Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

MeshTransformer ✨ This is our research code of End-to-End Human Pose and Mesh Reconstruction with Transformers. MEsh TRansfOrmer is a simple yet effec

Microsoft 473 Dec 31, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 3, 2023
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

Katsuya Hyodo 16 Dec 22, 2022
AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

AI Face Mesh: This is a simple face mesh detection program based on Artificial Intelligence which made with Python. It's able to detect 468 different

Md. Rakibul Islam 1 Jan 13, 2022
This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation This repo is the official implementation of Exploiting Temporal Con

Vegetabird 241 Jan 7, 2023
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation This repo is the official implementation of "MHFormer: Multi-Hypothesis Transforme

Vegetabird 281 Jan 7, 2023
A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Mesh-Keys A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python Have been seeing alot

Joseph 53 Dec 13, 2022
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh

generate_cloud_points Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh. Run python disp_mesh.py Or you

Peng Yu 2 Dec 24, 2021
A transformer-based method for Healthcare Image Captioning in Vietnamese

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese This repo GitHub contains our solution for vieCap4H

Doanh B C 4 May 5, 2022
[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Stable Head Pose Estimation and Landmark Regression via 3D Dense Face Reconstruction Reimplementation of (ECCV 2020) Towards Fast, Accurate and Stable

Remilia Scarlet 221 Dec 30, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
MediaPipeのPythonパッケージのサンプルです。2020/12/11時点でPython実装のある4機能(Hands、Pose、Face Mesh、Holistic)について用意しています。

mediapipe-python-sample MediaPipeのPythonパッケージのサンプルです。 2020/12/11時点でPython実装のある以下4機能について用意しています。 Hands Pose Face Mesh Holistic Requirement mediapipe 0.

KazuhitoTakahashi 217 Dec 12, 2022