Research code of ICCV 2021 paper "Mesh Graphormer"

Overview

MeshGraphormer

This is our research code of Mesh Graphormer.

Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image. In this work, we study how to combine graph convolutions and self-attentions in a transformer to better model both local and global interactions.

Installation

Check INSTALL.md for installation instructions.

Model Zoo and Download

Please download our pre-trained models and other relevant files that are important to run our code.

Check DOWNLOAD.md for details.

Quick demo

We provide demo codes to run end-to-end inference on the test images.

Check DEMO.md for details.

Experiments

We provide python codes for training and evaluation.

Check EXP.md for details.

License

Our research code is released under the MIT license. See LICENSE for details.

We use submodules from third parties, such as huggingface/transformers and hassony2/manopth. Please see NOTICE for details.

Our models have dependency with SMPL and MANO models. Please note that any use of SMPL models and MANO models are subject to Software Copyright License for non-commercial scientific research purposes. Please see SMPL-Model License and MANO License for details.

Contributing

We welcome contributions and suggestions. Please check CONTRIBUTE and CODE_OF_CONDUCT for details.

Citations

If you find our work useful in your research, please consider citing:

@inproceedings{lin2021-mesh-graphormer,
author = {Lin, Kevin and Wang, Lijuan and Liu, Zicheng},
title = {Mesh Graphormer},
booktitle = {ICCV},
year = {2021},
}

Acknowledgments

Our implementation and experiments are built on top of open-source GitHub repositories. We thank all the authors who made their code public, which tremendously accelerates our project progress. If you find these works helpful, please consider citing them as well.

huggingface/transformers

HRNet/HRNet-Image-Classification

nkolot/GraphCMR

akanazawa/hmr

MandyMo/pytorch_HMR

hassony2/manopth

hongsukchoi/Pose2Mesh_RELEASE

mks0601/I2L-MeshNet_RELEASE

open-mmlab/mmdetection

Comments
  • not good result ??

    not good result ??

    https://user-images.githubusercontent.com/7630101/137278281-d38daeb7-6d1d-4762-96cb-eb256ea19f74.mp4

    I change from your code and write my own test script to inference from video, but the result seems not good as previous videos shows.

    the core code is :

            # forward-pass
            pred_camera, pred_3d_joints, pred_vertices_sub2, pred_vertices_sub, pred_vertices = Graphormer_model(images, smpl, mesh_sampler)
            # obtain 3d joints, which are regressed from the full mesh
            pred_3d_joints_from_smpl = smpl.get_h36m_joints(pred_vertices)
            pred_3d_joints_from_smpl = pred_3d_joints_from_smpl[:,cfg.H36M_J17_TO_J14,:]
    
            # obtain 2d joints, which are projected from 3d joints of smpl mesh
            pred_2d_joints_from_smpl = orthographic_projection(pred_3d_joints_from_smpl, pred_camera)
    

    the skeleton is pred_2d_joints_from_smpl . do i misunderstand something?

    opened by zhaishengfu 4
  • RuntimeError: The size of tensor a (750) must match the size of tensor b (25) at non-singleton dimension 0

    RuntimeError: The size of tensor a (750) must match the size of tensor b (25) at non-singleton dimension 0

    Hi, Thank you for your great work. I faced some problem when running evaluation on 3DPW. I was trying to use 1 GPU instead of 4. However, there are some errors that I don't understand. Would you help me see where I need to adjust?

    (gphmr) jyz@jyz-Alienware-Aurora-R8:~/GitHub/MeshGraphormer$ python src/tools/run_gphmer_bodymesh.py --val_yaml 3dpw/test_has_gender.yaml --arch hrnet-w64 --num_workers 4 --per_gpu_eval_batch_size 25 --num_hidden_layers 4 --num_attention_heads 4 --input_feat_dim 2051,512,128 --hidden_feat_dim 1024,256,64 --run_eval_only --resume_checkpoint ./models/graphormer_release/graphormer_3dpw_state_dict.bin set os.environ[OMP_NUM_THREADS] to 4 2021-10-28 04:44:38,749 Graphormer INFO: Using 1 GPUs /home/jyz/anaconda3/envs/gphmr/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. self._set_intXint(row, col, x.flat[0]) /home/jyz/anaconda3/envs/gphmr/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. self._set_intXint(row, col, x.flat[0]) /home/jyz/anaconda3/envs/gphmr/lib/python3.7/site-packages/scipy/sparse/_index.py:84: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. self._set_intXint(row, col, x.flat[0]) 2021-10-28 04:44:53,992 Graphormer INFO: Update config parameter num_hidden_layers: 12 -> 4 2021-10-28 04:44:53,992 Graphormer INFO: Update config parameter hidden_size: 768 -> 1024 2021-10-28 04:44:53,992 Graphormer INFO: Update config parameter num_attention_heads: 12 -> 4 2021-10-28 04:44:53,992 Graphormer INFO: Update config parameter intermediate_size: 3072 -> 2048 2021-10-28 04:44:54,909 Graphormer INFO: Init model from scratch. 2021-10-28 04:44:54,909 Graphormer INFO: Update config parameter num_hidden_layers: 12 -> 4 2021-10-28 04:44:54,909 Graphormer INFO: Update config parameter hidden_size: 768 -> 256 2021-10-28 04:44:54,909 Graphormer INFO: Update config parameter num_attention_heads: 12 -> 4 2021-10-28 04:44:54,909 Graphormer INFO: Update config parameter intermediate_size: 3072 -> 512 2021-10-28 04:44:55,045 Graphormer INFO: Init model from scratch. 2021-10-28 04:44:55,046 Graphormer INFO: Add Graph Conv 2021-10-28 04:44:55,046 Graphormer INFO: Update config parameter num_hidden_layers: 12 -> 4 2021-10-28 04:44:55,046 Graphormer INFO: Update config parameter hidden_size: 768 -> 64 2021-10-28 04:44:55,046 Graphormer INFO: Update config parameter num_attention_heads: 12 -> 4 2021-10-28 04:44:55,046 Graphormer INFO: Update config parameter intermediate_size: 3072 -> 128 2021-10-28 04:44:55,112 Graphormer INFO: Init model from scratch. => loading pretrained model models/hrnet/hrnetv2_w64_imagenet_pretrained.pth 2021-10-28 04:45:00,432 Graphormer INFO: => loading hrnet-v2-w64 model 2021-10-28 04:45:00,433 Graphormer INFO: Graphormer encoders total parameters: 83318598 2021-10-28 04:45:00,435 Graphormer INFO: Backbone total parameters: 128059944 2021-10-28 04:45:00,488 Graphormer INFO: Loading state dict from checkpoint ./models/graphormer_release/graphormer_3dpw_state_dict.bin 2021-10-28 04:45:07,582 Graphormer INFO: Training parameters Namespace(arch='hrnet-w64', config_name='', data_dir='datasets', device=device(type='cuda'), distributed=False, drop_out=0.1, hidden_feat_dim='1024,256,64', hidden_size=64, img_scale_factor=1, input_feat_dim='2051,512,128', interm_size_scale=2, intermediate_size=128, joints_loss_weight=1000.0, local_rank=0, logging_steps=1000, lr=0.0001, mesh_type='body', model_name_or_path='src/modeling/bert/bert-base-uncased/', num_attention_heads=4, num_gpus=1, num_hidden_layers=4, num_train_epochs=200, num_workers=4, output_dir='output/', per_gpu_eval_batch_size=25, per_gpu_train_batch_size=30, resume_checkpoint='./models/graphormer_release/graphormer_3dpw_state_dict.bin', run_eval_only=True, seed=88, train_yaml='imagenet2012/train.yaml', val_yaml='3dpw/test_has_gender.yaml', vertices_loss_weight=100.0, vloss_w_full=0.33, vloss_w_sub=0.33, vloss_w_sub2=0.33, which_gcn='0,0,1') 3dpw/test_has_gender.yaml Traceback (most recent call last): File "src/tools/run_gphmer_bodymesh.py", line 747, in main(args) File "src/tools/run_gphmer_bodymesh.py", line 734, in main run_eval_general(args, val_dataloader, _model, smpl, mesh_sampler) File "src/tools/run_gphmer_bodymesh.py", line 374, in run_eval_general mesh_sampler) File "src/tools/run_gphmer_bodymesh.py", line 412, in run_validate gt_vertices = smpl(gt_pose, gt_betas) File "/home/jyz/anaconda3/envs/gphmr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/jyz/GitHub/MeshGraphormer/MeshGraphormer/src/modeling/_smpl.py", line 94, in forward v_posed = v_shaped + torch.matmul(posedirs, lrotmin[:, :, None]).view(-1, 6890, 3) RuntimeError: The size of tensor a (750) must match the size of tensor b (25) at non-singleton dimension 0

    opened by CocoaPebble 3
  • body mesh reconstruction test

    body mesh reconstruction test

    thanks for sharing this wonderful project! I see that you tell us how to testhand , but not test body , so how can we test body mesh resconstruction with our own image or videos?

    opened by zhaishengfu 3
  • How to get the 'azureml' file?

    How to get the 'azureml' file?

    azureml Greeting! When I want to evaluate the code, I meet an error "No module named 'azureml'. I check the root and couldn't find the file. I would like to know how to get this file for the correct run. Thanks for your great work!

    opened by PomIsBest 2
  • how to inference and visualize the output mesh

    how to inference and visualize the output mesh

    thanks for your project! how can we inference the output mesh with our own video or picture? could you please provide the code? Also,how can we visualize the output mesh?

    opened by liushanyuan18 2
  • about the meshgraphormer architecture problem

    about the meshgraphormer architecture problem

    Hello, I want to know two things,

    1. I want to prove whether the input in the meshgraphormer framework and the information in the attention mechanism are as I drew the picture IMG_20210908_200139

    2. I want to know that grid features are obtained by extracting features through CNN. Its dimension is 49×1024. It is not marked in the paper that it uses upsampling. How did it transform from 49×1024 to 49×2051? image

    opened by zhangzhenyu-zzy 2
  • Clarification of loss

    Clarification of loss

    Dear authors, thanks for the great work! In your paper, you mentioned that L1 loss was applied to vertices, 3d joints and 2d joints in section 4.3: Training details.

    To be specific, we apply L1 loss to 3D mesh vertices and body joints. We also apply L1 loss to 2D projected body joints to improve the align- ment between the image and the reconstructed mesh.

    However, I noticed that MSE was applied to 2D and 3D joints in your code: https://github.com/microsoft/MeshGraphormer/blob/1c489e35e6bd3848ce0702891e4c8365b584bb8e/src/tools/run_gphmer_bodymesh.py#L168-L171

    I was wondering which loss did you use? And what is the loss used in the provided pretrained model?

    Thank you!

    opened by pangyyyyy 1
  • model predictions download links is invalid

    model predictions download links is invalid

    Thank you for your excellent work! model predictions download links is invalid now. Can you share it again? model predictions https://datarelease.blob.core.windows.net/metro/graphormer_release_ckpt200-multisc-pred.zip Looking forward to your reply.

    opened by Rainfalllen 1
  • Some download links is invalid now.

    Some download links is invalid now.

    Thank you for your excellent work! Some download links is invalid now.Can you share them again?

    1. Datasets https://datarelease.blob.core.windows.net/metro/datasets/filename.tar
    2. model predictions https://datarelease.blob.core.windows.net/metro/graphormer_release_ckpt200-multisc-pred.zip Looking forward to your reply.
    opened by julyiii 1
  • Training batch size

    Training batch size

    Greetings,

    In the paper, you mention that the training time batch size is 32, however, in the training code you provide for the mixed datasets, the --per_gpu_train_batch_size argument is set to 25. Which batch size should be used to reproduce the results?

    Thanks in advance.

    opened by Bozcomlekci 1
  • evaluate H36m on protocol2

    evaluate H36m on protocol2

    Hi! I have a question about evaluation. human36m protocol2 execute on subject9,11 with 64 frame steps. but i think your code is on 5 frame steps. ex) _000001.jpg, _000006.jpg, _000011.jpg, ...

    Could you share 64 frame steps evaluation code?

    opened by asw91666 1
  • question about H36M model

    question about H36M model

    When I trained with mixed datasets, the best result was generated by 76th epoch. The best mPJPE is 56.21, and PAmPJPE is 35.62, which are different from (https://datarelease.blob.core.windows.net/metro/models/2021-02-25-graphormer_h36m_log.txt), MPJPE 51.2, PAmPJPE 34.5.

    The command is the same as in EXP.md

    python -m torch.distributed.launch --nproc_per_node=8 src/tools/run_gphmer_bodymesh.py --train_yaml Tax-H36m-coco40k-Muco-UP-Mpii/train.yaml --val_yaml human3.6m/valid.protocol2.yaml --arch hrnet-w64 --num_workers 4 --per_gpu_train_batch_size 25 --per_gpu_eval_batch_size 25 --num_hidden_layers 4 --num_attention_heads 4 --lr 1e-4 --num_train_epochs 200 --input_feat_dim 2051,512,128 --hidden_feat_dim 1024,256,64

    Looking forward to your reply @kevinlin311tw, thx

    opened by GithubAccountTo 0
  • About the Predicted Camera Parameters and Orthographic Projection

    About the Predicted Camera Parameters and Orthographic Projection

    Usually, the camera parameters for a camera model should consist of f, c, R, and t, and they have 1, 2, 3 (rotation angle), 3 parameters. Even if we use the camera coordinate only instead of the world coordinate, the formula still requires f and c. In this case, the projected 2d points p2 of 3d points p3 should be with formula $$p_2 = f\cdot(p_3[:2]/p_3[2] - c)$$ And I found that the code in this repo for orthographic projection is:

    def orthographic_projection(X, camera):
        """Perform orthographic projection of 3D points X using the camera parameters
        Args:
            X: size = [B, N, 3]
            camera: size = [B, 3]
        Returns:
            Projected 2D points -- size = [B, N, 2]
        """ 
        camera = camera.view(-1, 1, 3)
        X_trans = X[:, :, :2] + camera[:, :, 1:]
        shape = X_trans.shape
        X_2d = (camera[:, :, 0] * X_trans.view(shape[0], -1)).view(shape)
        return X_2d
    

    The 3d points wasn't devided by its value on z axis, which would result a orthogonal projection instead of orthomal projection

    opened by ChenZhang-2000 0
  • how to reproduce the model as paper

    how to reproduce the model as paper

    how many epoch do I have to train to reproduce the model as paper I think, I have to use the pretrained model which trained on h36m dataset. right? then how many epochs do we need for fine tuning on 3dpw??

    opened by jhkim0759 6
  • How to get smpl pose rotation?

    How to get smpl pose rotation?

    Seems like the model directly output mesh ,and get joints position by smp model. Can I get the joints rotation from the positon? Do you have some functions doing that?

    opened by linyu0219 1
  • Resnet-50 backbone performance

    Resnet-50 backbone performance

    Thanks for your awesome work, hrnet-w64 is too heavy to inference on realtime, have you tried resnet-50. I wander it's performance.

    Also, I noticed there is another paper(hand mesh) with git repo:https://github.com/SeanChenxy/HandMesh, which use resnet-18 as backbone.

    Best!

    opened by StayYouth1993 1
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Jan 4, 2023
Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

MeshTransformer ✨ This is our research code of End-to-End Human Pose and Mesh Reconstruction with Transformers. MEsh TRansfOrmer is a simple yet effec

Microsoft 473 Dec 31, 2022
Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks"

HKD Code for ICCV 2021 paper "Distilling Holistic Knowledge with Graph Neural Networks" cifia-100 result The implementation of compared methods are ba

Wang Yucheng 30 Dec 18, 2022
code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

G-SFDA Code (based on pytorch 1.3) for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'. [project] [paper]. Dataset preparing Download

Shiqi Yang 84 Dec 26, 2022
Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

Bo Sun 132 Nov 28, 2022
Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Pixel Difference Convolution This repository contains the PyTorch implementation for "Pixel Difference Networks for Efficient Edge Detection" by Zhuo

Alex 236 Dec 21, 2022
Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization 0. Environment Environment: python 3.6 and cuda 10

Haitao Yang 62 Dec 30, 2022
Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds This is the official code implementation for the paper "Spatio-temporal Se

Hesper 63 Jan 5, 2023
Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

Facebook Research 123 Dec 13, 2022
Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

null 235 Dec 26, 2022
Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

null 10 Mar 16, 2022
Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Davis Rempe 367 Dec 24, 2022
Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA>=10.0,

null 29 Aug 23, 2022
Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Detecting Invisible People [ICCV 2021 Paper] [Website] Tarasha Khurana, Achal Dave, Deva Ramanan Introduction This repository contains code for Detect

Tarasha Khurana 28 Sep 16, 2022
A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE-Pytorch This repository is a pytorch version that implements Ali's ACL 2021 research paper Learning Span-Level Interactions for Aspect Senti

来自丹麦的天籁 10 Dec 6, 2022
Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

yuexy 123 Jan 1, 2023
PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

null 148 Dec 30, 2022
Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction [Paper] [PaddlePaddle Implementation] Homepage of paper: Paint Transformer: Fee

null 442 Dec 16, 2022
Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

Chris Rockwell 95 Nov 22, 2022