MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
Getting Started
Our codes are implemented and tested with python 3.6 and pytorch 1.5.
Install Pytorch following the official guide on Pytorch website.
And install the requirements using virtualenv
or conda
:
pip install -r requirements.txt
Data Preparation
Refer to data.md for instructions.
Training
Stage 1 training
Generally, you can use the distributed launch script of pytorch to start training.
For example, for a training on 2 nodes, 4 gpus each (2x4=8 gpus total): On node 0, run:
python -u -m torch.distributed.launch \
--nnodes=2 \
--node_rank=0 \
--nproc_per_node=4 \
--master_port=<MASTER_PORT> \
--master_addr=<MASTER_NODE_ID> \
--use_env \
train.py --cfg configs/config_stage1.yaml
On node 1, run:
python -u -m torch.distributed.launch \
--nnodes=2 \
--node_rank=1 \
--nproc_per_node=4 \
--master_port=<MASTER_PORT> \
--master_addr=<MASTER_NODE_ID> \
--use_env \
train.py --cfg configs/config_stage1.yaml
Otherwise, if you are using task scheduling system such as Slurm to submit your training tasks, you can refer to this script to start your training:
# training on 2 nodes, 4 gpus each (2x4=8 gpus total)
sh scripts/run.sh 2 4 configs/config_stage1.yaml
The checkpoint of training will be saved in [results/
] by default. You are free to modify it in the config file.
Stage 2 training
Use the last checkpoint of stage 1 to initialize the model and starts training stage 2.
# On Node 0.
python -u -m torch.distributed.launch \
--nnodes=2 \
--node_rank=0 \
--nproc_per_node=4 \
--master_port=<MASTER_PORT> \
--master_addr=<MASTER_NODE_ID> \
--use_env \
train.py --cfg configs/config_stage2.yaml --pretrained <PATH_TO_CHECKPOINT_FILE>
Similar on node 1.
Evaluation
To evaluate model on 3dpw test set:
python eval.py --cfg <PATH_TO_EXPERIMENT>/config.yaml --checkpoint <PATH_TO_EXPERIMENT>/model_best.pth.tar --eval_set 3dpw
Evaluation metric is Procrustes Aligned Mean Per Joint Position Error (PA-MPJPE) in mm.
Models | PA-MPJPE ↓ | MPJPE ↓ | PVE ↓ | ACCEL ↓ |
---|---|---|---|---|
HMR (w/o 3DPW) | 81.3 | 130.0 | - | 37.4 |
SPIN (w/o 3DPW) | 59.2 | 96.9 | 116.4 | 29.8 |
MEVA (w/ 3DPW) | 54.7 | 86.9 | - | 11.6 |
VIBE (w/o 3DPW) | 56.5 | 93.5 | 113.4 | 27.1 |
VIBE (w/ 3DPW) | 51.9 | 82.9 | 99.1 | 23.4 |
ours (w/o 3DPW) | 50.7 | 88.8 | 104.5 | 18.0 |
ours (w/ 3DPW) | 45.7 | 79.1 | 92.6 | 17.6 |
Citation
@inproceedings{wan2021,
title={Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation},
author={Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
year = {2021}
}