Official implementation of MSR-GCN (ICCV 2021 paper)

LevonDang

Last update: Nov 7, 2022

Related tags

Deep Learning pytorch multiscale gcn residuals iccv2021 humanmotionprediction

Overview

MSR-GCN

Official implementation of MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction (ICCV 2021 paper)

[Paper] [Supp] [Poster] [Slides]

Authors

Lingwei Dang, School of Computer Science and Engineering, South China University of Technology, China, danglevon@gmail.com
Yongwei Nie, School of Computer Science and Engineering, South China University of Technology, China, nieyongwei@scut.edu.cn
Chengjiang Long, JD Finance America Corporation, USA, cjfykx@gmail.com
Qing Zhang, School of Computer Science and Engineering, Sun Yat-sen University, China, zhangqing.whu.cs@gmail.com
Guiqing Li, School of Computer Science and Engineering, South China University of Technology, China, ligq@scut.edu.cn

Overview

Human motion prediction is a challenging task due to the stochasticity and aperiodicity of future poses. Recently, graph convolutional network (GCN) has been proven to be very effective to learn dynamic relations among pose joints, which is helpful for pose prediction. On the other hand, one can abstract a human pose recursively to obtain a set of poses at multiple scales. With the increase of the abstraction level, the motion of the pose becomes more stable, which benefits pose prediction too. In this paper, we propose a novel multi-scale residual Graph Convolution Network (MSR-GCN) for human pose prediction task in the manner of end-to-end. The GCNs are used to extract features from fine to coarse scale and then from coarse to fine scale. The extracted features at each scale are then combined and decoded to obtain the residuals between the input and target poses. Intermediate supervisions are imposed on all the predicted poses, which enforces the network to learn more representative features. Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches.

Dependencies

Pytorch 1.7.0+cu110
Python 3.8.5
Nvidia RTX 3090

Get the data

Human3.6m in exponential map can be downloaded from here.

CMU mocap was obtained from the repo of ConvSeq2Seq paper.

About datasets

Human3.6M

A pose in h3.6m has 32 joints, from which we choose 22, and build the multi-scale by 22 -> 12 -> 7 -> 4 dividing manner.
We use S5 / S11 as test / valid dataset, and the rest as train dataset, testing is done on the 15 actions separately, on each we use all data instead of the randomly selected 8 samples.
Some joints of the origin 32 have the same position
The input / output length is 10 / 25

CMU Mocap dataset

A pose in cmu has 38 joints, from which we choose 25, and build the multi-scale by 25 -> 12 -> 7 -> 4 dividing manner.
CMU does not have valid dataset, testing is done on the 8 actions separately, on each we use all data instead of the random selected 8 samples.
Some joints of the origin 38 have the same position
The input / output length is 10 / 25

Train

train on Human3.6M:

python main.py --exp_name=h36m --is_train=1 --output_n=25 --dct_n=35 --test_manner=all
train on CMU Mocap:

python main.py --exp_name=cmu --is_train=1 --output_n=25 --dct_n=35 --test_manner=all

Evaluate and visualize results

evaluate on Human3.6M:

python main.py --exp_name=h36m --is_load=1 --model_path=ckpt/pretrained/h36m_in10out25dctn35_best_err57.9256.pth --output_n=25 --dct_n=35 --test_manner=all
evaluate on CMU Mocap:

python main.py --exp_name=cmu --is_load=1 --model_path=ckpt/pretrained/cmu_in10out25dctn35_best_err37.2310.pth --output_n=25 --dct_n=35 --test_manner=all

Results

H3.6M-10/25/35-all	80	160	320	400	560	1000	-
walking	12.16	22.65	38.65	45.24	52.72	63.05	-
eating	8.39	17.05	33.03	40.44	52.54	77.11	-
smoking	8.02	16.27	31.32	38.15	49.45	71.64	-
discussion	11.98	26.76	57.08	69.74	88.59	117.59	-
directions	8.61	19.65	43.28	53.82	71.18	100.59	-
greeting	16.48	36.95	77.32	93.38	116.24	147.23	-
phoning	10.10	20.74	41.51	51.26	68.28	104.36	-
posing	12.79	29.38	66.95	85.01	116.26	174.33	-
purchases	14.75	32.39	66.13	79.63	101.63	139.15	-
sitting	10.53	21.99	46.26	57.80	78.19	120.02	-
sittingdown	16.10	31.63	62.45	76.84	102.83	155.45	-
takingphoto	9.89	21.01	44.56	56.30	77.94	121.87	-
waiting	10.68	23.06	48.25	59.23	76.33	106.25	-
walkingdog	20.65	42.88	80.35	93.31	111.87	148.21	-
walkingtogether	10.56	20.92	37.40	43.85	52.93	65.91	-
Average	12.11	25.56	51.64	62.93	81.13	114.18	57.93

CMU-10/25/35-all	80	160	320	400	560	1000	-
basketball	10.24	18.64	36.94	45.96	61.12	86.24	-
basketball_signal	3.04	5.62	12.49	16.60	25.43	49.99	-
directing_traffic	6.13	12.60	29.37	39.22	60.46	114.56	-
jumping	15.19	28.85	55.97	69.11	92.38	126.16	-
running	13.17	20.91	29.88	33.37	38.26	43.62	-
soccer	10.92	19.40	37.41	47.00	65.25	101.85	-
walking	6.38	10.25	16.88	20.05	25.48	36.78	-
washwindow	5.41	10.93	24.51	31.79	45.13	70.16	-
Average	8.81	15.90	30.43	37.89	51.69	78.67	37.23

Train

train on Human3.6M: python main.py --expname=h36m --is_train=1 --output_n=25 --dct_n=35 --test_manner=all
train on CMU Mocap: python main.py --expname=cmu --is_train=1 --output_n=25 --dct_n=35 --test_manner=all

Citation

If you use our code, please cite our work

@InProceedings{Dang_2021_ICCV,
    author    = {Dang, Lingwei and Nie, Yongwei and Long, Chengjiang and Zhang, Qing and Li, Guiqing},
    title     = {MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11467-11476}
}

Acknowledgments

Some of our evaluation code and data process code was adapted/ported from LearnTrajDep by Wei Mao.

Licence

MIT

Comments

H3.6M dataset may be unavailable

The H3.6M DATASET preprocessed by someone (earlier researchers) used in this paper is no longer available or recommended by researchers in this field, it is strongly recommend to conduct experiments based on the OFFICAL DATASET at http://vision.imar.ro/human3.6m/description.php, please refer to https://github.com/facebookresearch/VideoPose3D/blob/main/DATASETS.md for relevant pre-processing codes.

opened by Droliven 2
Question about training

Hi @Droliven ,

Thanks for your work.

According to the code, if this work reports the result trained for 5000 epochs? How much time does it need to train on the Human3.6 3d dataset?

opened by LaLaLailalai 1
MemoryError: Unable to allocate 4.55 GiB for an array with shape (35, 17432064)

@Droliven Dear author： I would like to know how big the device memory you used, mine is 16G, once I run python main.py --exp_name=h36m --is_train=1 --output_n=25 --dct_n=35 --test_manner=all, the computer freezes up.

opened by heduo-star 1
Performance of different actions.

Dear author, There are no evaluation performances of different actions or milliseconds prediction were output in the test. I want to kown how to print these performance results (Table 1/2). Many Thanks!

opened by Lucky-Maximize 1
Results on AMASS

Hi, thank you for sharing the implementation of your interesting work. I want to compare your method with others on AMASS dataset. Have you already tried this experiment? If yes, I would ask you to share the results kindly; otherwise, I would like to know if you have any suggestions to perform a fair comparison, possibly showing what I should change in the config file (Index2212/127/74, dim_repeat...).

Thank you.

opened by GDam90 1
Wrong parameter name in README.md

In Evaluate and visualize results section of README.md, python main.py --expname=h36m --is_load=1 --model_path=ckpt/pretrained/h36m_in10out25dctn35_best_epoch82_err57.9256.pth --output_n=25 --dct_n=35 --test_manner=all

The parameter name is --expaname, but in main.py, the parameter name is actually --exp_name.

opened by eshanvaid 1

Official implementation of MSR-GCN (ICCV 2021 paper)

Related tags

Overview

MSR-GCN

Authors

Overview

Dependencies

Get the data

About datasets

Train

Evaluate and visualize results

Results

Train

Citation

Acknowledgments

Licence

Comments

H3.6M dataset may be unavailable

Question about training

MemoryError: Unable to allocate 4.55 GiB for an array with shape (35, 17432064)

Performance of different actions.

Results on AMASS

Wrong parameter name in README.md

Owner

LevonDang

A PyTorch implementation of "Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks" (KDD 2019).

A new GCN model for Point Cloud Analyse

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.