Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

Mathis Petrovich

Last update: Dec 23, 2022

Related tags

Deep Learning ACTOR

Overview

ACTOR

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021.

Please visit our webpage for more details.

Bibtex

If you find this code useful in your research, please cite:

@INPROCEEDINGS{petrovich21actor,
  title     = {Action-Conditioned 3{D} Human Motion Synthesis with Transformer {VAE}},
  author    = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year      = {2021}
}

Installation 👷

1. Create conda environment

conda env create -f environment.yml
conda activate actor

Or install the following packages in your pytorch environnement:

pip install tensorboard
pip install matplotlib
pip install ipdb
pip install sklearn
pip install pandas
pip install tqdm
pip install imageio
pip install pyyaml
pip install smplx
pip install chumpy

The code was tested on Python 3.8 and PyTorch 1.7.1.

2. Download the datasets

For all the datasets, be sure to read and follow their license agreements, and cite them accordingly.

For more information about the datasets we use in this research, please check this page, where we provide information on how we obtain/process the datasets and their citations. Please cite the original references for each of the datasets as indicated.

Please install gdown to download directly from Google Drive and then:

bash prepare/download_datasets.sh

Update: Unfortunately, the NTU13 dataset (derived from NTU) is no longer available.

3. Download some SMPL files

bash prepare/download_smpl_files.sh

This will download the SMPL neutral model from this github repo and additionnal files.

If you want to integrate the male and the female versions, you must:

Download the models from the SMPL website
Move them to models/smpl
Change the SMPL_MODEL_PATH variable in src/config.py accordingly.

4. Download the action recogition models

bash prepare/download_recognition_models.sh

Action recognition models are used to extract motion features for evaluation.

For NTU13 and HumanAct12, we use the action recognition models directly from Action2Motion project.

For the UESTC dataset, we train an action recognition model using STGCN, with this command line:

python -m src.train.train_stgcn --dataset uestc --extraction_method vibe --pose_rep rot6d --num_epochs 100 --snapshot 50 --batch_size 64 --lr 0.0001 --num_frames 60 --view all --sampling conseq --sampling_step 1 --glob --no-translation --folder recognition_training

How to use ACTOR 🚀

NTU13

Training

python -m src.train.train_cvae --modelname cvae_transformer_rc_rcxyz_kl --pose_rep rot6d --lambda_kl 1e-5 --jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 --lr 0.0001 --glob --translation --no-vertstrans --dataset DATASET --num_epochs 2000 --snapshot 100 --folder exp/ntu13

HumanAct12

Training

python -m src.train.train_cvae --modelname cvae_transformer_rc_rcxyz_kl --pose_rep rot6d --lambda_kl 1e-5 --jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 --lr 0.0001 --glob --translation --no-vertstrans --dataset humanact12 --num_epochs 5000 --snapshot 100 --folder exps/humanact12

UESTC

Training

python -m src.train.train_cvae --modelname cvae_transformer_rc_rcxyz_kl --pose_rep rot6d --lambda_kl 1e-5 --jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 --lr 0.0001 --glob --translation --no-vertstrans --dataset uestc --num_epochs 1000 --snapshot 100 --folder exps/uestc

Evaluation

python -m src.evaluate.evaluate_cvae PATH/TO/checkpoint_XXXX.pth.tar --batch_size 64 --niter 20

This script will evaluate the trained model, on the epoch XXXX, with 20 different seeds, and put all the results in PATH/TO/evaluation_metrics_XXXX_all.yaml.

If you want to get a table with mean and interval, you can use this script:

python -m src.evaluate.tables.easy_table PATH/TO/evaluation_metrics_XXXX_all.yaml

Pretrained models

You can download pretrained models with this script:

bash prepare/download_pretrained_models.sh

Visualization

Grid of stick figures

 python -m src.visualize.visualize_checkpoint PATH/TO/CHECKPOINT.tar --num_actions_to_sample 5  --num_samples_per_action 5

Each line corresponds to an action. The first column on the right represents a movement of the dataset, and the second column represents the reconstruction of the movement (via encoding/decoding). All other columns on the left are generations with random noise.

Example

Generating and rendering SMPL meshes

Additional dependencies

pip install trimesh
pip install pyrender
pip install imageio-ffmpeg

Generate motions

python -m src.generate.generate_sequences PATH/TO/CHECKPOINT.tar --num_samples_per_action 10 --cpu

It will generate 10 samples per action, and store them in PATH/TO/generation.npy.

Render motions

python -m src.render.rendermotion PATH/TO/generation.npy

It will render the sequences into this folder PATH/TO/generation/.

Examples

Pickup	Raising arms	High knee running	Bending torso	Knee raising

Overview of the available models

List of models

modeltype	architecture	losses
cvae	fc	rc
	gru	rcxyz
	transformer	kl

Construct a model

Follow this: {modeltype}_{architecture} + "_".join(*losses)

For example for the cvae model with Transformer encoder/decoder and with rc, rcxyz and kl loss, you can use: --modelname cvae_transformer_rc_rcxyz_kl.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

Comments

Training result of UETSC dataset differs from the pertained model

Hi, first of all, Thank you for releasing the nicely arranged code. Released code really helped understanding of the model, and I appreciate kindly written instructions for the code.

However, I encountered undesirable result on UETSC dataset, which comes from the model I trained from the scratch. I followed instruction on the dataset, also did not modify training script. python -m src.train.train_cvae --modelname cvae_transformer_rc_rcxyz_kl --pose_rep rot6d --lambda_kl 1e-5 --jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 --lr 0.0001 --glob --translation --no-vertstrans --dataset uestc --num_epochs 1000 --snapshot 100 --folder exps/uestc

I double-checked whether saved opt.yml file from the experiment is identical to the pretrained model, but I found that no configurations were different. Also, I conducted the same experiment twice but results were identical.

This is the train log of the released pretrained model,

and these are the train logs from the trainings I conducted.

As my training result on HumanAct dataset is same with the pre-trained model, I guess no training environmental issue involves here.

Is there any possible reason for this failure of the training?

opened by jinseokbae 4
Train/val/test split for HumanAct12

Hi, Thanks for releasing your code. I am wondering what is the train/val/test split that you are using for HumanAct12? It seems that you are always using the entire dataset for training and evaluating if I am not mistaken.. Tanks for your feedback and your help.

opened by fabienbaradel 4
GLError during render motions

Hello,Mathux! Thanks for your implementation. When I tried to render motions with generation.npy， the error occurs like the following. I have tried the introduction in this link https://github.com/mmatl/pyrender/issues/86, but it does not work. Thank you in advance!

opened by wang-zm18 4
How can i input the output data of decoder into input data of decoder?

I want to expand output length like following method, i.e. run decoder twice. Because i think if i only expand length from 60 to 120 lengths, meaningless output will be generated after 60th frame.

If i generate 120 length(60 length *2) input -> decoder -> output(60 length) -> input -> decoder -> output(60 length) => total 120 length.

opened by SinDongHwan 3
storing predicted rotations in bvh format

Hi,

Is there a way to store model generated rotations as bvh file ? I want to load the generated motion via following package in order to proceed with my research. Thanks.

opened by Neptune-Trojans 3
AttributeError on command visualize grid

Hello,

I am an University student at UHA (Université de Haute Alsace) and as my university project, I have to try to make your code ACTOR to work on Google Colab. But when I got to the part when i tried to visualize the grid I got this error:

I got the same error with the trained models created with the command train or the pretrained models.

Could you please tell me if you have a solution to this problem?

Cordialy, Scherrer Xavier

opened by AllKings 2
Question for embedding dimension

Thanks for the wonderful work @Mathux! I really enjoyed reading paper and also testing it on my environment, while looking at your code and paper, I found out that you've set dimensionality of embedding for VAE to 256 and I'm wondering if there's any reason for this.

Thanks, Joseph

opened by JosephKKim 2
shake in the later sequence

Hi, @Mathux , Thanks for your nice work! I encountered some issue when i generated some motion sequences. Like the shake problem occurred about 60 frames later, do you have any idea about it? thanks!

https://user-images.githubusercontent.com/9010202/144010246-35d2c693-5379-4452-9481-5c926c2ee8d0.mp4

https://user-images.githubusercontent.com/9010202/144010491-b71a6dd7-30ac-4383-905d-9c943c357a03.mp4

opened by bigpo 2
Adding clothes or skin

Hello Dear Author. Thank you very much for your good work. I wanted to ask if there is a way to add clothes, skin, background to these 3d actions.

Thank you very much,

Sardor

opened by sard0r 1
How can I concatenate 2d and 3d tensors?

xseq = torch.cat((self.muQuery[y][None], self.sigmaQuery[y][None], x), axis=0)

This code tries to concatenate 2d and 3d tensors. Can you explain how it works?

Thanks in advance.

opened by Eddie-Hwang 1
missing module "optutils"

Hi, thank you for the code, I have already trained some models and would like to visualize the results. However, the scripts visualize_latent_space.py and visualize_sequence.py are trying to import the module optutils, which seems to be missing from the repository. Could you please add this module?

opened by gabinsane 1

Owner

Mathis Petrovich

PhD student mainly interested in Human Body Shape Analysis, Computer Vision and Optimal Transport.

GitHub

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

17 Oct 12, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

123 Jan 1, 2023

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

281 Dec 30, 2022

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

62 Nov 21, 2022

Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

The Power of Points for Modeling Humans in Clothing (ICCV 2021) This repository contains the official PyTorch implementation of the ICCV 2021 paper: T

158 Nov 24, 2022

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

35 Oct 16, 2022

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

35 Oct 26, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

Deep Relational Metric Learning This repository is the official PyTorch implementation of Deep Relational Metric Learning. Framework Datasets CUB-200-

39 Dec 10, 2022

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

41 Nov 29, 2022

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Gra

32 Dec 26, 2022

PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

148 Dec 30, 2022

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

49 Nov 10, 2022

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

4 May 8, 2022

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

95 Nov 22, 2022

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

235 Dec 26, 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022