Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Jiashun Wang

Last update: Dec 13, 2022

Related tags

Deep Learning Long-term-Motion-in-3D-Scenes

Overview

Long-term-Motion-in-3D-Scenes

This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D".

Please check our paper and the project webpage for more details.

Citation

If you use our code or paper, please consider citing:

@article{wang2020synthesizing,
  title={Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes},
  author={Wang, Jiashun and Xu, Huazhe and Xu, Jingwei and Liu, Sifei and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2012.05522},
  year={2020}
}

Dependencies

Requirements:

python3.6
pytorch==1.1.0
trimesh
open3d
Chamfer Pytorch
Human Body Prior
SMPL-X

Datasets

We use PROX and PROXE datasets as our training data. After downloading them, please put them in './data/'. We provide generate_routepose_data.ipynb and generate_sub_data.ipynb for data generation. Note in PROX, the human meshes and the scene meshes are not in the same area in the world coordinates. Different from PROX and PROXE, we apply the inverse of the camera extrinsics to the scene mesh. Since the scene is the input and we need it to be aligned with the human bodies. This is done in the data generation code. Thus for contact calculating, you do not need to apply transformation to them. While for collision calculating, you still need to apply the transformation to the human bodies similar to PROXE to make it be aligned with SDF. Please be careful with this during training or testing, especially if you want to test on other scenes such as Matterport3D. Please put body_segments data in './data/' as well.

Demo

We provide demo.ipynb to help you play with our method. Before running, please put a downsampled MPH16.ply mesh and the SDF data of this scene in './demo_data/'. You can download them from PROX and PROXE. Still, please be careful with the camera extrinsics when you want to test other scenes, make sure the human body is in the scene. This code will also show you how to optimize the whole motion.

Models

We use SMPL-X to represent human bodies. Please download the SMPL-X models and put them in './models/' and it may look like './models/smplx/SMPLX_NEUTRAL.npz'. Please download vposer model and put it in './' ('./vposer_v1_0/').

We also provide our pretrained model here

Training

After you generate the data. You can train the networks directly,

python train_subgoal.py

python train_route.py

Please train the posenet after you finished training routenet with your own pretrained routenet model,

python train_pose.py

Acknowledgement

This work was supported, in part, by grants from DARPA LwLL, NSF 1730158 CI-New: Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI), NSF ACI-1541349 CC*DNI Pacific Research Platform, and gifts from Qualcomm and TuSimple. Part of our code is based on PROXE and it may help you with the dependencies and dataset parts as well. Many thanks!

License

Apache-2.0 License

Comments

Pre-trained model

Hi, thanks for your nice work!

How can I get a pre-trained model(0811_in_256_4.model and point.model) of pointnet?

Could you share the pre-trained model of pointnet?

Also, could you share the pre-trained model of cvae, RouteNet and PoseNet?

Thanks!

opened by seonghyunkim1212 2
Second part of the skating loss

Hi Jiashun, thanks for your nice work! Could you please illustrate more about the intuition behind this for loop? https://github.com/jiashunwang/Long-term-Motion-in-3D-Scenes/blob/5673d342093ff92e677da2d90e0dd8b7dd766c5b/utils_optimize.py#L312

opened by JiahaoPlus 2
Commit of human_body_prior used for this repo?

Thanks for your interesting work. When I attempt to use the code in generate_sub_data.ipynb, it seems human_body_prior has changed its function, can you tell me which commit of human_body_prior do you use for the generate_sub_data.ipynb?

opened by azuki-miho 1
About the middle position (sub goal).

I have noticed that you set the middle position manually in demo.ipynb. So, I am wondering that if I want to synthesize a long-term sequence with the start position and end position as input, I need to manually set the middle position by myself? According to my understanding, the sub-goal network proposed in your paper only synthesizes the body with a given position, it doesn't generate a middle position in a long-term sequence synthesis, am I right? I am looking forward to your reply.

opened by Silverster98 1
Code for video visualization

Thanks for your instruction for the demo. But the demo can only generate an SMPLX model for each frame, can you publish the code which generates the video visualization of your method on your project page?

opened by azuki-miho 1

Evaluation

I have a question about evaluation. I implemented the evaluation code for the evaluation split of the PROX dataset. In Table 1 of the paper(Ours w/o opt), translation, orientation, and pose errors are reported as 6.91, 9,71, and 41.17, respectively. The results of the code I implemented are 59.02, 60.23 and 1459.95 respectively. What's wrong with my implementation?

import os
import random
import torch
import numpy as np

from route_data import ROUTEDATAEVAL
from route_data import ROUTEDATA
from route import ROUTENET
from pose_after_route import POSEAFTERROUTE
from utils import GeometryTransformer
from utils import AverageMeter
from progress.bar import Bar

def l1_error(prediction, target):
    return torch.abs(prediction - target).sum(dim=1).mean().item()


SEED_VALUE = 0
print(f'Seed value for the experiment {SEED_VALUE}')
os.environ['PYTHONHASHSEED'] = str(SEED_VALUE)
random.seed(SEED_VALUE)
torch.manual_seed(SEED_VALUE)
np.random.seed(SEED_VALUE)


batch_size = 16

dataset = ROUTEDATAEVAL()
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)


model_route = ROUTENET(input_dim=9,hid_dim=64)
model_route = model_route.cuda()

print('use pretrained routenet')
model_route.load_state_dict(torch.load('saved_model/route.model'))
model_route.eval()

model_pose = POSEAFTERROUTE(input_dim=65-9,hid_dim=256)
model_pose = model_pose.cuda()

print('use pretrained posenet')
model_pose.load_state_dict(torch.load('saved_model/pose_after_route.model'))
model_pose.eval()

error_translation_ = AverageMeter()
error_orientation_ = AverageMeter()
error_pose_ = AverageMeter()

num_iter = len(dataloader)
bar = Bar('==>', max=num_iter)

for j, data in enumerate(dataloader, 0):

    input_list, middle_list, frame_name, scene_name, sdf, scene_points, cam_extrinsic, s_grid_min, s_grid_max = data

    input_list = input_list[:, [0, -1], :]
    body = middle_list[:, 0:1, 6:16].cuda()
    input_list = torch.cat([input_list[:, :, :6], input_list[:, :, 16:]], dim=2)
    middle_list = torch.cat([middle_list[:, :, :6], middle_list[:, :, 16:]], dim=2)

    scene_points = scene_points.cuda()

    input_list = input_list.view(-1, 62)
    six_d_input_list = GeometryTransformer.convert_to_6D_rot(input_list)
    six_d_input_list = six_d_input_list.view(-1, 2, 65)
    x = six_d_input_list.cuda()
    x1 = six_d_input_list[:, :, :9].cuda()

    middle_list = middle_list.view(-1, 62)
    six_d_middle_list = GeometryTransformer.convert_to_6D_rot(middle_list)
    six_d_middle_list = six_d_middle_list.view(-1, 60, 65)  # 60: 2s 30fps

    y = six_d_middle_list[:, :, :9].cuda()

    out_route = model_route(x1, scene_points.transpose(1, 2))

    pred_trans = out_route[: , :, :3]
    pred_6d = out_route[:, :, 3:]

    gt_trans = y[:, :, :3]
    gt_6d = y[:, :, 3:]

    route_prediction = out_route.detach().view(x1.shape[0],-1)

    out_pose = model_pose(x[:,:,9:],scene_points.transpose(1,2),route_prediction)

    y = six_d_middle_list[:, :, 9:].cuda()

    pred_body_pose = out_pose[:, :, :32]
    gt_body_pose = y[:, :, :32]

    error_translation = l1_error(pred_trans.reshape(-1,3).detach(), gt_trans.reshape(-1,3)) * 100.
    error_orientation = l1_error(pred_6d.reshape(-1,6).detach(), gt_6d.reshape(-1,6)) * 100.
    error_pose = l1_error(pred_body_pose.reshape(-1,32).detach(), gt_body_pose.reshape(-1,32).detach()) * 100.


    error_translation_.update(error_translation, pred_trans.shape[0] * pred_trans.shape[1])
    error_orientation_.update(error_orientation, pred_trans.shape[0] * pred_trans.shape[1])
    error_pose_.update(error_pose, pred_trans.shape[0] * pred_trans.shape[1])



    summary_string = ' EVAL :[{0}/{1}]'.format(j, num_iter)
    summary_string += ' | ET {error_translation.avg:.4f} | EO {error_orientation.avg:.4f} | EP {error_pose.avg:.4f}'.format(error_translation=error_translation_, error_orientation=error_orientation_, error_pose=error_pose_)

    Bar.suffix = summary_string
    bar.next()

bar.finish()

Thanks, Seonghyun

opened by seonghyunkim1212 13

Owner

Jiashun Wang

BS@Fudan University MS@UC San Diego jiashun_wang at outlook.com

GitHub

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance This is the codebase for video-based human motion reconstruction in human-mot

5 Jul 14, 2022

[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

UAV-Human Official repository for CVPR2021: UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicle Paper arXiv Res

129 Jan 4, 2023

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

35 Oct 26, 2022

LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

62 Dec 17, 2022

LSTMs (Long Short Term Memory) RNN for prediction of price trends

Price Prediction with Recurrent Neural Networks LSTMs BTC-USD price prediction with deep learning algorithm. Artificial Neural Networks specifically L

5 Nov 12, 2021

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Autoformer (NeurIPS 2021) Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting Time series forecasting is a c

847 Jan 8, 2023

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

77 Dec 16, 2022

Multi-resolution SeqMatch based long-term Place Recognition

MRS-SLAM for long-term place recognition In this work, we imply an multi-resolution sambling based visual place recognition method. This work is based

6 Dec 6, 2022

Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

81 Nov 8, 2022

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

114 Nov 28, 2022

Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

116 Dec 20, 2022

Improving Calibration for Long-Tailed Recognition (CVPR2021)

19 Apr 28, 2021

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

27 Mar 30, 2022

Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

116 Dec 20, 2022

Synthesizing and manipulating 2048x1024 images with conditional GANs

pix2pixHD Project | Youtube | Paper Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translatio

6k Dec 27, 2022

This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

26 Dec 24, 2022

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

226 Jan 8, 2023

[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

17 Dec 22, 2022

Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Related tags

Overview

Long-term-Motion-in-3D-Scenes

Citation

Dependencies

Datasets

Demo

Models

Training

Acknowledgement

License

Comments

Pre-trained model

Second part of the skating loss

Commit of human_body_prior used for this repo?

About the middle position (sub goal).

Code for video visualization

Evaluation

Owner

Jiashun Wang

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

LSTMs (Long Short Term Memory) RNN for prediction of price trends

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Multi-resolution SeqMatch based long-term Place Recognition

Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Synthesizing and manipulating 2048x1024 images with conditional GANs

This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.