Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Overview

Long-term-Motion-in-3D-Scenes

This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D".

Please check our paper and the project webpage for more details.

Citation

If you use our code or paper, please consider citing:

@article{wang2020synthesizing,
  title={Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes},
  author={Wang, Jiashun and Xu, Huazhe and Xu, Jingwei and Liu, Sifei and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2012.05522},
  year={2020}
}

Dependencies

Requirements:

Datasets

We use PROX and PROXE datasets as our training data. After downloading them, please put them in './data/'. We provide generate_routepose_data.ipynb and generate_sub_data.ipynb for data generation. Note in PROX, the human meshes and the scene meshes are not in the same area in the world coordinates. Different from PROX and PROXE, we apply the inverse of the camera extrinsics to the scene mesh. Since the scene is the input and we need it to be aligned with the human bodies. This is done in the data generation code. Thus for contact calculating, you do not need to apply transformation to them. While for collision calculating, you still need to apply the transformation to the human bodies similar to PROXE to make it be aligned with SDF. Please be careful with this during training or testing, especially if you want to test on other scenes such as Matterport3D. Please put body_segments data in './data/' as well.

Demo

We provide demo.ipynb to help you play with our method. Before running, please put a downsampled MPH16.ply mesh and the SDF data of this scene in './demo_data/'. You can download them from PROX and PROXE. Still, please be careful with the camera extrinsics when you want to test other scenes, make sure the human body is in the scene. This code will also show you how to optimize the whole motion.

Models

We use SMPL-X to represent human bodies. Please download the SMPL-X models and put them in './models/' and it may look like './models/smplx/SMPLX_NEUTRAL.npz'. Please download vposer model and put it in './' ('./vposer_v1_0/').

We also provide our pretrained model here

Training

After you generate the data. You can train the networks directly,

python train_subgoal.py
python train_route.py

Please train the posenet after you finished training routenet with your own pretrained routenet model,

python train_pose.py

Acknowledgement

This work was supported, in part, by grants from DARPA LwLL, NSF 1730158 CI-New: Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI), NSF ACI-1541349 CC*DNI Pacific Research Platform, and gifts from Qualcomm and TuSimple. Part of our code is based on PROXE and it may help you with the dependencies and dataset parts as well. Many thanks!

License

Apache-2.0 License

Comments
  • Pre-trained model

    Pre-trained model

    Hi, thanks for your nice work!

    How can I get a pre-trained model(0811_in_256_4.model and point.model) of pointnet?

    Could you share the pre-trained model of pointnet?

    Also, could you share the pre-trained model of cvae, RouteNet and PoseNet?

    Thanks!

    opened by seonghyunkim1212 2
  • Second part of the skating loss

    Second part of the skating loss

    Hi Jiashun, thanks for your nice work! Could you please illustrate more about the intuition behind this for loop? https://github.com/jiashunwang/Long-term-Motion-in-3D-Scenes/blob/5673d342093ff92e677da2d90e0dd8b7dd766c5b/utils_optimize.py#L312

    opened by JiahaoPlus 2
  • Commit of human_body_prior used for this repo?

    Commit of human_body_prior used for this repo?

    Thanks for your interesting work. When I attempt to use the code in generate_sub_data.ipynb, it seems human_body_prior has changed its function, can you tell me which commit of human_body_prior do you use for the generate_sub_data.ipynb?

    opened by azuki-miho 1
  • About the middle position (sub goal).

    About the middle position (sub goal).

    I have noticed that you set the middle position manually in demo.ipynb. So, I am wondering that if I want to synthesize a long-term sequence with the start position and end position as input, I need to manually set the middle position by myself? According to my understanding, the sub-goal network proposed in your paper only synthesizes the body with a given position, it doesn't generate a middle position in a long-term sequence synthesis, am I right? I am looking forward to your reply.

    opened by Silverster98 1
  • Code for video visualization

    Code for video visualization

    Thanks for your instruction for the demo. But the demo can only generate an SMPLX model for each frame, can you publish the code which generates the video visualization of your method on your project page?

    opened by azuki-miho 1
  • Evaluation

    Evaluation

    Hi

    I have a question about evaluation. I implemented the evaluation code for the evaluation split of the PROX dataset. In Table 1 of the paper(Ours w/o opt), translation, orientation, and pose errors are reported as 6.91, 9,71, and 41.17, respectively. The results of the code I implemented are 59.02, 60.23 and 1459.95 respectively. What's wrong with my implementation?

    import os
    import random
    import torch
    import numpy as np
    
    from route_data import ROUTEDATAEVAL
    from route_data import ROUTEDATA
    from route import ROUTENET
    from pose_after_route import POSEAFTERROUTE
    from utils import GeometryTransformer
    from utils import AverageMeter
    from progress.bar import Bar
    
    def l1_error(prediction, target):
        return torch.abs(prediction - target).sum(dim=1).mean().item()
    
    
    SEED_VALUE = 0
    print(f'Seed value for the experiment {SEED_VALUE}')
    os.environ['PYTHONHASHSEED'] = str(SEED_VALUE)
    random.seed(SEED_VALUE)
    torch.manual_seed(SEED_VALUE)
    np.random.seed(SEED_VALUE)
    
    
    batch_size = 16
    
    dataset = ROUTEDATAEVAL()
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    
    
    model_route = ROUTENET(input_dim=9,hid_dim=64)
    model_route = model_route.cuda()
    
    print('use pretrained routenet')
    model_route.load_state_dict(torch.load('saved_model/route.model'))
    model_route.eval()
    
    model_pose = POSEAFTERROUTE(input_dim=65-9,hid_dim=256)
    model_pose = model_pose.cuda()
    
    print('use pretrained posenet')
    model_pose.load_state_dict(torch.load('saved_model/pose_after_route.model'))
    model_pose.eval()
    
    error_translation_ = AverageMeter()
    error_orientation_ = AverageMeter()
    error_pose_ = AverageMeter()
    
    num_iter = len(dataloader)
    bar = Bar('==>', max=num_iter)
    
    for j, data in enumerate(dataloader, 0):
    
        input_list, middle_list, frame_name, scene_name, sdf, scene_points, cam_extrinsic, s_grid_min, s_grid_max = data
    
        input_list = input_list[:, [0, -1], :]
        body = middle_list[:, 0:1, 6:16].cuda()
        input_list = torch.cat([input_list[:, :, :6], input_list[:, :, 16:]], dim=2)
        middle_list = torch.cat([middle_list[:, :, :6], middle_list[:, :, 16:]], dim=2)
    
        scene_points = scene_points.cuda()
    
        input_list = input_list.view(-1, 62)
        six_d_input_list = GeometryTransformer.convert_to_6D_rot(input_list)
        six_d_input_list = six_d_input_list.view(-1, 2, 65)
        x = six_d_input_list.cuda()
        x1 = six_d_input_list[:, :, :9].cuda()
    
        middle_list = middle_list.view(-1, 62)
        six_d_middle_list = GeometryTransformer.convert_to_6D_rot(middle_list)
        six_d_middle_list = six_d_middle_list.view(-1, 60, 65)  # 60: 2s 30fps
    
        y = six_d_middle_list[:, :, :9].cuda()
    
        out_route = model_route(x1, scene_points.transpose(1, 2))
    
        pred_trans = out_route[: , :, :3]
        pred_6d = out_route[:, :, 3:]
    
        gt_trans = y[:, :, :3]
        gt_6d = y[:, :, 3:]
    
        route_prediction = out_route.detach().view(x1.shape[0],-1)
    
        out_pose = model_pose(x[:,:,9:],scene_points.transpose(1,2),route_prediction)
    
        y = six_d_middle_list[:, :, 9:].cuda()
    
        pred_body_pose = out_pose[:, :, :32]
        gt_body_pose = y[:, :, :32]
    
        error_translation = l1_error(pred_trans.reshape(-1,3).detach(), gt_trans.reshape(-1,3)) * 100.
        error_orientation = l1_error(pred_6d.reshape(-1,6).detach(), gt_6d.reshape(-1,6)) * 100.
        error_pose = l1_error(pred_body_pose.reshape(-1,32).detach(), gt_body_pose.reshape(-1,32).detach()) * 100.
    
    
        error_translation_.update(error_translation, pred_trans.shape[0] * pred_trans.shape[1])
        error_orientation_.update(error_orientation, pred_trans.shape[0] * pred_trans.shape[1])
        error_pose_.update(error_pose, pred_trans.shape[0] * pred_trans.shape[1])
    
    
    
        summary_string = ' EVAL :[{0}/{1}]'.format(j, num_iter)
        summary_string += ' | ET {error_translation.avg:.4f} | EO {error_orientation.avg:.4f} | EP {error_pose.avg:.4f}'.format(error_translation=error_translation_, error_orientation=error_orientation_, error_pose=error_pose_)
    
        Bar.suffix = summary_string
        bar.next()
    
    bar.finish()
    

    Thanks, Seonghyun

    opened by seonghyunkim1212 13
Owner
Jiashun Wang
BS@Fudan University MS@UC San Diego jiashun_wang at outlook.com
Jiashun Wang
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance This is the codebase for video-based human motion reconstruction in human-mot

Jiachen Xu 5 Jul 14, 2022
[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

UAV-Human Official repository for CVPR2021: UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicle Paper arXiv Res

null 129 Jan 4, 2023
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

MORAI 62 Dec 17, 2022
LSTMs (Long Short Term Memory) RNN for prediction of price trends

Price Prediction with Recurrent Neural Networks LSTMs BTC-USD price prediction with deep learning algorithm. Artificial Neural Networks specifically L

null 5 Nov 12, 2021
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Autoformer (NeurIPS 2021) Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting Time series forecasting is a c

THUML @ Tsinghua University 847 Jan 8, 2023
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

null 77 Dec 16, 2022
Multi-resolution SeqMatch based long-term Place Recognition

MRS-SLAM for long-term place recognition In this work, we imply an multi-resolution sambling based visual place recognition method. This work is based

METASLAM 6 Dec 6, 2022
Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

Mohamed Hassan 81 Nov 8, 2022
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Kakao Brain 114 Nov 28, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

Jia Research Lab 116 Dec 20, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Jia Research Lab 19 Apr 28, 2021
Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

Bo Pang 27 Mar 30, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

DV Lab 116 Dec 20, 2022
Synthesizing and manipulating 2048x1024 images with conditional GANs

pix2pixHD Project | Youtube | Paper Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translatio

NVIDIA Corporation 6k Dec 27, 2022
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

Future Interfaces Group (CMU) 26 Dec 24, 2022
FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

null 226 Jan 8, 2023
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 17 Dec 22, 2022