This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

Overview

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

Python Pytorch

Project Page | YouTube | Paper

This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

Environment

conda install pytorch torchvision cudatoolkit=<your cuda version>
conda install pyyaml scikit-image scikit-learn opencv
pip install -r requirements.txt

Data

Mixamo

Mixamo is a synthesized 3D character animation dataset.

  1. Download mixamo data here.
  2. Extract under data/mixamo

For directions for downloading 3D Mixamo data please refer to this link.

SoloDance

SoloDance is a collection of dancing videos on youtube. We use DensePose to extract skeleton sequences from these videos for training.

  1. Download the extracted skeleton sequences here.
  2. Extract under data/solo_dance

The original videos can be downloaded here.

Preprocessing

run sh scripts/preprocess.sh to preprocess the two datasets above.

Pretrained model

Download the pretrained models here.

Inference

  1. For Skeleton Extraction, please consider using a pose estimation library such as Detectron2. We require the input skeleton sequences to be in the format of a numpy .npy file:

    • The file should contain an array with shape 15 x 2 x length.
    • The first dimension (15) corresponds the 15 body joint defined here.
    • The second dimension (2) corresponds to x and y coordinates.
    • The third dimension (length) is the temporal dimension.
  2. For Motion Retargeting Network, we provide the sample command for inference:

python infer_pair.py 
--config configs/transmomo.yaml 
--checkpoint transmomo_mixamo_36_800_24/checkpoints/autoencoder_00200000.pt # replace with actual path
--source a.npy  # replace with actual path
--target b.npy  # replace with actual path
--source_width 1280 --source_height 720 
--target_height 1920 --target_width 1080
  1. For Skeleton-to-Video Rendering, please refer to Everybody Dance Now.

Training

To train the Motion Retargeting Network, run

python train.py --config configs/transmomo.yaml

To train on the SoloDance dataest, run

python train.py --config configs/transmomo_solo_dance.yaml

Testing

For testing motion retargeting MSE, first generate the motion-retargeted motions with

python test.py
--config configs/transmomo.yaml # replace with the actual config used for training
--checkpoint transmomo_mixamo_36_800_24/checkpoints/autoencoder_00200000.pt
--out_dir transmomo_mixamo_36_800_24_results # replace actual path to output directory

And then compute MSE by

python scripts/compute_mse.py 
--in_dir transmomo_mixamo_36_800_24_results # replace with the previous output directory

Project Structure

transmomo.pytorch
├── configs - configuration files
├── data - place for storing data
├── docs - documentations
├── lib
│   ├── data.py - datasets and dataLoaders
│   ├── networks - encoders, decoders, discriminators, etc.
│   ├── trainer.py - training pipeline
│   ├── loss.py - loss functions
│   ├── operation.py - operations, e.g. rotation, projection, etc.
│   └── util - utility functions
├── out - place for storing output
├── infer_pair.py - perform motion retargeting
├── render_interpolate.py - perform motion and body interpolation
├── scripts - scripts for data processing and experiments
├── test.py - test MSE
└── train.py - main entrance for training

TODOs

  • Detailed documentation

  • Add example files

  • Release in-the-wild dancing video dataset (unannotated)

  • Tool for visualizing Mixamo test error

  • Tool for converting keypoint formats

Citation

Z. Yang*, W. Zhu*, W. Wu*, C. Qian, Q. Zhou, B. Zhou, C. C. Loy. "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. (* indicates equal contribution.)

BibTeX:

@inproceedings{transmomo2020,
  title={TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting},
  author={Yang, Zhuoqian and Zhu, Wentao and Wu, Wayne and Qian, Chen and Zhou, Qiang and Zhou, Bolei and Loy, Chen Change},
  booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

Acknowledgement

This repository is partly based on Rundi Wu's Learning Character-Agnostic Motion for Motion Retargeting in 2D and Xun Huang's MUNIT: Multimodal UNsupervised Image-to-image Translation. The skeleton-to-rendering part is based on Everybody Dance Now. We sincerely thank them for their inspiration and contribution to the community.

Comments
  • About Triplet Margin Loss

    About Triplet Margin Loss

    Hi, thank you for releasing the code!

    I have a few questions on the triplet loss function as described in Section 3.2.2. It seems the triplet loss is trying to evaluate how well a generated pose sequence in comparison to a 'real' sequence. The anchor and positive samples in a triplet tuple are selected by using two consecutive results produced by the body encoder with 'real sequence' as input, and another result is treated as the negative sample which is produced by the body encoder part with 'fake' or 'generated' results. According to Eq.4 and Eq.5, both sequences produced by feeding the 'original' input and 'limb-scaled' input can be treated as 'real' output. Therefore, the anchor and positive sample can be selected from both streams in your proposed model? Is that correct the triplet loss is trying to guide the body encoder to generated the 'same' body structure when changing the scale of the input pose?

    BTW, is that a bug in the calculation of temporal pairwise cosine similarity of the seqs_b in Line 43 (loss.py)? because the temporal similarity should be also calculated within the seqs_b to obtain the similarity between an 'anchor' and a 'positive' sample?

    https://github.com/yzhq97/transmomo.pytorch/blob/5d766fb16f511b77f446abda3697b626595a2b2d/lib/loss.py#L42-L44

    Thanks!

    opened by AndrewChiyz 1
  • Questions about latent space interpolation results

    Questions about latent space interpolation results

    Hi, first of all, thank you for releasing such a great code! My question regarding the latent space interpolation is that how the background of the interpolated pose changes. For example Figure 8 in the paper, I can see body structures can be successfully interpolated, but what about the synthesized frames conditioned on them? It would be helpful if you can provide or describe the synthesized video frame. (Is it just a mixture of two frames??)

    opened by kangyeolk 1
  • Error when running infer_pair.py

    Error when running infer_pair.py

    Hello, I am interested in running your code. The test.py runs fine, however when I try to run the infer_pair.py file with the following command

    python infer_pair.py --config configs/transmomo.yaml --checkpoint transmomo_mixamo_36_800_24/checkpoints/autoencoder_00200000.pt --source data/mixamo/36_800_24/test_random_rotate/SPORTY_GRANY/Pulling_A_Rope/Pulling_A_Rope.npy --target "data/mixamo/36_800_24/test_random_rotate/TY/Golf_Tee_Up_(1)/Golf_Tee_Up_(1).npy" --source_width 1280 --source_height 720 --target_height 1080 --target_width 1920
    

    I get the following error

    Traceback (most recent call last):
      File "infer_pair.py", line 87, in <module>
        main(config, args)
      File "infer_pair.py", line 77, in main
        x_cross = postprocess(x_cross, mean_pose, std_pose, unit=1.0, start=x_src_start)
      File "/home/rk/projects/transmomo.pytorch/lib/util/motion.py", line 27, in postprocess
        motion = globalize_motion(motion, start=start)
      File "/home/rk/projects/transmomo.pytorch/lib/util/motion.py", line 100, in globalize_motion
        centers += start.reshape([2, 1])
    ValueError: cannot reshape array of size 3 into shape (2,1)
    

    It's also the same error with other pairs of motion.

    Is there something wrong? Thanks.

    opened by RusticKey 1
  • GPU memrory

    GPU memrory

    Thanks for your excellent work. And I what to know how many GPU memory it Need during training phase and testing phase respectively? And how long does it take to train this network.

    opened by swrdZWJ 1
  • What's the 3d keyjoints format?

    What's the 3d keyjoints format?

    Hello, thank you for sharing the code.

    I am trying to get 3d key joints by calling reconstruct3d() in network.py. It returns an array in this format : (45, length)

    What's the order of these 45 data points?

    Thanks

    opened by FredericaLee 1
  • Which point is the center for limb scaling?

    Which point is the center for limb scaling?

    Q1: The result is amazing no matter what people is small or large. But I don't find the code you choose which point as the center for scaling structure, so all limbs will shifting together, it is not reasonable. And if we scaling every target's skeleton frame by frame independently, whether the generated result will jittering?

    Q2: For all target video, do we just need one common model for generating results?

    opened by ailias 1
  • Body_25 format form detectron

    Body_25 format form detectron

    I have fetched the keypoints of an image/video using detectron2 as suggested in the readme. But the format is Cocoo. This project expects the input to be in Body_25 format. How do I convert Cocoo to Body_25?

    Also, how do I visualize the output of infer.py? I know that I'm supposed to use motion2video utility, but it doesn't seem to work directly on the output motion of infer.py. I am not sure what format it's expecting the data in.

    opened by viggyr 0
  • I want to convert the video like this and save it as Npy:

    I want to convert the video like this and save it as Npy:

    I want to convert the video like this and save it as Npy:

    The file should contain an array with shape 15 x 2 x length. The first dimension (15) corresponds the 15 body joint defined here. The second dimension (2) corresponds to x and y coordinates. The third dimension (length) is the temporal dimension.

    opened by LMR2018 0
  • How to generate 3D pose

    How to generate 3D pose

    Hi! I've noticed in the paper that the 2D pose can be generated by projecting 3D into 2D, and the retargeting inference that the repo provides is 2D pose generation. I wonder if there are functions which can directly generate 3D pose(which means every point has x,y,z coordinate instead of x,y). Thanks a lot!

    opened by Alouette98 0
  • limb_norm() related issue

    limb_norm() related issue

    Hi @yzhq97, I am facing issue regarding the limb normalisation. I have been trying to normalise each to the limb lengths in scale between 0-1. But it seems that it is not straight forward. I have till now understood that limb normalisation requires multiple dependents of joints to be normalised. But limb_norm(x_a,x_b) takes two different motions to scale x_a according to target x_b skeleton structure. How can I scale limbs between 0-1 since I am not trying to adjust structure according to target skeleton by using limb_norm()?

    opened by RahhulDd 0
  • some parameter's meaning in the code

    some parameter's meaning in the code

    I want to know what is unit meaning in this function. Hope for your reply.

    `def preprocess_test(motion, meanpose, stdpose, unit=128): motion = motion * unit

    motion[1, :, :] = (motion[2, :, :] + motion[5, :, :]) / 2
    motion[8, :, :] = (motion[9, :, :] + motion[12, :, :]) / 2
    
    start = motion[8, :, 0]
    motion = localize_motion(motion)
    motion = normalize_motion(motion, meanpose, stdpose)
    
    return motion, start
    

    `

    opened by zq1335030905 5
PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Reference-Based-Sketch-Image-Colorization-ImageNet This is a PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization usin

Yuzhi ZHAO 11 Jul 28, 2022
The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

The Equalization Losses for Long-tailed Object Detection and Instance Segmentation This repo is official implementation CVPR 2021 paper: Equalization

Jingru Tan 129 Dec 16, 2022
Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

Andrew Luo 41 Dec 9, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Zhengxia Zou 1.5k Dec 28, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

null 88 Nov 22, 2022
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

null 697 Jan 6, 2023
Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

L2F - Learning to Forget for Meta-Learning Sungyong Baik, Seokil Hong, Kyoung Mu Lee Source code for CVPR 2020 paper "Learning to Forget for Meta-Lear

Sungyong Baik 29 May 22, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a

Raghav 42 Dec 15, 2022
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

Nguyen Mau Dung 438 Dec 29, 2022
Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

Dual super-resolution learning for semantic segmentation 2021-01-02 Subpixel Update Happy new year! The 2020-12-29 update of SISR with subpixel conv p

Sam 79 Nov 24, 2022
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

Poplar Implementation of Bundle Adjustment using Gaussian Belief Propagation on Graphcore's IPU Implementation of CVPR 2020 paper: Bundle Adjustment o

Joe Ortiz 34 Dec 5, 2022
UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

UDP-Pose This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Top-Down Results on

null 20 Jul 29, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
Woosung Choi 63 Nov 14, 2022