Single-Shot Motion Completion with Transformer

FuxiCV

Last update: Dec 29, 2022

Related tags

Deep Learning SSMCT

Overview

Single-Shot Motion Completion with Transformer

👉 [Preprint] 👈

Abstract

Motion completion is a challenging and long-discussed problem, which is of great significance in film and game applications. For different motion completion scenarios (in-betweening, in-filling, and blending), most previous methods deal with the completion problems with case-by-case designs. In this work, we propose a simple but effective method to solve multiple motion completion problems under a unified framework and achieves a new state of the art accuracy under multiple evaluation settings. Inspired by the recent great success of attention-based models, we consider the completion as a sequence to sequence prediction problem. Our method consists of two modules - a standard transformer encoder with self-attention that learns long-range dependencies of input motions, and a trainable mixture embedding module that models temporal information and discriminates key-frames. Our method can run in a non-autoregressive manner and predict multiple missing frames within a single forward propagation in real time. We finally show the effectiveness of our method in music-dance applications.

State-of-the-art on Lafan1 dataset

With the help of Transformer, we achieve a new SOTA result on Lafan1 dataset.

Lengths = 30	L2Q	L2P	NPSS
Zero-Vel	1.51	6.60	0.2318
Interp.	0.98	2.32	0.2013
ERD-QV	0.69	1.28	0.1328
Ours	0.61	1.10	0.1222

Some results (blue appearaces represent keyframes):

Dance Infilling on Anidance Dataset

We also evaluate our method on the Anidance dataset:

Infilling on the test set (black skeletons are the keyframes):

(From Left to Right: Ours, Interp. and Ground Truth)

Infilling on random keyframes (keyframes are randomly chosen from the test set with a random order for simulating in-the-wild scenario):

(From Left to Right: Ours, Interp. and Ground Truth)

Dance blending

Our method can also work on complex dance movement completion:

Code

Coming soon

Citation

@misc{duan2021singleshot,
      title={Single-Shot Motion Completion with Transformer}, 
      author={Yinglin Duan and Tianyang Shi and Zhengxia Zou and Yenan Lin and Zhehui Qian and Bohan Zhang and Yi Yuan},
      year={2021},
      eprint={2103.00776},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

You might also like...

Code associated with the paper "Deep Optics for Single-shot High-dynamic-range Imaging"

Deep Optics for Single-shot High-dynamic-range Imaging Code associated with the paper "Deep Optics for Single-shot High-dynamic-range Imaging" CVPR, 2

40 Dec 12, 2022

A PyTorch Implementation of Single Shot MultiBox Detector

SSD: Single Shot MultiBox Object Detector, in PyTorch A PyTorch implementation of Single Shot MultiBox Detector from the 2016 paper by Wei Liu, Dragom

4.8k Jan 7, 2023

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022) Paper | Demo Requirements Python = 3.6 , Pytorch

84 Jan 3, 2023

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

DanceNet3D The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. Dataset and Results Pleas

36 Dec 21, 2022

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

248 Dec 23, 2022

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

906 Dec 30, 2022

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Comments

question about visualization

How to show the human in a 3D world? do you use Unity as the tool? and how to bind the output data to the 3D software parameters for presenting the motion?(euler angle? quaternion or something else) Thank you ~

opened by gravitychen 0

Single-Shot Motion Completion with Transformer

Related tags

Overview

Single-Shot Motion Completion with Transformer

Abstract

State-of-the-art on Lafan1 dataset

Dance Infilling on Anidance Dataset

Dance blending

Code

Citation

You might also like...

Code associated with the paper "Deep Optics for Single-shot High-dynamic-range Imaging"

A PyTorch Implementation of Single Shot MultiBox Detector

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Comments

question about visualization

Owner

FuxiCV

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

A PyTorch Implementation of Single Shot Scale-invariant Face Detector.

Single-Shot Motion Completion with Transformer

Related tags

Overview

Single-Shot Motion Completion with Transformer

Abstract

State-of-the-art on Lafan1 dataset

Dance Infilling on Anidance Dataset

Dance blending

Code

Citation

You might also like...

Code associated with the paper "Deep Optics for Single-shot High-dynamic-range Imaging"

A PyTorch Implementation of Single Shot MultiBox Detector

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Comments

question about visualization

Owner

FuxiCV

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

A PyTorch Implementation of Single Shot Scale-invariant Face Detector.

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.