Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Overview

Bailando

Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

[Paper] | [Project Page] | [Video Demo]

Do not hesitate to give a star!

Driving 3D characters to dance following a piece of music is highly challenging due to the spatial constraints applied to poses by choreography norms. In addition, the generated dance sequence also needs to maintain temporal coherency with different music genres. To tackle these challenges, we propose a novel music-to-dance framework, Bailando, with two powerful components: 1) a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequence to a quantized codebook, 2) an actor-critic Generative Pre-trained Transformer (GPT) that composes these units to a fluent dance coherent to the music. With the learned choreographic memory, dance generation is realized on the quantized units that meet high choreography standards, such that the generated dancing sequences are confined within the spatial constraints. To achieve synchronized alignment between diverse motion tempos and music beats, we introduce an actor-critic-based reinforcement learning scheme to the GPT with a newly-designed beat-align reward function. Extensive experiments on the standard benchmark demonstrate that our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively. Notably, the learned choreographic memory is shown to discover human-interpretable dancing-style poses in an unsupervised manner.

Code

Environment

PyTorch == 1.6.0

Data preparation

In our experiments, we use AIST++ for both training and evaluation. Please visit here to download the AIST++ annotations and unzip them as './aist_plusplus_final/' folder, visit here to download all original music pieces (wav) into './aist_plusplus_final/all_musics'. And please set up the AIST++ API from here and download the required SMPL models from here. Please make a folder './smpl' and copy the downloaded 'male' SMPL model (with '_m' in name) to 'smpl/SMPL_MALE.pkl' and finally run

./prepare_aistpp_data.sh

to produce the features for training and test. Otherwise, directly download our preprocessed feature from here as ./data folder if you don't wish to process the data.

Training

The training of Bailando comprises of 4 steps in the following sequence. If you are using the slurm workload manager, you can directly run the corresponding shell. Otherwise, please remove the 'srun' parts. Our models are all trained with single NVIDIA V100 GPU. * A kind reminder: the quantization code does not fit multi-gpu training

Step 1: Train pose VQ-VAE (without global velocity)

sh srun.sh configs/sep_vqvae.yaml train [your node name] 1

Step 2: Train glabal velocity branch of pose VQ-VAE

sh srun.sh configs/sep_vavqe_root.yaml train [your node name] 1

Step 3: Train motion GPT

sh srun_gpt_all.sh configs/cc_motion_gpt.yaml train [your node name] 1

Step 4: Actor-Critic finetuning on target music

sh srun_actor_critic.sh configs/actor_critic.yaml train [your node name] 1

Evaluation

To test with our pretrained models, please download the weights from here (Google Drive) or separately downloads the four weights from [weight 1]|[weight 2]|[weight 3]|[weight4] (坚果云) into ./experiments folder.

1. Generate dancing results

To test the VQ-VAE (with or without global shift as you indicated in config):

sh srun.sh configs/sep_vqvae.yaml eval [your node name] 1

To test GPT:

sh srun_gpt_all.sh configs/cc_motion_gpt.yaml eval [your node name] 1

To test final restuls:

sh srun_actor_critic.sh configs/actor_critic.yaml eval [your node name] 1

2. Dance quality evaluations

After generating the dance in the above step, run the following codes.

Step 1: Extract the (kinetic & manual) features of all AIST++ motions (ONLY do it by once):

python extract_aist_features.py

Step 2: compute the evaluation metrics:

python utils/metrics_new.py

It will show exactly the same values reported in the paper. To fasten the computation, comment Line 184 of utils/metrics_new.py after computed the ground-truth feature once. To test another folder, change Line 182 to your destination, or kindly modify this code to a "non hard version" :)

Choreographic for music in the wild

TODO

Citation

@inproceedings{siyao2022bailando,
    title={Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory,
    author={Siyao, Li and Yu, Weijiang and Gu, Tianpei and Lin, Chunze and Wang, Quan and Qian, Chen and Loy, Chen Change and Liu, Ziwei },
    booktitle={CVPR},
    year={2022}
}

License

Our code is released under MIT License.

Issues
  • Some file missing in data.zip

    Some file missing in data.zip

    Hi, thanks for sharing your code! I download the data.zip that you released. However, I got an error here. I check the data and I found that aistpp_music_feat_7.5fps/mJB4.json is empty. Is it a mistake?

    opened by xljh0520 3
  • Why extracting audio features twice with different sampling rate?

    Why extracting audio features twice with different sampling rate?

    Hi, Siyao~ Thanks for releasing and cleaning the code!!

    May I ask why in the pre-processing part, the audio (music) features are extracted twice and with different sampling rates?

    Precisely, in _prepro_aistpp.py, the audio features are extracted with the sampling rate 15360*2

    While in _prepro_aistpp_music.py, the audio features are extracted with the sampling rate 15360*2/8

    opened by XiSHEN0220 2
  • Processed data

    Processed data

    Hi, There is no link to the processed data, since you said that "Otherwise, directly download our preprocessed feature from here as ./data folder if you don't wish to process the data."

    Can you add the data link? Thanks!

    opened by LaLaLailalai 2
  • where is  the definition of

    where is the definition of "from .utils.torch_utils import assert_shape"

    Hi @lisiyao21 Thank you for release code! where is the definition of "from .utils.torch_utils import assert_shape" https://github.com/lisiyao21/Bailando/blob/27fe2b63896a2e31928b22944bac10455413263e/models/encdec.py#L4

    opened by zhangsanfeng86 2
  • No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'

    No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'

    Bailando/utils/functional.py", line 130, in img2video music_names = sorted(os.listdir(audio_dir)) FileNotFoundError: [Errno 2] No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'

    opened by donghaoye 1
  • training problem

    training problem

    I meet error at Step 1 by running python -u main.py --config configs/sep_vqvae.yaml --train

    Traceback (most recent call last):
      File "main.py", line 56, in <module>
        main()
      File "main.py", line 40, in main
        agent.train()
      File "/share/yanzhen/Bailando/motion_vqvae.py", line 94, in train
        loss.backward()
      File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 166, in backward
        grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
      File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 67, in _make_grads
        raise RuntimeError("grad can be implicitly created only for scalar outputs")
    RuntimeError: grad can be implicitly created only for scalar outputs
    

    After print the loss, it looks like tensor([0.2667, 0.2735, 0.2687, 0.2584, 0.2701, 0.2697, 0.2571, 0.2658], device='cuda:0', grad_fn=<GatherBackward>), so do I need to take a mean or sum operation?

    However, even if I take a mean operation, the training still seems problematic. The loss decreases normally, while in eval stage, the output quants are all zero. Any suggestion?

    The training log is attached for reference.

    log.txt

    @lisiyao21

    opened by haofanwang 0
  • Loading AIST++ data

    Loading AIST++ data

    When training the CrossCondGPT2 using cc_motion_gpt.yaml config and loading the AIST++ data using load_data_aist[line 538] in ./utils/functional.py, I'm confusing to the for loop in line 572. Because of 8 downsample rate, the np_music feature current read is at 7.5fps, while the np_dance is at 60fps. However, in the for loop, the seq_len is associate with np_music, and the step for range(...) is move(which is 8 in the config yaml), the step for np_music is i/music_sample_rate (the music_sample_rate I think is also 8), the step for np_dance is 8. As a result, the music training sample is actually step by 1 frame in the range(...) loop. If the music sample step by 1 frame, we should get len(np_music) training sample. But the step in range(....) is 8, len(np_music) - len(np_music)//8 training sample is left behind. In a nutshell, I think the split function in for loop doesn't match.

    opened by Jhc-china 0
  • How to show the dancing character via Mixamo?

    How to show the dancing character via Mixamo?

    According to the code you provided, I can only get the dancing stickman (skeleton), how to get the dancing character animation. Is there any relevant code or tutorial about Mixamo?

    Thanks a lot.

    opened by LinghaoChan 0
  • Suggestion on import aist_plusplus

    Suggestion on import aist_plusplus

    In https://github.com/lisiyao21/Bailando/blob/125b529b3c596b58b9d70f22c05dc3f04a01895c/extract_aist_features.py#L5 suggest as : from aistplusplus_api.aist_plusplus.loader import AISTDataset

    opened by kevink2424 1
Owner
Li Siyao
an interesting PhD student
Li Siyao
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 29 Jun 26, 2022
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG 64 Jun 28, 2022
Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022) Code release of our paper ROCA. Check out our video, paper, and website

null 73 Jun 27, 2022
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 71 May 27, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 69 Jun 7, 2022
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 8 Apr 28, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 72 Jun 23, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Jun 28, 2022
Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.

Learning to Segment Every Thing This repository contains the code for the following paper: R. Hu, P. Dollár, K. He, T. Darrell, R. Girshick, Learning

Ronghang Hu 416 Jun 15, 2022
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 102 Jun 16, 2022
(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

BRNet Introduction This is a release of the code of our paper Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds,

null 79 Jun 13, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 171 Jun 12, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 192 Jun 17, 2022
Distilling Knowledge via Knowledge Review, CVPR 2021

ReviewKD Distilling Knowledge via Knowledge Review Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia This project provides an implementation for the

DV Lab 145 Jun 15, 2022
Automatically download multiple papers by keywords in CVPR

CVFPaperHelper Automatically download multiple papers by keywords in CVPR Install mkdir PapersToRead cd PaperToRead pip install requests tqdm git clon

null 46 Jun 8, 2022
Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

DewarpNet This repository contains the codes for DewarpNet training. Recent Updates [May, 2020] Added evaluation images and an important note about Ma

CVLab@StonyBrook 316 Jun 18, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 489 Apr 28, 2022
Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

This repository hosts the source code of our paper: [AAAI 2021]Sequential End-to-end Network for Efficient Person Search. SeqNet achieves the state-of

Zj Li 198 Jun 13, 2022
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

null 71 Jun 7, 2022