Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

Related tags

Deep Learning mint
Overview

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021].

Overview

This package contains the model implementation and training infrastructure of our AI Choreographer.

Get started

Pull the code

git clone https://github.com/liruilong940607/mint --recursive

Note here --recursive is important as it will automatically clone the submodule (orbit) as well.

Install dependencies

conda create -n mint python=3.7
conda activate mint
conda install protobuf numpy
pip install tensorflow absl-py tensorflow-datasets librosa

sudo apt-get install libopenexr-dev
pip install --upgrade OpenEXR
pip install tensorflow-graphics tensorflow-graphics-gpu

git clone https://github.com/arogozhnikov/einops /tmp/einops
cd /tmp/einops/ && pip install . -U

git clone https://github.com/google/aistplusplus_api /tmp/aistplusplus_api
cd /tmp/aistplusplus_api && pip install -r requirements.txt && pip install . -U

Note if you meet environment conflicts about numpy, you can try with pip install numpy==1.20.

Get the data

See the website

Get the checkpoint

Download from google drive here, and put them to the folder ./checkpoints/

Run the code

  1. complie protocols
protoc ./mint/protos/*.proto
  1. preprocess dataset into tfrecord
python tools/preprocessing.py \
    --anno_dir="/mnt/data/aist_plusplus_final/" \
    --audio_dir="/mnt/data/AIST/music/" \
    --split=train
python tools/preprocessing.py \
    --anno_dir="/mnt/data/aist_plusplus_final/" \
    --audio_dir="/mnt/data/AIST/music/" \
    --split=testval
  1. run training
python trainer.py --config_path ./configs/fact_v5_deeper_t10_cm12.config --model_dir ./checkpoints

Note you might want to change the batch_size in the config file if you meet OUT-OF-MEMORY issue.

  1. run testing and evaluation
# caching the generated motions (seed included) to `./outputs`
python evaluator.py --config_path ./configs/fact_v5_deeper_t10_cm12.config --model_dir ./checkpoints
# calculate FIDs
python tools/calculate_scores.py

Citation

@inproceedings{li2021dance,
  title={AI Choreographer: Music Conditioned 3D Dance Generation with AIST++},
  author={Ruilong Li and Shan Yang and David A. Ross and Angjoo Kanazawa},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}
}
Comments
  • How to transfer .npy file to video

    How to transfer .npy file to video

    Thanks for your fancy work. I run evaluator.py file and the output file is .npy. Could you please give me some suggestions about transferring the output file to video?

    opened by loserZhang 13
  • Visulization with 3D character

    Visulization with 3D character

    Hi, thanks for your fancy work. I'm new in 3D visualization. Just curious how you visualize the generate 3D motion with character from Mixamo. Do you use blender or something? Could you possibly refer some helpful websites or something like that?

    Thanks in advance.

    opened by EricGuo5513 10
  • Inference - some output angles seem wrong

    Inference - some output angles seem wrong

    Visualizing the output .npy files of evaluator.py according to README with the provided checkpoints seems like some of the angles (e.g. shoulders) are wrong, maybe flipped. In this clip, green is the original sample from AIST++ and red is MINT inference.

    To visualize, I implement the opposite operation described here, then used Blender's SMPLX addon to visualize. Here's my code:

        rotations = mint_data[:seq_len, 9:] # trim first 9 entries according to https://github.com/google-research/mint - mint/tools/preprocessing.py +161
        rotations = rotations.reshape([-1, 3, 3])
        rotations = R.from_matrix(rotations).as_rotvec().reshape([seq_len, (joint_dim-1)//9, 3])
        body_pose = rotations[:, :NUM_SMPLX_BODYJOINTS]  # FIXME - not sure about that (trimming last 3 joints from smpl 24, to smplx 21)
    
    opened by GuyTevet 5
  • bvh_writer instruction

    bvh_writer instruction

    Hi Ruilong,

    Is it possible that providing instructions on bvh_writer? I didn't find the skeleton_csv_filename and joints_to_ignore_csv_filename.

    Best, Wenjie

    opened by YIN95 4
  • about the crossmodal_train.txt

    about the crossmodal_train.txt

    I'm sorry I meet a problem. where can I get this file of crossmodal_train.txt In this code, it seems in /mnt/data/AIST/music/, but i can't find where to download .

    opened by heheal 1
  • TypeError: The `filenames` argument must contain `tf.string` elements. Got `tf.float32` elements.

    TypeError: The `filenames` argument must contain `tf.string` elements. Got `tf.float32` elements.

    When I completed the data preprocessing and got the tfrecord file, however, when I tried to train the model, the error occurred like this. It seems to be the problem of tf.data.TFRecordDataset. Thank you in advance! image image

    opened by wang-zm18 0
  • Something Wrong with fast_processing in input_util.py?

    Something Wrong with fast_processing in input_util.py?

    https://github.com/google-research/mint/blob/b8f8bdfbbe3fbfa67831a2ef9bcf71a6a9e74552/mint/utils/inputs_util.py#L101-L103

    It should be:

    example["audio_input"] = example["audio_sequence"][start * audio_sample_rate:start * audio_sample_rate + 
                                                       audio_input_length, :] 
    
    opened by hzxie 0
  • Making the AI Choreographer work in colab

    Making the AI Choreographer work in colab

    Hello, is there any guide or things to note when trying to run the AI choreographer on colab? I already get a few errors when trying to run the dependencies, I would be thankful for any help on running this on colab

    opened by fromglow 0
  • Why the seq_name list repeats 10 times during preprocessing.py?

    Why the seq_name list repeats 10 times during preprocessing.py?

    Hi, authors, thank you for your nice work! I am learning your FID calculation code. I found that in Line 170 in preprocessing.py, the seq_name repeats 10 times, so that the evaluator run 10 times on testval set. I am not understand this design. https://github.com/google-research/mint/blob/b8f8bdfbbe3fbfa67831a2ef9bcf71a6a9e74552/tools/preprocessing.py#L170

    And can I reproduce the FID scores in your paper with the provided checkpoints?

    Thank you! Best

    opened by by2101 0
  • About the eval result

    About the eval result

    my eval result using the given checkpoint has only a few second valid dance. after a few second, the result became totally nonsence. is this the real case or did I do something wrong?

    opened by nesciemus 4
  • In calculate_beat_scores.py, what should be the result_files?

    In calculate_beat_scores.py, what should be the result_files?

    Hi. After evaluation, I just tried to run calculate_beat_scores.py. However, the default result_files is '/mnt/data/aist_paper_results/*.pkl' which doesn't work. Is there anyone could tell me how to generate the motion and replace the default result files? Thank you very much!

    opened by mingdianliu 2
  • loss逐步减小,但FID_k却逐渐增大?(tensorflow & pytorch)

    loss逐步减小,但FID_k却逐渐增大?(tensorflow & pytorch)

    本人用Pytorch复刻了一版,也增加了valid过程,结果发现,FID_k最好的是第21个epoch,为101,后面越训练,FID_k的值越大,波动比较大,也不知道什么原因。 loss值降到0.0002左右,基本就不收敛了,FID_k的值能达到7000多, FID_g的值25左右。 使用原作者tensorflow原代码,从头开始训练,loss值降到0.0001后,基本就不收敛了,FID_k的值达到700多,FID_g也就30多,也是完全复现不了原作者放出的训练好的模型。 @liruilong940607

    TF: 2.3 cuda: 10.1

    pytorch: 1.9.1 cuda: 10.1

    阶段进展更新了,pytorch复刻版训练已经收敛,复现了原作者的指标参数,解决方法如下:

    1、每一层的初始化方法与TF版保持一致,注意检查每一层的默认初始化方法 2、由于训练集只有952个视频,如果多个GPU同时训练,batch_size设置为32,那么每个epoch只包含几个迭代,建议方法是加载列表后,把列表复制10倍或20倍,这样每个epoch的迭代次数就变为原来的10倍或20倍 3、训练足够多的epoch,如果训练的时候,迭代器中把训练数据列表复制了10倍或20倍,那么loss要收敛到0.00011级别,至少训练800个epoch,0-200 epoch是lr=1e-4,200-500 epoch时lr=1e-5,500-800 是1e-6

    另外,我在最后一个CrossTransformer后面,最后一个fc前面,增加了一个2层双向的LSTM,训练下来,loss收敛到0.00011左右,FIDk最小可以达到22.3,比原作者给出的指标还小,但这个指标小,并不代表着生成的舞蹈动作很好,效果如原作者给出的模型类似,只有少数几个效果还行,其他的都不怎么样。

    opened by CuberFan 18
Owner
Google Research
Google Research
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 7, 2023
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

Facebook Research 42 Dec 9, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
Contrastive Fact Verification

VitaminC This repository contains the dataset and models for the NAACL 2021 paper: Get Your Vitamin C! Robust Fact Verification with Contrastive Evide

null 47 Dec 19, 2022
Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

Neema Kotonya 42 Nov 17, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 1, 2023
✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

How Robust are Fact Checking Systems on Colloquial Claims? Official PyTorch implementation of our NAACL paper: Byeongchang Kim*, Hyunwoo Kim*, Seokhee

Byeongchang Kim 19 Mar 15, 2022
A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

DaDa 106 Dec 29, 2022
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

null 695 Jan 5, 2023
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
A pytorch-based deep learning framework for multi-modal 2D/3D medical image segmentation

A 3D multi-modal medical image segmentation library in PyTorch We strongly believe in open and reproducible deep learning research. Our goal is to imp

Adaloglou Nikolas 1.2k Dec 27, 2022
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition [ArXiv] [Project Page] This repository is the official implementation of AdaMML:

International Business Machines 43 Dec 26, 2022
Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

FeiyiFANG 5 Dec 13, 2021
Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

CoaDTI Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2 Abstract Environment The test was conducted i

Layne_Huang 7 Nov 14, 2022
Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification We provide the codes for repr

null 12 Dec 12, 2022
Multi-Modal Machine Learning toolkit based on PyTorch.

简体中文 | English TorchMM 简介 多模态学习工具包 TorchMM 旨在于提供模态联合学习和跨模态学习算法模型库,为处理图片文本等多模态数据提供高效的解决方案,助力多模态学习应用落地。 近期更新 2022.1.5 发布 TorchMM 初始版本 v1.0 特性 丰富的任务场景:工具

njustkmg 1 Jan 5, 2022