Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

Zhaokai Wang

Last update: Dec 27, 2022

Related tags

Deep Learning video-bgm-generation

Overview

CMT

Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)

[Paper] [Site]

Directory Structure

src/: code of the whole pipeline
- train.py: training script, take a npz as input music data to train the model
- model.py: code of the model
- gen_midi_conditional.py: inference script, take a npz (represents a video) as input to generate several songs
- src/video2npz/: convert video into npz by extracting motion saliency and motion speed
dataset/: processed dataset for training, in the format of npz
logs/: logs that automatically generate during training, can be used to track training process
exp/: checkpoints, named after val loss (e.g. loss_13_params.pt)
inference/: processed video for inference (.npz), and generated music(.mid)

Preparation

clone this repo
download lpd_5_prcem_mix_v8_10000.npz from HERE and put it under dataset/
download pretrained model loss_8_params.pt from HERE and put it under exp/
install ffmpeg=3.2.4
prepare a Python3 conda environment
```
pip install -r py3_requirements.txt
```
prepare a Python2 conda environment (for extracting visbeat)
- ```
pip install -r py2_requirements.txt
```
- open visbeat package directory (e.g. anaconda3/envs/XXXX/lib/python2.7/site-packages/visbeat), replace the original Video_CV.py with src/video2npz/Video_CV.py

Training

If you want to use another training set: convert training data from midi into npz under dataset/
```
python midi2numpy_mix.py --midi_dir /PATH/TO/MIDIS/ --out_name data.npz 
```

train the model

python train.py -n XXX -g 0 1 2 3

# -n XXX: the name of the experiment, will be the name of the log file & the checkpoints directory. if XXX is 'debug', checkpoints will not be saved
# -l (--lr): initial learning rate
# -b (--batch_size): batch size
# -p (--path): if used, load model checkpoint from the given path
# -e (--epochs): number of epochs in training
# -t (--train_data): path of the training data (.npz file) 
# -g (--gpus): ids of gpu
# other model hyperparameters: modify the source .py files

Inference

convert input video (MP4 format) into npz (use the Python2 environment)
```
cd src/video2npz
sh video2npz.sh ../../videos/xxx.mp4
```
- try resizing the video if this takes a long time

run model to generate .mid :

python gen_midi_conditional.py -f "../inference/xxx.npz" -c "../exp/loss_8_params.pt"

# -c: checkpoints to be loaded
# -f: input npz file
# -g: id of gpu (only one gpu is needed for inference)

if using another training set, change decoder_n_class in gen_midi_conditional to the decoder_n_class in train.py

convert midi into audio: use GarageBand (recommended) or midi2audio
- set tempo to the value of tempo in video2npz/metadata.json

combine original video and audio into video with BGM

ffmpeg -i 'xxx.mp4' -i 'yyy.mp3' -c:v copy -c:a aac -strict experimental -map 0:v:0 -map 1:a:0 'zzz.mp4'

# xxx.mp4: input video
# yyy.mp3: audio file generated in the previous step
# zzz.mp4: output video

Comments

Bugs encountered while using the inference code "gen_midi_conditional.py" in "src/" folder

Hi, I encountered some bugs while using the "gen_midi_conditional.py" code to generate midi files for a given video. I installed the Python 2 environment given the requirement file "py2_requirements.txt" and then used the "video2npz.sh" to produce a "xxx.npz" file for the given video. But I encountered some problems while using the "gen_midi_conditional.py" code, the program output and error report are pasted below:

Command I used: python3 gen_midi_conditional.py -f ../inference/LGpwmBqJF1Q_HarryPotter2ChamberOfSecrets.npz -c ../exp/train_exp/loss_70_params.pt

Code standard print: inference D_MODEL 512 N_LAYER 12 N_HEAD 8 DECODER ATTN causal-linear [18, 3, 18, 129, 18, 6, 27, 102, 5025] [*] load model from: ../exp/train_exp/loss_70_params.pt new song [vlog_npz matrix print here] ------ initiate ------ tensor([[[17, 1, 10, 0, 0, 0, 0, 1, 0]]])

Error print: Traceback (most recent call last): File "gen_midi_conditional.py", line 104, in generate() File "gen_midi_conditional.py", line 85, in generate res, err_note_number_list, err_beat_number_list = net(is_train=False, vlog=vlog_npz, C=0.7) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs, **kwargs) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in call_impl return forward_call(*input, **kwargs) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 483, in forward return self.inference_from_scratch(**kwargs) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 341, in inference_from_scratch h, y_type = self.forward_hidden(input, is_training=False, init_token=pre_init) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 216, in forward_hidden init_emb_linear = self.forward_init_token(init_token) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 162, in forward_init_token emb_genre = self.init_emb_genre(x[..., 0]) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/utils.py", line 80, in forward return self.lut(x) * math.sqrt(self.d_model) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/functional.py", line 2183, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

The inference code, trained model and data (including original video and processed .npz file) are attached in Google drive. Here is the link: https://drive.google.com/drive/folders/1Ch3jjxZrztKAtEvuEhGjxPk2-G0NSYe0?usp=sharing

Could you help me check this? Really appreciate it.

Best regards,

opened by shansongliu 23
some question about 'encoder_hidden'

Hi, @wzk1015. Thank you for your great work. When I read model.py, I found that when calculating y_type, encoder_hidden[:, 7:, :] is needed as input to the linear layer, so why not use the whole encoder_hidden as input instead of encoder_hidden[:, 7:, :]? y_type = self.proj_type(encoder_hidden[:, 7:, :])

opened by LqNoob 13

Error(s) in loading state_dict for DataParallel

Hi！I have an error when running model to generate .mid (use the mm21_py3 environment) . I did not change the code to train except using epoch==1 and batchsize == 1. I do have to set batchsize to 8? Could you help me to solve this problem ??~

Traceback (most recent call last):
  File "gen_midi_conditional.py", line 102, in <module>
    generate()
  File "gen_midi_conditional.py", line 58, in generate
    net.load_state_dict(torch.load(path_saved_ckpt))
  File "/root/miniconda3/envs/mm21_py3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
        size mismatch for module.init_emb_genre.lut.weight: copying a param with shape torch.Size([1, 64]) from checkpoint, the shape in current model is torch.Size([7, 64]).
        size mismatch for module.init_emb_instrument.lut.weight: copying a param with shape torch.Size([1, 64]) from checkpoint, the shape in current model is torch.Size([6, 64]).

opened by yxt7979 8

User defined features

Dear author, I've successfully run the colab, but is there any cell defines genre and instruments you mentioned in paper? Or the colab just random selected it?

opened by yellow946821 4
What are those parameters in matching score calculation?
Hi, I'm studying your paper, and congratulations for the excellent work!

I got some detailed questions:

Specifically, what are those input parameters in calculating matching score (equation 16)?

Why do you need to add a "1()" function for the strength.

If the video's strength is coming from the visual beat, how do you handle the different value ranges of simu-note strength and the saliency?

Is the code corresponding to this part released, and if so, where is it?

Thank you!
opened by btyu 3
The role of 'processed_mid'

Hi, The function midi_to_mp3 in src/midi2mp3.py includes a variable named 'processed_mid', which in my opinion converts the raw midi to the specified tempo in video2npz/metadate.json. However, the following fs.midi_to_audio function still picks the original midi_path instead of the processed_mid to generate the mp3 music, and I cannot find any usage of processed_mid in the following codes. Therefore, I wonder what is the role of the processed_mid, and if the variable in the fs.midi_to_audio should be processed_mid rather than midi_path. Thanks

opened by JustinYuu 2
Google colab error

Hello author, I've tried to run the colab you shared, but when I met this part, there's some problem like the picture. What's wrong with it? I just follow the step and run. Sincerely, William.

opened by yellow946821 2

TypeError: 'NoneType' object is not callable

It runs well with the code in 'Quick-start' from https://github.com/idiap/fast-transformers. But I meet this error when running the 'train.py' with no modification:

name: train_default
args Namespace(batch_size='1', epochs=200, gpus=None, lr=0.0001, name='train_default', path=None, train_data='../dataset/lpd_5_prcem_mix_v8_10000.npz')
num of encoder classes: [  18    3   18  129   18    6   20  102 4865] [7, 1, 6]
D_MODEL 512  N_LAYER 12  N_HEAD 8 DECODER ATTN causal-linear
>>>>>: [  18    3   18  129   18    6   20  102 4865]
DEVICE COUNT: 1
VISIBLE: 0
n_parameters: 39,006,324
    train_data: dataset
    batch_size: 1
    num_batch: 3039
    train_x: (3039, 9999, 9)
    train_y: (3039, 9999, 9)
    train_mask: (3039, 9999)
    lr_init: 0.0001
    DECAY_EPOCH: []
    DECAY_RATIO: 0.1
Traceback (most recent call last):
  File "train.py", line 226, in <module>
    train_dp()
  File "train.py", line 169, in train_dp
    losses = net(is_train=True, x=batch_x, target=batch_y, loss_mask=batch_mask, init_token=batch_init)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 482, in forward
    return self.train_forward(**kwargs)
  File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 450, in train_forward
    h, y_type = self.forward_hidden(x, memory=None, is_training=True, init_token=init_token)
  File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 221, in forward_hidden
    encoder_hidden = self.transformer_encoder(encoder_pos_emb, attn_mask)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/transformers.py", line 138, in forward
    x = layer(x, attn_mask=attn_mask, length_mask=length_mask)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/transformers.py", line 81, in forward
    key_lengths=length_mask
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/attention_layer.py", line 109, in forward
    key_lengths
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/causal_linear_attention.py", line 101, in forward
    values
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/causal_linear_attention.py", line 23, in causal_linear
    V_new = causal_dot_product(Q, K, V)
  File "/opt/conda/lib/python3.6/site-packages/fast_transformers/causal_product/__init__.py", line 48, in forward
    product
TypeError: 'NoneType' object is not callable

opened by xxmlala 2

TypeError: 'NoneType' object is not callable

Hello，according to the py3_requirements.txt, I set up the pytorch-1.9.1 environment. But when I tried to run train.py, it returned TypeError. The details are as follows. If you can give me some suggestions, I would be very grateful. name: args Namespace(batch_size=2, epochs=200, gpus=None, lr=0.0001, name='', path=None, train_data='../dataset/lpd_5_prcem_mix_v8_10000.npz') num of encoder classes: [ 18 3 18 129 18 6 20 102 4865] [1 1 1] D_MODEL 512 N_LAYER 12 N_HEAD 8 DECODER ATTN causal-linear

: [ 18 3 18 129 18 6 20 102 4865] DEVICE COUNT: 2 VISIBLE: 0,1 n_parameters: 39,005,620 train_data: dataset batch_size: 2 num_batch: 1519 train_x: (3039, 9999, 9) train_y: (3039, 9999, 9) train_mask: (3039, 9999) lr_init: 0.0001 DECAY_EPOCH: [] DECAY_RATIO: 0.1 Traceback (most recent call last): File "train.py", line 219, in train_dp() File "train.py", line 162, in train_dp losses = net(is_train=True, x=batch_x, target=batch_y, loss_mask=batch_mask, init_token=batch_init) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/CODE/video-bgm-generation/src/model.py", line 482, in forward return self.train_forward(**kwargs) File "/home/bing/CODE/video-bgm-generation/src/model.py", line 450, in train_forward h, y_type = self.forward_hidden(x, memory=None, is_training=True, init_token=init_token) File "/home/bing/CODE/video-bgm-generation/src/model.py", line 221, in forward_hidden encoder_hidden = self.transformer_encoder(encoder_pos_emb, attn_mask) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/transformers.py", line 138, in forward x = layer(x, attn_mask=attn_mask, length_mask=length_mask) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/transformers.py", line 81, in forward key_lengths=length_mask File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/attention_layer.py", line 109, in forward key_lengths File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/causal_linear_attention.py", line 101, in forward values File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/causal_linear_attention.py", line 23, in causal_linear V_new = causal_dot_product(Q, K, V) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/causal_product/init.py", line 48, in forward product TypeError: 'NoneType' object is not callable

Looking forward to your reply！

opened by binghuang21 2
LPD 5 midi2numpy issue

Hi, I'm so interested in your work and successfully train it in your proposed 'lpd_5_prcem_mix_v8_10000.npz' and 'loss_8_params.pt'. Moreover, I'd like to have a try on new datasets and firstly I conducted "midi2numpy_mix.py" script on original LPD-5 cleaned midi files, but I just obtained the problem as follows:

python midi2numpy_mix.py --midi_dir /home/video-bgm-generation-main/dataset/clean_midi/Zero --out_name data.npz 0%| | 0/39 [00:00<?, ?it/s] Traceback (most recent call last): File "midi2numpy_mix.py", line 243, in midi2numpy(id_list) File "midi2numpy_mix.py", line 196, in midi2numpy midi = MIDI(id) File "midi2numpy_mix.py", line 149, in init self.bars = self._get_bars() File "midi2numpy_mix.py", line 156, in _get_bars bars[note.bar].append(note) AttributeError: 'Note' object has no attribute 'bar'

Thus I feel a little confused about how to exactly get your propose npz training data. Looking forward to your help! Thank you~~

opened by 2000222 2
Objective Evaluation code
Hello author，how to evaluate the effect of generated music？Is there any open source code implementation of the three Objective Evaluation method mentioned in the paper?

Pitch Class Histogram Entropy

Grooving Pattern Similarity

Structureness Indicator
opened by borishanzju 1
Objective Evaluation

Hello author， I adopt “https://github.com/slSeanWU/MusDr” to evaluate the generated MIDI. But after I process the midi file to get the pickle file, the MusDr warehouse also needs the corresponding csv file to get the objective evaluation value. How do you convert the processed pickle file to csv?

opened by borishanzju 2

Owner

Zhaokai Wang

Undergraduate student from Beihang University

GitHub

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

4 Aug 28, 2022

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

101 Dec 16, 2022

Changing the Mind of Transformers for Topically-Controllable Language Generation

We will first introduce the how to run the IPython notebook demo by downloading our pretrained models. Then, we will introduce how to run our training and evaluation code.

20 Dec 6, 2022

An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

Text2Event An implementation for Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction Please contact Yaojie Lu (@

153 Jan 7, 2023

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

261 Jan 9, 2023

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

81 Sep 25, 2021

[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022) This repository provides the official PyTorch impleme

128 Jan 3, 2023

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

DPT This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and model

111 Dec 21, 2022

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

95 Oct 24, 2022

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

203 Dec 31, 2022

Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo Block diagram of FCL-taco2, where the decode

39 Sep 28, 2022

Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Contrast and Mix (CoMix) The repository contains the codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Backgroun

Computer Vision and Intelligence Research (CVIR)

13 Dec 10, 2022

[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Joint Implicit Image Function for Guided Depth Super-Resolution This repository contains the code for: Joint Implicit Image Function for Guided Depth

78 Dec 27, 2022

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

23 Nov 5, 2022

Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

Related tags

Overview

CMT

Directory Structure

Preparation

Training

Inference

Comments

Owner

Zhaokai Wang

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Changing the Mind of Transformers for Topically-Controllable Language Generation

An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021.

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization