Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

Overview

CMT

Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)

[Paper] [Site]

Directory Structure

  • src/: code of the whole pipeline

    • train.py: training script, take a npz as input music data to train the model

    • model.py: code of the model

    • gen_midi_conditional.py: inference script, take a npz (represents a video) as input to generate several songs

    • src/video2npz/: convert video into npz by extracting motion saliency and motion speed

  • dataset/: processed dataset for training, in the format of npz

  • logs/: logs that automatically generate during training, can be used to track training process

  • exp/: checkpoints, named after val loss (e.g. loss_13_params.pt)

  • inference/: processed video for inference (.npz), and generated music(.mid)

Preparation

  • clone this repo

  • download lpd_5_prcem_mix_v8_10000.npz from HERE and put it under dataset/

  • download pretrained model loss_8_params.pt from HERE and put it under exp/

  • install ffmpeg=3.2.4

  • prepare a Python3 conda environment

    pip install -r py3_requirements.txt
  • prepare a Python2 conda environment (for extracting visbeat)

    • pip install -r py2_requirements.txt
    • open visbeat package directory (e.g. anaconda3/envs/XXXX/lib/python2.7/site-packages/visbeat), replace the original Video_CV.py with src/video2npz/Video_CV.py

Training

  • If you want to use another training set: convert training data from midi into npz under dataset/

    python midi2numpy_mix.py --midi_dir /PATH/TO/MIDIS/ --out_name data.npz 
  • train the model

    python train.py -n XXX -g 0 1 2 3
    
    # -n XXX: the name of the experiment, will be the name of the log file & the checkpoints directory. if XXX is 'debug', checkpoints will not be saved
    # -l (--lr): initial learning rate
    # -b (--batch_size): batch size
    # -p (--path): if used, load model checkpoint from the given path
    # -e (--epochs): number of epochs in training
    # -t (--train_data): path of the training data (.npz file) 
    # -g (--gpus): ids of gpu
    # other model hyperparameters: modify the source .py files

Inference

  • convert input video (MP4 format) into npz (use the Python2 environment)

    cd src/video2npz
    sh video2npz.sh ../../videos/xxx.mp4
    • try resizing the video if this takes a long time
  • run model to generate .mid :

    python gen_midi_conditional.py -f "../inference/xxx.npz" -c "../exp/loss_8_params.pt"
    
    # -c: checkpoints to be loaded
    # -f: input npz file
    # -g: id of gpu (only one gpu is needed for inference) 
    • if using another training set, change decoder_n_class in gen_midi_conditional to the decoder_n_class in train.py
  • convert midi into audio: use GarageBand (recommended) or midi2audio

    • set tempo to the value of tempo in video2npz/metadata.json
  • combine original video and audio into video with BGM

    ffmpeg -i 'xxx.mp4' -i 'yyy.mp3' -c:v copy -c:a aac -strict experimental -map 0:v:0 -map 1:a:0 'zzz.mp4'
    
    # xxx.mp4: input video
    # yyy.mp3: audio file generated in the previous step
    # zzz.mp4: output video
Comments
  • Bugs encountered while using the inference code

    Bugs encountered while using the inference code "gen_midi_conditional.py" in "src/" folder

    Hi, I encountered some bugs while using the "gen_midi_conditional.py" code to generate midi files for a given video. I installed the Python 2 environment given the requirement file "py2_requirements.txt" and then used the "video2npz.sh" to produce a "xxx.npz" file for the given video. But I encountered some problems while using the "gen_midi_conditional.py" code, the program output and error report are pasted below:

    Command I used: python3 gen_midi_conditional.py -f ../inference/LGpwmBqJF1Q_HarryPotter2ChamberOfSecrets.npz -c ../exp/train_exp/loss_70_params.pt

    Code standard print: inference D_MODEL 512 N_LAYER 12 N_HEAD 8 DECODER ATTN causal-linear [18, 3, 18, 129, 18, 6, 27, 102, 5025] [*] load model from: ../exp/train_exp/loss_70_params.pt new song [vlog_npz matrix print here] ------ initiate ------ tensor([[[17, 1, 10, 0, 0, 0, 0, 1, 0]]])

    Error print: Traceback (most recent call last): File "gen_midi_conditional.py", line 104, in generate() File "gen_midi_conditional.py", line 85, in generate res, err_note_number_list, err_beat_number_list = net(is_train=False, vlog=vlog_npz, C=0.7) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs, **kwargs) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in call_impl return forward_call(*input, **kwargs) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 483, in forward return self.inference_from_scratch(**kwargs) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 341, in inference_from_scratch h, y_type = self.forward_hidden(input, is_training=False, init_token=pre_init) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 216, in forward_hidden init_emb_linear = self.forward_init_token(init_token) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/model.py", line 162, in forward_init_token emb_genre = self.init_emb_genre(x[..., 0]) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/group/30042/shansongliu/Projects/VideoMusicRecommend/VideoBGMGenerate/src_mm21_py2/utils.py", line 80, in forward return self.lut(x) * math.sqrt(self.d_model) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "/data/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/nn/functional.py", line 2183, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

    The inference code, trained model and data (including original video and processed .npz file) are attached in Google drive. Here is the link: https://drive.google.com/drive/folders/1Ch3jjxZrztKAtEvuEhGjxPk2-G0NSYe0?usp=sharing

    Could you help me check this? Really appreciate it.

    Best regards,

    opened by shansongliu 23
  • some question about 'encoder_hidden'

    some question about 'encoder_hidden'

    Hi, @wzk1015. Thank you for your great work. When I read model.py, I found that when calculating y_type, encoder_hidden[:, 7:, :] is needed as input to the linear layer, so why not use the whole encoder_hidden as input instead of encoder_hidden[:, 7:, :]? y_type = self.proj_type(encoder_hidden[:, 7:, :])

    opened by LqNoob 13
  • Error(s) in loading state_dict for DataParallel

    Error(s) in loading state_dict for DataParallel

    Hi!I have an error when running model to generate .mid (use the mm21_py3 environment) . I did not change the code to train except using epoch==1 and batchsize == 1. I do have to set batchsize to 8? Could you help me to solve this problem ??~

    Traceback (most recent call last):
      File "gen_midi_conditional.py", line 102, in <module>
        generate()
      File "gen_midi_conditional.py", line 58, in generate
        net.load_state_dict(torch.load(path_saved_ckpt))
      File "/root/miniconda3/envs/mm21_py3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for DataParallel:
            size mismatch for module.init_emb_genre.lut.weight: copying a param with shape torch.Size([1, 64]) from checkpoint, the shape in current model is torch.Size([7, 64]).
            size mismatch for module.init_emb_instrument.lut.weight: copying a param with shape torch.Size([1, 64]) from checkpoint, the shape in current model is torch.Size([6, 64]).
    
    opened by yxt7979 8
  • User defined features

    User defined features

    Dear author, I've successfully run the colab, but is there any cell defines genre and instruments you mentioned in paper? Or the colab just random selected it?

    opened by yellow946821 4
  • What are those parameters in matching score calculation?

    What are those parameters in matching score calculation?

    Hi, I'm studying your paper, and congratulations for the excellent work!

    I got some detailed questions:

    • Specifically, what are those input parameters in calculating matching score (equation 16)?
    • Why do you need to add a "1()" function for the strength.
    • If the video's strength is coming from the visual beat, how do you handle the different value ranges of simu-note strength and the saliency?
    • Is the code corresponding to this part released, and if so, where is it?

    Thank you!

    opened by btyu 3
  • The role of 'processed_mid'

    The role of 'processed_mid'

    Hi, The function midi_to_mp3 in src/midi2mp3.py includes a variable named 'processed_mid', which in my opinion converts the raw midi to the specified tempo in video2npz/metadate.json. However, the following fs.midi_to_audio function still picks the original midi_path instead of the processed_mid to generate the mp3 music, and I cannot find any usage of processed_mid in the following codes. Therefore, I wonder what is the role of the processed_mid, and if the variable in the fs.midi_to_audio should be processed_mid rather than midi_path. Thanks

    opened by JustinYuu 2
  • Google colab error

    Google colab error

    image Hello author, I've tried to run the colab you shared, but when I met this part, there's some problem like the picture. What's wrong with it? I just follow the step and run. Sincerely, William.

    opened by yellow946821 2
  • TypeError: 'NoneType' object is not callable

    TypeError: 'NoneType' object is not callable

    It runs well with the code in 'Quick-start' from https://github.com/idiap/fast-transformers. But I meet this error when running the 'train.py' with no modification:

    name: train_default
    args Namespace(batch_size='1', epochs=200, gpus=None, lr=0.0001, name='train_default', path=None, train_data='../dataset/lpd_5_prcem_mix_v8_10000.npz')
    num of encoder classes: [  18    3   18  129   18    6   20  102 4865] [7, 1, 6]
    D_MODEL 512  N_LAYER 12  N_HEAD 8 DECODER ATTN causal-linear
    >>>>>: [  18    3   18  129   18    6   20  102 4865]
    DEVICE COUNT: 1
    VISIBLE: 0
    n_parameters: 39,006,324
        train_data: dataset
        batch_size: 1
        num_batch: 3039
        train_x: (3039, 9999, 9)
        train_y: (3039, 9999, 9)
        train_mask: (3039, 9999)
        lr_init: 0.0001
        DECAY_EPOCH: []
        DECAY_RATIO: 0.1
    Traceback (most recent call last):
      File "train.py", line 226, in <module>
        train_dp()
      File "train.py", line 169, in train_dp
        losses = net(is_train=True, x=batch_x, target=batch_y, loss_mask=batch_mask, init_token=batch_init)
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
        return self.module(*inputs[0], **kwargs[0])
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 482, in forward
        return self.train_forward(**kwargs)
      File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 450, in train_forward
        h, y_type = self.forward_hidden(x, memory=None, is_training=True, init_token=init_token)
      File "/home/jovyan/work/test-mm21/video-bgm-generation/src/model.py", line 221, in forward_hidden
        encoder_hidden = self.transformer_encoder(encoder_pos_emb, attn_mask)
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/fast_transformers/transformers.py", line 138, in forward
        x = layer(x, attn_mask=attn_mask, length_mask=length_mask)
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/fast_transformers/transformers.py", line 81, in forward
        key_lengths=length_mask
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/attention_layer.py", line 109, in forward
        key_lengths
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/causal_linear_attention.py", line 101, in forward
        values
      File "/opt/conda/lib/python3.6/site-packages/fast_transformers/attention/causal_linear_attention.py", line 23, in causal_linear
        V_new = causal_dot_product(Q, K, V)
      File "/opt/conda/lib/python3.6/site-packages/fast_transformers/causal_product/__init__.py", line 48, in forward
        product
    TypeError: 'NoneType' object is not callable
    
    opened by xxmlala 2
  • TypeError: 'NoneType' object is not callable

    TypeError: 'NoneType' object is not callable

    Hello,according to the py3_requirements.txt, I set up the pytorch-1.9.1 environment. But when I tried to run train.py, it returned TypeError. The details are as follows. If you can give me some suggestions, I would be very grateful. name: args Namespace(batch_size=2, epochs=200, gpus=None, lr=0.0001, name='', path=None, train_data='../dataset/lpd_5_prcem_mix_v8_10000.npz') num of encoder classes: [ 18 3 18 129 18 6 20 102 4865] [1 1 1] D_MODEL 512 N_LAYER 12 N_HEAD 8 DECODER ATTN causal-linear

    : [ 18 3 18 129 18 6 20 102 4865] DEVICE COUNT: 2 VISIBLE: 0,1 n_parameters: 39,005,620 train_data: dataset batch_size: 2 num_batch: 1519 train_x: (3039, 9999, 9) train_y: (3039, 9999, 9) train_mask: (3039, 9999) lr_init: 0.0001 DECAY_EPOCH: [] DECAY_RATIO: 0.1 Traceback (most recent call last): File "train.py", line 219, in train_dp() File "train.py", line 162, in train_dp losses = net(is_train=True, x=batch_x, target=batch_y, loss_mask=batch_mask, init_token=batch_init) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/CODE/video-bgm-generation/src/model.py", line 482, in forward return self.train_forward(**kwargs) File "/home/bing/CODE/video-bgm-generation/src/model.py", line 450, in train_forward h, y_type = self.forward_hidden(x, memory=None, is_training=True, init_token=init_token) File "/home/bing/CODE/video-bgm-generation/src/model.py", line 221, in forward_hidden encoder_hidden = self.transformer_encoder(encoder_pos_emb, attn_mask) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/transformers.py", line 138, in forward x = layer(x, attn_mask=attn_mask, length_mask=length_mask) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/transformers.py", line 81, in forward key_lengths=length_mask File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/attention_layer.py", line 109, in forward key_lengths File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/causal_linear_attention.py", line 101, in forward values File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/attention/causal_linear_attention.py", line 23, in causal_linear V_new = causal_dot_product(Q, K, V) File "/home/bing/anaconda3/envs/torch-1.9/lib/python3.7/site-packages/fast_transformers/causal_product/init.py", line 48, in forward product TypeError: 'NoneType' object is not callable

    Looking forward to your reply!

    opened by binghuang21 2
  • LPD 5 midi2numpy issue

    LPD 5 midi2numpy issue

    Hi, I'm so interested in your work and successfully train it in your proposed 'lpd_5_prcem_mix_v8_10000.npz' and 'loss_8_params.pt'. Moreover, I'd like to have a try on new datasets and firstly I conducted "midi2numpy_mix.py" script on original LPD-5 cleaned midi files, but I just obtained the problem as follows:

    python midi2numpy_mix.py --midi_dir /home/video-bgm-generation-main/dataset/clean_midi/Zero --out_name data.npz 0%| | 0/39 [00:00<?, ?it/s] Traceback (most recent call last): File "midi2numpy_mix.py", line 243, in midi2numpy(id_list) File "midi2numpy_mix.py", line 196, in midi2numpy midi = MIDI(id) File "midi2numpy_mix.py", line 149, in init self.bars = self._get_bars() File "midi2numpy_mix.py", line 156, in _get_bars bars[note.bar].append(note) AttributeError: 'Note' object has no attribute 'bar'

    Thus I feel a little confused about how to exactly get your propose npz training data. Looking forward to your help! Thank you~~

    opened by 2000222 2
  • Objective Evaluation code

    Objective Evaluation code

    Hello author,how to evaluate the effect of generated music?Is there any open source code implementation of the three Objective Evaluation method mentioned in the paper?

    1. Pitch Class Histogram Entropy
    2. Grooving Pattern Similarity
    3. Structureness Indicator
    opened by borishanzju 1
  • Objective Evaluation

    Objective Evaluation

    Hello author, I adopt “https://github.com/slSeanWU/MusDr” to evaluate the generated MIDI. But after I process the midi file to get the pickle file, the MusDr warehouse also needs the corresponding csv file to get the objective evaluation value. How do you convert the processed pickle file to csv?

    opened by borishanzju 2
Owner
Zhaokai Wang
Undergraduate student from Beihang University
Zhaokai Wang
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
Changing the Mind of Transformers for Topically-Controllable Language Generation

We will first introduce the how to run the IPython notebook demo by downloading our pretrained models. Then, we will introduce how to run our training and evaluation code.

IESL 20 Dec 6, 2022
An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

Text2Event An implementation for Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction Please contact Yaojie Lu (@

Roger 153 Jan 7, 2023
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 9, 2023
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

Ren Yurui 81 Sep 25, 2021
[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022) This repository provides the official PyTorch impleme

Billy XU 128 Jan 3, 2023
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

DPT This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and model

CASIA-IVA-Lab 111 Dec 21, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 203 Dec 31, 2022
Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo Block diagram of FCL-taco2, where the decode

Disong Wang 39 Sep 28, 2022
Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Contrast and Mix (CoMix) The repository contains the codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Backgroun

Computer Vision and Intelligence Research (CVIR) 13 Dec 10, 2022
[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Joint Implicit Image Function for Guided Depth Super-Resolution This repository contains the code for: Joint Implicit Image Function for Guided Depth

hawkey 78 Dec 27, 2022
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

Yunan Zhu 23 Nov 5, 2022
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

xindong zhang 236 Dec 26, 2022
[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Installation pip install -r requirements.txt Dataset Preparation Given the

Yingchen Yu 25 Nov 9, 2022
Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021.

Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021. Figure 1: In the process of motion capture (mocap), some joints or even the whole human

Shinny cui 3 Oct 31, 2022
[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Transformer for Image Colorization This is an implemention for Yes, "Attention Is All You Need", for Exemplar based Colorization, and the current soft

Wang Yin 30 Dec 7, 2022