MoCoGAN: Decomposing Motion and Content for Video Generation

Sergey Tulyakov

Last update: Dec 18, 2022

Related tags

Overview

MoCoGAN: Decomposing Motion and Content for Video Generation

This repository contains an implementation and further details of MoCoGAN: Decomposing Motion and Content for Video Generation by Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz.

CVPR Poster:

Representation

MoCoGAN is a generative model for videos, which generates videos from random inputs. It features separated representations of motion and content, offering control over what is generated. For example, MoCoGAN can generate the same object performing different actions, as well as the same action performed by different objects

Examples of generated videos

We trained MoCoGAN on the MUG Facial Expression Database to generate facial expressions. When fixing the content code and changing the motion code, it generated the same person performs different expressions. When fixing the motion code and changing the content code, it generated different people performs the same expression. In the figure shown below, each column has fixed identity, each row shows the same action:

We trained MoCoGAN on a human action dataset where content is represented by the performer, executing several actions. When fixing the content code and changing the motion code, it generated the same person performs different actions. When fixing the motion code and changing the content code, it generated different people performs the same action. Each pair of images represents the same action executed by different people:

We have collected a large-scale TaiChi dataset including 4.5K videos of TaiChi performers. Below are videos generated by MoCoGAN.

Training MoCoGAN

Please refer to a wiki page

Citation

If you use MoCoGAN in your research please cite our paper:

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, "MoCoGAN: Decomposing Motion and Content for Video Generation"

@inproceedings{Tulyakov:2018:MoCoGAN,
 title={{MoCoGAN}: Decomposing motion and content for video generation},
 author={Tulyakov, Sergey and Liu, Ming-Yu and Yang, Xiaodong and Kautz, Jan},
 booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 pages = {1526--1535},
 year={2018}
}

Other implementations:

Comments

Inception Score on UCF101

Hi,

I am trying to reproduce the Inception score results on UCF101 dataset. Could you please point out, which model and parameters(number of generated videos, splits) were used for stated result? Did you use implementation from TGAN paper or other repository?

Thanks in advance!

opened by VladYushchenko 9
A Question regarding generate_videos.py
Dear the author of MocoGAN:

I am deeply impressed about your fantastic work. And I really appreciate that you've opened the source code of this project.

I have a small problem when using generate_video.py file. After I trained the model and run,

"python generate_videos.py --num_videos 10 --output_format gif --number_of_frames 16 ../logs/actions/generator_21700.pytorch output"

The following error occurrs:

Traceback (most recent call last): File "generate_videos.py", line 61, in v, _ = generator.sample_videos(1, int(args['--number_of_frames'])) File "/mocogan/src/models.py", line 268, in sample_videos z, z_category_labels = self.sample_z_video(num_samples, video_len) File "/mocogan/src/models.py", line 259, in sample_z_video z_motion = self.sample_z_m(num_samples, video_len) File "/mocogan/src/models.py", line 224, in sample_z_m h_t.append(self.recurrent(e_t, h_t[-1])) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 682, in forward self.bias_ih, self.bias_hh, File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py", line 49, in GRUCell gi = F.linear(input, w_ih) File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 555, in linear output = input.matmul(weight.t()) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 560, in matmul return torch.matmul(self, other) File "/usr/local/lib/python2.7/dist-packages/torch/functional.py", line 173, in matmul return torch.mm(tensor1, tensor2) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 579, in mm return Addmm.apply(output, self, matrix, 0, 1, True) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/blas.py", line 26, in forward matrix1, matrix2, out=output) TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor), but expected one of:

(torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)

(torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)

(float beta, torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)

(torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)

(float beta, torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)

(torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)

(float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out) didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)

(float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out) didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)

I think there must be some mistakes I made, but could you look into it give me any clue?
opened by TheIllusion 7
got segmentation fault when tried to run train .py

I follow the steps from wiki and build the environment manually. I am not using docker. I got segmentation fault (core dump) error while running this command python train.py
--image_batch 32
--video_batch 32
--use_infogan
--use_noise
--noise_sigma 0.1
--image_discriminator PatchImageDiscriminator
--video_discriminator CategoricalVideoDiscriminator
--print_every 100
--every_nth 2
--dim_z_content 50
--dim_z_motion 10
--dim_z_category 4
../data/actions ../logs/actions

Can you please help me regard this

opened by maniyar2jaimin 6
Got segmentation fault(core dumped) when tried to run train.py

I've not used docker and installed dependencies by pip as instructed in the wiki. Actually, the error is coming in loggers file while importing tensorflow. I found out this while debugging this error. Please provide a solution @sergeytulyakov Segmentation fault (core dumped) train.py calls trainers.py which calls loggers.py. #6

opened by vipulbjj 5
RuntimeError: arguments are located on different GPUs

Hi, I was running your code on my GPUs but the error occurred. I tried to set one GPU but the problem still exists. I was wondering if you know how to solve that?

opened by Fanny-Yuan 4
Code for Image-to-video Translation

Dear author,

You have mentioned that the mocogan model can be adapted to the image-to-video task. Would you mind to share your implementation?

Thank you so much

opened by zyong812 3
Issue on - executing (MoCoGAN Paper)
Hi, I am using - Python 3.6.1 :: Anaconda custom (64-bit) and Ubuntu 14.04.5 LTS. When i am trying to exexute the code at - https://github.com/sergeytulyakov/mocogan and steps : https://github.com/sergeytulyakov/mocogan/wiki/Training-MoCoGAN

I am getting below error while executing the code. Would you please help ?

Training:

executed below command from command line python train.py
--image_batch 32
--video_batch 32
--use_infogan
--use_noise
--noise_sigma 0.1
--image_discriminator PatchImageDiscriminator
--video_discriminator CategoricalVideoDiscriminator
--print_every 100
--every_nth 2
--dim_z_content 50
--dim_z_motion 10
--dim_z_category 4
../data/actions ../logs/actions ############################################################### shiba@shiba:~/Downloads/mocogan-master/src$ python train.py \

--image_batch 32 \ --video_batch 32 \ --use_infogan \ --use_noise \ --noise_sigma 0.1 \ --image_discriminator PatchImageDiscriminator \ --video_discriminator CategoricalVideoDiscriminator \ --print_every 100 \ --every_nth 2 \ --dim_z_content 50 \ --dim_z_motion 10 \ --dim_z_category 4 \ ../data/actions ../logs/actions

/home/shiba/anaconda3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds) {'--batches': '100000', '--dim_z_category': '4', '--dim_z_content': '50', '--dim_z_motion': '10', '--every_nth': '2', '--image_batch': '32', '--image_dataset': '', '--image_discriminator': 'PatchImageDiscriminator', '--image_size': '64', '--n_channels': '3', '--noise_sigma': '0.1', '--print_every': '100', '--use_categories': False, '--use_infogan': True, '--use_noise': True, '--video_batch': '32', '--video_discriminator': 'CategoricalVideoDiscriminator', '--video_length': '16', '': '../data/actions', '<log_folder>': '../logs/actions'} Traceback (most recent call last): File "train.py", line 104, in dataset = data.VideoFolderDataset(args[''], cache=os.path.join(args[''], 'local.db')) File "/home/shiba/Downloads/mocogan-master/src/data.py", line 24, in init self.images, self.lengths = pickle.load(f) TypeError: a bytes-like object is required, not 'str' *** Error in `python': double free or corruption (!prev): 0x0000000000bfda20 *** Aborted (core dumped) shiba@shiba:~/Downloads/mocogan-master/src$

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
opened by ShibaPrasad 3
Dataset Formatter

Is there any code that converts an mp4 (or other video format) to the 2-dimensional jpgs that are input for training that could be included or referenced?

opened by jhawgs 2

Invariable Image Size

I have been working with the model, and I am trying to generate images of size 128x128. I changed the --image_size option to 128. For reference, here is the full command.

$ python3 train.py  \
      --image_batch 32 \
      --video_batch 32 \
      --use_noise \
      --noise_sigma 0.1 \
      --image_discriminator PatchImageDiscriminator \
      --video_discriminator PatchVideoDiscriminator \
      --print_every 100 \
      --every_nth 2 \
      --dim_z_content 50 \
      --dim_z_motion 10 --image_size 128 \
      ../data/fb-128 ../logs/fb-2

The initial output is the following, which verifies that the option was acknowledged by the program.

{'--batches': '100000',
 '--dim_z_category': '6',
 '--dim_z_content': '50',
 '--dim_z_motion': '10',
 '--every_nth': '2',
 '--image_batch': '32',
 '--image_dataset': '',
 '--image_discriminator': 'PatchImageDiscriminator',
 '--image_size': '128',
 '--n_channels': '3',
 '--noise_sigma': '0.1',
 '--print_every': '100',
 '--use_categories': False,
 '--use_infogan': False,
 '--use_noise': True,
 '--video_batch': '32',
 '--video_discriminator': 'PatchVideoDiscriminator',
 '--video_length': '16',
 '<dataset>': '../data/fb-128',
 '<log_folder>': '../logs/fb-2'}

The program then runs, but doesn't produce images of size 128x128 and continues to create images of size 64x64. Additionally, saved models show no increase in size, contrary to the expected increase in response to a larger output size. I have traced the bug to the model definitions, specifically the following lines.

self.main = nn.Sequential(
            nn.ConvTranspose2d(dim_z, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf, self.n_channels, 4, 2, 1, bias=False),
            nn.Tanh()
        )

Note: this is only the generator definition. I would expect that each of the discriminators would also need a change analogous to what might help here.

I have tried several different resolution methods including changing n_channels, but to no avail. I just can't seem to find the point from which the arbitrary size of 64x64 originates. That does exclude that 64 is the product of 8, 4, 2, and 1, which are the coefficients of ngf in each output size, but I don't see how that affects the final output size of n_channels.

Although I would love to see a fix, if anybody knows where 64x64 comes from, I can do more poking, and probably find a solution myself.

opened by jhawgs 1

Video generation ffmpeg error

Hi,

I get the following ffmpeg error when trying to generate a video.

ffmpeg version 2.8.11-0ubuntu0.16.04.1 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, rawvideo, from 'pipe:':
  Duration: N/A, start: 0.000000, bitrate: 786 kb/s
    Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 64x64, 786 kb/s, 8 tbr, 8 tbn, 8 tbc
[swscaler @ 0x751940] deprecated pixel format used, make sure you did set range correctly
[gif @ 0x740420] GIF muxer supports only a single video GIF stream.
Output #0, gif, to '../data/0.gif':
  Metadata:
    encoder         : Lavf56.40.101
    Stream #0:0: Video: mjpeg, yuvj444p(pc), 64x64, q=2-31, 200 kb/s, 8 fps, 8 tbn, 8 tbc
    Metadata:
      encoder         : Lavc56.60.100 mjpeg
Stream mapping:
  Stream #0:0 -> #0:0 (rawvideo (native) -> mjpeg (native))
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument

opened by medhini 1

Video generation error

Hi, after getting this error when executing the generate_videos.py script:

root@508aee39a995:/mocogan/src# python generate_videos.py ../logs/dances/generator_100000.pytorch ../output 
Traceback (most recent call last):
  File "generate_videos.py", line 61, in <module>
    v, _ = generator.sample_videos(1, int(args['--number_of_frames']))
  File "/mocogan/src/models.py", line 268, in sample_videos
    z, z_category_labels = self.sample_z_video(num_samples, video_len)
  File "/mocogan/src/models.py", line 259, in sample_z_video
    z_motion = self.sample_z_m(num_samples, video_len)
  File "/mocogan/src/models.py", line 224, in sample_z_m
    h_t.append(self.recurrent(e_t, h_t[-1]))
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 682, in forward
    self.bias_ih, self.bias_hh,
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py", line 49, in GRUCell
    gi = F.linear(input, w_ih)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 555, in linear
    output = input.matmul(weight.t())
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 560, in matmul
    return torch.matmul(self, other)
  File "/usr/local/lib/python2.7/dist-packages/torch/functional.py", line 173, in matmul
    return torch.mm(tensor1, tensor2)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 579, in mm
    return Addmm.apply(output, self, matrix, 0, 1, True)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/blas.py", line 26, in forward
    matrix1, matrix2, out=output)
TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor), but expected one of:
 * (torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
      didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)
 * (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
      didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)

Any ideas?

opened by velascoluis 1

ValueError: Expected target size (16, 16, 256, 256), got torch.Size([16])

Hi,I'm training my model and I set the batch_size, image_batch, video_batch all equal to 16 but I met that problem. The problem occurs at: File "/home/ydj/MoCoGAN/trainers.py", line 268, in train self.video_batch_size, use_categories=self.use_categories) File "/home/ydj/MoCoGAN/trainers.py", line 180, in train_discriminator l_discriminator += self.category_criterion(real_categorical.squeeze(), categories_gt.long()) Is anyone know how to solve that?

opened by Fanny-Yuan 0
Future frame prediction

Hi, according to the paper, you also did experiments with a variant of MoCoGAN on future frame prediction, I am interested in how that variant is constructed. Is the detail available to be released? Thank you!

opened by bhdeng 0
EOFError: Ran out of input
I try to use it in Python3.

However, the error is reported:

python train.py --image_batch 32 --video_batch 32 --use_infogan --use_noise --noise_sigma 0.1 --image_discriminator PatchImageDiscriminator --video_discriminator CategoricalVideoDiscriminator --print_every 100 --every_nth 2 --dim_z_content 50 --dim_z_motion 10 --dim_z_category 4 /slow/junyan/VideoSynthesis/mocogan/data/actions logs/actions {'--batches': '100000', '--dim_z_category': '4', '--dim_z_content': '50', '--dim_z_motion': '10', '--every_nth': '2', '--image_batch': '32', '--image_dataset': '', '--image_discriminator': 'PatchImageDiscriminator', '--image_size': '64', '--n_channels': '3', '--noise_sigma': '0.1', '--print_every': '100', '--use_categories': False, '--use_infogan': True, '--use_noise': True, '--video_batch': '32', '--video_discriminator': 'CategoricalVideoDiscriminator', '--video_length': '16', '': '/slow/junyan/VideoSynthesis/mocogan/data/actions', '<log_folder>': 'logs/actions'} /root/anaconda3/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead. "please use transforms.Resize instead.") /slow/junyan/VideoSynthesis/mocogan/data/actions/local.db Traceback (most recent call last): File "train.py", line 104, in dataset = data.VideoFolderDataset(args[''], cache=os.path.join(args[''], 'local.db')) File "/slow/junyan/VideoSynthesis/mocogan/src/data.py", line 24, in init print(pickle.load(f)) EOFError: Ran out of input

Here is the code

class VideoFolderDataset(torch.utils.data.Dataset): def __init__(self, folder, cache, min_len=32): dataset = ImageFolder(folder) self.total_frames = 0 self.lengths = [] self.images = [] print(cache) if cache is not None and os.path.exists(cache): with open(cache, 'rb') as f: print(pickle.load(f)) else: for idx, (im, categ) in enumerate( tqdm.tqdm(dataset, desc="Counting total number of frames")): img_path, _ = dataset.imgs[idx] shorter, longer = min(im.width, im.height), max(im.width, im.height) length = longer // shorter if length >= min_len: self.images.append((img_path, categ)) self.lengths.append(length) if cache is not None: with open(cache, 'wb') as f: pickle.dump((self.images, self.lengths), f) self.cumsum = np.cumsum([0] + self.lengths) print("Total number of frames {}".format(np.sum(self.lengths)))
opened by momo1986 3
Problem in running generate_videos.py

#3

After doing the ffmpeg thing as told by you in the issue referred above, I'm getting this error

Attached screenshot of the error https://drive.google.com/file/d/13AuoobWDDfAEC4yNQQpiRVllvu-NRjwt/view?usp=sharing @sergeytulyakov Please help.

opened by vipulbjj 2

Owner

Sergey Tulyakov

GitHub

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

DanceNet3D The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. Dataset and Results Pleas

36 Dec 21, 2022

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

MotionCLIP Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space". Please visit our webpage for mor

173 Dec 26, 2022

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Video Autoencoder: self-supervised disentanglement of 3D structure and motion This repository contains the code (in PyTorch) for the model introduced

157 Dec 22, 2022

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

114 Oct 16, 2022

MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

MAU (NeurIPS2021) Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xinguang Xiang, Wen GAo. Official PyTorch Code for "MAU: A Motion-Aware

20 Nov 25, 2022

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

325 Jan 5, 2023

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

76 Jan 3, 2023

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

86 Dec 28, 2022

Measures input lag without dedicated hardware, performing motion detection on recorded or live video

What is InputLagTimer? This tool can measure input lag by analyzing a video where both the game controller and the game screen can be seen on a webcam

4 Aug 18, 2022

A motion tracking system for any arbitaray points in a video frame.

PointTracking This code is written by Majid Masoumi @ [email protected] I have used lucas kanade optical flow technique to track the points b

1 Feb 9, 2022

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

27 Dec 20, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

23 Oct 17, 2022

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

2.7k Jan 7, 2023

VideoGPT: Video Generation using VQ-VAE and Transformers

VideoGPT: Video Generation using VQ-VAE and Transformers [Paper][Website][Colab][Gradio Demo] We present VideoGPT: a conceptually simple architecture

470 Dec 30, 2022

Playable Video Generation

Playable Video Generation Playable Video Generation Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci Paper: ArX

136 Dec 31, 2022

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

97 Dec 13, 2022

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

503 Jan 4, 2023

MoCoGAN: Decomposing Motion and Content for Video Generation

Related tags

Overview

MoCoGAN: Decomposing Motion and Content for Video Generation

Representation

Examples of generated videos

Training MoCoGAN

Citation

Other implementations:

Comments

Owner

Sergey Tulyakov

The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

Measures input lag without dedicated hardware, performing motion detection on recorded or live video

A motion tracking system for any arbitaray points in a video frame.

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Image-generation-baseline - MUGE Text To Image Generation Baseline

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

VideoGPT: Video Generation using VQ-VAE and Transformers

Playable Video Generation

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.