MoCoGAN: Decomposing Motion and Content for Video Generation

Overview

MoCoGAN: Decomposing Motion and Content for Video Generation

This repository contains an implementation and further details of MoCoGAN: Decomposing Motion and Content for Video Generation by Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz.

CVPR Poster:

Representation

MoCoGAN is a generative model for videos, which generates videos from random inputs. It features separated representations of motion and content, offering control over what is generated. For example, MoCoGAN can generate the same object performing different actions, as well as the same action performed by different objects

MoCoGAN Representation

Examples of generated videos

We trained MoCoGAN on the MUG Facial Expression Database to generate facial expressions. When fixing the content code and changing the motion code, it generated the same person performs different expressions. When fixing the motion code and changing the content code, it generated different people performs the same expression. In the figure shown below, each column has fixed identity, each row shows the same action:

Facial expressions

We trained MoCoGAN on a human action dataset where content is represented by the performer, executing several actions. When fixing the content code and changing the motion code, it generated the same person performs different actions. When fixing the motion code and changing the content code, it generated different people performs the same action. Each pair of images represents the same action executed by different people:

Human actions

We have collected a large-scale TaiChi dataset including 4.5K videos of TaiChi performers. Below are videos generated by MoCoGAN.

TaiChi

Training MoCoGAN

Please refer to a wiki page

Citation

If you use MoCoGAN in your research please cite our paper:

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, "MoCoGAN: Decomposing Motion and Content for Video Generation"

@inproceedings{Tulyakov:2018:MoCoGAN,
 title={{MoCoGAN}: Decomposing motion and content for video generation},
 author={Tulyakov, Sergey and Liu, Ming-Yu and Yang, Xiaodong and Kautz, Jan},
 booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 pages = {1526--1535},
 year={2018}
}

Other implementations:

  1. Alternative pytorch implementation
  2. Chainer implementation
Comments
  • Inception Score on UCF101

    Inception Score on UCF101

    Hi,

    I am trying to reproduce the Inception score results on UCF101 dataset. Could you please point out, which model and parameters(number of generated videos, splits) were used for stated result? Did you use implementation from TGAN paper or other repository?

    Thanks in advance!

    opened by VladYushchenko 9
  • A Question regarding generate_videos.py

    A Question regarding generate_videos.py

    Dear the author of MocoGAN:

    I am deeply impressed about your fantastic work. And I really appreciate that you've opened the source code of this project.

    I have a small problem when using generate_video.py file. After I trained the model and run,

    "python generate_videos.py --num_videos 10 --output_format gif --number_of_frames 16 ../logs/actions/generator_21700.pytorch output"

    The following error occurrs:


    Traceback (most recent call last): File "generate_videos.py", line 61, in v, _ = generator.sample_videos(1, int(args['--number_of_frames'])) File "/mocogan/src/models.py", line 268, in sample_videos z, z_category_labels = self.sample_z_video(num_samples, video_len) File "/mocogan/src/models.py", line 259, in sample_z_video z_motion = self.sample_z_m(num_samples, video_len) File "/mocogan/src/models.py", line 224, in sample_z_m h_t.append(self.recurrent(e_t, h_t[-1])) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 682, in forward self.bias_ih, self.bias_hh, File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py", line 49, in GRUCell gi = F.linear(input, w_ih) File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 555, in linear output = input.matmul(weight.t()) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 560, in matmul return torch.matmul(self, other) File "/usr/local/lib/python2.7/dist-packages/torch/functional.py", line 173, in matmul return torch.mm(tensor1, tensor2) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 579, in mm return Addmm.apply(output, self, matrix, 0, 1, True) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/blas.py", line 26, in forward matrix1, matrix2, out=output) TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor), but expected one of:

    • (torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
    • (torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
    • (float beta, torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
    • (torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
    • (float beta, torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
    • (torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
    • (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out) didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)
    • (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out) didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)

    2017-10-25 11 33 19

    I think there must be some mistakes I made, but could you look into it give me any clue?

    opened by TheIllusion 7
  • got segmentation fault when tried to run train .py

    got segmentation fault when tried to run train .py

    I follow the steps from wiki and build the environment manually. I am not using docker. I got segmentation fault (core dump) error while running this command python train.py
    --image_batch 32
    --video_batch 32
    --use_infogan
    --use_noise
    --noise_sigma 0.1
    --image_discriminator PatchImageDiscriminator
    --video_discriminator CategoricalVideoDiscriminator
    --print_every 100
    --every_nth 2
    --dim_z_content 50
    --dim_z_motion 10
    --dim_z_category 4
    ../data/actions ../logs/actions

    Can you please help me regard this

    opened by maniyar2jaimin 6
  • Got segmentation fault(core dumped) when tried to run train.py

    Got segmentation fault(core dumped) when tried to run train.py

    I've not used docker and installed dependencies by pip as instructed in the wiki. Actually, the error is coming in loggers file while importing tensorflow. I found out this while debugging this error. Please provide a solution @sergeytulyakov Segmentation fault (core dumped) train.py calls trainers.py which calls loggers.py. #6

    opened by vipulbjj 5
  • RuntimeError: arguments are located on different GPUs

    RuntimeError: arguments are located on different GPUs

    Hi, I was running your code on my GPUs but the error occurred. I tried to set one GPU but the problem still exists. I was wondering if you know how to solve that?

    opened by Fanny-Yuan 4
  • Code for Image-to-video Translation

    Code for Image-to-video Translation

    Dear author,

    You have mentioned that the mocogan model can be adapted to the image-to-video task. Would you mind to share your implementation?

    Thank you so much

    opened by zyong812 3
  • Issue on - executing (MoCoGAN Paper)

    Issue on - executing (MoCoGAN Paper)

    Hi, I am using - Python 3.6.1 :: Anaconda custom (64-bit) and Ubuntu 14.04.5 LTS. When i am trying to exexute the code at - https://github.com/sergeytulyakov/mocogan and steps : https://github.com/sergeytulyakov/mocogan/wiki/Training-MoCoGAN

    I am getting below error while executing the code. Would you please help ?

    Training:

    • executed below command from command line python train.py
      --image_batch 32
      --video_batch 32
      --use_infogan
      --use_noise
      --noise_sigma 0.1
      --image_discriminator PatchImageDiscriminator
      --video_discriminator CategoricalVideoDiscriminator
      --print_every 100
      --every_nth 2
      --dim_z_content 50
      --dim_z_motion 10
      --dim_z_category 4
      ../data/actions ../logs/actions ############################################################### shiba@shiba:~/Downloads/mocogan-master/src$ python train.py \
    --image_batch 32 \
    --video_batch 32 \
    --use_infogan \
    --use_noise \
    --noise_sigma 0.1 \
    --image_discriminator PatchImageDiscriminator \
    --video_discriminator CategoricalVideoDiscriminator \
    --print_every 100 \
    --every_nth 2 \
    --dim_z_content 50 \
    --dim_z_motion 10 \
    --dim_z_category 4 \
    ../data/actions ../logs/actions
    

    /home/shiba/anaconda3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds) {'--batches': '100000', '--dim_z_category': '4', '--dim_z_content': '50', '--dim_z_motion': '10', '--every_nth': '2', '--image_batch': '32', '--image_dataset': '', '--image_discriminator': 'PatchImageDiscriminator', '--image_size': '64', '--n_channels': '3', '--noise_sigma': '0.1', '--print_every': '100', '--use_categories': False, '--use_infogan': True, '--use_noise': True, '--video_batch': '32', '--video_discriminator': 'CategoricalVideoDiscriminator', '--video_length': '16', '': '../data/actions', '<log_folder>': '../logs/actions'} Traceback (most recent call last): File "train.py", line 104, in dataset = data.VideoFolderDataset(args[''], cache=os.path.join(args[''], 'local.db')) File "/home/shiba/Downloads/mocogan-master/src/data.py", line 24, in init self.images, self.lengths = pickle.load(f) TypeError: a bytes-like object is required, not 'str' *** Error in `python': double free or corruption (!prev): 0x0000000000bfda20 *** Aborted (core dumped) shiba@shiba:~/Downloads/mocogan-master/src$

    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    opened by ShibaPrasad 3
  • Dataset Formatter

    Dataset Formatter

    Is there any code that converts an mp4 (or other video format) to the 2-dimensional jpgs that are input for training that could be included or referenced?

    opened by jhawgs 2
  • Invariable Image Size

    Invariable Image Size

    I have been working with the model, and I am trying to generate images of size 128x128. I changed the --image_size option to 128. For reference, here is the full command.

    $ python3 train.py  \
          --image_batch 32 \
          --video_batch 32 \
          --use_noise \
          --noise_sigma 0.1 \
          --image_discriminator PatchImageDiscriminator \
          --video_discriminator PatchVideoDiscriminator \
          --print_every 100 \
          --every_nth 2 \
          --dim_z_content 50 \
          --dim_z_motion 10 --image_size 128 \
          ../data/fb-128 ../logs/fb-2
    

    The initial output is the following, which verifies that the option was acknowledged by the program.

    {'--batches': '100000',
     '--dim_z_category': '6',
     '--dim_z_content': '50',
     '--dim_z_motion': '10',
     '--every_nth': '2',
     '--image_batch': '32',
     '--image_dataset': '',
     '--image_discriminator': 'PatchImageDiscriminator',
     '--image_size': '128',
     '--n_channels': '3',
     '--noise_sigma': '0.1',
     '--print_every': '100',
     '--use_categories': False,
     '--use_infogan': False,
     '--use_noise': True,
     '--video_batch': '32',
     '--video_discriminator': 'PatchVideoDiscriminator',
     '--video_length': '16',
     '<dataset>': '../data/fb-128',
     '<log_folder>': '../logs/fb-2'}
    

    The program then runs, but doesn't produce images of size 128x128 and continues to create images of size 64x64. Additionally, saved models show no increase in size, contrary to the expected increase in response to a larger output size. I have traced the bug to the model definitions, specifically the following lines.

    self.main = nn.Sequential(
                nn.ConvTranspose2d(dim_z, ngf * 8, 4, 1, 0, bias=False),
                nn.BatchNorm2d(ngf * 8),
                nn.ReLU(True),
                nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
                nn.BatchNorm2d(ngf * 4),
                nn.ReLU(True),
                nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
                nn.BatchNorm2d(ngf * 2),
                nn.ReLU(True),
                nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
                nn.BatchNorm2d(ngf),
                nn.ReLU(True),
                nn.ConvTranspose2d(ngf, self.n_channels, 4, 2, 1, bias=False),
                nn.Tanh()
            )
    

    Note: this is only the generator definition. I would expect that each of the discriminators would also need a change analogous to what might help here.

    I have tried several different resolution methods including changing n_channels, but to no avail. I just can't seem to find the point from which the arbitrary size of 64x64 originates. That does exclude that 64 is the product of 8, 4, 2, and 1, which are the coefficients of ngf in each output size, but I don't see how that affects the final output size of n_channels.

    Although I would love to see a fix, if anybody knows where 64x64 comes from, I can do more poking, and probably find a solution myself.

    opened by jhawgs 1
  • Video generation ffmpeg error

    Video generation ffmpeg error

    Hi,

    I get the following ffmpeg error when trying to generate a video.

    ffmpeg version 2.8.11-0ubuntu0.16.04.1 Copyright (c) 2000-2017 the FFmpeg developers
      built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 20160609
      configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
      libavutil      54. 31.100 / 54. 31.100
      libavcodec     56. 60.100 / 56. 60.100
      libavformat    56. 40.101 / 56. 40.101
      libavdevice    56.  4.100 / 56.  4.100
      libavfilter     5. 40.101 /  5. 40.101
      libavresample   2.  1.  0 /  2.  1.  0
      libswscale      3.  1.101 /  3.  1.101
      libswresample   1.  2.101 /  1.  2.101
      libpostproc    53.  3.100 / 53.  3.100
    Input #0, rawvideo, from 'pipe:':
      Duration: N/A, start: 0.000000, bitrate: 786 kb/s
        Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 64x64, 786 kb/s, 8 tbr, 8 tbn, 8 tbc
    [swscaler @ 0x751940] deprecated pixel format used, make sure you did set range correctly
    [gif @ 0x740420] GIF muxer supports only a single video GIF stream.
    Output #0, gif, to '../data/0.gif':
      Metadata:
        encoder         : Lavf56.40.101
        Stream #0:0: Video: mjpeg, yuvj444p(pc), 64x64, q=2-31, 200 kb/s, 8 fps, 8 tbn, 8 tbc
        Metadata:
          encoder         : Lavc56.60.100 mjpeg
    Stream mapping:
      Stream #0:0 -> #0:0 (rawvideo (native) -> mjpeg (native))
    Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
    
    opened by medhini 1
  • Video generation error

    Video generation error

    Hi, after getting this error when executing the generate_videos.py script:

    root@508aee39a995:/mocogan/src# python generate_videos.py ../logs/dances/generator_100000.pytorch ../output 
    Traceback (most recent call last):
      File "generate_videos.py", line 61, in <module>
        v, _ = generator.sample_videos(1, int(args['--number_of_frames']))
      File "/mocogan/src/models.py", line 268, in sample_videos
        z, z_category_labels = self.sample_z_video(num_samples, video_len)
      File "/mocogan/src/models.py", line 259, in sample_z_video
        z_motion = self.sample_z_m(num_samples, video_len)
      File "/mocogan/src/models.py", line 224, in sample_z_m
        h_t.append(self.recurrent(e_t, h_t[-1]))
      File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 682, in forward
        self.bias_ih, self.bias_hh,
      File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py", line 49, in GRUCell
        gi = F.linear(input, w_ih)
      File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 555, in linear
        output = input.matmul(weight.t())
      File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 560, in matmul
        return torch.matmul(self, other)
      File "/usr/local/lib/python2.7/dist-packages/torch/functional.py", line 173, in matmul
        return torch.mm(tensor1, tensor2)
      File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 579, in mm
        return Addmm.apply(output, self, matrix, 0, 1, True)
      File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/blas.py", line 26, in forward
        matrix1, matrix2, out=output)
    TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor), but expected one of:
     * (torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
     * (torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
     * (float beta, torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
     * (torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
     * (float beta, torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
     * (torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
     * (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
          didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)
     * (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
          didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)
    

    Any ideas?

    opened by velascoluis 1
  • ValueError: Expected target size (16, 16, 256, 256), got torch.Size([16])

    ValueError: Expected target size (16, 16, 256, 256), got torch.Size([16])

    Hi,I'm training my model and I set the batch_size, image_batch, video_batch all equal to 16 but I met that problem. The problem occurs at: File "/home/ydj/MoCoGAN/trainers.py", line 268, in train self.video_batch_size, use_categories=self.use_categories) File "/home/ydj/MoCoGAN/trainers.py", line 180, in train_discriminator l_discriminator += self.category_criterion(real_categorical.squeeze(), categories_gt.long()) Is anyone know how to solve that?

    opened by Fanny-Yuan 0
  • Future frame prediction

    Future frame prediction

    Hi, according to the paper, you also did experiments with a variant of MoCoGAN on future frame prediction, I am interested in how that variant is constructed. Is the detail available to be released? Thank you!

    opened by bhdeng 0
  • EOFError: Ran out of input

    EOFError: Ran out of input

    I try to use it in Python3.

    However, the error is reported:

    python train.py --image_batch 32 --video_batch 32 --use_infogan --use_noise --noise_sigma 0.1 --image_discriminator PatchImageDiscriminator --video_discriminator CategoricalVideoDiscriminator --print_every 100 --every_nth 2 --dim_z_content 50 --dim_z_motion 10 --dim_z_category 4 /slow/junyan/VideoSynthesis/mocogan/data/actions logs/actions {'--batches': '100000', '--dim_z_category': '4', '--dim_z_content': '50', '--dim_z_motion': '10', '--every_nth': '2', '--image_batch': '32', '--image_dataset': '', '--image_discriminator': 'PatchImageDiscriminator', '--image_size': '64', '--n_channels': '3', '--noise_sigma': '0.1', '--print_every': '100', '--use_categories': False, '--use_infogan': True, '--use_noise': True, '--video_batch': '32', '--video_discriminator': 'CategoricalVideoDiscriminator', '--video_length': '16', '': '/slow/junyan/VideoSynthesis/mocogan/data/actions', '<log_folder>': 'logs/actions'} /root/anaconda3/lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead. "please use transforms.Resize instead.") /slow/junyan/VideoSynthesis/mocogan/data/actions/local.db Traceback (most recent call last): File "train.py", line 104, in dataset = data.VideoFolderDataset(args[''], cache=os.path.join(args[''], 'local.db')) File "/slow/junyan/VideoSynthesis/mocogan/src/data.py", line 24, in init print(pickle.load(f)) EOFError: Ran out of input

    Here is the code

    class VideoFolderDataset(torch.utils.data.Dataset):
        def __init__(self, folder, cache, min_len=32):
            dataset = ImageFolder(folder)
            self.total_frames = 0
            self.lengths = []
            self.images = []
            print(cache)
            if cache is not None and os.path.exists(cache):
                with open(cache, 'rb') as f:
                    print(pickle.load(f))
            else:
                for idx, (im, categ) in enumerate(
                        tqdm.tqdm(dataset, desc="Counting total number of frames")):
                    img_path, _ = dataset.imgs[idx]
                    shorter, longer = min(im.width, im.height), max(im.width, im.height)
                    length = longer // shorter
                    if length >= min_len:
                        self.images.append((img_path, categ))
                        self.lengths.append(length)
    
                if cache is not None:
                    with open(cache, 'wb') as f:
                        pickle.dump((self.images, self.lengths), f)
    
            self.cumsum = np.cumsum([0] + self.lengths)
            print("Total number of frames {}".format(np.sum(self.lengths)))
    
    opened by momo1986 3
  • Problem in running generate_videos.py

    Problem in running generate_videos.py

    #3

    After doing the ffmpeg thing as told by you in the issue referred above, I'm getting this error

    Attached screenshot of the error https://drive.google.com/file/d/13AuoobWDDfAEC4yNQQpiRVllvu-NRjwt/view?usp=sharing @sergeytulyakov Please help.

    opened by vipulbjj 2
Owner
Sergey Tulyakov
Sergey Tulyakov
The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

DanceNet3D The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. Dataset and Results Pleas

南嘉Nanga 36 Dec 21, 2022
Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

MotionCLIP Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space". Please visit our webpage for mor

Guy Tevet 173 Dec 26, 2022
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

null 1 Jan 23, 2022
Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Video Autoencoder: self-supervised disentanglement of 3D structure and motion This repository contains the code (in PyTorch) for the model introduced

null 157 Dec 22, 2022
[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

Jinpeng Wang 114 Oct 16, 2022
MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

MAU (NeurIPS2021) Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xinguang Xiang, Wen GAo. Official PyTorch Code for "MAU: A Motion-Aware

ZhengChang 20 Nov 25, 2022
The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

Jinkun Cao 325 Jan 5, 2023
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 3, 2023
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

Junheum Park 86 Dec 28, 2022
Measures input lag without dedicated hardware, performing motion detection on recorded or live video

What is InputLagTimer? This tool can measure input lag by analyzing a video where both the game controller and the game screen can be seen on a webcam

Bruno Gonzalez 4 Aug 18, 2022
A motion tracking system for any arbitaray points in a video frame.

PointTracking This code is written by Majid Masoumi @ [email protected] I have used lucas kanade optical flow technique to track the points b

Dr. Majid Masoumi 1 Feb 9, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

OpenMMLab 2.7k Jan 7, 2023
VideoGPT: Video Generation using VQ-VAE and Transformers

VideoGPT: Video Generation using VQ-VAE and Transformers [Paper][Website][Colab][Gradio Demo] We present VideoGPT: a conceptually simple architecture

Wilson Yan 470 Dec 30, 2022
Playable Video Generation

Playable Video Generation Playable Video Generation Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci Paper: ArX

Willi Menapace 136 Dec 31, 2022
Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 503 Jan 4, 2023