Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

Sihyun Yu

Last update: Dec 31, 2022

Related tags

Overview

DIGAN (ICLR 2022)

Official PyTorch implementation of "Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks" by Sihyun Yu*, Jihoon Tack*, Sangwoo Mo*, Hyunsu Kim, Junho Kim, Jung-Woo Ha, Jinwoo Shin.

TL;DR: We make video generation scalable leveraging implicit neural representations.

Illustration of the (a) generator and (b) discriminator of DIGAN. The generator creates a video INR weight from random content and motion vectors, which produces an image that corresponds to the input 2D grids {(x, y)} and time t. Two discriminators determine the reality of each image and motion (from a pair of images and their time difference), respectively.

1. Environment setup

conda create -n digan python=3.8
conda activate digan

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

pip install hydra-core==1.0.6
pip install tqdm scipy scikit-learn av ninja
pip install click gitpython requests psutil einops tensorboardX

2. Dataset

One should organize the video dataset as follows:

UCF-101

UCF-101
|-- train
    |-- class1
        |-- video1.avi
        |-- video2.avi
        |-- ...
    |-- class2
        |-- video1.avi
        |-- video2.avi
        |-- ...
    |-- ...

Dataset download

Link: UCF-101, Sky Time lapse, TaiChi-HD
For Kinetics-food dataset, read prepare_data/README.md

3. Training

To train the model, navigate to the project directory and run:

python src/infra/launch.py hydra.run.dir=. +experiment_name=<EXP_NAME> +dataset.name=<DATASET>

You may change training options via modifying configs/main.yml and configs/digan.yml.
Also the dataset list is as follows, <DATASET>: {UCF-101,sky,taichi,kinetics}

4. Evaluation (FVD and KVD)

python src/scripts/compute_fvd_kvd.py --network_pkl <MODEL_PATH> --data_path <DATA_PATH>

5. Video generation

Genrate and visualize videos (as gif and mp4):

python src/scripts/generate_videos.py --network_pkl <MODEL_PATH> --outdir <OUTPUT_PATH>

6. Results

Generated video results of DIGAN on TaiChi (top) and Sky (bottom) datasets.
More generated video results are available at the following site.

Citation

@inproceedings{
    yu2022generating,
    title={Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks},
    author={Yu, Sihyun and Tack, Jihoon and Mo, Sangwoo and Kim, Hyunsu and Kim, Junho and Ha, Jung-Woo and Shin, Jinwoo},
    booktitle={International Conference on Learning Representations},
    year={2022},
}

Reference

This code is mainly built upon StyleGAN2-ada and INR-GAN repositories.
We also used the code from following repositories: DiffAug, VideoGPT, MDGAN

Lisence

Copyright 2022-present NAVER Corp.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Comments

About FVD computing
Hello! I have two questions about FVD computing.

frechet_video_distance.py forces the generated sequence to be of length 16

fake = torch.cat([rearrange( G(z, c, timesteps=16, noise_mode='const')[0].clamp(-1, 1).cpu(), '(b t) c h w -> b t h w c', t=16) for z, c in zip(grid_z, grid_c)])

If I want to train DIGAN with clips length of 32 or 128. What should I do for FVD computing?

How long does it take to compute FVD each time? Following your training setting, this process takes 30 minutes each time and is calculated every 400k image, which means that the time spent on FVD calculation will be 0.5hrs * 25000 / 400 = 32hrs. This is far more time-consuming than StyleGAN2 FID computing. I want to make sure is this the case when you train your own model?

Thank you very much!
opened by johannwyh 5
How to perform space extrapolation?

I trained digan on a dataset at 128x128 resolution. I now intend to generate the output at 256x256 resolution. However, when I load the pretrained model, the output img_resolution is set at 128x128. I have tried changing the output resolution at multiple places, however, I am unable to do so. Any help on this would be appreciated.

opened by skymanaditya1 3
README file
Following the README file, I failed to run the project. Here are some suggestions and questions:

I think the author should tell us the project only support Linux system at least.

I just want to train the model, but the guide is too simple. Firstly, what are the means of "<EXP_NAME>"? Can I think it is just a temp name, it is ok to pass any text? Secondly, how to change the training options? Can I run the project successfully without any changes? Thirdly, where to place the data directory? data/UCF-101?

it seems launch.py is just an encapsulation of train.py, since some default settings may not available for everyone, why not provide a set of train.py ?

I have looked through the project, code quality is good, but the README is really a disaster.
opened by nicolgo 2
MoCoGAN-HD comparison

Hi, you compare to MoCoGAN-HD on Taichi where they do not report results on this dataset in their paper. I assume you used their repo to train on Taichi. Can you please share the checkpoint you used because I am trying to compare to both of your works.

Also can you share information how you did the time extrapolation? So how did you adjust Ts?

opened by torxxtorxx 2
modulated_conv2d in ToRGBLayer

Hi,

Thank you for your work.

I was looking at your code and I noticed that in ToRGBLayer, the modulated_conv2d function is used to generate the RGB frames. Does this mean that the network is not fully implicit but contains convolutions in the last layer, or did I miss something?

https://github.com/sihyun-yu/digan/blob/8368d5bfd73db38a9593ce10bb246cd7c37ddf9f/src/training/networks.py#L386

Thank you for your help!

opened by zacjiang 2

About the GPU requirement

Dear authors,

Hello! First of all, thank you for your inspiring work!

I encountered an issue with multi-GPU training on my 8 V100-16G GPUs. When distributing models across GPUs,

if rank == 0:
    print(f'Distributing across {num_gpus} GPUs...')
ddp_modules = dict()
for name, module in [('G_mapping', G.mapping), ('G_synthesis', G.synthesis), ('D', D), (None, G_ema), ('augment_pipe', augment_pipe)]:
    if rank == 0:
        print("[Distributing] Module {} ...".format(name))
    
    if (num_gpus > 1) and (module is not None) and len(list(module.parameters())) != 0:
        module.requires_grad_(True)
        module = torch.nn.parallel.DistributedDataParallel(module, device_ids=[device], broadcast_buffers=False,
                                                           find_unused_parameters=False)
        module.requires_grad_(False)
    
    if rank == 0:
        print("[Distributed] Module {}".format(name))
    
    if name is not None:
        ddp_modules[name] = module

the process failed on first module G_mapping, reporting

[Distributing] Module G_mapping ...
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1640811806235/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, unhandled cuda error, NCCL version 21.0.3
ncclUnhandledCudaError: Call to CUDA function failed.

The GPU memory consumption status is as follow,

wangyuhan-8-v100         Sat Mar  5 12:28:13 2022  460.73.01
[0] Tesla V100-SXM2-16GB | 36'C,  22 % | 15415 / 16160 MB | yuhan:python/31701(1283M) yuhan:python/31696(6905M) yuhan:python/31699(1151M) yuhan:python/31700(1241M) yuhan:python/31697(1283M) yuhan:python/31698(1283M) yuhan:python/31702(1175M) yuhan:python/31703(1099M)
[1] Tesla V100-SXM2-16GB | 37'C,   0 % |  2022 / 16160 MB | yuhan:python/31697(2019M)
[2] Tesla V100-SXM2-16GB | 38'C,   0 % |  2022 / 16160 MB | yuhan:python/31698(2019M)
[3] Tesla V100-SXM2-16GB | 39'C,   0 % |  2014 / 16160 MB | yuhan:python/31699(2011M)
[4] Tesla V100-SXM2-16GB | 35'C,   0 % |  2014 / 16160 MB | yuhan:python/31700(2011M)
[5] Tesla V100-SXM2-16GB | 35'C,   0 % |  2022 / 16160 MB | yuhan:python/31701(2019M)
[6] Tesla V100-SXM2-16GB | 36'C,   0 % |  2014 / 16160 MB | yuhan:python/31702(2011M)
[7] Tesla V100-SXM2-16GB | 37'C,   0 % |  2014 / 16160 MB | yuhan:python/31703(2011M)

I am not very familiar with this and seemingly GPU_0 is running out of memory. I am wondering whether it is the reason behind the ncclUnhandledError.

Could you please help me figure out what caused this error? Is your implementation working on 16GB V100 GPUs?

Thank you very much.

opened by johannwyh 2

Zip dataset

Hi,

thanks for your work! I want to use your repo with a .zip dataset, however I get following error:

File "/DIGAN/training/dataset.py", line 538, in init classes, class_to_idx = find_classes(path) File "/DIGAN/training/dataset.py", line 68, in find_classes classes = [d for d in os.listdir(dir) if os.path.isdir(os.path.join(dir, d))] NotADirectoryError: [Errno 20] Not a directory: '/DIGAN/data/dataset.zip'

Also I wanted to ask if I can somehow combine your model with the FVD evaluation of StyleGAN-V. Can you maybe integrate their evaluation protocol into your pipeline on the fly during training? I am having problems doing that and I think their evaluation protocol uses a better FVD evaluation

opened by torxxtorxx 1
dataset - ImageFolderDataset
Thanks for sharing your great work. I found the following issues in dataset.py:

Path for kinetics and Sky datasets: Before the line #579, there should be:

if 'kinetics' in self._path or 'KINETICS' in self._path or 'SKY' in self._path: if train: dir_path = os.path.join(self._path, 'train') else: dir_path = os.path.join(self._path, 'val')

and line #579 should be changed to: self._all_fnames = {os.path.relpath(os.path.join(root, fname), start=dir_path) for root, _dirs, files in os.walk(dir_path) for fname in files}

Otherwise, it won't work for the data from Kinetics and Sky datasets.

def _get_zipfile() is not defined in the code, but it is used in lines #582 and #607. The following lines can be added after the line #598 :

def _get_zipfile(self): assert self._type == 'zip' if self._zipfile is None: self._zipfile = zipfile.ZipFile(self._path) return self._zipfile

In line #599, def _file_ext(fname) should be changed to def _file_ext(self, fname).
opened by denabazazian 1
ModuleNotFoundError: No module named 'torchsde'

Hi,

Thank you for your work. When I run generate_videos.py on the pretrained checkpoints, it gave out the following error:

Loading networks from "../digan/pretrained/ucf-101-train-test.pkl"... Traceback (most recent call last): File "src/scripts/generate_videos.py", line 59, in generate_videos() File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/core.py", line 1128, in call return self.main(*args, **kwargs) File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/mnt/home/v_jiangshihao/miniconda3/envs/digan/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "src/scripts/generate_videos.py", line 38, in generate_videos G = legacy.load_network_pkl(f)['G_ema'].to(device).eval() # type: ignore File "/mnt/home/v_jiangshihao/digan_new/src/legacy.py", line 21, in load_network_pkl data = _LegacyUnpickler(f).load() File "/mnt/home/v_jiangshihao/digan_new/src/torch_utils/persistence.py", line 190, in _reconstruct_persistent_obj module = _src_to_module(meta.module_src) File "/mnt/home/v_jiangshihao/digan_new/src/torch_utils/persistence.py", line 226, in _src_to_module exec(src, module.dict) # pylint: disable=exec-used File "", line 14, in ModuleNotFoundError: No module named 'torchsde'

Do you know what's the cause of that? Thanks for your help!

opened by zacjiang 1
Training Error (CUDA error: CUBLAS_STATUS_EXECUTION_FAILED )

Hi, thanks for your great work. I am planning on training your model with custom dataset. I encounter following error:

""""CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)"""""

I tried multiple ways to solve this issue such as reducing batch size to 1, reducing number of gpus to 1 and reducing resolution of images to 64X64. I am training on NVIDIA Titan Xp GPUs with 12GB RAM. I didn't find any luck yet!

Can you help me resolve this issue?

opened by gkuberreddy 0
Error on training from scratch
Hi I am getting an error on running the training script using the command provided in the README.md

python src/infra/launch.py hydra.run.dir=. +experiment_name=exp01 +dataset.name=kinetics

as below:

"self._image_fnames = sorted(fname for fname in self._all_fnames if self._file_ext(fname) in PIL.Image.EXTENSION) TypeError: _file_ext() missing 1 required positional argument: 'fname'"

I have the data for kinetics processed according to the steps in prepare_data folder.
opened by sourav-roni 2
Added latent optimisation code for performing video-related tasks

Hi, I could not find the code for performing video-related tasks that were shown in the paper such as video interpolation, extrapolation, inversion, etc. I added these functionalities on top of your repository here - https://github.com/skymanaditya1/digan/blob/master/src/scripts/project.py.

Please let me know if this looks okay to you and if you would like, I can create a PR for the same (with the refactoring of course).

opened by skymanaditya1 1

Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

Related tags

Overview

DIGAN (ICLR 2022)

1. Environment setup

2. Dataset

UCF-101

Other video datasets (Sky Time lapse, TaiChi-HD, Kinetics-food)

Dataset download

3. Training

4. Evaluation (FVD and KVD)

5. Video generation

6. Results

Citation

Reference

Lisence

Comments

Owner

Sihyun Yu

Official PyTorch implementation of "Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics".

Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Official implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

PyTorch implementations of Generative Adversarial Networks.

Partial implementation of ODE-GAN technique from the paper Training Generative Adversarial Networks by Solving Ordinary Differential Equations

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks