Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

Related tags

vid2vid
Overview





vid2vid

Project | YouTube(short) | YouTube(full) | arXiv | Paper(full)

Pytorch implementation for high-resolution (e.g., 2048x1024) photorealistic video-to-video translation. It can be used for turning semantic label maps into photo-realistic videos, synthesizing people talking from edge maps, or generating human motions from poses. The core of video-to-video translation is image-to-image translation. Some of our work in that space can be found in pix2pixHD and SPADE.

Video-to-Video Synthesis
Ting-Chun Wang1, Ming-Yu Liu1, Jun-Yan Zhu2, Guilin Liu1, Andrew Tao1, Jan Kautz1, Bryan Catanzaro1
1NVIDIA Corporation, 2MIT CSAIL
In Neural Information Processing Systems (NeurIPS) 2018

Video-to-Video Translation

  • Label-to-Streetview Results

  • Edge-to-Face Results

  • Pose-to-Body Results

  • Frame Prediction Results

Prerequisites

  • Linux or macOS
  • Python 3
  • NVIDIA GPU + CUDA cuDNN
  • PyTorch 0.4

Getting Started

Installation

  • Install python libraries dominate and requests.
pip install dominate requests
  • If you plan to train with face datasets, please install dlib.
pip install dlib
  • If you plan to train with pose datasets, please install DensePose and/or OpenPose.
  • Clone this repo:
git clone https://github.com/NVIDIA/vid2vid
cd vid2vid
  • Docker Image If you have difficulty building the repo, a docker image can be found in the docker folder.

Testing

  • Please first download example dataset by running python scripts/download_datasets.py.

  • Next, compile a snapshot of FlowNet2 by running python scripts/download_flownet2.py.

  • Cityscapes

    • Please download the pre-trained Cityscapes model by:

      python scripts/street/download_models.py
    • To test the model (bash ./scripts/street/test_2048.sh):

      #!./scripts/street/test_2048.sh
      python test.py --name label2city_2048 --label_nc 35 --loadSize 2048 --n_scales_spatial 3 --use_instance --fg --use_single_G

      The test results will be saved in: ./results/label2city_2048/test_latest/.

    • We also provide a smaller model trained with single GPU, which produces slightly worse performance at 1024 x 512 resolution.

      • Please download the model by
      python scripts/street/download_models_g1.py
      • To test the model (bash ./scripts/street/test_g1_1024.sh):
      #!./scripts/street/test_g1_1024.sh
      python test.py --name label2city_1024_g1 --label_nc 35 --loadSize 1024 --n_scales_spatial 3 --use_instance --fg --n_downsample_G 2 --use_single_G
    • You can find more example scripts in the scripts/street/ directory.

  • Faces

    • Please download the pre-trained model by:
      python scripts/face/download_models.py
    • To test the model (bash ./scripts/face/test_512.sh):
      #!./scripts/face/test_512.sh
      python test.py --name edge2face_512 --dataroot datasets/face/ --dataset_mode face --input_nc 15 --loadSize 512 --use_single_G
      The test results will be saved in: ./results/edge2face_512/test_latest/.

Dataset

  • Cityscapes
    • We use the Cityscapes dataset as an example. To train a model on the full dataset, please download it from the official website (registration required).
    • We apply a pre-trained segmentation algorithm to get the corresponding semantic maps (train_A) and instance maps (train_inst).
    • Please add the obtained images to the datasets folder in the same way the example images are provided.
  • Face
    • We use the FaceForensics dataset. We then use landmark detection to estimate the face keypoints, and interpolate them to get face edges.
  • Pose
    • We use random dancing videos found on YouTube. We then apply DensePose / OpenPose to estimate the poses for each frame.

Training with Cityscapes dataset

  • First, download the FlowNet2 checkpoint file by running python scripts/download_models_flownet2.py.
  • Training with 8 GPUs:
    • We adopt a coarse-to-fine approach, sequentially increasing the resolution from 512 x 256, 1024 x 512, to 2048 x 1024.
    • Train a model at 512 x 256 resolution (bash ./scripts/street/train_512.sh)
    #!./scripts/street/train_512.sh
    python train.py --name label2city_512 --label_nc 35 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 6 --n_frames_total 6 --use_instance --fg
    • Train a model at 1024 x 512 resolution (must train 512 x 256 first) (bash ./scripts/street/train_1024.sh):
    #!./scripts/street/train_1024.sh
    python train.py --name label2city_1024 --label_nc 35 --loadSize 1024 --n_scales_spatial 2 --num_D 3 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 4 --use_instance --fg --niter_step 2 --niter_fix_global 10 --load_pretrain checkpoints/label2city_512

If you have TensorFlow installed, you can see TensorBoard logs in ./checkpoints/label2city_1024/logs by adding --tf_log to the training scripts.

  • Training with a single GPU:

    • We trained our models using multiple GPUs. For convenience, we provide some sample training scripts (train_g1_XXX.sh) for single GPU users, up to 1024 x 512 resolution. Again a coarse-to-fine approach is adopted (256 x 128, 512 x 256, 1024 x 512). Performance is not guaranteed using these scripts.
    • For example, to train a 256 x 128 video with a single GPU (bash ./scripts/street/train_g1_256.sh)
    #!./scripts/street/train_g1_256.sh
    python train.py --name label2city_256_g1 --label_nc 35 --loadSize 256 --use_instance --fg --n_downsample_G 2 --num_D 1 --max_frames_per_gpu 6 --n_frames_total 6
  • Training at full (2k x 1k) resolution

    • To train the images at full resolution (2048 x 1024) requires 8 GPUs with at least 24G memory (bash ./scripts/street/train_2048.sh). If only GPUs with 12G/16G memory are available, please use the script ./scripts/street/train_2048_crop.sh, which will crop the images during training. Performance is not guaranteed with this script.

Training with face datasets

  • If you haven't, please first download example dataset by running python scripts/download_datasets.py.
  • Run the following command to compute face landmarks for training dataset:
    python data/face_landmark_detection.py train
  • Run the example script (bash ./scripts/face/train_512.sh)
    python train.py --name edge2face_512 --dataroot datasets/face/ --dataset_mode face --input_nc 15 --loadSize 512 --num_D 3 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 6 --n_frames_total 12  
  • For single GPU users, example scripts are in train_g1_XXX.sh. These scripts are not fully tested and please use at your own discretion. If you still hit out of memory errors, try reducing max_frames_per_gpu.
  • More examples scripts can be found in scripts/face/.
  • Please refer to More Training/Test Details for more explanations about training flags.

Training with pose datasets

  • If you haven't, please first download example dataset by running python scripts/download_datasets.py.
  • Example DensePose and OpenPose results are included. If you plan to use your own dataset, please generate these results and put them in the same way the example dataset is provided.
  • Run the example script (bash ./scripts/pose/train_256p.sh)
    python train.py --name pose2body_256p --dataroot datasets/pose --dataset_mode pose --input_nc 6 --num_D 2 --resize_or_crop ScaleHeight_and_scaledCrop --loadSize 384 --fineSize 256 --gpu_ids 0,1,2,3,4,5,6,7 --batchSize 8 --max_frames_per_gpu 3 --no_first_img --n_frames_total 12 --max_t_step 4
  • Again, for single GPU users, example scripts are in train_g1_XXX.sh. These scripts are not fully tested and please use at your own discretion. If you still hit out of memory errors, try reducing max_frames_per_gpu.
  • More examples scripts can be found in scripts/pose/.
  • Please refer to More Training/Test Details for more explanations about training flags.

Training with your own dataset

  • If your input is a label map, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please use --label_nc N during both training and testing.
  • If your input is not a label map, please specify --input_nc N where N is the number of input channels (The default is 3 for RGB images).
  • The default setting for preprocessing is scaleWidth, which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop option. For example, scaleWidth_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize). crop skips the resizing step and only performs random cropping. scaledCrop crops the image while retraining the original aspect ratio. randomScaleHeight will randomly scale the image height to be between opt.loadSize and opt.fineSize. If you don't want any preprocessing, please specify none, which will do nothing other than making sure the image is divisible by 32.

More Training/Test Details

  • We generate frames in the video sequentially, where the generation of the current frame depends on previous frames. To generate the first frame for the model, there are 3 different ways:

      1. Using another generator which was trained on generating single images (e.g., pix2pixHD) by specifying --use_single_G. This is the option we use in the test scripts.
      1. Using the first frame in the real sequence by specifying --use_real_img.
      1. Forcing the model to also synthesize the first frame by specifying --no_first_img. This must be trained separately before inference.
  • The way we train the model is as follows: suppose we have 8 GPUs, 4 for generators and 4 for discriminators, and we want to train 28 frames. Also, assume each GPU can generate only one frame. The first GPU generates the first frame, and pass it to the next GPU, and so on. After the 4 frames are generated, they are passed to the 4 discriminator GPUs to compute the losses. Then the last generated frame becomes input to the next batch, and the next 4 frames in the training sequence are loaded into GPUs. This is repeated 7 times (4 x 7 = 28), to train all the 28 frames.

  • Some important flags:

    • n_gpus_gen: the number of GPUs to use for generators (while the others are used for discriminators). We separate generators and discriminators into different GPUs since when dealing with high resolutions, even one frame cannot fit in a GPU. If the number is set to -1, there is no separation and all GPUs are used for both generators and discriminators (only works for low-res images).
    • n_frames_G: the number of input frames to feed into the generator network; i.e., n_frames_G - 1 is the number of frames we look into the past. the default is 3 (conditioned on previous two frames).
    • n_frames_D: the number of frames to feed into the temporal discriminator. The default is 3.
    • n_scales_spatial: the number of scales in the spatial domain. We train from the coarsest scale and all the way to the finest scale. The default is 3.
    • n_scales_temporal: the number of scales for the temporal discriminator. The finest scale takes in the sequence in the original frame rate. The coarser scales subsample the frames by a factor of n_frames_D before feeding the frames into the discriminator. For example, if n_frames_D = 3 and n_scales_temporal = 3, the discriminator effectively sees 27 frames. The default is 3.
    • max_frames_per_gpu: the number of frames in one GPU during training. If you run into out of memory error, please first try to reduce this number. If your GPU memory can fit more frames, try to make this number bigger to make training faster. The default is 1.
    • max_frames_backpropagate: the number of frames that loss backpropagates to previous frames. For example, if this number is 4, the loss on frame n will backpropagate to frame n-3. Increasing this number will slightly improve the performance, but also cause training to be less stable. The default is 1.
    • n_frames_total: the total number of frames in a sequence we want to train with. We gradually increase this number during training.
    • niter_step: for how many epochs do we double n_frames_total. The default is 5.
    • niter_fix_global: if this number if not 0, only train the finest spatial scale for this number of epochs before starting to fine-tune all scales.
    • batchSize: the number of sequences to train at a time. We normally set batchSize to 1 since often, one sequence is enough to occupy all GPUs. If you want to do batchSize > 1, currently only batchSize == n_gpus_gen is supported.
    • no_first_img: if not specified, the model will assume the first frame is given and synthesize the successive frames. If specified, the model will also try to synthesize the first frame instead.
    • fg: if specified, use the foreground-background separation model as stated in the paper. The foreground labels must be specified by --fg_labels.
    • no_flow: if specified, do not use flow warping and directly synthesize frames. We found this usually still works reasonably well when the background is static, while saving memory and training time.
    • sparse_D: if specified, only apply temporal discriminator on sparse frames in the sequence. This helps save memory while having little effect on performance.
  • For other flags, please see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags.

  • Additional flags for edge2face examples:

    • no_canny_edge: do not use canny edges for background as input.
    • no_dist_map: by default, we use distrance transform on the face edge map as input. This flag will make it directly use edge maps.
  • Additional flags for pose2body examples:

    • densepose_only: use only densepose results as input. Please also remember to change input_nc to be 3.
    • openpose_only: use only openpose results as input. Please also remember to change input_nc to be 3.
    • add_face_disc: add an additional discriminator that only works on the face region.
    • remove_face_labels: remove densepose results for face, and add noise to openpose face results, so the network can get more robust to different face shapes. This is important if you plan to do inference on half-body videos (if not, usually this flag is unnecessary).
    • random_drop_prob: the probability to randomly drop each pose segment during training, so the network can get more robust to missing poses at inference time. Default is 0.05.
    • basic_point_only: if specified, only use basic joint keypoints for OpenPose output, without using any hand or face keypoints.

Citation

If you find this useful for your research, please cite the following paper.

@inproceedings{wang2018vid2vid,
   author    = {Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Guilin Liu
                and Andrew Tao and Jan Kautz and Bryan Catanzaro},
   title     = {Video-to-Video Synthesis},
   booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},   
   year      = {2018},
}

Acknowledgments

We thank Karan Sapra, Fitsum Reda, and Matthieu Le for generating the segmentation maps for us. We also thank Lisa Rhee for allowing us to use her dance videos for training. We thank William S. Peebles for proofreading the paper.
This code borrows heavily from pytorch-CycleGAN-and-pix2pix and pix2pixHD.

Issues
  • TypeError: __init__() got an unexpected keyword argument 'track_running_stats'

    TypeError: __init__() got an unexpected keyword argument 'track_running_stats'

    i have installed this repo in nvidia docker env with: cuda8.0, cudnn6.0, miniconda with python3.6 virtualenv, pytorch0.2.0

    when i run ./scripts/test_2048.sh this shell, i got beblow error:

    ------------ Options ------------- aspect_ratio: 1.0 batchSize: 1 checkpoints_dir: ./checkpoints dataroot: datasets/Cityscapes/ dataset_mode: temporal debug: False display_id: 0 display_winsize: 512 feat_num: 3 fg: True fg_labels: [26] fineSize: 512 gpu_ids: [0] how_many: 300 input_nc: 3 isTrain: False label_feat: False label_nc: 35 loadSize: 2048 load_features: False load_pretrain: max_dataset_size: inf model: vid2vid nThreads: 2 n_blocks: 9 n_blocks_local: 3 n_downsample_E: 3 n_downsample_G: 3 n_frames_G: 3 n_gpus_gen: 1 n_local_enhancers: 1 n_scales_spatial: 3 name: label2city_2048 ndf: 64 nef: 32 netE: simple netG: composite ngf: 128 no_first_img: False no_flip: False norm: batch ntest: inf output_nc: 3 phase: test resize_or_crop: scaleWidth results_dir: ./results/ serial_batches: False tf_log: False use_instance: True use_real_img: False use_single_G: True which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [TestDataset] was created vid2vid ---------- Networks initialized -------------

    Traceback (most recent call last): File "test.py", line 24, in model = create_model(opt) File "/vid2vid/models/models.py", line 19, in create_model modelG.initialize(opt) File "/vid2vid/models/vid2vid_model_G.py", line 51, in initialize self.netG_i = self.load_single_G() if self.use_single_G else None File "/vid2vid/models/vid2vid_model_G.py", line 270, in load_single_G netG = networks.define_G(input_nc, opt.output_nc, 0, 32, 'local', 4, 'instance', 0, self.gpu_ids, opt) File "/vid2vid/models/networks.py", line 39, in define_G netG = LocalEnhancer(input_nc, output_nc, ngf, n_downsampling, opt.n_blocks, opt.n_local_enhancers, opt.n_blocks_local, norm_layer) File "/vid2vid/models/networks.py", line 320, in init model_global = GlobalGenerator(input_nc, output_nc, ngf_global, n_downsample_global, n_blocks_global, norm_layer).model File "/vid2vid/models/networks.py", line 286, in init model = [nn.ReflectionPad2d(3), nn.Conv2d(input_nc, ngf, kernel_size=7, padding=0), norm_layer(ngf), activation] TypeError: init() got an unexpected keyword argument 'track_running_stats' (py3) [email protected]:/vid2vid#

    can anyone tell me how to solve it.

    thks!

    opened by zzzkk2009 15
  • ModuleNotFoundError: No module named 'resample2d_cuda'

    ModuleNotFoundError: No module named 'resample2d_cuda'

    Hi,

    I faced an issue, 'ModuleNotFoundError: No module named 'resample2d_cuda'. Do you know how to solve this?

    'resample2d_package' folder contains as below. D:\download\vid2vid\models\flownet2_pytorch\networks\resample2d_package
    pycache init.py resample2d.py resample2d_cuda.cc resample2d_kernel.cu resample2d_kernel.cuh setup.py

    Following is the cmd command.

    D:\download\vid2vid>python test.py --name label2city_2048 --dataroot datasets/Cityscapes/test_A --loadSize 2048 --n_scales_spatial 3 --use_instance --fg --use_single_G ------------ Options ------------- aspect_ratio: 1.0 batchSize: 1 checkpoints_dir: ./checkpoints dataroot: datasets/Cityscapes/test_A dataset_mode: temporal debug: False display_id: 0 display_winsize: 512 feat_num: 3 fg: True fg_labels: [26] fineSize: 512 gpu_ids: [0] how_many: 300 input_nc: 3 isTrain: False label_feat: False label_nc: 35 loadSize: 2048 load_features: False load_pretrain: max_dataset_size: inf model: vid2vid nThreads: 2 n_blocks: 9 n_blocks_local: 3 n_downsample_E: 3 n_downsample_G: 3 n_frames_G: 3 n_gpus_gen: 1 n_local_enhancers: 1 n_scales_spatial: 3 name: label2city_2048 ndf: 64 nef: 32 netE: simple netG: composite ngf: 128 no_first_img: False no_flip: False norm: batch ntest: inf output_nc: 3 phase: test resize_or_crop: scaleWidth results_dir: ./results/ serial_batches: False tf_log: False use_instance: True use_real_img: False use_single_G: True which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [TestDataset] was created vid2vid Traceback (most recent call last): File "test.py", line 24, in model = create_model(opt) File "D:\download\vid2vid\models\models.py", line 7, in create_model from .vid2vid_model_G import Vid2VidModelG File "D:\download\vid2vid\models\vid2vid_model_G.py", line 13, in from . import networks File "D:\download\vid2vid\models\networks.py", line 12, in from .flownet2_pytorch.networks.resample2d_package.resample2d import Resample2d File "D:\download\vid2vid\models\flownet2_pytorch\networks\resample2d_package\resample2d.py", line 3, in import resample2d_cuda ModuleNotFoundError: No module named 'resample2d_cuda'

    opened by TatsuyaOsugi 13
  • Understanding the flow for training faces

    Understanding the flow for training faces

    Please correct me if I am wrong. ( I am focusing just on faces)

    As I understand, vid2vid lets you provide a video from which each frame is like labeled data for training. So once one has a trained model, then given any input data of just edge-maps, then vid2vid will try to create a face (based on the trained data) from the edge maps.

    I am not clear though how to do this with train.py. Do I need to generate edge-maps myself for each frame of my video?

    Ideally I want to just provide vid2vid a single say .avi or video file and vid2vid generate edge-maps itself for each frame, outputs a trained model.

    Thank you @tcwang0509 @junyanz

    When answering, please include CLI commands that I can copy paste/directions that I can immediately do/changes to Python code that might be needed.

    opened by fxfactorial 9
  • Using model on own data set.

    Using model on own data set.

    Hi this is an extremely interesting project, I have another set of data set that I would like to use. The data I want to use are images that are taken in 15 minute increments and I want to transform them into another image. We have used pix2pix for this problem set, but we also wanted to see if vid2vid will yield better results. I want to know how I need to have the images to be formatted or organized before training on them? Any help would be much appreciated.

    opened by yuanzhou15 7
  • ModuleNotFoundError: No module named 'models.flownet2_pytorch.networks.resample2d_package._ext'

    ModuleNotFoundError: No module named 'models.flownet2_pytorch.networks.resample2d_package._ext'

    Failed to test vid2vid...

    ➜  vid2vid git:(master) python test.py --name label2city_2048 --label_nc 35 --loadSize 2048 --n_scales_spatial 3 --use_instance --fg --use_single_G
    ------------ Options -------------
    add_face_disc: False
    aspect_ratio: 1.0
    batchSize: 1
    checkpoints_dir: ./checkpoints
    dataroot: datasets/Cityscapes/
    dataset_mode: temporal
    debug: False
    densepose_only: False
    display_id: 0
    display_winsize: 512
    feat_num: 3
    fg: True
    fg_labels: [26]
    fineSize: 512
    gpu_ids: [0]
    how_many: 300
    input_nc: 3
    isTrain: False
    label_feat: False
    label_nc: 35
    loadSize: 2048
    load_features: False
    load_pretrain: 
    max_dataset_size: inf
    model: vid2vid
    nThreads: 2
    n_blocks: 9
    n_blocks_local: 3
    n_downsample_E: 3
    n_downsample_G: 3
    n_frames_G: 3
    n_gpus_gen: 1
    n_local_enhancers: 1
    n_scales_spatial: 3
    name: label2city_2048
    ndf: 64
    nef: 32
    netE: simple
    netG: composite
    ngf: 128
    no_canny_edge: False
    no_dist_map: False
    no_first_img: False
    no_flip: False
    no_flow: False
    norm: batch
    ntest: inf
    openpose_only: False
    output_nc: 3
    phase: test
    random_drop_prob: 0.2
    remove_face_labels: False
    resize_or_crop: scaleWidth
    results_dir: ./results/
    serial_batches: False
    start_frame: 0
    tf_log: False
    use_instance: True
    use_real_img: False
    use_single_G: True
    which_epoch: latest
    -------------- End ----------------
    CustomDatasetDataLoader
    dataset [TestDataset] was created
    vid2vid
    Traceback (most recent call last):
      File "test.py", line 25, in <module>
        model = create_model(opt)
      File "....../vid2vid/models/models.py", line 7, in create_model
        from .vid2vid_model_G import Vid2VidModelG
      File "....../vid2vid/models/vid2vid_model_G.py", line 13, in <module>
        from . import networks
      File "....../vid2vid/models/networks.py", line 12, in <module>
        from .flownet2_pytorch.networks.resample2d_package.resample2d import Resample2d
    ModuleNotFoundError: No module named 'models.flownet2_pytorch'
    
    opened by jiapei100 6
  • Training a face model with train_g1_256.sh

    Training a face model with train_g1_256.sh

    Did anyone successfully trained a model with train_g1_256.sh? The readme says that single GPU models were to well tested. For some reasons the training finishes after only a few hours and trying to test I get:

    ISSUE: Pretrained network G0 has fewer layers; The following are not initialized:
    ['model_down_img', 'model_down_seg', 'model_final_flow', 'model_final_img', 'model_final_w', 'model_res_flow', 'model_res_img', 'model_up_flow', 'model_up_img']
    
    opened by petergerten 5
  • spatio-temporal growing

    spatio-temporal growing

    Hey, in your paper you mention in the experiment section: "Implementation details. We train our network in a spatio-temporally progressive manner. In particular, we start with generating low-resolution videos with few frames, and all the way up to generating full resolution videos with 30 (or more) frames."

    How exactly did you do the scaling? I looked through your code, but couldn't find anything related to it. In particular, I would like to know whether you increase both spatial/temporal size at the same time, or one after another, and whether you adjusted other hparams when using it.

    What I mean by adjusting hparams is that this progressive growing is mainly used to reduce train time I guess, so if the model would usually take N epochs to train on high res vids from scratch, you probably trained it on M<N epochs per progressive growing stage right? Then what is M? N/#stages? And did you use larger batch size for smaller resolutions, or make any other notable hparam changes?

    opened by fa9r 5
  • RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88'

    RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88'

    Hello!When I set the gpu_ids as '0,1' ,the program will throw the error.Who can help me?

    opened by moshifengyan 5
  • 'Segmentation fault (core dumped)' still exists.

    'Segmentation fault (core dumped)' still exists.

    I downloaded the latest version (Aug 24th), and followed the steps in 'Read me' Paragraph 'Test', but I still get such error. My CUDA version is 9.0, PyTorch version is 4.1.

    Can someone help me?

    ------------ Options -------------
    aspect_ratio: 1.0
    batchSize: 1
    checkpoints_dir: ./checkpoints
    dataroot: datasets/Cityscapes/test_A
    dataset_mode: temporal
    debug: False
    display_id: 0
    display_winsize: 512
    feat_num: 3
    fg: True
    fg_labels: [26]
    fineSize: 512
    gpu_ids: [1, 2]
    how_many: 300
    input_nc: 3
    isTrain: False
    label_feat: False
    label_nc: 35
    loadSize: 2048
    load_features: False
    load_pretrain: 
    max_dataset_size: inf
    model: vid2vid
    nThreads: 2
    n_blocks: 9
    n_blocks_local: 3
    n_downsample_E: 3
    n_downsample_G: 3
    n_frames_G: 3
    n_gpus_gen: 2
    n_local_enhancers: 1
    n_scales_spatial: 3
    name: label2city_2048
    ndf: 64
    nef: 32
    netE: simple
    netG: composite
    ngf: 128
    no_first_img: False
    no_flip: False
    norm: batch
    ntest: inf
    output_nc: 3
    phase: test
    resize_or_crop: scaleWidth
    results_dir: ./results/
    serial_batches: False
    tf_log: False
    use_instance: True
    use_real_img: False
    use_single_G: True
    which_epoch: latest
    -------------- End ----------------
    CustomDatasetDataLoader
    dataset [TestDataset] was created
    vid2vid
    ---------- Networks initialized -------------
    -----------------------------------------------
    Doing 28 frames
    Segmentation fault (core dumped)
    
    opened by joe1chief 5
  • Add Dockerfile and launch script

    Add Dockerfile and launch script

    I hope this docker image is useful to others! Would have saved me a couple days of trial and error work fighting competing dependencies

    I've tested this Dockerfile so far as it can run bash ./scripts/test_1024_g1.sh. I have a single GPU setup so I can't test any multi-GPU features.

    If I encounter further issues as I use vid2vid, I'll update this pull request.

    opened by dustinfreeman 4
  • FID evaluation

    FID evaluation

    Hello. Will the code evaluating FID be provided?

    opened by CheungBH 0
  • RuntimeError: Legacy autograd function with non-static forward method is deprecated.

    RuntimeError: Legacy autograd function with non-static forward method is deprecated.

    When running

    python train.py --name edge2face_512 --dataroot datasets/face/ --dataset_mode face --input_nc 15 --loadSize 256 --num_D 1 --max_frames_per_gpu 2 --n_frames_total 6
    

    I am getting this error: image I am using pytorch 1.9.0 and torchvision 0.10.0.

    @Note: I tried to compile using pytorch0.4.1 but unable to compile resample2d_cuda. Even after running install.sh I get import error

    import resample2d_cuda
    ImportError: libc10.so: cannot open shared object file: No such file or directory
    
    opened by avani17101 1
  • Vid

    Vid

    opened by bigDonJuan 1
  • RuntimeError: CUDA error: throwing an instance of 'c10::Error'

    RuntimeError: CUDA error: throwing an instance of 'c10::Error'

    Hi all,

    I'm getting this error while training vid2vid;

    RuntimeError: CUDA error: the launch timed out and was terminated
    terminate called after throwing an instance of 'c10::Error' 
    

    The error pops up when the script starts the validation phase; here is the entire log; log.txt

    I'm training vid2vid on the Kitti dataset using a server with 8 X 12GB Quadro M6000 GPUs. I'm also attaching the config file; ampO1.txt

    Any suggestions?

    Thanks :)

    opened by alamayreh 0
  • How to test “pose-to-body”?

    How to test “pose-to-body”?

    thanks for sharing! If I finish training pose-to-body, how to run test.py, what is the python command? And where will the weights be saved after training

    opened by guofengming11 0
  • Error

    Error

    Someone knows something about this message: File "/content/f7b7c7758a46da49f84bc68b47997d69/vid2vid/models/base_model.py", line 148, in get_edges edge[:,:,:,:,1:] = edge[:,:,:,:,1:] | (t[:,:,:,:,1:] != t[:,:,:,:,:-1]) RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'other' in call to _th_or ?

    opened by BeahIF 1
  • Inference time - How much FPS is possible?

    Inference time - How much FPS is possible?

    Has there been any evaluation of the inference time for the body to body or the face to face transformation? Would it be possible to get the model running in real time (over 10 FPS) and what would the FPS depend on?

    opened by Zrrr1997 0
  • Is there any ways to improve the output quality of the pose model?

    Is there any ways to improve the output quality of the pose model?

    I'm creating a dataset using this model and I need the fingers in the output to be clear is there a way to improve the output quality of hand signs.

    opened by arulpraveent 0
  • Using Openpose docker image with vid2vid

    Using Openpose docker image with vid2vid

    I trie dfinding information about this, but couldn't.

    I have problems installing OpenPose, but its docker image works fine. Now, how do I use this image with vid2vid?

    Any pointers will be great. Thank you.

    opened by karims 0
  • Sequence length: How to limit that to 30, it is increasing automatically as the no. of epochs is increasing

    Sequence length: How to limit that to 30, it is increasing automatically as the no. of epochs is increasing

    I'm custom training vid2vid for Pose-to-body generations, and given below is an extract of the logs used for my custom training. It says this despite specifying n_frame_total 30 in my parameters used for training ->> --------- Updating training sequence length to 120 ---------

    Any way to limit this to 30, or this is the way it has to train for getting good results? Can anyone clarify this?

    ----------------Parameters used---------------- TTUR: False add_face_disc: False basic_point_only: False batchSize: 8 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: True dataroot: /mnt/FS/datasets dataset_mode: pose debug: False densepose_only: False display_freq: 100 display_id: 0 display_winsize: 512 feat_num: 3 fg: False fg_labels: [26] fineSize: 256 fp16: False gan_mode: ls gpu_ids: [0, 1, 2, 3, 4, 5, 6, 7] input_nc: 6 isTrain: True label_feat: False label_nc: 0 lambda_F: 10.0 lambda_T: 10.0 lambda_feat: 10.0 loadSize: 384 load_features: False load_pretrain: local_rank: 0 lr: 0.0002 max_dataset_size: inf max_frames_backpropagate: 1 max_frames_per_gpu: 5 max_t_step: 1 model: vid2vid nThreads: 2 n_blocks: 9 n_blocks_local: 3 n_downsample_E: 3 n_downsample_G: 3 n_frames_D: 3 n_frames_G: 3 n_frames_total: 30 n_gpus_gen: 8 n_layers_D: 3 n_local_enhancers: 1 n_scales_spatial: 1 n_scales_temporal: 2 name: /mnt/FS/datasets/vid2vid/test ndf: 64 nef: 32 netE: simple netG: composite ngf: 64 niter: 10 niter_decay: 10 niter_fix_global: 0 niter_step: 5 no_canny_edge: False no_dist_map: False no_first_img: False no_flip: False no_flow: False no_ganFeat: False no_html: False no_vgg: False norm: batch num_D: 2 openpose_only: False output_nc: 3 phase: train pool_size: 1 print_freq: 100 random_drop_prob: 0.05 random_scale_points: False remove_face_labels: False resize_or_crop: Scaleheight_and_scaledCrop save_epoch_freq: 1 save_latest_freq: 1000 serial_batches: False sparse_D: False tf_log: False use_instance: False use_single_G: False which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [PoseDataset] was created #training videos = 5070 vid2vid ---------- Networks initialized -------------

    ---------- Networks initialized -------------

    Resuming from epoch 14 at iteration 144 update learning rate: 0.000200 -> 0.000140 update learning rate: 0.000200 -> 0.000140 --------- Updating training sequence length to 120 --------- -------- Updating number of backpropagated frames to 1 ----------

    opened by pranavraikote 0
Owner
NVIDIA Corporation
NVIDIA Corporation
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 43 Oct 20, 2021
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 8.5k Oct 22, 2021
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.4k Oct 25, 2021
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

vid2vid Project | YouTube(short) | YouTube(full) | arXiv | Paper(full) Pytorch implementation for high-resolution (e.g., 2048x1024) photorealistic vid

NVIDIA Corporation 7.8k Oct 25, 2021
Apply our monocular depth boosting to your own network!

MergeNet - Boost Your Own Depth Boost custom or edited monocular depth maps using MergeNet Input Original result After manual editing of base You can

Computational Photography Lab @ SFU 4 Oct 24, 2021
Learning and Building Convolutional Neural Networks using PyTorch

Image Classification Using Deep Learning Learning and Building Convolutional Neural Networks using PyTorch. Models, selected are based on number of ci

Mayur 45 Oct 18, 2021
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 2.6k Oct 19, 2021
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 2.6k Oct 21, 2021
Synthesizing and manipulating 2048x1024 images with conditional GANs

pix2pixHD Project | Youtube | Paper Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translatio

NVIDIA Corporation 5.5k Oct 23, 2021
Unoffical implementation about Image Super-Resolution via Iterative Refinement by Pytorch

Image Super-Resolution via Iterative Refinement Paper | Project Brief This is a unoffical implementation about Image Super-Resolution via Iterative Re

LiangWei Jiang 232 Oct 23, 2021
CVPR 2021 Challenge on Super-Resolution Space

Learning the Super-Resolution Space Challenge NTIRE 2021 at CVPR Learning the Super-Resolution Space challenge is held as a part of the 6th edition of

andreas 95 Oct 23, 2021
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

Hyeonwoo Kang 2.3k Oct 18, 2021
Official repository for the paper "Instance-Conditioned GAN"

Official repository for the paper "Instance-Conditioned GAN" by Arantxa Casanova, Marlene Careil, Jakob Verbeek, Michał Drożdżal, Adriana Romero-Soriano.

Facebook Research 409 Oct 22, 2021
Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Aviv Gabbay 21 Oct 19, 2021
PyTorch implementation of popular datasets and models in remote sensing

PyTorch Remote Sensing (torchrs) (WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Re

isaac 121 Oct 19, 2021
Unsupervised Image-to-Image Translation

UNIT: UNsupervised Image-to-image Translation Networks Imaginaire Repository We have a reimplementation of the UNIT method that is more performant. It

Ming-Yu Liu 劉洺堉 1.8k Oct 19, 2021
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 44 Oct 13, 2021
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (CVPR 2021, oral presentation) CoCosNet v2: Full-Resolution Correspondence

Microsoft 217 Oct 20, 2021
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Kayo Yin 77 Oct 17, 2021