Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

Overview





vid2vid

Project | YouTube(short) | YouTube(full) | arXiv | Paper(full)

Pytorch implementation for high-resolution (e.g., 2048x1024) photorealistic video-to-video translation. It can be used for turning semantic label maps into photo-realistic videos, synthesizing people talking from edge maps, or generating human motions from poses. The core of video-to-video translation is image-to-image translation. Some of our work in that space can be found in pix2pixHD and SPADE.

Video-to-Video Synthesis
Ting-Chun Wang1, Ming-Yu Liu1, Jun-Yan Zhu2, Guilin Liu1, Andrew Tao1, Jan Kautz1, Bryan Catanzaro1
1NVIDIA Corporation, 2MIT CSAIL
In Neural Information Processing Systems (NeurIPS) 2018

Video-to-Video Translation

  • Label-to-Streetview Results

  • Edge-to-Face Results

  • Pose-to-Body Results

  • Frame Prediction Results

Prerequisites

  • Linux or macOS
  • Python 3
  • NVIDIA GPU + CUDA cuDNN
  • PyTorch 0.4

Getting Started

Installation

  • Install python libraries dominate and requests.
pip install dominate requests
  • If you plan to train with face datasets, please install dlib.
pip install dlib
  • If you plan to train with pose datasets, please install DensePose and/or OpenPose.
  • Clone this repo:
git clone https://github.com/NVIDIA/vid2vid
cd vid2vid
  • Docker Image If you have difficulty building the repo, a docker image can be found in the docker folder.

Testing

  • Please first download example dataset by running python scripts/download_datasets.py.

  • Next, compile a snapshot of FlowNet2 by running python scripts/download_flownet2.py.

  • Cityscapes

    • Please download the pre-trained Cityscapes model by:

      python scripts/street/download_models.py
    • To test the model (bash ./scripts/street/test_2048.sh):

      #!./scripts/street/test_2048.sh
      python test.py --name label2city_2048 --label_nc 35 --loadSize 2048 --n_scales_spatial 3 --use_instance --fg --use_single_G

      The test results will be saved in: ./results/label2city_2048/test_latest/.

    • We also provide a smaller model trained with single GPU, which produces slightly worse performance at 1024 x 512 resolution.

      • Please download the model by
      python scripts/street/download_models_g1.py
      • To test the model (bash ./scripts/street/test_g1_1024.sh):
      #!./scripts/street/test_g1_1024.sh
      python test.py --name label2city_1024_g1 --label_nc 35 --loadSize 1024 --n_scales_spatial 3 --use_instance --fg --n_downsample_G 2 --use_single_G
    • You can find more example scripts in the scripts/street/ directory.

  • Faces

    • Please download the pre-trained model by:
      python scripts/face/download_models.py
    • To test the model (bash ./scripts/face/test_512.sh):
      #!./scripts/face/test_512.sh
      python test.py --name edge2face_512 --dataroot datasets/face/ --dataset_mode face --input_nc 15 --loadSize 512 --use_single_G
      The test results will be saved in: ./results/edge2face_512/test_latest/.

Dataset

  • Cityscapes
    • We use the Cityscapes dataset as an example. To train a model on the full dataset, please download it from the official website (registration required).
    • We apply a pre-trained segmentation algorithm to get the corresponding semantic maps (train_A) and instance maps (train_inst).
    • Please add the obtained images to the datasets folder in the same way the example images are provided.
  • Face
    • We use the FaceForensics dataset. We then use landmark detection to estimate the face keypoints, and interpolate them to get face edges.
  • Pose
    • We use random dancing videos found on YouTube. We then apply DensePose / OpenPose to estimate the poses for each frame.

Training with Cityscapes dataset

  • First, download the FlowNet2 checkpoint file by running python scripts/download_models_flownet2.py.
  • Training with 8 GPUs:
    • We adopt a coarse-to-fine approach, sequentially increasing the resolution from 512 x 256, 1024 x 512, to 2048 x 1024.
    • Train a model at 512 x 256 resolution (bash ./scripts/street/train_512.sh)
    #!./scripts/street/train_512.sh
    python train.py --name label2city_512 --label_nc 35 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 6 --n_frames_total 6 --use_instance --fg
    • Train a model at 1024 x 512 resolution (must train 512 x 256 first) (bash ./scripts/street/train_1024.sh):
    #!./scripts/street/train_1024.sh
    python train.py --name label2city_1024 --label_nc 35 --loadSize 1024 --n_scales_spatial 2 --num_D 3 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 4 --use_instance --fg --niter_step 2 --niter_fix_global 10 --load_pretrain checkpoints/label2city_512

If you have TensorFlow installed, you can see TensorBoard logs in ./checkpoints/label2city_1024/logs by adding --tf_log to the training scripts.

  • Training with a single GPU:

    • We trained our models using multiple GPUs. For convenience, we provide some sample training scripts (train_g1_XXX.sh) for single GPU users, up to 1024 x 512 resolution. Again a coarse-to-fine approach is adopted (256 x 128, 512 x 256, 1024 x 512). Performance is not guaranteed using these scripts.
    • For example, to train a 256 x 128 video with a single GPU (bash ./scripts/street/train_g1_256.sh)
    #!./scripts/street/train_g1_256.sh
    python train.py --name label2city_256_g1 --label_nc 35 --loadSize 256 --use_instance --fg --n_downsample_G 2 --num_D 1 --max_frames_per_gpu 6 --n_frames_total 6
  • Training at full (2k x 1k) resolution

    • To train the images at full resolution (2048 x 1024) requires 8 GPUs with at least 24G memory (bash ./scripts/street/train_2048.sh). If only GPUs with 12G/16G memory are available, please use the script ./scripts/street/train_2048_crop.sh, which will crop the images during training. Performance is not guaranteed with this script.

Training with face datasets

  • If you haven't, please first download example dataset by running python scripts/download_datasets.py.
  • Run the following command to compute face landmarks for training dataset:
    python data/face_landmark_detection.py train
  • Run the example script (bash ./scripts/face/train_512.sh)
    python train.py --name edge2face_512 --dataroot datasets/face/ --dataset_mode face --input_nc 15 --loadSize 512 --num_D 3 --gpu_ids 0,1,2,3,4,5,6,7 --n_gpus_gen 6 --n_frames_total 12  
  • For single GPU users, example scripts are in train_g1_XXX.sh. These scripts are not fully tested and please use at your own discretion. If you still hit out of memory errors, try reducing max_frames_per_gpu.
  • More examples scripts can be found in scripts/face/.
  • Please refer to More Training/Test Details for more explanations about training flags.

Training with pose datasets

  • If you haven't, please first download example dataset by running python scripts/download_datasets.py.
  • Example DensePose and OpenPose results are included. If you plan to use your own dataset, please generate these results and put them in the same way the example dataset is provided.
  • Run the example script (bash ./scripts/pose/train_256p.sh)
    python train.py --name pose2body_256p --dataroot datasets/pose --dataset_mode pose --input_nc 6 --num_D 2 --resize_or_crop ScaleHeight_and_scaledCrop --loadSize 384 --fineSize 256 --gpu_ids 0,1,2,3,4,5,6,7 --batchSize 8 --max_frames_per_gpu 3 --no_first_img --n_frames_total 12 --max_t_step 4
  • Again, for single GPU users, example scripts are in train_g1_XXX.sh. These scripts are not fully tested and please use at your own discretion. If you still hit out of memory errors, try reducing max_frames_per_gpu.
  • More examples scripts can be found in scripts/pose/.
  • Please refer to More Training/Test Details for more explanations about training flags.

Training with your own dataset

  • If your input is a label map, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please use --label_nc N during both training and testing.
  • If your input is not a label map, please specify --input_nc N where N is the number of input channels (The default is 3 for RGB images).
  • The default setting for preprocessing is scaleWidth, which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop option. For example, scaleWidth_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize). crop skips the resizing step and only performs random cropping. scaledCrop crops the image while retraining the original aspect ratio. randomScaleHeight will randomly scale the image height to be between opt.loadSize and opt.fineSize. If you don't want any preprocessing, please specify none, which will do nothing other than making sure the image is divisible by 32.

More Training/Test Details

  • We generate frames in the video sequentially, where the generation of the current frame depends on previous frames. To generate the first frame for the model, there are 3 different ways:

      1. Using another generator which was trained on generating single images (e.g., pix2pixHD) by specifying --use_single_G. This is the option we use in the test scripts.
      1. Using the first frame in the real sequence by specifying --use_real_img.
      1. Forcing the model to also synthesize the first frame by specifying --no_first_img. This must be trained separately before inference.
  • The way we train the model is as follows: suppose we have 8 GPUs, 4 for generators and 4 for discriminators, and we want to train 28 frames. Also, assume each GPU can generate only one frame. The first GPU generates the first frame, and pass it to the next GPU, and so on. After the 4 frames are generated, they are passed to the 4 discriminator GPUs to compute the losses. Then the last generated frame becomes input to the next batch, and the next 4 frames in the training sequence are loaded into GPUs. This is repeated 7 times (4 x 7 = 28), to train all the 28 frames.

  • Some important flags:

    • n_gpus_gen: the number of GPUs to use for generators (while the others are used for discriminators). We separate generators and discriminators into different GPUs since when dealing with high resolutions, even one frame cannot fit in a GPU. If the number is set to -1, there is no separation and all GPUs are used for both generators and discriminators (only works for low-res images).
    • n_frames_G: the number of input frames to feed into the generator network; i.e., n_frames_G - 1 is the number of frames we look into the past. the default is 3 (conditioned on previous two frames).
    • n_frames_D: the number of frames to feed into the temporal discriminator. The default is 3.
    • n_scales_spatial: the number of scales in the spatial domain. We train from the coarsest scale and all the way to the finest scale. The default is 3.
    • n_scales_temporal: the number of scales for the temporal discriminator. The finest scale takes in the sequence in the original frame rate. The coarser scales subsample the frames by a factor of n_frames_D before feeding the frames into the discriminator. For example, if n_frames_D = 3 and n_scales_temporal = 3, the discriminator effectively sees 27 frames. The default is 3.
    • max_frames_per_gpu: the number of frames in one GPU during training. If you run into out of memory error, please first try to reduce this number. If your GPU memory can fit more frames, try to make this number bigger to make training faster. The default is 1.
    • max_frames_backpropagate: the number of frames that loss backpropagates to previous frames. For example, if this number is 4, the loss on frame n will backpropagate to frame n-3. Increasing this number will slightly improve the performance, but also cause training to be less stable. The default is 1.
    • n_frames_total: the total number of frames in a sequence we want to train with. We gradually increase this number during training.
    • niter_step: for how many epochs do we double n_frames_total. The default is 5.
    • niter_fix_global: if this number if not 0, only train the finest spatial scale for this number of epochs before starting to fine-tune all scales.
    • batchSize: the number of sequences to train at a time. We normally set batchSize to 1 since often, one sequence is enough to occupy all GPUs. If you want to do batchSize > 1, currently only batchSize == n_gpus_gen is supported.
    • no_first_img: if not specified, the model will assume the first frame is given and synthesize the successive frames. If specified, the model will also try to synthesize the first frame instead.
    • fg: if specified, use the foreground-background separation model as stated in the paper. The foreground labels must be specified by --fg_labels.
    • no_flow: if specified, do not use flow warping and directly synthesize frames. We found this usually still works reasonably well when the background is static, while saving memory and training time.
    • sparse_D: if specified, only apply temporal discriminator on sparse frames in the sequence. This helps save memory while having little effect on performance.
  • For other flags, please see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags.

  • Additional flags for edge2face examples:

    • no_canny_edge: do not use canny edges for background as input.
    • no_dist_map: by default, we use distrance transform on the face edge map as input. This flag will make it directly use edge maps.
  • Additional flags for pose2body examples:

    • densepose_only: use only densepose results as input. Please also remember to change input_nc to be 3.
    • openpose_only: use only openpose results as input. Please also remember to change input_nc to be 3.
    • add_face_disc: add an additional discriminator that only works on the face region.
    • remove_face_labels: remove densepose results for face, and add noise to openpose face results, so the network can get more robust to different face shapes. This is important if you plan to do inference on half-body videos (if not, usually this flag is unnecessary).
    • random_drop_prob: the probability to randomly drop each pose segment during training, so the network can get more robust to missing poses at inference time. Default is 0.05.
    • basic_point_only: if specified, only use basic joint keypoints for OpenPose output, without using any hand or face keypoints.

Citation

If you find this useful for your research, please cite the following paper.

@inproceedings{wang2018vid2vid,
   author    = {Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Guilin Liu
                and Andrew Tao and Jan Kautz and Bryan Catanzaro},
   title     = {Video-to-Video Synthesis},
   booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},   
   year      = {2018},
}

Acknowledgments

We thank Karan Sapra, Fitsum Reda, and Matthieu Le for generating the segmentation maps for us. We also thank Lisa Rhee for allowing us to use her dance videos for training. We thank William S. Peebles for proofreading the paper.
This code borrows heavily from pytorch-CycleGAN-and-pix2pix and pix2pixHD.

Comments
  • TypeError: __init__() got an unexpected keyword argument 'track_running_stats'

    TypeError: __init__() got an unexpected keyword argument 'track_running_stats'

    i have installed this repo in nvidia docker env with: cuda8.0, cudnn6.0, miniconda with python3.6 virtualenv, pytorch0.2.0

    when i run ./scripts/test_2048.sh this shell, i got beblow error:

    ------------ Options ------------- aspect_ratio: 1.0 batchSize: 1 checkpoints_dir: ./checkpoints dataroot: datasets/Cityscapes/ dataset_mode: temporal debug: False display_id: 0 display_winsize: 512 feat_num: 3 fg: True fg_labels: [26] fineSize: 512 gpu_ids: [0] how_many: 300 input_nc: 3 isTrain: False label_feat: False label_nc: 35 loadSize: 2048 load_features: False load_pretrain: max_dataset_size: inf model: vid2vid nThreads: 2 n_blocks: 9 n_blocks_local: 3 n_downsample_E: 3 n_downsample_G: 3 n_frames_G: 3 n_gpus_gen: 1 n_local_enhancers: 1 n_scales_spatial: 3 name: label2city_2048 ndf: 64 nef: 32 netE: simple netG: composite ngf: 128 no_first_img: False no_flip: False norm: batch ntest: inf output_nc: 3 phase: test resize_or_crop: scaleWidth results_dir: ./results/ serial_batches: False tf_log: False use_instance: True use_real_img: False use_single_G: True which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [TestDataset] was created vid2vid ---------- Networks initialized -------------

    Traceback (most recent call last): File "test.py", line 24, in model = create_model(opt) File "/vid2vid/models/models.py", line 19, in create_model modelG.initialize(opt) File "/vid2vid/models/vid2vid_model_G.py", line 51, in initialize self.netG_i = self.load_single_G() if self.use_single_G else None File "/vid2vid/models/vid2vid_model_G.py", line 270, in load_single_G netG = networks.define_G(input_nc, opt.output_nc, 0, 32, 'local', 4, 'instance', 0, self.gpu_ids, opt) File "/vid2vid/models/networks.py", line 39, in define_G netG = LocalEnhancer(input_nc, output_nc, ngf, n_downsampling, opt.n_blocks, opt.n_local_enhancers, opt.n_blocks_local, norm_layer) File "/vid2vid/models/networks.py", line 320, in init model_global = GlobalGenerator(input_nc, output_nc, ngf_global, n_downsample_global, n_blocks_global, norm_layer).model File "/vid2vid/models/networks.py", line 286, in init model = [nn.ReflectionPad2d(3), nn.Conv2d(input_nc, ngf, kernel_size=7, padding=0), norm_layer(ngf), activation] TypeError: init() got an unexpected keyword argument 'track_running_stats' (py3) root@0d93b7e85c1e:/vid2vid#

    can anyone tell me how to solve it.

    thks!

    opened by zzzkk2009 15
  • ModuleNotFoundError: No module named 'resample2d_cuda'

    ModuleNotFoundError: No module named 'resample2d_cuda'

    Hi,

    I faced an issue, 'ModuleNotFoundError: No module named 'resample2d_cuda'. Do you know how to solve this?

    'resample2d_package' folder contains as below. D:\download\vid2vid\models\flownet2_pytorch\networks\resample2d_package
    pycache init.py resample2d.py resample2d_cuda.cc resample2d_kernel.cu resample2d_kernel.cuh setup.py

    Following is the cmd command.

    D:\download\vid2vid>python test.py --name label2city_2048 --dataroot datasets/Cityscapes/test_A --loadSize 2048 --n_scales_spatial 3 --use_instance --fg --use_single_G ------------ Options ------------- aspect_ratio: 1.0 batchSize: 1 checkpoints_dir: ./checkpoints dataroot: datasets/Cityscapes/test_A dataset_mode: temporal debug: False display_id: 0 display_winsize: 512 feat_num: 3 fg: True fg_labels: [26] fineSize: 512 gpu_ids: [0] how_many: 300 input_nc: 3 isTrain: False label_feat: False label_nc: 35 loadSize: 2048 load_features: False load_pretrain: max_dataset_size: inf model: vid2vid nThreads: 2 n_blocks: 9 n_blocks_local: 3 n_downsample_E: 3 n_downsample_G: 3 n_frames_G: 3 n_gpus_gen: 1 n_local_enhancers: 1 n_scales_spatial: 3 name: label2city_2048 ndf: 64 nef: 32 netE: simple netG: composite ngf: 128 no_first_img: False no_flip: False norm: batch ntest: inf output_nc: 3 phase: test resize_or_crop: scaleWidth results_dir: ./results/ serial_batches: False tf_log: False use_instance: True use_real_img: False use_single_G: True which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [TestDataset] was created vid2vid Traceback (most recent call last): File "test.py", line 24, in model = create_model(opt) File "D:\download\vid2vid\models\models.py", line 7, in create_model from .vid2vid_model_G import Vid2VidModelG File "D:\download\vid2vid\models\vid2vid_model_G.py", line 13, in from . import networks File "D:\download\vid2vid\models\networks.py", line 12, in from .flownet2_pytorch.networks.resample2d_package.resample2d import Resample2d File "D:\download\vid2vid\models\flownet2_pytorch\networks\resample2d_package\resample2d.py", line 3, in import resample2d_cuda ModuleNotFoundError: No module named 'resample2d_cuda'

    opened by TatsuyaOsugi 13
  • Download scripts no longer working

    Download scripts no longer working

    ModuleNotFoundError: No module named 'scripts.download_gdrive'

    They need to be updated as the scripts are now in specific sub folders. Also, I believe an init.py file is required in the scripts folder.

    opened by fniroui 11
  • Understanding the flow for training faces

    Understanding the flow for training faces

    Please correct me if I am wrong. ( I am focusing just on faces)

    As I understand, vid2vid lets you provide a video from which each frame is like labeled data for training. So once one has a trained model, then given any input data of just edge-maps, then vid2vid will try to create a face (based on the trained data) from the edge maps.

    I am not clear though how to do this with train.py. Do I need to generate edge-maps myself for each frame of my video?

    Ideally I want to just provide vid2vid a single say .avi or video file and vid2vid generate edge-maps itself for each frame, outputs a trained model.

    Thank you @tcwang0509 @junyanz

    When answering, please include CLI commands that I can copy paste/directions that I can immediately do/changes to Python code that might be needed.

    opened by fxfactorial 9
  • Using model on own data set.

    Using model on own data set.

    Hi this is an extremely interesting project, I have another set of data set that I would like to use. The data I want to use are images that are taken in 15 minute increments and I want to transform them into another image. We have used pix2pix for this problem set, but we also wanted to see if vid2vid will yield better results. I want to know how I need to have the images to be formatted or organized before training on them? Any help would be much appreciated.

    opened by yuanzhou15 8
  • ModuleNotFoundError: No module named 'models.flownet2_pytorch.networks.resample2d_package._ext'

    ModuleNotFoundError: No module named 'models.flownet2_pytorch.networks.resample2d_package._ext'

    Failed to test vid2vid...

    ➜  vid2vid git:(master) python test.py --name label2city_2048 --label_nc 35 --loadSize 2048 --n_scales_spatial 3 --use_instance --fg --use_single_G
    ------------ Options -------------
    add_face_disc: False
    aspect_ratio: 1.0
    batchSize: 1
    checkpoints_dir: ./checkpoints
    dataroot: datasets/Cityscapes/
    dataset_mode: temporal
    debug: False
    densepose_only: False
    display_id: 0
    display_winsize: 512
    feat_num: 3
    fg: True
    fg_labels: [26]
    fineSize: 512
    gpu_ids: [0]
    how_many: 300
    input_nc: 3
    isTrain: False
    label_feat: False
    label_nc: 35
    loadSize: 2048
    load_features: False
    load_pretrain: 
    max_dataset_size: inf
    model: vid2vid
    nThreads: 2
    n_blocks: 9
    n_blocks_local: 3
    n_downsample_E: 3
    n_downsample_G: 3
    n_frames_G: 3
    n_gpus_gen: 1
    n_local_enhancers: 1
    n_scales_spatial: 3
    name: label2city_2048
    ndf: 64
    nef: 32
    netE: simple
    netG: composite
    ngf: 128
    no_canny_edge: False
    no_dist_map: False
    no_first_img: False
    no_flip: False
    no_flow: False
    norm: batch
    ntest: inf
    openpose_only: False
    output_nc: 3
    phase: test
    random_drop_prob: 0.2
    remove_face_labels: False
    resize_or_crop: scaleWidth
    results_dir: ./results/
    serial_batches: False
    start_frame: 0
    tf_log: False
    use_instance: True
    use_real_img: False
    use_single_G: True
    which_epoch: latest
    -------------- End ----------------
    CustomDatasetDataLoader
    dataset [TestDataset] was created
    vid2vid
    Traceback (most recent call last):
      File "test.py", line 25, in <module>
        model = create_model(opt)
      File "....../vid2vid/models/models.py", line 7, in create_model
        from .vid2vid_model_G import Vid2VidModelG
      File "....../vid2vid/models/vid2vid_model_G.py", line 13, in <module>
        from . import networks
      File "....../vid2vid/models/networks.py", line 12, in <module>
        from .flownet2_pytorch.networks.resample2d_package.resample2d import Resample2d
    ModuleNotFoundError: No module named 'models.flownet2_pytorch'
    
    opened by jiapei100 6
  • spatio-temporal growing

    spatio-temporal growing

    Hey, in your paper you mention in the experiment section: "Implementation details. We train our network in a spatio-temporally progressive manner. In particular, we start with generating low-resolution videos with few frames, and all the way up to generating full resolution videos with 30 (or more) frames."

    How exactly did you do the scaling? I looked through your code, but couldn't find anything related to it. In particular, I would like to know whether you increase both spatial/temporal size at the same time, or one after another, and whether you adjusted other hparams when using it.

    What I mean by adjusting hparams is that this progressive growing is mainly used to reduce train time I guess, so if the model would usually take N epochs to train on high res vids from scratch, you probably trained it on M<N epochs per progressive growing stage right? Then what is M? N/#stages? And did you use larger batch size for smaller resolutions, or make any other notable hparam changes?

    opened by fa9r 5
  • Training a face model with train_g1_256.sh

    Training a face model with train_g1_256.sh

    Did anyone successfully trained a model with train_g1_256.sh? The readme says that single GPU models were to well tested. For some reasons the training finishes after only a few hours and trying to test I get:

    ISSUE: Pretrained network G0 has fewer layers; The following are not initialized:
    ['model_down_img', 'model_down_seg', 'model_final_flow', 'model_final_img', 'model_final_w', 'model_res_flow', 'model_res_img', 'model_up_flow', 'model_up_img']
    
    opened by petergerten 5
  • 'Segmentation fault (core dumped)' still exists.

    'Segmentation fault (core dumped)' still exists.

    I downloaded the latest version (Aug 24th), and followed the steps in 'Read me' Paragraph 'Test', but I still get such error. My CUDA version is 9.0, PyTorch version is 4.1.

    Can someone help me?

    ------------ Options -------------
    aspect_ratio: 1.0
    batchSize: 1
    checkpoints_dir: ./checkpoints
    dataroot: datasets/Cityscapes/test_A
    dataset_mode: temporal
    debug: False
    display_id: 0
    display_winsize: 512
    feat_num: 3
    fg: True
    fg_labels: [26]
    fineSize: 512
    gpu_ids: [1, 2]
    how_many: 300
    input_nc: 3
    isTrain: False
    label_feat: False
    label_nc: 35
    loadSize: 2048
    load_features: False
    load_pretrain: 
    max_dataset_size: inf
    model: vid2vid
    nThreads: 2
    n_blocks: 9
    n_blocks_local: 3
    n_downsample_E: 3
    n_downsample_G: 3
    n_frames_G: 3
    n_gpus_gen: 2
    n_local_enhancers: 1
    n_scales_spatial: 3
    name: label2city_2048
    ndf: 64
    nef: 32
    netE: simple
    netG: composite
    ngf: 128
    no_first_img: False
    no_flip: False
    norm: batch
    ntest: inf
    output_nc: 3
    phase: test
    resize_or_crop: scaleWidth
    results_dir: ./results/
    serial_batches: False
    tf_log: False
    use_instance: True
    use_real_img: False
    use_single_G: True
    which_epoch: latest
    -------------- End ----------------
    CustomDatasetDataLoader
    dataset [TestDataset] was created
    vid2vid
    ---------- Networks initialized -------------
    -----------------------------------------------
    Doing 28 frames
    Segmentation fault (core dumped)
    
    opened by joe1chief 5
  • Add Dockerfile and launch script

    Add Dockerfile and launch script

    I hope this docker image is useful to others! Would have saved me a couple days of trial and error work fighting competing dependencies

    I've tested this Dockerfile so far as it can run bash ./scripts/test_1024_g1.sh. I have a single GPU setup so I can't test any multi-GPU features.

    If I encounter further issues as I use vid2vid, I'll update this pull request.

    opened by dustinfreeman 4
  • Sometimes ran into RuntimeError: Given groups=1, weight of size [64, 18, 7, 7]... when training.

    Sometimes ran into RuntimeError: Given groups=1, weight of size [64, 18, 7, 7]... when training.

    After I train the model with follow parameters:

    python train.py --name pose \
    --dataroot datasets/pose --dataset_mode pose \
    --input_nc 6 --ngf 64 --num_D 2 \
    --resize_or_crop scaleHeight_and_scaledCrop --loadSize 288 --fineSize 256 \
    --niter 5 --niter_decay 5 \
    --n_frames_total 20 --max_t_step 4 \
    --max_frames_per_gpu 8
    

    Logs

    (epoch: 8, iters: 18006, time: 4.986) D_T_fake0: 0.064 D_T_fake1: 0.228 D_T_real0: 0.167 D_T_real1: 0.120 D_fake: 0.161 D_real: 0.498 G_GAN: 2.842 G_GAN_Feat: 5.241 G_T_GAN>
    (epoch: 8, iters: 18106, time: 5.215) D_T_fake0: 0.038 D_T_fake1: 0.124 D_T_real0: 0.072 D_T_real1: 0.155 D_fake: 0.494 D_real: 0.542 G_GAN: 2.327 G_GAN_Feat: 5.192 G_T_GAN>
    (epoch: 8, iters: 18206, time: 5.250) D_T_fake0: 0.050 D_T_fake1: 0.053 D_T_real0: 0.139 D_T_real1: 0.108 D_fake: 0.361 D_real: 0.640 G_GAN: 2.527 G_GAN_Feat: 4.626 G_T_GAN>
    (epoch: 8, iters: 18306, time: 5.295) D_T_fake0: 0.019 D_T_fake1: 0.229 D_T_real0: 0.278 D_T_real1: 0.049 D_fake: 0.573 D_real: 0.695 G_GAN: 2.337 G_GAN_Feat: 4.622 G_T_GAN>
    

    Traceback

    Traceback (most recent call last):
      File "train.py", line 148, in <module>
        train()
      File "train.py", line 55, in train
        fake_B, fake_B_raw, flow, weight, real_A, real_Bp, fake_B_last = modelG(input_A, input_B, inst_A, fake_B_prev_last)
      File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/vid2vid/models/models.py", line 37, in forward
        outputs = self.model(*inputs, **kwargs, dummy_bs=self.pad_bs)
      File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
        return self.module(*inputs[0], **kwargs[0])
      File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/vid2vid/models/vid2vid_model_G.py", line 133, in forward
        fake_B, fake_B_raw, flow, weight = self.generate_frame_train(netG, real_A_all, fake_B_prev, start_gpu, is_first_frame)
      File "/vid2vid/models/vid2vid_model_G.py", line 178, in generate_frame_train
        fake_B_feat, flow_feat, fake_B_fg_feat, use_raw_only)
      File "/vid2vid/models/networks.py", line 204, in forward
        downsample = self.model_down_seg(input) + self.model_down_img(img_prev)
      File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
        input = module(input)
      File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
        self.padding, self.dilation, self.groups)
    RuntimeError: Given groups=1, weight of size [64, 18, 7, 7], expected input[1, 12, 262, 198] to have 18 channels, but got 12 channels instead
    

    I've tried to continue train this model but it still happended after 10000~100000 iterations.

    opened by sheiun 3
  • RuntimeError: DataLoader worker (pid(s) 22100) exited unexpectedly

    RuntimeError: DataLoader worker (pid(s) 22100) exited unexpectedly

    Dear and Near,

    I tried all the possibilities. But no luck. Could anyone help me to figure out the issue? Your help is very much appreciated!

    I am using windows 11 Pc with NVIDIA GeForce RTX 3090 GPU

    ------------ Options -------------
    add_face_disc: False
    aspect_ratio: 1.0
    basic_point_only: False
    batchSize: 1
    checkpoints_dir: ./checkpoints
    dataroot: datasets/Cityscapes/
    dataset_mode: temporal
    debug: False
    densepose_only: False
    display_id: 0
    display_winsize: 512
    feat_num: 3
    fg: True
    fg_labels: [26]
    fineSize: 512
    fp16: False
    gpu_ids: [0]
    how_many: 300
    input_nc: 3
    isTrain: False
    label_feat: False
    label_nc: 35
    loadSize: 1024
    load_features: False
    load_pretrain: 
    local_rank: 0
    max_dataset_size: inf
    model: vid2vid
    nThreads: 2
    n_blocks: 9
    n_blocks_local: 3
    n_downsample_E: 3
    n_downsample_G: 2
    n_frames_G: 3
    n_gpus_gen: 1
    n_local_enhancers: 1
    n_scales_spatial: 3
    name: label2city_1024_g1
    ndf: 64
    nef: 32
    netE: simple
    netG: composite
    ngf: 128
    no_canny_edge: False
    no_dist_map: False
    no_first_img: False
    no_flip: False
    no_flow: False
    norm: batch
    ntest: inf
    openpose_only: False
    output_nc: 3
    phase: test
    random_drop_prob: 0.05
    random_scale_points: False
    remove_face_labels: False
    resize_or_crop: scaleWidth
    results_dir: ./results/
    serial_batches: False
    start_frame: 0
    tf_log: False
    use_instance: True
    use_real_img: False
    use_single_G: True
    which_epoch: latest
    -------------- End ----------------
    CustomDatasetDataLoader
    dataset [TestDataset] was created
    vid2vid
    ---------- Networks initialized -------------
    -----------------------------------------------
    Doing 28 frames
    []
    Num GPUs Available:  0
    2022-09-07 18:43:37.812230: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2022-09-07 18:43:37.818042: I tensorflow/compiler/xla/service/service.cc:170] XLA service 0x2d8390eac00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2022-09-07 18:43:37.818151: I tensorflow/compiler/xla/service/service.cc:178]   StreamExecutor device (0): Host, Default Version
    [0]
    Device ID (unmasked): 0
    Device ID (masked): 0
    a+b=42
    2022-09-07 18:43:37.823294: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
    ------------ Options -------------
    add_face_disc: False
    aspect_ratio: 1.0
    basic_point_only: False
    batchSize: 1
    checkpoints_dir: ./checkpoints
    dataroot: datasets/Cityscapes/
    dataset_mode: temporal
    debug: False
    densepose_only: False
    display_id: 0
    display_winsize: 512
    feat_num: 3
    fg: True
    fg_labels: [26]
    fineSize: 512
    fp16: False
    gpu_ids: [0]
    how_many: 300
    input_nc: 3
    isTrain: False
    label_feat: False
    label_nc: 35
    loadSize: 1024
    load_features: False
    load_pretrain: 
    local_rank: 0
    max_dataset_size: inf
    model: vid2vid
    nThreads: 2
    n_blocks: 9
    n_blocks_local: 3
    n_downsample_E: 3
    n_downsample_G: 2
    n_frames_G: 3
    n_gpus_gen: 1
    n_local_enhancers: 1
    n_scales_spatial: 3
    name: label2city_1024_g1
    ndf: 64
    nef: 32
    netE: simple
    netG: composite
    ngf: 128
    no_canny_edge: False
    no_dist_map: False
    no_first_img: False
    no_flip: False
    no_flow: False
    norm: batch
    ntest: inf
    openpose_only: False
    output_nc: 3
    phase: test
    random_drop_prob: 0.05
    random_scale_points: False
    remove_face_labels: False
    resize_or_crop: scaleWidth
    results_dir: ./results/
    serial_batches: False
    start_frame: 0
    tf_log: False
    use_instance: True
    use_real_img: False
    use_single_G: True
    which_epoch: latest
    -------------- End ----------------
    CustomDatasetDataLoader
    dataset [TestDataset] was created
    vid2vid
    ---------- Networks initialized -------------
    -----------------------------------------------
    Doing 28 frames
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\spawn.py", line 125, in _main
        prepare(preparation_data)
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\spawn.py", line 236, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
        main_content = runpy.run_path(main_path,
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\runpy.py", line 265, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\runpy.py", line 97, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "D:\Python_Code\venv\vid2vid-master\test.py", line 75, in <module>
        for i, data in enumerate(dataset):
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\site-packages\torch\utils\data\dataloader.py", line 368, in __iter__
        return self._get_iterator()
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\site-packages\torch\utils\data\dataloader.py", line 314, in _get_iterator
        return _MultiProcessingDataLoaderIter(self)
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\site-packages\torch\utils\data\dataloader.py", line 927, in __init__
        w.start()
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\process.py", line 121, in start
        self._popen = self._Popen(self)
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\context.py", line 224, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\context.py", line 327, in _Popen
        return Popen(process_obj)
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
        raise RuntimeError('''
    RuntimeError: 
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.
    
            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:
    
                if __name__ == '__main__':
                    freeze_support()
                    ...
    
            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Traceback (most recent call last):
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\site-packages\torch\utils\data\dataloader.py", line 1011, in _try_get_data
        data = self._data_queue.get(timeout=timeout)
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\multiprocessing\queues.py", line 108, in get
        raise Empty
    _queue.Empty
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "D:/Python_Code/venv/vid2vid-master/test.py", line 75, in <module>
        for i, data in enumerate(dataset):
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\site-packages\torch\utils\data\dataloader.py", line 530, in __next__
        data = self._next_data()
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\site-packages\torch\utils\data\dataloader.py", line 1207, in _next_data
        idx, data = self._get_data()
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\site-packages\torch\utils\data\dataloader.py", line 1173, in _get_data
        success, data = self._try_get_data()
      File "C:\ProgramData\Miniconda3\envs\tf2.4\lib\site-packages\torch\utils\data\dataloader.py", line 1024, in _try_get_data
        raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
    RuntimeError: DataLoader worker (pid(s) 22100) exited unexpectedly
    
    opened by am-official 0
  • Errors when running on CPU without CUDA

    Errors when running on CPU without CUDA

    When I set --gpu_id=-1, I start getting errors from the models when they are initialized like

    Traceback (most recent call last):
      File "train.py", line 148, in <module>
        train()
      File "train.py", line 28, in train
        models = create_model(opt)
      File "C:\Users\moish\Desktop\vid2vid\models\models.py", line 76, in create_model
        modelG.initialize(opt)
      File "C:\Users\moish\Desktop\vid2vid\models\vid2vid_model_G.py", line 59, in initialize
        self.n_frames_per_gpu = min(self.opt.max_frames_per_gpu, self.opt.n_frames_total // self.n_gpus) # number of frames in each GPU
    ZeroDivisionError: integer division or modulo by zero
    

    When then setting --n_gpus_gen 1 it gets through that error but then comes

    Traceback (most recent call last):
      File "train.py", line 148, in <module>
        train()
      File "train.py", line 28, in train
        models = create_model(opt)
      File "C:\Users\moish\Desktop\vid2vid\models\models.py", line 78, in create_model
        modelD.initialize(opt)
      File "C:\Users\moish\Desktop\vid2vid\models\vid2vid_model_D.py", line 22, in initialize
        self.gpu_ids = ([opt.gpu_ids[0]] + opt.gpu_ids[gpu_split_id:]) if opt.n_gpus_gen != len(opt.gpu_ids) else opt.gpu_ids
    IndexError: list index out of range
    

    Forcefully setting it to -1 fixes it but then gives me

    Traceback (most recent call last):
      File "train.py", line 148, in <module>
        train()
      File "train.py", line 28, in train
        models = create_model(opt)
      File "C:\Users\moish\Desktop\vid2vid\models\models.py", line 78, in create_model
        modelD.initialize(opt)
      File "C:\Users\moish\Desktop\vid2vid\models\vid2vid_model_D.py", line 37, in initialize
        opt.num_D, not opt.no_ganFeat, gpu_ids=self.gpu_ids)
      File "C:\Users\moish\Desktop\vid2vid\models\networks.py", line 66, in define_D
        netD.cuda(gpu_ids[0])
      File "C:\Users\moish\.conda\envs\vid2vid\lib\site-packages\torch\nn\modules\module.py", line 304, in cuda
        return self._apply(lambda t: t.cuda(device))
      File "C:\Users\moish\.conda\envs\vid2vid\lib\site-packages\torch\nn\modules\module.py", line 201, in _apply
        module._apply(fn)
      File "C:\Users\moish\.conda\envs\vid2vid\lib\site-packages\torch\nn\modules\module.py", line 201, in _apply
        module._apply(fn)
      File "C:\Users\moish\.conda\envs\vid2vid\lib\site-packages\torch\nn\modules\module.py", line 223, in _apply
        param_applied = fn(param)
      File "C:\Users\moish\.conda\envs\vid2vid\lib\site-packages\torch\nn\modules\module.py", line 304, in <lambda>
        return self._apply(lambda t: t.cuda(device))
    RuntimeError: Device index must not be negative
    

    I'm stuck y'all.

    opened by PinPointPing 0
  • RuntimeError: Legacy autograd function with non-static forward method is deprecated.

    RuntimeError: Legacy autograd function with non-static forward method is deprecated.

    When running

    python train.py --name edge2face_512 --dataroot datasets/face/ --dataset_mode face --input_nc 15 --loadSize 256 --num_D 1 --max_frames_per_gpu 2 --n_frames_total 6
    

    I am getting this error: image I am using pytorch 1.9.0 and torchvision 0.10.0.

    @Note: I tried to compile using pytorch0.4.1 but unable to compile resample2d_cuda. Even after running install.sh I get import error

    import resample2d_cuda
    ImportError: libc10.so: cannot open shared object file: No such file or directory
    
    opened by avani17101 2
  • RuntimeError: CUDA error: throwing an instance of 'c10::Error'

    RuntimeError: CUDA error: throwing an instance of 'c10::Error'

    Hi all,

    I'm getting this error while training vid2vid;

    RuntimeError: CUDA error: the launch timed out and was terminated
    terminate called after throwing an instance of 'c10::Error' 
    

    The error pops up when the script starts the validation phase; here is the entire log; log.txt

    I'm training vid2vid on the Kitti dataset using a server with 8 X 12GB Quadro M6000 GPUs. I'm also attaching the config file; ampO1.txt

    Any suggestions?

    Thanks :)

    opened by alamayreh 1
  • How to test “pose-to-body”?

    How to test “pose-to-body”?

    thanks for sharing! If I finish training pose-to-body, how to run test.py, what is the python command? And where will the weights be saved after training

    opened by guofengming11 0
Owner
NVIDIA Corporation
NVIDIA Corporation
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

The Hypersim Dataset For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real i

Apple 1.3k Jan 4, 2023
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation E2EC: An End-to-End Contour-based Method for High-Quality H

zhangtao 146 Dec 29, 2022
Pytorch implementation of "ARM: Any-Time Super-Resolution Method"

ARM-Net Dependencies Python 3.6 Pytorch 1.7 Results Train Data preprocessing cd data_scripts python extract_subimages_test.py python data_augmentation

Bohong Chen 55 Nov 24, 2022
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
PyTorch implementation of our method for adversarial attacks and defenses in hyperspectral image classification.

Self-Attention Context Network for Hyperspectral Image Classification PyTorch implementation of our method for adversarial attacks and defenses in hyp

null 22 Dec 2, 2022
[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Reference-based Video Super-Resolution (RefVSR) Official PyTorch Implementation of the CVPR 2022 Paper Project | arXiv | RealMCVSR Dataset This repo c

Junyong Lee 151 Dec 30, 2022
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.5k Jan 2, 2023
This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

Lite-HRNet: A Lightweight High-Resolution Network Introduction This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution

HRNet 675 Dec 25, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 4, 2023
Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

VITON-HD — Official PyTorch Implementation VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization Seunghwan Choi*1, Sunghyun Pa

Seunghwan Choi 250 Jan 6, 2023
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 54 Aug 30, 2021
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

null 2 Nov 15, 2021
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

Haoyu Chen 71 Dec 30, 2022
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

null 143 Dec 28, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022