Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

Related tags

Deep Learning LPTN
Overview

LPTN

Paper | Supplementary Material | Poster

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
Jie Liang*, Hui Zeng*, and Lei Zhang.
In CVPR 2021.

Abstract

Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps. In this paper, we focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we reveal that the attribute transformations, such as illumination and color manipulation, relate more to the low-frequency component, while the content details can be adaptively refined on high-frequency components. We consequently propose a Laplacian Pyramid Translation Network (LPTN) to simultaneously perform these two tasks, where we design a lightweight network for translating the low-frequency component with reduced resolution and a progressive masking strategy to efficiently refine the high-frequency ones. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details. Extensive experimental results on various tasks demonstrate that the proposed method can translate 4K images in real-time using one normal GPU while achieving comparable transformation performance against existing methods.

Overall pipeline of the LPTN:

pipeline

For more details, please refer to our paper.

Getting started

  • Clone this repo.
git clone https://github.com/csjliang/LPTN
cd LPTN
  • Install dependencies. (Python 3 + NVIDIA GPU + CUDA. Recommend to use Anaconda)
pip install -r requirement.txt
  • Download dataset (FiveK in 480p) and create lmdb (to accelerate training).
PYTHONPATH="./:${PYTHONPATH}" python scripts/data_preparation/download_datasets.py
PYTHONPATH="./:${PYTHONPATH}" python scripts/data_preparation/create_lmdb.py

Training

First, check and adapt the yml file options/train/LPTN/train_FiveK.yml, then

  • Single GPU:
PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0 python codes/train.py -opt options/train/LPTN/train_FiveK.yml
  • Distributed Training:
PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 codes/train.py -opt options/train/LPTN/train_FiveK.yml --launcher pytorch

Training files (logs, models, training states and visualizations) will be saved in the directory ./experiments/{name}

Evaluation

First, check and adapt the yml file options/test/LPTN/test_FiveK.yml and options/test/LPTN/test_speed_FiveK.yml, then

  • Calculate metrics and save visual results:
PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0 python codes/test.py -opt options/test/LPTN/test_FiveK.yml
  • Test inference speed:
PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0 python codes/test_speed.py -opt options/test/LPTN/test_speed_FiveK.yml

Evaluating files (logs and visualizations) will be saved in the directory ./results/{name}

Use Pretrained Models

  • Download the pretrained model from GoogleDrive and move it to the directory experiments/pretrained_models:

  • Specify the path: pretrain_network_g in test_FiveK.yml and run evaluation.

Notes

  • We have optimized the training process and improved the performance (get 22.9db on FiveK at 480p)

  • We will release the datasets of day2night and sum2win later.

Citation

If you use this dataset or code for your research, please cite our paper.

@inproceedings{jie2021LPTN,
  title={High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network},
  author={Liang, Jie and Zeng, Hui and Zhang, Lei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Acknowledgement

We borrowed the training and validating framework from the excellent BasicSR project.

Contact

Should you have any questions, please contact me via [email protected].

Comments
  • Can conditional (paired) image-to-image translation be done using LPTN?

    Can conditional (paired) image-to-image translation be done using LPTN?

    Hello @csjliang,

    Thank you for sharing your implementation.

    I am working on a shadow generation for cars, I tried working with the pix2pix model but the results were a bit pixelated and I was not able to get high-resolution images. I want to use an image of size 1200x1600 is it possible to work on paired image-to-image translation code with your implementation? If yes, what are the necessary changes I need to make in order it to work?

    Thanks in advance.

    Best, @vgthengane

    opened by vgthengane 8
  • the pretrained model test results not good

    the pretrained model test results not good

    I used your pretrained model to test the images on FiveK datasets, but some images will generate "Ghost shadow" on the images (eg. the middle image as below). Would you please help to fix this issue? or can you give me some guide to avoid this? thanks a lot. 4501_LPTN_FiveK_480p

    opened by semchan 6
  • 下载数据集遇到的问题

    下载数据集遇到的问题

    当我执行PYTHONPATH="./:${PYTHONPATH}" python scripts/data_preparation/download_datasets.py时 报了以下错误,是链接失效了吗 Traceback (most recent call last): File "scripts/data_preparation/download_datasets.py", line 56, in download_dataset(file_ids[dataset]) File "scripts/data_preparation/download_datasets.py", line 26, in download_dataset download_file_from_google_drive(file_id, save_path) File "/mnt/lbh/new/Problems/LPTN-main/codes/utils/download_util.py", line 23, in download_file_from_google_drive response = session.get(URL, params=params, stream=True) File "/home/lbh/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 546, in get return self.request('GET', url, **kwargs) File "/home/lbh/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/home/lbh/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/home/lbh/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 516, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='docs.google.com', port=443): Max retries exceeded with url: /uc?export=download&id=1oAORKd-TPnPwZvhcnEEJqc1ogT7KgFtx (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0325ff1310>: Failed to establish a new connection: [Errno 110] Connection timed out'))

    opened by bohuisir 5
  • About summer2winter dataset

    About summer2winter dataset

    Hi,thank you for your wonderful work! i want to train a summer2winter model,but there is no a HD dataset. so can you share your summer2winter dataset,please? thanks!

    opened by hughwcq 4
  • AttributeError: 'NoneType' object has no attribute '__buffer__'

    AttributeError: 'NoneType' object has no attribute '__buffer__'

    Hi, I am running the single training script. The program has such an error:

      File "codes/train.py", line 250, in <module>
        main()
      File "codes/train.py", line 179, in main
        prefetcher = CUDAPrefetcher(train_loader, opt)
      File "/data/code_fusion/Pytorch-LPTN/LPTN-main/codes/data/prefetch_dataloader.py", line 103, in __init__
        self.preload()
      File "/data/code_fusion/Pytorch-LPTN/LPTN-main/codes/data/prefetch_dataloader.py", line 109, in preload
        self.batch = next(self.loader)  # self.batch is a dict
      File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
        data = self._next_data()
      File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
        return self._process_data(data)
      File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
        data.reraise()
      File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
        raise self.exc_type(msg)
    AttributeError: Caught AttributeError in DataLoader worker process 0.
    Original Traceback (most recent call last):
      File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
        data = fetcher.fetch(index)
      File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/data/code_fusion/Pytorch-LPTN/LPTN-main/codes/data/unpair_image_dataset.py", line 53, in __getitem__
        img_gt = imfrombytes(img_bytes, float32=True)
      File "/data/code_fusion/Pytorch-LPTN/LPTN-main/codes/utils/img_util.py", line 166, in imfrombytes
        img_np = np.frombuffer(content, np.uint8)
    AttributeError: 'NoneType' object has no attribute '__buffer__'
    

    but, the number of the data in the input log is not 0.

    2021-05-21 10:48:20,393 INFO: Dataset UnPairedImageDataset - FiveK is created.
    2021-05-21 10:48:20,393 INFO: Training statistics:
    	Number of train images: 2250
    	Dataset enlarge ratio: 100
    	Batch size per gpu: 32
    	World size (gpu number): 1
    	Require iter number per epoch: 7032
    	Total epochs: 43; iters: 300000.
    2021-05-21 10:48:20,460 INFO: Dataset PairedImageDataset - FiveK_val is created.
    2021-05-21 10:48:20,460 INFO: Number of val images/folders in FiveK_val: 500
    
    opened by 24werewolf 4
  •  Hi, I have some questions about your code

    Hi, I have some questions about your code

    I want to ask something about LP (Laplacian Pyramid).I use your code to decompose gt(groundtruth) for high frequency and low frequency, and perform LP inverse transformation on them.But there is a big gap between the results and gt,For example, MSE can reach more than 100. So I want to know why? Have you verified this?

    opened by wangweiran970922 3
  • Only support single GPU validation?

    Only support single GPU validation?

    Dear csjliang

    When I use Distributed Training by README

    2021-07-25 20:04:34,446 INFO: [LPTN_..][epoch: 16, iter: 19,800, lr:(1.000e-04,)] [eta: 2 days, 18:43:45, time (data): 0.178 (0.001)] l_g_pix: 3.1238e+01 l_g_gan: 8.7738e+01 l_d_real: 7.0186e+01 out_d_real: -7.0186e+01 l_d_fake: -8.7537e+01 out_d_fake: -8.7537e+01 2021-07-25 20:06:00,337 INFO: [LPTN_..][epoch: 17, iter: 19,900, lr:(1.000e-04,)] [eta: 2 days, 18:42:21, time (data): 0.504 (0.001)] l_g_pix: 2.0734e+01 l_g_gan: 9.3872e+01 l_d_real: 6.7697e+01 out_d_real: -6.7697e+01 l_d_fake: -9.4580e+01 out_d_fake: -9.4580e+01 2021-07-25 20:07:30,459 INFO: [LPTN_..][epoch: 17, iter: 20,000, lr:(1.000e-04,)] [eta: 2 days, 18:41:57, time (data): 0.202 (0.001)] l_g_pix: 3.0153e+01 l_g_gan: 9.9768e+01 l_d_real: 7.4591e+01 out_d_real: -7.4591e+01 l_d_fake: -9.9862e+01 out_d_fake: -9.9862e+01 2021-07-25 20:07:30,460 INFO: Saving models and training states. 0%| | 0/998 [00:00<?, ?image/s]2021-07-25 20:07:30,515 INFO: Only support single GPU validation. 0%| | 0/998 [00:00<?, ?image/s]Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 226, in main model.validation(val_loader, current_iter, tb_logger, File "/home/delight-gpu/project/LPTN/codes/models/base_model.py", line 45, in validation self.dist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 169, in dist_validation self.nondist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 225, in nondist_validation metric_module, metric_type)(result_img, gt_img, **opt_) UnboundLocalError: local variable 'gt_img' referenced before assignment Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 226, in main model.validation(val_loader, current_iter, tb_logger, File "/home/delight-gpu/project/LPTN/codes/models/base_model.py", line 45, in validation self.dist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 169, in dist_validation self.nondist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 225, in nondist_validation metric_module, metric_type)(result_img, gt_img, **opt_) UnboundLocalError: local variable 'gt_img' referenced before assignment Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 226, in main model.validation(val_loader, current_iter, tb_logger, File "/home/delight-gpu/project/LPTN/codes/models/base_model.py", line 45, in validation self.dist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 169, in dist_validation self.nondist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 225, in nondist_validation metric_module, metric_type)(result_img, gt_img, **opt_) UnboundLocalError: local variable 'gt_img' referenced before assignment 0%| | 0/998 [00:02<?, ?image/s] 0%| | 0/998 [00:02<?, ?image/s] 0%| | 0/998 [00:03<?, ?image/s] ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11662) of binary: /home/delight-gpu/anaconda3/envs/lptn/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=1 master_addr=127.0.0.1 master_port=4321 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2] role_ranks=[0, 1, 2] global_ranks=[0, 1, 2] role_world_sizes=[3, 3, 3] global_world_sizes=[3, 3, 3]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_1/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_1/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_1/2/error.json Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main Traceback (most recent call last): opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options File "codes/train.py", line 249, in init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch _init_dist_pytorch(backend, **kwargs) _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch

    File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group dist.init_process_group(backend=backend, **kwargs) dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier _store_based_barrier(rank, store, timeout)_store_based_barrier(rank, store, timeout)

    File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=3, worker_count=6, timeout=0:30:00) raise RuntimeError(raise RuntimeError(

    RuntimeErrorRuntimeError: : Timed out initializing process group in store based barrier on rank: 1, for key: store_based_barrier_key:1 (world_size=3, worker_count=6, timeout=0:30:00)Timed out initializing process group in store based barrier on rank: 2, for key: store_based_barrier_key:1 (world_size=3, worker_count=6, timeout=0:30:00)

    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 28662) of binary: /home/delight-gpu/anaconda3/envs/lptn/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 2/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=2 master_addr=127.0.0.1 master_port=4321 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2] role_ranks=[0, 1, 2] global_ranks=[0, 1, 2] role_world_sizes=[3, 3, 3] global_world_sizes=[3, 3, 3]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_2/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_2/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_2/2/error.json Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 2, for key: store_based_barrier_key:1 (world_size=3, worker_count=9, timeout=0:30:00) Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options Traceback (most recent call last): init_dist(args.launcher) File "codes/train.py", line 249, in File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=3, worker_count=9, timeout=0:30:00) main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 1, for key: store_based_barrier_key:1 (world_size=3, worker_count=9, timeout=0:30:00) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 30527) of binary: /home/delight-gpu/anaconda3/envs/lptn/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 1/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=3 master_addr=127.0.0.1 master_port=4321 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2] role_ranks=[0, 1, 2] global_ranks=[0, 1, 2] role_world_sizes=[3, 3, 3] global_world_sizes=[3, 3, 3]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_3/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_3/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_3/2/error.json Traceback (most recent call last): File "codes/train.py", line 249, in Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True)main()

    File "codes/train.py", line 43, in parse_options File "codes/train.py", line 128, in main init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier _store_based_barrier(rank, store, timeout)raise RuntimeError(

    File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier RuntimeError: Timed out initializing process group in store based barrier on rank: 1, for key: store_based_barrier_key:1 (world_size=3, worker_count=12, timeout=0:30:00) raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=3, worker_count=12, timeout=0:30:00) Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 2, for key: store_based_barrier_key:1 (world_size=3, worker_count=12, timeout=0:30:00) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 32378) of binary: /home/delight-gpu/anaconda3/envs/lptn/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (FAILED). Waiting 300 seconds for other agents to finish /home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0010943412780761719 seconds {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "32378", "role": "default", "hostname": "LIGHT-24B.PC.CS.CMU.EDU", "state": "FAILED", "total_run_time": 22569, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [3]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 1, "group_rank": 0, "worker_id": "32379", "role": "default", "hostname": "LIGHT-24B.PC.CS.CMU.EDU", "state": "FAILED", "total_run_time": 22569, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [1], "role_rank": [1], "role_world_size": [3]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 2, "group_rank": 0, "worker_id": "32380", "role": "default", "hostname": "LIGHT-24B.PC.CS.CMU.EDU", "state": "FAILED", "total_run_time": 22569, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [2], "role_rank": [2], "role_world_size": [3]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "LIGHT-24B.PC.CS.CMU.EDU", "state": "SUCCEEDED", "total_run_time": 22569, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 3}} /home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py:354: UserWarning:


               CHILD PROCESS FAILED WITH NO ERROR_FILE                
    

    CHILD PROCESS FAILED WITH NO ERROR_FILE Child process 32378 (local_rank 0) FAILED (exitcode 1) Error msg: Process failed with exitcode 1 Without writing an error file to <N/A>. While this DOES NOT affect the correctness of your application, no trace information about the error will be available for inspection. Consider decorating your top level entrypoint function with torch.distributed.elastic.multiprocessing.errors.record. Example:

    from torch.distributed.elastic.multiprocessing.errors import record

    @record def trainer_main(args): # do train


    warnings.warn(_no_error_file_warning_msg(rank, failure)) Traceback (most recent call last): File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/launch.py", line 173, in main() File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/launch.py", line 169, in main run(args) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/run.py", line 621, in run elastic_launch( File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(*args, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:


         codes/train.py FAILED         
    

    ======================================= Root Cause: [0]: time: 2021-07-25_21:37:41 rank: 0 (local_rank: 0) exitcode: 1 (pid: 32378) error_file: <N/A> msg: "Process failed with exitcode 1"

    Other Failures: [1]: time: 2021-07-25_21:37:41 rank: 1 (local_rank: 1) exitcode: 1 (pid: 32379) error_file: <N/A> msg: "Process failed with exitcode 1" [2]: time: 2021-07-25_21:37:41 rank: 2 (local_rank: 2) exitcode: 1 (pid: 32380) error_file: <N/A> msg: "Process failed with exitcode 1"


    opened by azuryl 3
  • 关于代码运行的问题

    关于代码运行的问题

    作者你好,我今天在运行代码的时候发现了一个问题 这是LPTN/codes/metrics/metric_util.py文件中第三行代码: from basicsr.utils.matlab_functions import bgr2ycbcr 但是我并没有在项目中找到basicsr.utils.matlab_functions这个文件。 是不是代码没有上传全呀

    opened by duweidongzju 3
  • train PairedImageDataset on fivek

    train PairedImageDataset on fivek

    I follow here to use your code to train PairedImageDataset on fivek . After 20000 iters, I check the training results and find the phenomenon of "over exposure" (as shown in the figure below, the right is the input, the middle is the prediction result, and the left is GT). All training super parameters are the default configuration in your code. Can you help to guide me on how to reduce the "over exposure"? thanks a lot.

    D1645D64@F008A51E F0C5C260 jpg_recompress

    opened by semchan 3
  • implementation differs from paper ?

    implementation differs from paper ?

    cannot understand this block

    def forward(self, x, pyr_original, fake_low):
            pyr_result = []
            mask = self.model(x)
            for i in range(self.num_high):
                mask = nn.functional.interpolate(mask, size=(pyr_original[-2-i].shape[2], pyr_original[-2-i].shape[3]))
                result_highfreq = torch.mul(pyr_original[-2-i], mask) + pyr_original[-2-i]
                ...
    

    here you multiply mask from first high freq model with original high freq map

    but then add original high freq map?

    I don't see that in the paper

    firefox_2021-05-21_20-04-24

    opened by iperov 3
  • Drawing inference on unpaired images

    Drawing inference on unpaired images

    Hi,

    I would like to know if there is a way to draw inference on a single image. For example, I input an summer time image and I get a winter time image as the output? So far, everything I have tried to make this happen, it is expecting a ground truth image as well. I tried to modify the test_FiveK.yaml file by changing the dataset type to UnPairedImageDataset but even then the model expects a groundtruth file. Am I missing somthing here? Any help would be appreciated. Thanks

    opened by Suhaskj1691 2
  • How to train this model with my own data?

    How to train this model with my own data?

    I have a unpaired dataset and I would like to perform image-to-image translation through this model, I tested both disk and lmdb mode, but I found both methods are time consuming and most of the time is spent on loading data instead of computing. I believe I've got something wrong. So what is the simplest way to train this model with my own data?

    opened by Red-Fairy 0
Owner
null
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022)

Pop-Out Motion Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022) Jihyun Lee*, Minhyuk Sung*, Hyunjin Kim, Tae-Ky

Jihyun Lee 88 Nov 22, 2022
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

DASR Paper Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution Jie Liang, Hui Zeng, and Lei Zhang. In arxiv preprint. Abs

null 81 Dec 28, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

VITON-HD — Official PyTorch Implementation VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization Seunghwan Choi*1, Sunghyun Pa

Seunghwan Choi 250 Jan 6, 2023
Official implement of Paper:A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images 深度监督影像融合网络DSIFN用于高分辨率双时相遥感影像变化检测 Of

Chenxiao Zhang 135 Dec 19, 2022
Real-Time High-Resolution Background Matting

Real-Time High-Resolution Background Matting Official repository for the paper Real-Time High-Resolution Background Matting. Our model requires captur

Peter Lin 6.1k Jan 3, 2023
Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data Christoph Reich, Tim Prangemeier, Özdemir Cetin & Heinz Koeppl | Pr

Christoph Reich 23 Sep 21, 2022
[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

MDCA Calibration 21 Dec 22, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

null 150 Dec 26, 2022
Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English ?? Real-CUGAN

tarsin 111 Dec 28, 2022
This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

Lite-HRNet: A Lightweight High-Resolution Network Introduction This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution

HRNet 675 Dec 25, 2022
Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

null 77 Dec 24, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

null 21 Nov 9, 2022