Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

Last update: Dec 26, 2022

Related tags

Deep Learning LPTN

Overview

LPTN

Paper | Supplementary Material | Poster

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
Jie Liang*, Hui Zeng*, and Lei Zhang.
In CVPR 2021.

Abstract

Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps. In this paper, we focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we reveal that the attribute transformations, such as illumination and color manipulation, relate more to the low-frequency component, while the content details can be adaptively refined on high-frequency components. We consequently propose a Laplacian Pyramid Translation Network (LPTN) to simultaneously perform these two tasks, where we design a lightweight network for translating the low-frequency component with reduced resolution and a progressive masking strategy to efficiently refine the high-frequency ones. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details. Extensive experimental results on various tasks demonstrate that the proposed method can translate 4K images in real-time using one normal GPU while achieving comparable transformation performance against existing methods.

Overall pipeline of the LPTN:

For more details, please refer to our paper.

Getting started

Clone this repo.

git clone https://github.com/csjliang/LPTN
cd LPTN

Install dependencies. (Python 3 + NVIDIA GPU + CUDA. Recommend to use Anaconda)

pip install -r requirement.txt

Download dataset (FiveK in 480p) and create lmdb (to accelerate training).

PYTHONPATH="./:${PYTHONPATH}" python scripts/data_preparation/download_datasets.py
PYTHONPATH="./:${PYTHONPATH}" python scripts/data_preparation/create_lmdb.py

Training

First, check and adapt the yml file options/train/LPTN/train_FiveK.yml, then

Single GPU:

PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0 python codes/train.py -opt options/train/LPTN/train_FiveK.yml

Distributed Training:

PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 codes/train.py -opt options/train/LPTN/train_FiveK.yml --launcher pytorch

Training files (logs, models, training states and visualizations) will be saved in the directory ./experiments/{name}

Evaluation

First, check and adapt the yml file options/test/LPTN/test_FiveK.yml and options/test/LPTN/test_speed_FiveK.yml, then

Calculate metrics and save visual results:

PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0 python codes/test.py -opt options/test/LPTN/test_FiveK.yml

Test inference speed:

PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0 python codes/test_speed.py -opt options/test/LPTN/test_speed_FiveK.yml

Evaluating files (logs and visualizations) will be saved in the directory ./results/{name}

Use Pretrained Models

Download the pretrained model from GoogleDrive and move it to the directory experiments/pretrained_models:
Specify the path: pretrain_network_g in test_FiveK.yml and run evaluation.

Notes

We have optimized the training process and improved the performance (get 22.9db on FiveK at 480p)
We will release the datasets of day2night and sum2win later.

Citation

If you use this dataset or code for your research, please cite our paper.

@inproceedings{jie2021LPTN,
  title={High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network},
  author={Liang, Jie and Zeng, Hui and Zhang, Lei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Acknowledgement

We borrowed the training and validating framework from the excellent BasicSR project.

Contact

Should you have any questions, please contact me via [email protected].

Comments

Can conditional (paired) image-to-image translation be done using LPTN?

Hello @csjliang,

Thank you for sharing your implementation.

I am working on a shadow generation for cars, I tried working with the pix2pix model but the results were a bit pixelated and I was not able to get high-resolution images. I want to use an image of size 1200x1600 is it possible to work on paired image-to-image translation code with your implementation? If yes, what are the necessary changes I need to make in order it to work?

Thanks in advance.

Best, @vgthengane

opened by vgthengane 8
the pretrained model test results not good

I used your pretrained model to test the images on FiveK datasets, but some images will generate "Ghost shadow" on the images (eg. the middle image as below). Would you please help to fix this issue? or can you give me some guide to avoid this? thanks a lot.

opened by semchan 6
下载数据集遇到的问题

当我执行PYTHONPATH="./:${PYTHONPATH}" python scripts/data_preparation/download_datasets.py时报了以下错误，是链接失效了吗 Traceback (most recent call last): File "scripts/data_preparation/download_datasets.py", line 56, in download_dataset(file_ids[dataset]) File "scripts/data_preparation/download_datasets.py", line 26, in download_dataset download_file_from_google_drive(file_id, save_path) File "/mnt/lbh/new/Problems/LPTN-main/codes/utils/download_util.py", line 23, in download_file_from_google_drive response = session.get(URL, params=params, stream=True) File "/home/lbh/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 546, in get return self.request('GET', url, **kwargs) File "/home/lbh/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/home/lbh/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/home/lbh/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 516, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='docs.google.com', port=443): Max retries exceeded with url: /uc?export=download&id=1oAORKd-TPnPwZvhcnEEJqc1ogT7KgFtx (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0325ff1310>: Failed to establish a new connection: [Errno 110] Connection timed out'))

opened by bohuisir 5
About summer2winter dataset

Hi,thank you for your wonderful work! i want to train a summer2winter model,but there is no a HD dataset. so can you share your summer2winter dataset,please? thanks!

opened by hughwcq 4

AttributeError: 'NoneType' object has no attribute 'buffer'

Hi, I am running the single training script. The program has such an error:

  File "codes/train.py", line 250, in <module>
    main()
  File "codes/train.py", line 179, in main
    prefetcher = CUDAPrefetcher(train_loader, opt)
  File "/data/code_fusion/Pytorch-LPTN/LPTN-main/codes/data/prefetch_dataloader.py", line 103, in __init__
    self.preload()
  File "/data/code_fusion/Pytorch-LPTN/LPTN-main/codes/data/prefetch_dataloader.py", line 109, in preload
    self.batch = next(self.loader)  # self.batch is a dict
  File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
    return self._process_data(data)
  File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/chenyuanpeng/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/code_fusion/Pytorch-LPTN/LPTN-main/codes/data/unpair_image_dataset.py", line 53, in __getitem__
    img_gt = imfrombytes(img_bytes, float32=True)
  File "/data/code_fusion/Pytorch-LPTN/LPTN-main/codes/utils/img_util.py", line 166, in imfrombytes
    img_np = np.frombuffer(content, np.uint8)
AttributeError: 'NoneType' object has no attribute '__buffer__'

but, the number of the data in the input log is not 0.

2021-05-21 10:48:20,393 INFO: Dataset UnPairedImageDataset - FiveK is created.
2021-05-21 10:48:20,393 INFO: Training statistics:
	Number of train images: 2250
	Dataset enlarge ratio: 100
	Batch size per gpu: 32
	World size (gpu number): 1
	Require iter number per epoch: 7032
	Total epochs: 43; iters: 300000.
2021-05-21 10:48:20,460 INFO: Dataset PairedImageDataset - FiveK_val is created.
2021-05-21 10:48:20,460 INFO: Number of val images/folders in FiveK_val: 500

opened by 24werewolf 4

Hi, I have some questions about your code

I want to ask something about LP （Laplacian Pyramid）.I use your code to decompose gt（groundtruth） for high frequency and low frequency, and perform LP inverse transformation on them.But there is a big gap between the results and gt，For example, MSE can reach more than 100. So I want to know why? Have you verified this?

opened by wangweiran970922 3
Only support single GPU validation?
Dear csjliang

When I use Distributed Training by README

2021-07-25 20:04:34,446 INFO: [LPTN_..][epoch: 16, iter: 19,800, lr:(1.000e-04,)] [eta: 2 days, 18:43:45, time (data): 0.178 (0.001)] l_g_pix: 3.1238e+01 l_g_gan: 8.7738e+01 l_d_real: 7.0186e+01 out_d_real: -7.0186e+01 l_d_fake: -8.7537e+01 out_d_fake: -8.7537e+01 2021-07-25 20:06:00,337 INFO: [LPTN_..][epoch: 17, iter: 19,900, lr:(1.000e-04,)] [eta: 2 days, 18:42:21, time (data): 0.504 (0.001)] l_g_pix: 2.0734e+01 l_g_gan: 9.3872e+01 l_d_real: 6.7697e+01 out_d_real: -6.7697e+01 l_d_fake: -9.4580e+01 out_d_fake: -9.4580e+01 2021-07-25 20:07:30,459 INFO: [LPTN_..][epoch: 17, iter: 20,000, lr:(1.000e-04,)] [eta: 2 days, 18:41:57, time (data): 0.202 (0.001)] l_g_pix: 3.0153e+01 l_g_gan: 9.9768e+01 l_d_real: 7.4591e+01 out_d_real: -7.4591e+01 l_d_fake: -9.9862e+01 out_d_fake: -9.9862e+01 2021-07-25 20:07:30,460 INFO: Saving models and training states. 0%| | 0/998 [00:00<?, ?image/s]2021-07-25 20:07:30,515 INFO: Only support single GPU validation. 0%| | 0/998 [00:00<?, ?image/s]Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 226, in main model.validation(val_loader, current_iter, tb_logger, File "/home/delight-gpu/project/LPTN/codes/models/base_model.py", line 45, in validation self.dist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 169, in dist_validation self.nondist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 225, in nondist_validation metric_module, metric_type)(result_img, gt_img, **opt_) UnboundLocalError: local variable 'gt_img' referenced before assignment Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 226, in main model.validation(val_loader, current_iter, tb_logger, File "/home/delight-gpu/project/LPTN/codes/models/base_model.py", line 45, in validation self.dist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 169, in dist_validation self.nondist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 225, in nondist_validation metric_module, metric_type)(result_img, gt_img, **opt_) UnboundLocalError: local variable 'gt_img' referenced before assignment Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 226, in main model.validation(val_loader, current_iter, tb_logger, File "/home/delight-gpu/project/LPTN/codes/models/base_model.py", line 45, in validation self.dist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 169, in dist_validation self.nondist_validation(dataloader, current_iter, tb_logger, save_img) File "/home/delight-gpu/project/LPTN/codes/models/lptn_model.py", line 225, in nondist_validation metric_module, metric_type)(result_img, gt_img, **opt_) UnboundLocalError: local variable 'gt_img' referenced before assignment 0%| | 0/998 [00:02<?, ?image/s] 0%| | 0/998 [00:02<?, ?image/s] 0%| | 0/998 [00:03<?, ?image/s] ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11662) of binary: /home/delight-gpu/anaconda3/envs/lptn/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=1 master_addr=127.0.0.1 master_port=4321 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2] role_ranks=[0, 1, 2] global_ranks=[0, 1, 2] role_world_sizes=[3, 3, 3] global_world_sizes=[3, 3, 3]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_1/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_1/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_1/2/error.json Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main Traceback (most recent call last): opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options File "codes/train.py", line 249, in init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch _init_dist_pytorch(backend, **kwargs) _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch

File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group dist.init_process_group(backend=backend, **kwargs) dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier _store_based_barrier(rank, store, timeout)_store_based_barrier(rank, store, timeout)

File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=3, worker_count=6, timeout=0:30:00) raise RuntimeError(raise RuntimeError(

RuntimeErrorRuntimeError: : Timed out initializing process group in store based barrier on rank: 1, for key: store_based_barrier_key:1 (world_size=3, worker_count=6, timeout=0:30:00)Timed out initializing process group in store based barrier on rank: 2, for key: store_based_barrier_key:1 (world_size=3, worker_count=6, timeout=0:30:00)

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 28662) of binary: /home/delight-gpu/anaconda3/envs/lptn/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 2/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=2 master_addr=127.0.0.1 master_port=4321 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2] role_ranks=[0, 1, 2] global_ranks=[0, 1, 2] role_world_sizes=[3, 3, 3] global_world_sizes=[3, 3, 3]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_2/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_2/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_2/2/error.json Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 2, for key: store_based_barrier_key:1 (world_size=3, worker_count=9, timeout=0:30:00) Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options Traceback (most recent call last): init_dist(args.launcher) File "codes/train.py", line 249, in File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=3, worker_count=9, timeout=0:30:00) main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 1, for key: store_based_barrier_key:1 (world_size=3, worker_count=9, timeout=0:30:00) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 30527) of binary: /home/delight-gpu/anaconda3/envs/lptn/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 1/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=3 master_addr=127.0.0.1 master_port=4321 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2] role_ranks=[0, 1, 2] global_ranks=[0, 1, 2] role_world_sizes=[3, 3, 3] global_world_sizes=[3, 3, 3]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_3/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_3/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_l8eumjpm/none_twj5_557/attempt_3/2/error.json Traceback (most recent call last): File "codes/train.py", line 249, in Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True)main()

File "codes/train.py", line 43, in parse_options File "codes/train.py", line 128, in main init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier _store_based_barrier(rank, store, timeout)raise RuntimeError(

File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier RuntimeError: Timed out initializing process group in store based barrier on rank: 1, for key: store_based_barrier_key:1 (world_size=3, worker_count=12, timeout=0:30:00) raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=3, worker_count=12, timeout=0:30:00) Traceback (most recent call last): File "codes/train.py", line 249, in main() File "codes/train.py", line 128, in main opt = parse_options(is_train=True) File "codes/train.py", line 43, in parse_options init_dist(args.launcher) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 14, in init_dist _init_dist_pytorch(backend, **kwargs) File "/home/delight-gpu/project/LPTN/codes/utils/dist_util.py", line 25, in _init_dist_pytorch dist.init_process_group(backend=backend, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 547, in init_process_group _store_based_barrier(rank, store, timeout) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 219, in _store_based_barrier raise RuntimeError( RuntimeError: Timed out initializing process group in store based barrier on rank: 2, for key: store_based_barrier_key:1 (world_size=3, worker_count=12, timeout=0:30:00) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 32378) of binary: /home/delight-gpu/anaconda3/envs/lptn/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (FAILED). Waiting 300 seconds for other agents to finish /home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0010943412780761719 seconds {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "32378", "role": "default", "hostname": "LIGHT-24B.PC.CS.CMU.EDU", "state": "FAILED", "total_run_time": 22569, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [3]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 1, "group_rank": 0, "worker_id": "32379", "role": "default", "hostname": "LIGHT-24B.PC.CS.CMU.EDU", "state": "FAILED", "total_run_time": 22569, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [1], "role_rank": [1], "role_world_size": [3]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 2, "group_rank": 0, "worker_id": "32380", "role": "default", "hostname": "LIGHT-24B.PC.CS.CMU.EDU", "state": "FAILED", "total_run_time": 22569, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [2], "role_rank": [2], "role_world_size": [3]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "LIGHT-24B.PC.CS.CMU.EDU", "state": "SUCCEEDED", "total_run_time": 22569, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 3}} /home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py:354: UserWarning:

CHILD PROCESS FAILED WITH NO ERROR_FILE

CHILD PROCESS FAILED WITH NO ERROR_FILE Child process 32378 (local_rank 0) FAILED (exitcode 1) Error msg: Process failed with exitcode 1 Without writing an error file to <N/A>. While this DOES NOT affect the correctness of your application, no trace information about the error will be available for inspection. Consider decorating your top level entrypoint function with torch.distributed.elastic.multiprocessing.errors.record. Example:

from torch.distributed.elastic.multiprocessing.errors import record

@record def trainer_main(args): # do train

warnings.warn(_no_error_file_warning_msg(rank, failure)) Traceback (most recent call last): File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/launch.py", line 173, in main() File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/launch.py", line 169, in main run(args) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/run.py", line 621, in run elastic_launch( File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(*args, **kwargs) File "/home/delight-gpu/anaconda3/envs/lptn/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

codes/train.py FAILED

======================================= Root Cause: [0]: time: 2021-07-25_21:37:41 rank: 0 (local_rank: 0) exitcode: 1 (pid: 32378) error_file: <N/A> msg: "Process failed with exitcode 1"

Other Failures: [1]: time: 2021-07-25_21:37:41 rank: 1 (local_rank: 1) exitcode: 1 (pid: 32379) error_file: <N/A> msg: "Process failed with exitcode 1" [2]: time: 2021-07-25_21:37:41 rank: 2 (local_rank: 2) exitcode: 1 (pid: 32380) error_file: <N/A> msg: "Process failed with exitcode 1"
opened by azuryl 3
关于代码运行的问题

作者你好，我今天在运行代码的时候发现了一个问题这是LPTN/codes/metrics/metric_util.py文件中第三行代码： from basicsr.utils.matlab_functions import bgr2ycbcr 但是我并没有在项目中找到basicsr.utils.matlab_functions这个文件。是不是代码没有上传全呀

opened by duweidongzju 3
train PairedImageDataset on fivek

I follow here to use your code to train PairedImageDataset on fivek . After 20000 iters, I check the training results and find the phenomenon of "over exposure" (as shown in the figure below, the right is the input, the middle is the prediction result, and the left is GT). All training super parameters are the default configuration in your code. Can you help to guide me on how to reduce the "over exposure"? thanks a lot.

opened by semchan 3

implementation differs from paper ?

cannot understand this block

def forward(self, x, pyr_original, fake_low):
        pyr_result = []
        mask = self.model(x)
        for i in range(self.num_high):
            mask = nn.functional.interpolate(mask, size=(pyr_original[-2-i].shape[2], pyr_original[-2-i].shape[3]))
            result_highfreq = torch.mul(pyr_original[-2-i], mask) + pyr_original[-2-i]
            ...

here you multiply mask from first high freq model with original high freq map

but then add original high freq map?

I don't see that in the paper

firefox_2021-05-21_20-04-24

opened by iperov 3

Drawing inference on unpaired images

Hi,

I would like to know if there is a way to draw inference on a single image. For example, I input an summer time image and I get a winter time image as the output? So far, everything I have tried to make this happen, it is expecting a ground truth image as well. I tried to modify the test_FiveK.yaml file by changing the dataset type to UnPairedImageDataset but even then the model expects a groundtruth file. Am I missing somthing here? Any help would be appreciated. Thanks

opened by Suhaskj1691 2
How to train this model with my own data?

I have a unpaired dataset and I would like to perform image-to-image translation through this model, I tested both disk and lmdb mode, but I found both methods are time consuming and most of the time is spent on loading data instead of computing. I believe I've got something wrong. So what is the simplest way to train this model with my own data?

opened by Red-Fairy 0

Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

Related tags

Overview

LPTN

Paper | Supplementary Material | Poster

Abstract

Getting started

Training

Evaluation

Use Pretrained Models

Notes

Citation

Acknowledgement

Contact

Comments

======================================= Root Cause: [0]: time: 2021-07-25_21:37:41 rank: 0 (local_rank: 0) exitcode: 1 (pid: 32378) error_file: <N/A> msg: "Process failed with exitcode 1"

Owner

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022)

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

Real-Time High-Resolution Background Matting

Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)