Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

NVIDIA Research Projects

Last update: Dec 29, 2022

Related tags

Deep Learning imaginaire

Overview

Imaginaire

Docs | License | Installation | Model Zoo

Imaginaire is a pytorch library that contains optimized implementation of several image and video synthesis methods developed at NVIDIA.

License

Imaginaire is released under NVIDIA Software license. For commercial use, please consult NVIDIA Research Inquiries.

What's inside?

We have a tutorial for each model. Click on the model name, and your browser should take you to the tutorial page for the project.

Supervised Image-to-Image Translation

Algorithm Name	Feature	Publication
pix2pixHD	Learn a mapping that converts a semantic image to a high-resolution photorealistic image.	Wang et. al. CVPR 2018
SPADE	Improve pix2pixHD on handling diverse input labels and delivering better output quality.	Park et. al. CVPR 2019

Unsupervised Image-to-Image Translation

Algorithm Name	Feature	Publication
UNIT	Learn a one-to-one mapping between two visual domains.	Liu et. al. NeurIPS 2017
MUNIT	Learn a many-to-many mapping between two visual domains.	Huang et. al. ECCV 2018
FUNIT	Learn a style-guided image translation model that can generate translations in unseen domains.	Liu et. al. ICCV 2019
COCO-FUNIT	Improve FUNIT with a content-conditioned style encoding scheme for style code computation.	Saito et. al. ECCV 2020

Video-to-video Translation

Algorithm Name	Feature	Publication
vid2vid	Learn a mapping that converts a semantic video to a photorealistic video.	Wang et. al. NeurIPS 2018
fs-vid2vid	Learn a subject-agnostic mapping that converts a semantic video and an example image to a photoreslitic video.	Wang et. al. NeurIPS 2019

World-to-world Translation

Algorithm Name	Feature	Publication
wc-vid2vid	Improve vid2vid on view consistency and long-term consistency.	Mallya et. al. ECCV 2020
GANcraft	Convert semantic block worlds to realistic-looking worlds.	Hao et. al. ICCV 2021

Comments

error : vid2vid_street.yaml >> /tmp/unit_test.log [Failure] in run test bash scripts/test_training.sh

root@4460709f4b11:/workspace# bash scripts/test_training.sh /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_oe27bhot/none_vqivpge9 INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_oe27bhot/none_vqivpge9/attempt_0/0/error.json Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth 100%|█████████████████████████████████████████████████████████████████████| 548M/548M [01:40<00:00, 5.71MB/s] /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3590: UserWarning: Default upsampling behavior when mode=bicubic is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn( /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3638: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn( /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1153.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0005445480346679688 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "386", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/spade.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_4bbnta_g/none_vmrfh3_4 INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_4bbnta_g/none_vmrfh3_4/attempt_0/0/error.json /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1153.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0006163120269775391 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "936", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/pix2pixHD.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_62ca57fw/none_cbejg5ie INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_62ca57fw/none_cbejg5ie/attempt_0/0/error.json [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0006213188171386719 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "1313", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/munit.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_peoaucrn/none_je_rxzez INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_peoaucrn/none_je_rxzez/attempt_0/0/error.json [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0005629062652587891 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "1561", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/munit_patch.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_gf_rgw_1/none_yd9hagbt INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_gf_rgw_1/none_yd9hagbt/attempt_0/0/error.json /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3638: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn( [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0006194114685058594 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "1809", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/unit.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_k4k83su_/none_doq690x_ INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_k4k83su_/none_doq690x_/attempt_0/0/error.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0004730224609375 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "2358", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/funit.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_qmiyb0yp/none_nw6jcn_x INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_qmiyb0yp/none_nw6jcn_x/attempt_0/0/error.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0007641315460205078 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "2600", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/coco_funit.yaml >> /tmp/unit_test.log [Success] Traceback (most recent call last): File "train.py", line 168, in main() File "train.py", line 92, in main trainer = get_trainer(cfg, net_G, net_D, File "/workspace/imaginaire/utils/trainer.py", line 59, in get_trainer trainer = trainer_lib.Trainer(cfg, net_G, net_D, File "/workspace/imaginaire/trainers/vid2vid.py", line 44, in init super(Trainer, self).init(cfg, net_G, net_D, opt_G, File "/workspace/imaginaire/trainers/base.py", line 99, in init self._init_loss(cfg) File "/workspace/imaginaire/trainers/vid2vid.py", line 145, in _init_loss self.criteria['Flow'] = FlowLoss(cfg) File "/workspace/imaginaire/losses/flow.py", line 59, in init self.flowNet = flow_module.FlowNet(pretrained=True) File "/workspace/imaginaire/third_party/flow_net/flow_net.py", line 30, in init checkpoint = torch.load(flownet2_path, File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. python train.py --single_gpu --config configs/unit_test/vid2vid_street.yaml >> /tmp/unit_test.log [Failure] root@4460709f4b11:/workspace#

`

opened by D-Mad 19

Inference error can not resolve - Adam object has noattribute _step_count

We have trained and are now trying to run inference (using the new corrected config file found in issue #106). However, we are getting the following error. We have traced through the code and can not see anything we can do to resolve this issues. Can you please advise?

We have looked at multiple config files online and none mention this particular parameter as it relates to the Adam optimizer
We have even used rather than our training data the same data used for validation in training --although they both have the exact same structure.

Using random seed 0
cudnn benchmark: True
cudnn deterministic: False
LMDB ROOT ['dataset/val/']
Creating metadata
['images', 'poses-openpose']
Data file extensions: {'images': 'jpg', 'poses-openpose': 'json'}
Searching in dir: images
Found 40 sequences
Found 1524 files
Folder at dataset/val/images opened.
Folder at dataset/val/poses-openpose opened.
Num datasets: 1
Num sequences: 40
Max sequence length: 40
Epoch length: 40
Using random seed 0
Concatenate images:
    ext: jpg
    num_channels: 3
    normalize: True for input.
	Num. of channels in the input image: 3
Concatenate images:
    ext: jpg
    num_channels: 3
    normalize: True for input.
Concatenate poses-openpose:
    ext: json
    num_channels: 3
    interpolator: None
    normalize: False
    pre_aug_ops: decode_json, convert::imaginaire.utils.visualization.pose::openpose_to_npy
    post_aug_ops: vis::imaginaire.utils.visualization.pose::draw_openpose_npy for input.
	Num. of channels in the input label: 3
Concatenate images:
    ext: jpg
    num_channels: 3
    normalize: True for input.
	Num. of channels in the input image: 3
Concatenate images:
    ext: jpg
    num_channels: 3
    normalize: True for input.
	Num. of channels in the input image: 3
Concatenate images:
    ext: jpg
    num_channels: 3
    normalize: True for input.
	Num. of channels in the input image: 3
Initialized temporal embedding network with the reference one.
Concatenate images:
    ext: jpg
    num_channels: 3
    normalize: True for input.
Concatenate poses-openpose:
    ext: json
    num_channels: 3
    interpolator: None
    normalize: False
    pre_aug_ops: decode_json, convert::imaginaire.utils.visualization.pose::openpose_to_npy
    post_aug_ops: vis::imaginaire.utils.visualization.pose::draw_openpose_npy for input.
	Num. of channels in the input label: 3
Concatenate images:
    ext: jpg
    num_channels: 3
    normalize: True for input.
	Num. of channels in the input image: 3
Initialize net_G and net_D weights using type: xavier gain: 0.02
Using random seed 0
net_G parameter count: 91,147,294
net_D parameter count: 5,598,018
Use custom initialization for the generator.
Opt D Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.0, 0.999)
    eps: 1e-08
    initial_lr: 0.0004
    lr: 0.0004
    weight_decay: 0
)
Setup trainer.
Using automatic mixed precision training.
Augmentation policy: 
GAN mode: hinge
Perceptual loss:
	Mode: vgg19
Loss GAN                  Weight 1.0
Loss FeatureMatching      Weight 10.0
Loss Perceptual           Weight 10.0
Loss Flow                 Weight 10.0
Loss Flow_L1              Weight 10.0
Loss Flow_Warp            Weight 10.0
Loss Flow_Mask            Weight 10.0
Done with loading the checkpoint.
configs for inference
finetune: True
finetune_iter: 100
few_shot_seq_index: 0
few_shot_frame_index: 0
driving_seq_index: 1
output dir:  projects/fs_vid2vid/output/face_forensics
Epoch length: 40
  0% 0/40 [00:00<?, ?it/s]person dict [160.594, 118.24, 0.251877, 192.575, 144.063, 0.824015, 225.618, 146.141, 0.722625, 228.763, 187.411, 0.577983, 222.522, 197.745, 0.0691224, 156.441, 141.995, 0.65878, 136.839, 176.029, 0.816474, 122.349, 203.979, 0.806893, 194.681, 207.035, 0.582138, 219.419, 204.992, 0.580149, 214.285, 253.546, 0.750408, 233.867, 303.093, 0.767982, 172.951, 209.085, 0.521009, 161.562, 260.727, 0.755334, 152.304, 317.576, 0.783057, 0, 0, 0, 161.64, 113.054, 0.485235, 201.904, 119.295, 0.144664, 175.014, 114.108, 0.830086, 109.928, 324.766, 0.654595, 114.095, 326.869, 0.640277, 160.523, 325.834, 0.651386, 214.269, 302.072, 0.6639, 218.372, 302.011, 0.653834, 237.02, 310.333, 0.700539]
person dict [167.8, 86.1641, 0.864743, 120.294, 97.5681, 0.558493, 94.4901, 93.4588, 0.498503, 148.18, 96.528, 0.296182, 163.661, 101.676, 0.0693238, 145.079, 102.756, 0.410478, 200.818, 102.709, 0.455387, 277.273, 103.817, 0.660783, 117.187, 174.002, 0.430313, 99.6447, 175.023, 0.37583, 111.004, 234.891, 0.673355, 102.742, 301.006, 0.74067, 132.683, 174.004, 0.437687, 136.82, 226.63, 0.563235, 128.554, 281.403, 0.770815, 161.654, 76.9165, 0.851908, 169.845, 79.0123, 0.305025, 134.727, 70.7234, 0.842317, 0, 0, 0, 154.368, 287.597, 0.292605, 152.313, 284.509, 0.335124, 121.323, 286.564, 0.595389, 147.154, 313.39, 0.593413, 138.878, 316.507, 0.520831, 94.4743, 307.206, 0.651808]
Training layers:  ['conv_img', 'up', 'weight_generator.fc']
  0% 0/40 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/content/drive/My Drive/imaginaire/inference.py", line 99, in <module>
    main()
  File "/content/drive/My Drive/imaginaire/inference.py", line 95, in main
    trainer.test(test_data_loader, args.output_dir, cfg.inference_args)
  File "/content/drive/My Drive/imaginaire/imaginaire/trainers/fs_vid2vid.py", line 168, in test
    output = self.test_single(data, output_dir, inference_args)
  File "/content/drive/My Drive/imaginaire/imaginaire/trainers/vid2vid.py", line 376, in test_single
    self.finetune(data, inference_args)
  File "/content/drive/My Drive/imaginaire/imaginaire/trainers/fs_vid2vid.py", line 287, in finetune
    self.gen_update(data_finetune)
  File "/content/drive/My Drive/imaginaire/imaginaire/trainers/vid2vid.py", line 265, in gen_update
    self.get_dis_losses(net_D_output)
  File "/content/drive/My Drive/imaginaire/imaginaire/trainers/vid2vid.py", line 642, in get_dis_losses
    if self.last_step_count_D == self.opt_D._step_count:
**AttributeError: 'Adam' object has no attribute '_step_count'**

opened by grewe 17

[FSVID2VID - - POSE ] TypeError: conv2d() received an invalid combination of arguments
I am trying to train a custom pose dataset using only openpose.

I get the following error: /content/drive/My Drive/imaginaire Using random seed 2 Training with 1 GPUs. Make folder logs/2021_1110_2255_34_ampO1 cudnn benchmark: True cudnn deterministic: False LMDB ROOT ['dataset/train'] Creating metadata ['human_instance_maps', 'images', 'poses-openpose'] Data file extensions: {'images': 'jpg', 'poses-openpose': 'json', 'human_instance_maps': 'png'} Searching in dir: images Found 336 sequences Found 12934 files Folder at dataset/train/images opened. Folder at dataset/train/poses-openpose opened. Folder at dataset/train/human_instance_maps opened. Num datasets: 1 Num sequences: 336 Max sequence length: 40 Epoch length: 336 LMDB ROOT ['dataset/val'] Creating metadata ['human_instance_maps', 'images', 'poses-openpose'] Data file extensions: {'images': 'jpg', 'poses-openpose': 'json', 'human_instance_maps': 'png'} Searching in dir: images Found 40 sequences Found 1524 files Folder at dataset/val/images opened. Folder at dataset/val/poses-openpose opened. Folder at dataset/val/human_instance_maps opened. Num datasets: 1 Num sequences: 40 Max sequence length: 40 Epoch length: 40 Train dataset length: 336 Val dataset length: 40 Using random seed 2 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Concatenate poses-openpose: ext: json num_channels: 3 interpolator: None normalize: False pre_aug_ops: decode_json, convert::imaginaire.utils.visualization.pose::openpose_to_npy post_aug_ops: vis::imaginaire.utils.visualization.pose::draw_openpose_npy computed_on_the_fly: False is_mask: False for input. Concatenate human_instance_maps: ext: png num_channels: 3 is_mask: True normalize: False computed_on_the_fly: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input label: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Initialized temporal embedding network with the reference one. Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Concatenate poses-openpose: ext: json num_channels: 3 interpolator: None normalize: False pre_aug_ops: decode_json, convert::imaginaire.utils.visualization.pose::openpose_to_npy post_aug_ops: vis::imaginaire.utils.visualization.pose::draw_openpose_npy computed_on_the_fly: False is_mask: False for input. Concatenate human_instance_maps: ext: png num_channels: 3 is_mask: True normalize: False computed_on_the_fly: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input label: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Initialize net_G and net_D weights using type: xavier gain: 0.02 Using random seed 2 net_G parameter count: 91,147,294 net_D parameter count: 6,292,963 Use custom initialization for the generator. Setup trainer. Using automatic mixed precision training. Augmentation policy: GAN mode: hinge Perceptual loss: Mode: vgg19 Loss GAN Weight 1.0 Loss FeatureMatching Weight 10.0 Loss Perceptual Weight 10.0 Loss GAN_face Weight 10.0 Loss FeatureMatching_face Weight 10.0 Loss Flow Weight 10.0 Loss Flow_L1 Weight 10.0 Loss Flow_Warp Weight 10.0 Loss Flow_Mask Weight 10.0 TRAIN DATASET := <imaginaire.trainers.fs_vid2vid.Trainer object at 0x7f553eec3fa0> No checkpoint found. Epoch 0 ... Epoch length: 336 ------ Now start training 4 frames ------- Traceback (most recent call last): File "/content/drive/My Drive/imaginaire/train.py", line 169, in <module> main() File "/content/drive/My Drive/imaginaire/train.py", line 129, in main for it, data in enumerate(train_data_loader): File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/usr/local/lib/python3.9/site-packages/torch/_utils.py", line 434, in reraise raise exception KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/drive/My Drive/imaginaire/imaginaire/datasets/paired_videos.py", line 288, in __getitem__ return self._getitem(index) File "/content/drive/My Drive/imaginaire/imaginaire/datasets/paired_few_shot_videos.py", line 303, in _getitem data = self.apply_ops(data, self.full_data_ops, full_data=True) File "/content/drive/My Drive/imaginaire/imaginaire/datasets/base.py", line 428, in apply_ops data = op(data) File "/content/drive/My Drive/imaginaire/imaginaire/model_utils/fs_vid2vid.py", line 108, in crop_face_from_data landmarks = data['landmarks-dlib68_xy'] KeyError: 'landmarks-dlib68_xy'

So I tried removing the crop_face_from_data from the config file, and now I get this error message:

TRAIN DATASET := <imaginaire.trainers.fs_vid2vid.Trainer object at 0x7f09ac9606a0> No checkpoint found. Epoch 0 ... Epoch length: 336 ------ Now start training 4 frames ------- Traceback (most recent call last): File "/content/drive/My Drive/imaginaire/train.py", line 169, in <module> main() File "/content/drive/My Drive/imaginaire/train.py", line 141, in main trainer.gen_update( File "/content/drive/My Drive/imaginaire/imaginaire/trainers/vid2vid.py", line 254, in gen_update net_G_output = self.net_G(data_t) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/My Drive/imaginaire/imaginaire/utils/trainer.py", line 195, in forward return self.module(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/My Drive/imaginaire/imaginaire/generators/fs_vid2vid.py", line 151, in forward self.weight_generator(ref_images, ref_labels, label, is_first_frame) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/My Drive/imaginaire/imaginaire/generators/fs_vid2vid.py", line 586, in forward self.encode_reference(ref_image, ref_label, label, k) File "/content/drive/My Drive/imaginaire/imaginaire/generators/fs_vid2vid.py", line 644, in encode_reference x_label = self.ref_label_first(ref_label) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/content/drive/My Drive/imaginaire/imaginaire/layers/conv.py", line 142, in forward x = layer(x) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl result = forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 446, in forward return self._conv_forward(input, self.weight, self.bias) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward return F.conv2d(input, weight, bias, self.stride, TypeError: conv2d() received an invalid combination of arguments - got (NoneType, Tensor, Parameter, tuple, tuple, tuple, int), but expected one of: * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups) didn't match because some of the arguments have invalid types: (NoneType, Tensor, Parameter, tuple, tuple, tuple, int) * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups) didn't match because some of the arguments have invalid types: (NoneType, Tensor, Parameter, tuple, tuple, tuple, int)

Here is the edited yaml file attached. Is there any way to solve this?
opened by dikshantjn 13
dancing datasets#lmdb

I have processed the data to lmdb format, the location is datasets/youtubeDancing/lmdb/train, and the roots in ampO1.yaml is also datasets/youtubeDancing/lmdb/train. But during training, the following occurs

Found 0 sequences Found 0 files Folder at datasets/youtubeDancing/lmdb/train/images opened. Folder at datasets/youtubeDancing/lmdb/train/pose_maps-densepose opened. Folder at datasets/youtubeDancing/lmdb/train/poses-openpose opened. Folder at datasets/youtubeDancing/lmdb/train/human_instance_maps opened.

Hope someone can help, thanks

opened by Mumuwei 8
mode collapse

hi, when I trained with coco_funit, In the first few epochs, the results are normal, but mode collapse appears from the 59th epoch. Is this normal? Will it also appear during your training?

opened by hugo-xie 8
AttributeError: module 'torch.distributed' has no attribute 'is_initialized' for fs_vid2vid

I use a windows 10 I am trying to train in a custom data set using this command

python -m torch.distributed.launch --nproc_per_node=8 train.py --config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml

, gives me the following error

Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 56, in main train_data_loader, val_data_loader = get_train_and_val_dataloader(cfg) File "C:\Users\student\Downloads\imaginaire-master\imaginaire-master\imaginaire\utils\dataset.py", line 75, in get_train_and_val_dataloader train_data_loader = _get_data_loader( File "C:\Users\student\Downloads\imaginaire-master\imaginaire-master\imaginaire\utils\dataset.py", line 46, in _get_data_loader not_distributed = not_distributed or not dist.is_initialized() AttributeError: module 'torch.distributed' has no attribute 'is_initialized' Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\distributed\launch.py", line 263, in main() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\distributed\launch.py", line 258, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['C:\ProgramData\Anaconda3\python.exe', '-u', 'train.py', '--local_rank=7', '--config', 'configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml']' returned non-zero exit status 1.

opened by dikshantjn 6
inference on fs-vid-to-vid

@arunmallya @dumerrill @mjgarland hey.. thanks for sharing your code. I tried running your face-forensics pre-trained model on custom reference test data, however, the results are not good.

Can you suggest anything I should try?

https://user-images.githubusercontent.com/22388014/115153448-8e1f8880-a093-11eb-8731-82414e5795c6.mp4

opened by SURABHI-GUPTA 6

Ask a question about YoutubeDance datasets file paths in fs-vid2vid

Hello, I want to implement the pose synthesis in fs-vid2vid. I have downloaded a set of youtube dancing datasets that you have provided. And I have converted them to Openpose format and Densepos format. Finally generated the LMDB file. The file path is shown below:

pose
└───lmdb
    └───train
        └───human_instance_maps
               └───data.mdb
               └───lock.mdb
        └───images
               └───data.mdb
               └───lock.mdb
        └───poses-openpose
               └───data.mdb
               └───lock.mdb
        └───pose_maps-densepose
               └───data.mdb
               └───lock.mdb
        └───all_filenames.json
        └───metadata.json
    └───val
        └───human_instance_maps
               └───data.mdb
               └───lock.mdb
        └───images
                     ...(similar to train file path)
└───raw
    └───train
        └───human_instance_maps
               └───000000
                     └───frame000329_INDS.png
                     └───frame000330_INDS.png
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───images
               └───000000
                     └───frame000329.jpg
                     └───frame000330.jpg
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───poses-openpose
               └───000000
                     └───frame000329_keypoints.json
                     └───frame000330_keypoints.json
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......
        └───pose_maps-densepose
               └───000000
                     └───frame000329_IUV.png
                     └───frame000330_IUV.png
                                       ...
               └───000001
                            .......
               └───000002
                            .......
                        .......

Now, I use python -m torch.distributed.launch --nproc_per_node=4 train.py --config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml train this datasets. And ampO1.yaml roots part is shown in below

train:
        roots:
            - ./datasets/pose/lmdb/train/
        batch_size: 6
        initial_sequence_length: 4
        max_sequence_length: 16
        augmentations:
            resize_smallest_side: 540
            horizontal_flip: False
    
 val:
        roots:
            - ./datasets/pose/lmdb/val/
        batch_size: 1
        augmentations:            
            resize_smallest_side: 540
            horizontal_flip: False

However, I get the error

Traceback (most recent call last):
  File "train.py", line 99, in <module>
    main()
  File "train.py", line 93, in main
    trainer.end_of_epoch(data, current_epoch, current_iteration)
UnboundLocalError: local variable 'data' referenced before assignment

I found that the reason for this problem is for it, data in enumerate(train_data_loader):(train.py line77) this for loop is not executed. I debug these codes, I get train_dataset object (dataset.py line 74) is shown below

Is there a problem with my datasets path, or is there something I need to improve?

opened by HappyDeepLearning 6

Google colab notebooks, anyone?

Hello, Has anyone gotten this working with Google Colab yet? Would be very excited to check it out!! I'm working on it, but haven't had success yet.

One major issue: Colab is running cuda 10.1, and it looks like it may not be possible to do a local installation of cuda 10.2, which is needed for Imaginaire to work.

opened by terekita 6
Different results in content encoding in munit with amp O1 or amp O0
Hi there,

First of all thank you for this amazing library. It really helps people like me to bootstrap in the amazing world of generative networks!

That said I have noticed a strange behaviour when running the training on munit/afhq_doc2cat:

when you run in amp O1 optimization level, the content reconstruction error diverges (error above 3.5 all the time)

however running exactly the same settings but with amp O0, the same content reconstruction error converges to values as small as 0.6 See attached content_recon.png

I wonder if this is the expected behaviour.

Additional information:

the same behaviour happens when using torch.cuda.amp instead of apex

the same behaviour happens when using my dataset which is not about dogs and cats ;)

the style encoding reconstruction error does NOT seem to suffer from the same issue See attached style_recon.png

Configuration:

ubuntu 18.04

2 V100 GPUs

nvidia driver 450.66

pytorch 1.6

cuda 10.2.89

cudnn 8.04.30

Looking forward to reading from you,

Pierre
opened by theponpon 6
UnboundLocalError: local variable 'data' referenced before assignment

@arunmallya I opened this new issue since you closed the last one as far as I have checked I haven't made and any changes to the directory structure it's as same as the one mentioned in the readme file and I haven't made any changes to the input configuration too so I'm not sure why this error is showing up.

Dir Sturcture +---test | +---human_instance_maps | | +---seq0001 | | ---seq0002 | +---images | | +---seq0001 | | ---seq0002 | +---poses-openpose | | +---seq0001 | | ---seq0002 | ---pose_maps-densepose | +---seq0001 | ---seq0002 ---train +---human_instance_maps | +---seq0001 | ---seq0002 +---images | +---seq0001 | ---seq0002 +---poses-openpose | +---seq0001 | ---seq0002 ---pose_maps-densepose +---seq0001 ---seq0002

the images have .jpg extension and the instance_maps and pose_maps have .png extension the name of the files in each dir are the same too

Error stack Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) UnboundLocalError: local variable 'data' referenced before assignment Traceback (most recent call last): File "/home/ubuntu/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/bin/python3', '-u', 'train.py', '--local_rank=0', '--single_gpu', '--config', 'configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml']' returned non-zero exit status 1.

The command that i used to create the lmbd dataset is "python3 scripts/build_lmdb.py --config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml --data_root /mnt/fs/datasets/fsv/jay-arul-long-shot-fsv/train --output_root ~/imaginaire/datasets/youtubeDancing/lmdb/train --paired"

opened by arulpraveent 5
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
Experiment of using Instance Normalization vs Layer Normalization on the Decoder (MUNIT)

Here are the results of using different normalization on the decoder.

By the computation operation of the normalization methods, the MUNIT architecture can be summarized as follows.

This means that since there's no tuning channel correlation on the upsampling layer (i.e., Adaptive Instance Normalization, StyleGAN), if you use instance normalization during upsampling, the tunned channel correlation (ResNet + Adaptive Instance Normalization) will be destroyed.

opened by tom99763 0
No module named 'imaginaire'

I try to get Imdbs following the link, but get an error: Error while finding module specification for 'imaginaire.tools.build_lmdb' (ModuleNotFoundError: No module named 'imaginaire')

I have already install the third party packages ( installing with conda ), so I want to konw what caused this error?

opened by Nyoko74 0
fs_vid2vid model inference outputs are all 0

When I use the given test dataset and follow the instructions to run inference.py for fs_vid2vid model, the outputs are all 0. I don't know where the error might be(pre-trained model parameters? or datatype?). Please give me some suggestions. https://github.com/NVlabs/imaginaire/blob/master/projects/fs_vid2vid/README.md

opened by Linxi-ZHAO 1
NVMLError(ret) pynvml.nvml.NVMLError_InvalidArgument: Invalid Argument

Hi,

I am using Ubuntu 20.04 with Nvidia RTX3090. When I followed the instructions to train the model, it always gives me this error: File "/opt/conda/lib/python3.8/site-packages/pynvml/nvml.py", line 366, in check_return raise NVMLError(ret) pynvml.nvml.NVMLError_InvalidArgument: Invalid Argument Does anyone know any possible solutions? That would be very helpful.

opened by GabrielZZZ 3

How to stable MUNIT training process?

I use my own dataset to train MUNIT(which is similar to the synthetic2cityscape), but the training process is very unstable. How can I make training procedure more stable? some of the charts are look like below: And the yaml file is as blow, which is modified from configs/projects/munit/summer2winter_hd/ampO1.yaml

pretrained_weight: 17gYCHgWD9xM_EFqid1S3b3MXBjIvElAI
inference_args:
    # Translates images from domain A to B or from B to A.
    a2b: True
    # Samples the style code from the prior distribution or uses the style code
    # encoded from the input images in the other domain.
    random_style: True

# How often do you want to log the training stats.
logging_iter: 10
# Number of training epochs.
max_iter: 100000
# Whether to benchmark speed or not.
speed_benchmark: True

image_display_iter: 500
image_save_iter: 5000
snapshot_save_iter: 5000
trainer:
    type: imaginaire.trainers.munit
    model_average_config:
        enabled: True
    amp_config:
        enabled: True
    gan_mode: hinge
    perceptual_mode: vgg19
    perceptual_layers: 'relu_4_1'
    loss_weight:
        gan: 1
        image_recon: 10
        content_recon: 1
        style_recon: 1
        perceptual: 0
        cycle_recon: 10
        gp: 0
        consistency_reg: 0
    init:
        type: orthogonal
        gain: 1
gen_opt:
    type: adam
    lr: 0.0001
    adam_beta1: 0.5
    adam_beta2: 0.999
    lr_policy:
        type: constant
dis_opt:
    type: adam
    lr: 0.0004
    adam_beta1: 0.5
    adam_beta2: 0.999
    lr_policy:
        type: constant
gen:
    type: imaginaire.generators.munit
    latent_dim: 8
    num_filters: 64
    num_filters_mlp: 256
    num_res_blocks: 4
    num_mlp_blocks: 2
    num_downsamples_style: 4
    num_downsamples_content: 3
    content_norm_type: instance
    style_norm_type: none
    decoder_norm_type: instance
    weight_norm_type: spectral
    pre_act: True
dis:
    type: imaginaire.discriminators.munit
    patch_wise: True
    num_filters: 48
    max_num_filters: 1024
    num_layers: 5
    activation_norm_type: none
    weight_norm_type: spectral

# Data options.
data:
    # Name of this dataset.
    name: fusion2cityscape
    # Which dataloader to use?
    type: imaginaire.datasets.unpaired_images
    # How many data loading workers per GPU?
    num_workers: 8
    input_types:
        - images_a:
            # If not specified, is None by default.
            ext: png
            # If not specified, is None by default.
            num_channels: 3
            # If not specified, is None by default.
            normalize: True
        - images_b:
            # If not specified, is None by default.
            ext: png
            # If not specified, is None by default.
            num_channels: 3
            # If not specified, is None by default.
            normalize: True

    # Train dataset details.
    train:
        # Input LMDBs.
        roots:
            - /data/hdd2/zhangrui/dataset/fusion2cityscape_raw/train
        # Batch size per GPU.
        batch_size: 8
        # Data augmentations to be performed in given order.
        augmentations:
            # First resize all inputs to this size.
            resize_h_w: 480, 640 
            # Horizontal flip?
            horizontal_flip: True
            # Crop size.
            random_crop_h_w: 480, 640

    # Val dataset details.
    val:
        # Input LMDBs.
        roots:
            - /data/hdd2/zhangrui/dataset/fusion2cityscape_raw/test
        # Batch size per GPU.
        batch_size: 1
        # If resize_h_w is not given, then it is assumed to be same as crop_h_w.
        augmentations:
            center_crop_h_w: 480, 640

test_data:
    # Name of this dataset.
    name: fusion2cityscape
    # Which dataloader to use?
    type: imaginaire.datasets.unpaired_images
    input_types:
        - images_a:
              ext: png
              num_channels: 3
              normalize: True
        - images_b:
              ext: png
              num_channels: 3
              normalize: True

    # Which labels to be concatenated as final output label from dataloader.
    paired: False
    # Validation dataset details.
    test:
        is_lmdb: False
        roots:
            - /data/hdd2/zhangrui/dataset/fusion2cityscape_raw/test
        # Batch size per GPU.
        batch_size: 1
        # If resize_h_w is not given, then it is assumed to be same as crop_h_w.
        augmentations:
            resize_smallest_side: 1024

opened by NYCXI 2

Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Related tags

Overview

Imaginaire

Docs | License | Installation | Model Zoo

License

What's inside?

Supervised Image-to-Image Translation

Unsupervised Image-to-Image Translation

Video-to-video Translation

World-to-world Translation

Comments

Patching CVE-2007-4559

Owner

NVIDIA Research Projects

Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

A python bot to move your mouse every few seconds to appear active on Skype, Teams or Zoom as you go AFK. 🐭 🤖

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Geometric Deep Learning Extension Library for PyTorch

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

TorchX: A PyTorch Extension Library for More Efficient Deep Learning

A PyTorch Library for Accelerating 3D Deep Learning Research

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

Deep learning image registration library for PyTorch

This repository provides an efficient PyTorch-based library for training deep models.

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

A general framework for deep learning experiments under PyTorch based on pytorch-lightning