Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Overview

imaginaire_logo.svg

Imaginaire

Docs | License | Installation | Model Zoo

Imaginaire is a pytorch library that contains optimized implementation of several image and video synthesis methods developed at NVIDIA.

License

Imaginaire is released under NVIDIA Software license. For commercial use, please consult NVIDIA Research Inquiries.

What's inside?

IMAGE ALT TEXT

We have a tutorial for each model. Click on the model name, and your browser should take you to the tutorial page for the project.

Supervised Image-to-Image Translation

Algorithm Name Feature Publication
pix2pixHD Learn a mapping that converts a semantic image to a high-resolution photorealistic image. Wang et. al. CVPR 2018
SPADE Improve pix2pixHD on handling diverse input labels and delivering better output quality. Park et. al. CVPR 2019

Unsupervised Image-to-Image Translation

Algorithm Name Feature Publication
UNIT Learn a one-to-one mapping between two visual domains. Liu et. al. NeurIPS 2017
MUNIT Learn a many-to-many mapping between two visual domains. Huang et. al. ECCV 2018
FUNIT Learn a style-guided image translation model that can generate translations in unseen domains. Liu et. al. ICCV 2019
COCO-FUNIT Improve FUNIT with a content-conditioned style encoding scheme for style code computation. Saito et. al. ECCV 2020

Video-to-video Translation

Algorithm Name Feature Publication
vid2vid Learn a mapping that converts a semantic video to a photorealistic video. Wang et. al. NeurIPS 2018
fs-vid2vid Learn a subject-agnostic mapping that converts a semantic video and an example image to a photoreslitic video. Wang et. al. NeurIPS 2019

World-to-world Translation

Algorithm Name Feature Publication
wc-vid2vid Improve vid2vid on view consistency and long-term consistency. Mallya et. al. ECCV 2020
GANcraft Convert semantic block worlds to realistic-looking worlds. Hao et. al. ICCV 2021
Comments
  • error : vid2vid_street.yaml >> /tmp/unit_test.log [Failure] in run test bash scripts/test_training.sh

    error : vid2vid_street.yaml >> /tmp/unit_test.log [Failure] in run test bash scripts/test_training.sh

    root@4460709f4b11:/workspace# bash scripts/test_training.sh /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_oe27bhot/none_vqivpge9 INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_oe27bhot/none_vqivpge9/attempt_0/0/error.json Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth 100%|█████████████████████████████████████████████████████████████████████| 548M/548M [01:40<00:00, 5.71MB/s] /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3590: UserWarning: Default upsampling behavior when mode=bicubic is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn( /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3638: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn( /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1153.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0005445480346679688 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "386", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/spade.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_4bbnta_g/none_vmrfh3_4 INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_4bbnta_g/none_vmrfh3_4/attempt_0/0/error.json /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ../c10/core/TensorImpl.h:1153.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0006163120269775391 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "936", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/pix2pixHD.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_62ca57fw/none_cbejg5ie INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_62ca57fw/none_cbejg5ie/attempt_0/0/error.json [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0006213188171386719 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "1313", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/munit.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_peoaucrn/none_je_rxzez INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_peoaucrn/none_je_rxzez/attempt_0/0/error.json [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0005629062652587891 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "1561", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/munit_patch.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_gf_rgw_1/none_yd9hagbt INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_gf_rgw_1/none_yd9hagbt/attempt_0/0/error.json /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3638: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn( [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0006194114685058594 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "1809", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/unit.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_k4k83su_/none_doq690x_ INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_k4k83su_/none_doq690x_/attempt_0/0/error.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0004730224609375 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "2358", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/funit.yaml >> /tmp/unit_test.log [Success] /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:torch.distributed.launch is Deprecated. Use torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_qmiyb0yp/none_nw6jcn_x INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_qmiyb0yp/none_nw6jcn_x/attempt_0/0/error.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0007641315460205078 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "2600", "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [1]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "4460709f4b11", "state": "SUCCEEDED", "total_run_time": 15, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 0}} python -m torch.distributed.launch --nproc_per_node=1 train.py --config configs/unit_test/coco_funit.yaml >> /tmp/unit_test.log [Success] Traceback (most recent call last): File "train.py", line 168, in main() File "train.py", line 92, in main trainer = get_trainer(cfg, net_G, net_D, File "/workspace/imaginaire/utils/trainer.py", line 59, in get_trainer trainer = trainer_lib.Trainer(cfg, net_G, net_D, File "/workspace/imaginaire/trainers/vid2vid.py", line 44, in init super(Trainer, self).init(cfg, net_G, net_D, opt_G, File "/workspace/imaginaire/trainers/base.py", line 99, in init self._init_loss(cfg) File "/workspace/imaginaire/trainers/vid2vid.py", line 145, in _init_loss self.criteria['Flow'] = FlowLoss(cfg) File "/workspace/imaginaire/losses/flow.py", line 59, in init self.flowNet = flow_module.FlowNet(pretrained=True) File "/workspace/imaginaire/third_party/flow_net/flow_net.py", line 30, in init checkpoint = torch.load(flownet2_path, File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. python train.py --single_gpu --config configs/unit_test/vid2vid_street.yaml >> /tmp/unit_test.log [Failure] root@4460709f4b11:/workspace#

    `

    opened by D-Mad 19
  • Inference error can not resolve - Adam object has noattribute _step_count

    Inference error can not resolve - Adam object has noattribute _step_count

    We have trained and are now trying to run inference (using the new corrected config file found in issue #106). However, we are getting the following error. We have traced through the code and can not see anything we can do to resolve this issues. Can you please advise?

    • We have looked at multiple config files online and none mention this particular parameter as it relates to the Adam optimizer

    • We have even used rather than our training data the same data used for validation in training --although they both have the exact same structure.

    Using random seed 0
    cudnn benchmark: True
    cudnn deterministic: False
    LMDB ROOT ['dataset/val/']
    Creating metadata
    ['images', 'poses-openpose']
    Data file extensions: {'images': 'jpg', 'poses-openpose': 'json'}
    Searching in dir: images
    Found 40 sequences
    Found 1524 files
    Folder at dataset/val/images opened.
    Folder at dataset/val/poses-openpose opened.
    Num datasets: 1
    Num sequences: 40
    Max sequence length: 40
    Epoch length: 40
    Using random seed 0
    Concatenate images:
        ext: jpg
        num_channels: 3
        normalize: True for input.
    	Num. of channels in the input image: 3
    Concatenate images:
        ext: jpg
        num_channels: 3
        normalize: True for input.
    Concatenate poses-openpose:
        ext: json
        num_channels: 3
        interpolator: None
        normalize: False
        pre_aug_ops: decode_json, convert::imaginaire.utils.visualization.pose::openpose_to_npy
        post_aug_ops: vis::imaginaire.utils.visualization.pose::draw_openpose_npy for input.
    	Num. of channels in the input label: 3
    Concatenate images:
        ext: jpg
        num_channels: 3
        normalize: True for input.
    	Num. of channels in the input image: 3
    Concatenate images:
        ext: jpg
        num_channels: 3
        normalize: True for input.
    	Num. of channels in the input image: 3
    Concatenate images:
        ext: jpg
        num_channels: 3
        normalize: True for input.
    	Num. of channels in the input image: 3
    Initialized temporal embedding network with the reference one.
    Concatenate images:
        ext: jpg
        num_channels: 3
        normalize: True for input.
    Concatenate poses-openpose:
        ext: json
        num_channels: 3
        interpolator: None
        normalize: False
        pre_aug_ops: decode_json, convert::imaginaire.utils.visualization.pose::openpose_to_npy
        post_aug_ops: vis::imaginaire.utils.visualization.pose::draw_openpose_npy for input.
    	Num. of channels in the input label: 3
    Concatenate images:
        ext: jpg
        num_channels: 3
        normalize: True for input.
    	Num. of channels in the input image: 3
    Initialize net_G and net_D weights using type: xavier gain: 0.02
    Using random seed 0
    net_G parameter count: 91,147,294
    net_D parameter count: 5,598,018
    Use custom initialization for the generator.
    Opt D Adam (
    Parameter Group 0
        amsgrad: False
        betas: (0.0, 0.999)
        eps: 1e-08
        initial_lr: 0.0004
        lr: 0.0004
        weight_decay: 0
    )
    Setup trainer.
    Using automatic mixed precision training.
    Augmentation policy: 
    GAN mode: hinge
    Perceptual loss:
    	Mode: vgg19
    Loss GAN                  Weight 1.0
    Loss FeatureMatching      Weight 10.0
    Loss Perceptual           Weight 10.0
    Loss Flow                 Weight 10.0
    Loss Flow_L1              Weight 10.0
    Loss Flow_Warp            Weight 10.0
    Loss Flow_Mask            Weight 10.0
    Done with loading the checkpoint.
    configs for inference
    finetune: True
    finetune_iter: 100
    few_shot_seq_index: 0
    few_shot_frame_index: 0
    driving_seq_index: 1
    output dir:  projects/fs_vid2vid/output/face_forensics
    Epoch length: 40
      0% 0/40 [00:00<?, ?it/s]person dict [160.594, 118.24, 0.251877, 192.575, 144.063, 0.824015, 225.618, 146.141, 0.722625, 228.763, 187.411, 0.577983, 222.522, 197.745, 0.0691224, 156.441, 141.995, 0.65878, 136.839, 176.029, 0.816474, 122.349, 203.979, 0.806893, 194.681, 207.035, 0.582138, 219.419, 204.992, 0.580149, 214.285, 253.546, 0.750408, 233.867, 303.093, 0.767982, 172.951, 209.085, 0.521009, 161.562, 260.727, 0.755334, 152.304, 317.576, 0.783057, 0, 0, 0, 161.64, 113.054, 0.485235, 201.904, 119.295, 0.144664, 175.014, 114.108, 0.830086, 109.928, 324.766, 0.654595, 114.095, 326.869, 0.640277, 160.523, 325.834, 0.651386, 214.269, 302.072, 0.6639, 218.372, 302.011, 0.653834, 237.02, 310.333, 0.700539]
    person dict [167.8, 86.1641, 0.864743, 120.294, 97.5681, 0.558493, 94.4901, 93.4588, 0.498503, 148.18, 96.528, 0.296182, 163.661, 101.676, 0.0693238, 145.079, 102.756, 0.410478, 200.818, 102.709, 0.455387, 277.273, 103.817, 0.660783, 117.187, 174.002, 0.430313, 99.6447, 175.023, 0.37583, 111.004, 234.891, 0.673355, 102.742, 301.006, 0.74067, 132.683, 174.004, 0.437687, 136.82, 226.63, 0.563235, 128.554, 281.403, 0.770815, 161.654, 76.9165, 0.851908, 169.845, 79.0123, 0.305025, 134.727, 70.7234, 0.842317, 0, 0, 0, 154.368, 287.597, 0.292605, 152.313, 284.509, 0.335124, 121.323, 286.564, 0.595389, 147.154, 313.39, 0.593413, 138.878, 316.507, 0.520831, 94.4743, 307.206, 0.651808]
    Training layers:  ['conv_img', 'up', 'weight_generator.fc']
      0% 0/40 [00:01<?, ?it/s]
    Traceback (most recent call last):
      File "/content/drive/My Drive/imaginaire/inference.py", line 99, in <module>
        main()
      File "/content/drive/My Drive/imaginaire/inference.py", line 95, in main
        trainer.test(test_data_loader, args.output_dir, cfg.inference_args)
      File "/content/drive/My Drive/imaginaire/imaginaire/trainers/fs_vid2vid.py", line 168, in test
        output = self.test_single(data, output_dir, inference_args)
      File "/content/drive/My Drive/imaginaire/imaginaire/trainers/vid2vid.py", line 376, in test_single
        self.finetune(data, inference_args)
      File "/content/drive/My Drive/imaginaire/imaginaire/trainers/fs_vid2vid.py", line 287, in finetune
        self.gen_update(data_finetune)
      File "/content/drive/My Drive/imaginaire/imaginaire/trainers/vid2vid.py", line 265, in gen_update
        self.get_dis_losses(net_D_output)
      File "/content/drive/My Drive/imaginaire/imaginaire/trainers/vid2vid.py", line 642, in get_dis_losses
        if self.last_step_count_D == self.opt_D._step_count:
    **AttributeError: 'Adam' object has no attribute '_step_count'**
    
    opened by grewe 17
  • [FSVID2VID - - POSE ] TypeError: conv2d() received an invalid combination of arguments

    [FSVID2VID - - POSE ] TypeError: conv2d() received an invalid combination of arguments

    I am trying to train a custom pose dataset using only openpose.

    I get the following error: /content/drive/My Drive/imaginaire Using random seed 2 Training with 1 GPUs. Make folder logs/2021_1110_2255_34_ampO1 cudnn benchmark: True cudnn deterministic: False LMDB ROOT ['dataset/train'] Creating metadata ['human_instance_maps', 'images', 'poses-openpose'] Data file extensions: {'images': 'jpg', 'poses-openpose': 'json', 'human_instance_maps': 'png'} Searching in dir: images Found 336 sequences Found 12934 files Folder at dataset/train/images opened. Folder at dataset/train/poses-openpose opened. Folder at dataset/train/human_instance_maps opened. Num datasets: 1 Num sequences: 336 Max sequence length: 40 Epoch length: 336 LMDB ROOT ['dataset/val'] Creating metadata ['human_instance_maps', 'images', 'poses-openpose'] Data file extensions: {'images': 'jpg', 'poses-openpose': 'json', 'human_instance_maps': 'png'} Searching in dir: images Found 40 sequences Found 1524 files Folder at dataset/val/images opened. Folder at dataset/val/poses-openpose opened. Folder at dataset/val/human_instance_maps opened. Num datasets: 1 Num sequences: 40 Max sequence length: 40 Epoch length: 40 Train dataset length: 336 Val dataset length: 40 Using random seed 2 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Concatenate poses-openpose: ext: json num_channels: 3 interpolator: None normalize: False pre_aug_ops: decode_json, convert::imaginaire.utils.visualization.pose::openpose_to_npy post_aug_ops: vis::imaginaire.utils.visualization.pose::draw_openpose_npy computed_on_the_fly: False is_mask: False for input. Concatenate human_instance_maps: ext: png num_channels: 3 is_mask: True normalize: False computed_on_the_fly: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input label: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Initialized temporal embedding network with the reference one. Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Concatenate poses-openpose: ext: json num_channels: 3 interpolator: None normalize: False pre_aug_ops: decode_json, convert::imaginaire.utils.visualization.pose::openpose_to_npy post_aug_ops: vis::imaginaire.utils.visualization.pose::draw_openpose_npy computed_on_the_fly: False is_mask: False for input. Concatenate human_instance_maps: ext: png num_channels: 3 is_mask: True normalize: False computed_on_the_fly: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input label: 3 Concatenate images: ext: jpg num_channels: 3 normalize: True computed_on_the_fly: False is_mask: False pre_aug_ops: None post_aug_ops: None for input. Num. of channels in the input image: 3 Initialize net_G and net_D weights using type: xavier gain: 0.02 Using random seed 2 net_G parameter count: 91,147,294 net_D parameter count: 6,292,963 Use custom initialization for the generator. Setup trainer. Using automatic mixed precision training. Augmentation policy: GAN mode: hinge Perceptual loss: Mode: vgg19 Loss GAN Weight 1.0 Loss FeatureMatching Weight 10.0 Loss Perceptual Weight 10.0 Loss GAN_face Weight 10.0 Loss FeatureMatching_face Weight 10.0 Loss Flow Weight 10.0 Loss Flow_L1 Weight 10.0 Loss Flow_Warp Weight 10.0 Loss Flow_Mask Weight 10.0 TRAIN DATASET := <imaginaire.trainers.fs_vid2vid.Trainer object at 0x7f553eec3fa0> No checkpoint found. Epoch 0 ... Epoch length: 336 ------ Now start training 4 frames ------- Traceback (most recent call last): File "/content/drive/My Drive/imaginaire/train.py", line 169, in <module> main() File "/content/drive/My Drive/imaginaire/train.py", line 129, in main for it, data in enumerate(train_data_loader): File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/usr/local/lib/python3.9/site-packages/torch/_utils.py", line 434, in reraise raise exception KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/drive/My Drive/imaginaire/imaginaire/datasets/paired_videos.py", line 288, in __getitem__ return self._getitem(index) File "/content/drive/My Drive/imaginaire/imaginaire/datasets/paired_few_shot_videos.py", line 303, in _getitem data = self.apply_ops(data, self.full_data_ops, full_data=True) File "/content/drive/My Drive/imaginaire/imaginaire/datasets/base.py", line 428, in apply_ops data = op(data) File "/content/drive/My Drive/imaginaire/imaginaire/model_utils/fs_vid2vid.py", line 108, in crop_face_from_data landmarks = data['landmarks-dlib68_xy'] KeyError: 'landmarks-dlib68_xy'

    So I tried removing the crop_face_from_data from the config file, and now I get this error message:

    TRAIN DATASET :=  <imaginaire.trainers.fs_vid2vid.Trainer object at 0x7f09ac9606a0>
    No checkpoint found.
    Epoch 0 ...
    Epoch length: 336
    ------ Now start training 4 frames -------
    Traceback (most recent call last):
      File "/content/drive/My Drive/imaginaire/train.py", line 169, in <module>
        main()
      File "/content/drive/My Drive/imaginaire/train.py", line 141, in main
        trainer.gen_update(
      File "/content/drive/My Drive/imaginaire/imaginaire/trainers/vid2vid.py", line 254, in gen_update
        net_G_output = self.net_G(data_t)
      File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/content/drive/My Drive/imaginaire/imaginaire/utils/trainer.py", line 195, in forward
        return self.module(*args, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/content/drive/My Drive/imaginaire/imaginaire/generators/fs_vid2vid.py", line 151, in forward
        self.weight_generator(ref_images, ref_labels, label, is_first_frame)
      File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/content/drive/My Drive/imaginaire/imaginaire/generators/fs_vid2vid.py", line 586, in forward
        self.encode_reference(ref_image, ref_label, label, k)
      File "/content/drive/My Drive/imaginaire/imaginaire/generators/fs_vid2vid.py", line 644, in encode_reference
        x_label = self.ref_label_first(ref_label)
      File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/content/drive/My Drive/imaginaire/imaginaire/layers/conv.py", line 142, in forward
        x = layer(x)
      File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
        result = forward_call(*input, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 446, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    TypeError: conv2d() received an invalid combination of arguments - got (NoneType, Tensor, Parameter, tuple, tuple, tuple, int), but expected one of:
     * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
          didn't match because some of the arguments have invalid types: (NoneType, Tensor, Parameter, tuple, tuple, tuple, int)
     * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
          didn't match because some of the arguments have invalid types: (NoneType, Tensor, Parameter, tuple, tuple, tuple, int)
    
    

    Here is the edited yaml file attached. Is there any way to solve this?

    opened by dikshantjn 13
  • dancing datasets#lmdb

    dancing datasets#lmdb

    I have processed the data to lmdb format, the location is datasets/youtubeDancing/lmdb/train, and the roots in ampO1.yaml is also datasets/youtubeDancing/lmdb/train. But during training, the following occurs

    Found 0 sequences Found 0 files Folder at datasets/youtubeDancing/lmdb/train/images opened. Folder at datasets/youtubeDancing/lmdb/train/pose_maps-densepose opened. Folder at datasets/youtubeDancing/lmdb/train/poses-openpose opened. Folder at datasets/youtubeDancing/lmdb/train/human_instance_maps opened.

    Hope someone can help, thanks image

    opened by Mumuwei 8
  • mode collapse

    mode collapse

    hi, when I trained with coco_funit, In the first few epochs, the results are normal, but mode collapse appears from the 59th epoch. Is this normal? Will it also appear during your training? epoch_00063_iteration_000094000

    opened by hugo-xie 8
  • AttributeError: module 'torch.distributed' has no attribute 'is_initialized' for fs_vid2vid

    AttributeError: module 'torch.distributed' has no attribute 'is_initialized' for fs_vid2vid

    I use a windows 10 I am trying to train in a custom data set using this command

    python -m torch.distributed.launch --nproc_per_node=8 train.py --config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml

    , gives me the following error

    Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 56, in main train_data_loader, val_data_loader = get_train_and_val_dataloader(cfg) File "C:\Users\student\Downloads\imaginaire-master\imaginaire-master\imaginaire\utils\dataset.py", line 75, in get_train_and_val_dataloader train_data_loader = _get_data_loader( File "C:\Users\student\Downloads\imaginaire-master\imaginaire-master\imaginaire\utils\dataset.py", line 46, in _get_data_loader not_distributed = not_distributed or not dist.is_initialized() AttributeError: module 'torch.distributed' has no attribute 'is_initialized' Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\distributed\launch.py", line 263, in main() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\distributed\launch.py", line 258, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['C:\ProgramData\Anaconda3\python.exe', '-u', 'train.py', '--local_rank=7', '--config', 'configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml']' returned non-zero exit status 1.

    opened by dikshantjn 6
  • inference on fs-vid-to-vid

    inference on fs-vid-to-vid

    @arunmallya @dumerrill @mjgarland hey.. thanks for sharing your code. I tried running your face-forensics pre-trained model on custom reference test data, however, the results are not good.

    Can you suggest anything I should try?

    https://user-images.githubusercontent.com/22388014/115153448-8e1f8880-a093-11eb-8731-82414e5795c6.mp4

    opened by SURABHI-GUPTA 6
  • Ask a question about  YoutubeDance  datasets file paths in fs-vid2vid

    Ask a question about YoutubeDance datasets file paths in fs-vid2vid

    Hello, I want to implement the pose synthesis in fs-vid2vid. I have downloaded a set of youtube dancing datasets that you have provided. And I have converted them to Openpose format and Densepos format. Finally generated the LMDB file. The file path is shown below:

    pose
    └───lmdb
        └───train
            └───human_instance_maps
                   └───data.mdb
                   └───lock.mdb
            └───images
                   └───data.mdb
                   └───lock.mdb
            └───poses-openpose
                   └───data.mdb
                   └───lock.mdb
            └───pose_maps-densepose
                   └───data.mdb
                   └───lock.mdb
            └───all_filenames.json
            └───metadata.json
        └───val
            └───human_instance_maps
                   └───data.mdb
                   └───lock.mdb
            └───images
                         ...(similar to train file path)
    └───raw
        └───train
            └───human_instance_maps
                   └───000000
                         └───frame000329_INDS.png
                         └───frame000330_INDS.png
                                           ...
                   └───000001
                                .......
                   └───000002
                                .......
                            .......
            └───images
                   └───000000
                         └───frame000329.jpg
                         └───frame000330.jpg
                                           ...
                   └───000001
                                .......
                   └───000002
                                .......
                            .......
            └───poses-openpose
                   └───000000
                         └───frame000329_keypoints.json
                         └───frame000330_keypoints.json
                                           ...
                   └───000001
                                .......
                   └───000002
                                .......
                            .......
            └───pose_maps-densepose
                   └───000000
                         └───frame000329_IUV.png
                         └───frame000330_IUV.png
                                           ...
                   └───000001
                                .......
                   └───000002
                                .......
                            .......
    

    Now, I use python -m torch.distributed.launch --nproc_per_node=4 train.py --config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml train this datasets. And ampO1.yaml roots part is shown in below

    train:
            roots:
                - ./datasets/pose/lmdb/train/
            batch_size: 6
            initial_sequence_length: 4
            max_sequence_length: 16
            augmentations:
                resize_smallest_side: 540
                horizontal_flip: False
        
     val:
            roots:
                - ./datasets/pose/lmdb/val/
            batch_size: 1
            augmentations:            
                resize_smallest_side: 540
                horizontal_flip: False
    

    However, I get the error

    Traceback (most recent call last):
      File "train.py", line 99, in <module>
        main()
      File "train.py", line 93, in main
        trainer.end_of_epoch(data, current_epoch, current_iteration)
    UnboundLocalError: local variable 'data' referenced before assignment
    

    I found that the reason for this problem is for it, data in enumerate(train_data_loader):(train.py line77) this for loop is not executed. I debug these codes, I get train_dataset object (dataset.py line 74) is shown below image

    Is there a problem with my datasets path, or is there something I need to improve?

    opened by HappyDeepLearning 6
  • Google colab notebooks, anyone?

    Google colab notebooks, anyone?

    Hello, Has anyone gotten this working with Google Colab yet? Would be very excited to check it out!! I'm working on it, but haven't had success yet.

    One major issue: Colab is running cuda 10.1, and it looks like it may not be possible to do a local installation of cuda 10.2, which is needed for Imaginaire to work.

    opened by terekita 6
  • Different results in content encoding in munit with amp O1 or amp O0

    Different results in content encoding in munit with amp O1 or amp O0

    Hi there,

    First of all thank you for this amazing library. It really helps people like me to bootstrap in the amazing world of generative networks!

    That said I have noticed a strange behaviour when running the training on munit/afhq_doc2cat:

    • when you run in amp O1 optimization level, the content reconstruction error diverges (error above 3.5 all the time)
    • however running exactly the same settings but with amp O0, the same content reconstruction error converges to values as small as 0.6 See attached content_recon.png content_recon

    I wonder if this is the expected behaviour.

    Additional information:

    • the same behaviour happens when using torch.cuda.amp instead of apex
    • the same behaviour happens when using my dataset which is not about dogs and cats ;)
    • the style encoding reconstruction error does NOT seem to suffer from the same issue See attached style_recon.png style_recon

    Configuration:

    • ubuntu 18.04
    • 2 V100 GPUs
    • nvidia driver 450.66
    • pytorch 1.6
    • cuda 10.2.89
    • cudnn 8.04.30

    Looking forward to reading from you,

    Pierre

    opened by theponpon 6
  • UnboundLocalError: local variable 'data' referenced before assignment

    UnboundLocalError: local variable 'data' referenced before assignment

    @arunmallya I opened this new issue since you closed the last one as far as I have checked I haven't made and any changes to the directory structure it's as same as the one mentioned in the readme file and I haven't made any changes to the input configuration too so I'm not sure why this error is showing up.

    Dir Sturcture +---test | +---human_instance_maps | | +---seq0001 | | ---seq0002 | +---images | | +---seq0001 | | ---seq0002 | +---poses-openpose | | +---seq0001 | | ---seq0002 | ---pose_maps-densepose | +---seq0001 | ---seq0002 ---train +---human_instance_maps | +---seq0001 | ---seq0002 +---images | +---seq0001 | ---seq0002 +---poses-openpose | +---seq0001 | ---seq0002 ---pose_maps-densepose +---seq0001 ---seq0002

    the images have .jpg extension and the instance_maps and pose_maps have .png extension the name of the files in each dir are the same too

    Error stack Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) UnboundLocalError: local variable 'data' referenced before assignment Traceback (most recent call last): File "/home/ubuntu/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/ubuntu/anaconda3/bin/python3', '-u', 'train.py', '--local_rank=0', '--single_gpu', '--config', 'configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml']' returned non-zero exit status 1.

    The command that i used to create the lmbd dataset is "python3 scripts/build_lmdb.py --config configs/projects/fs_vid2vid/YouTubeDancing/ampO1.yaml --data_root /mnt/fs/datasets/fsv/jay-arul-long-shot-fsv/train --output_root ~/imaginaire/datasets/youtubeDancing/lmdb/train --paired"

    opened by arulpraveent 5
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Experiment of using Instance Normalization vs Layer Normalization on the Decoder (MUNIT)

    Experiment of using Instance Normalization vs Layer Normalization on the Decoder (MUNIT)

    Here are the results of using different normalization on the decoder. 圖片1

    By the computation operation of the normalization methods, the MUNIT architecture can be summarized as follows.

    圖片2

    This means that since there's no tuning channel correlation on the upsampling layer (i.e., Adaptive Instance Normalization, StyleGAN), if you use instance normalization during upsampling, the tunned channel correlation (ResNet + Adaptive Instance Normalization) will be destroyed.

    opened by tom99763 0
  • No module named 'imaginaire'

    No module named 'imaginaire'

    I try to get Imdbs following the link, but get an error: Error while finding module specification for 'imaginaire.tools.build_lmdb' (ModuleNotFoundError: No module named 'imaginaire')

    I have already install the third party packages ( installing with conda ), so I want to konw what caused this error?

    opened by Nyoko74 0
  • fs_vid2vid model inference outputs are all 0

    fs_vid2vid model inference outputs are all 0

    opened by Linxi-ZHAO 1
  • NVMLError(ret) pynvml.nvml.NVMLError_InvalidArgument: Invalid Argument

    NVMLError(ret) pynvml.nvml.NVMLError_InvalidArgument: Invalid Argument

    Hi,

    I am using Ubuntu 20.04 with Nvidia RTX3090. When I followed the instructions to train the model, it always gives me this error: File "/opt/conda/lib/python3.8/site-packages/pynvml/nvml.py", line 366, in check_return raise NVMLError(ret) pynvml.nvml.NVMLError_InvalidArgument: Invalid Argument Does anyone know any possible solutions? That would be very helpful. image

    opened by GabrielZZZ 3
  • How to stable MUNIT training process?

    How to stable MUNIT training process?

    I use my own dataset to train MUNIT(which is similar to the synthetic2cityscape), but the training process is very unstable. How can I make training procedure more stable? some of the charts are look like below: image image image And the yaml file is as blow, which is modified from configs/projects/munit/summer2winter_hd/ampO1.yaml

    pretrained_weight: 17gYCHgWD9xM_EFqid1S3b3MXBjIvElAI
    inference_args:
        # Translates images from domain A to B or from B to A.
        a2b: True
        # Samples the style code from the prior distribution or uses the style code
        # encoded from the input images in the other domain.
        random_style: True
    
    # How often do you want to log the training stats.
    logging_iter: 10
    # Number of training epochs.
    max_iter: 100000
    # Whether to benchmark speed or not.
    speed_benchmark: True
    
    image_display_iter: 500
    image_save_iter: 5000
    snapshot_save_iter: 5000
    trainer:
        type: imaginaire.trainers.munit
        model_average_config:
            enabled: True
        amp_config:
            enabled: True
        gan_mode: hinge
        perceptual_mode: vgg19
        perceptual_layers: 'relu_4_1'
        loss_weight:
            gan: 1
            image_recon: 10
            content_recon: 1
            style_recon: 1
            perceptual: 0
            cycle_recon: 10
            gp: 0
            consistency_reg: 0
        init:
            type: orthogonal
            gain: 1
    gen_opt:
        type: adam
        lr: 0.0001
        adam_beta1: 0.5
        adam_beta2: 0.999
        lr_policy:
            type: constant
    dis_opt:
        type: adam
        lr: 0.0004
        adam_beta1: 0.5
        adam_beta2: 0.999
        lr_policy:
            type: constant
    gen:
        type: imaginaire.generators.munit
        latent_dim: 8
        num_filters: 64
        num_filters_mlp: 256
        num_res_blocks: 4
        num_mlp_blocks: 2
        num_downsamples_style: 4
        num_downsamples_content: 3
        content_norm_type: instance
        style_norm_type: none
        decoder_norm_type: instance
        weight_norm_type: spectral
        pre_act: True
    dis:
        type: imaginaire.discriminators.munit
        patch_wise: True
        num_filters: 48
        max_num_filters: 1024
        num_layers: 5
        activation_norm_type: none
        weight_norm_type: spectral
    
    # Data options.
    data:
        # Name of this dataset.
        name: fusion2cityscape
        # Which dataloader to use?
        type: imaginaire.datasets.unpaired_images
        # How many data loading workers per GPU?
        num_workers: 8
        input_types:
            - images_a:
                # If not specified, is None by default.
                ext: png
                # If not specified, is None by default.
                num_channels: 3
                # If not specified, is None by default.
                normalize: True
            - images_b:
                # If not specified, is None by default.
                ext: png
                # If not specified, is None by default.
                num_channels: 3
                # If not specified, is None by default.
                normalize: True
    
        # Train dataset details.
        train:
            # Input LMDBs.
            roots:
                - /data/hdd2/zhangrui/dataset/fusion2cityscape_raw/train
            # Batch size per GPU.
            batch_size: 8
            # Data augmentations to be performed in given order.
            augmentations:
                # First resize all inputs to this size.
                resize_h_w: 480, 640 
                # Horizontal flip?
                horizontal_flip: True
                # Crop size.
                random_crop_h_w: 480, 640
    
        # Val dataset details.
        val:
            # Input LMDBs.
            roots:
                - /data/hdd2/zhangrui/dataset/fusion2cityscape_raw/test
            # Batch size per GPU.
            batch_size: 1
            # If resize_h_w is not given, then it is assumed to be same as crop_h_w.
            augmentations:
                center_crop_h_w: 480, 640
    
    test_data:
        # Name of this dataset.
        name: fusion2cityscape
        # Which dataloader to use?
        type: imaginaire.datasets.unpaired_images
        input_types:
            - images_a:
                  ext: png
                  num_channels: 3
                  normalize: True
            - images_b:
                  ext: png
                  num_channels: 3
                  normalize: True
    
        # Which labels to be concatenated as final output label from dataloader.
        paired: False
        # Validation dataset details.
        test:
            is_lmdb: False
            roots:
                - /data/hdd2/zhangrui/dataset/fusion2cityscape_raw/test
            # Batch size per GPU.
            batch_size: 1
            # If resize_h_w is not given, then it is assumed to be same as crop_h_w.
            augmentations:
                resize_smallest_side: 1024
    
    opened by NYCXI 2
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Linux_kernel_exploits Some Linux kernel exploits for various real world kernel vulnerabilities here. More exploits are yet to come. This repo contains

Wei Wu 472 Dec 21, 2022
This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Mutli-agent task allocation This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams. To change

Biorobotics Lab 5 Oct 12, 2022
A python bot to move your mouse every few seconds to appear active on Skype, Teams or Zoom as you go AFK. 🐭 🤖

PyMouseBot If you're from GT and annoyed with SGVPN idle timeouts while working on development laptop, You might find this useful. A python cli bot to

Oaker Min 6 Oct 24, 2022
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
🧠 A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation.', ECCV 2016

Deep CORAL A PyTorch implementation of 'Deep CORAL: Correlation Alignment for Deep Domain Adaptation. B Sun, K Saenko, ECCV 2016' Deep CORAL can learn

Andy Hsu 200 Dec 25, 2022
deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

null 63 Oct 17, 2022
Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Emile van Krieken 140 Dec 30, 2022
Geometric Deep Learning Extension Library for PyTorch

Documentation | Paper | Colab Notebooks | External Resources | OGB Examples PyTorch Geometric (PyG) is a geometric deep learning extension library for

Matthias Fey 16.5k Jan 8, 2023
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

PyKale 370 Dec 27, 2022
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie_recs Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Coll

ShopRunner 97 Jan 3, 2023
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 2, 2023
TorchX: A PyTorch Extension Library for More Efficient Deep Learning

TorchX TorchX: A PyTorch Extension Library for More Efficient Deep Learning. @misc{torchx, author = {Ansheng You and Changxu Wang}, title = {T

Donny You 8 May 28, 2022
A PyTorch Library for Accelerating 3D Deep Learning Research

Kaolin: A Pytorch Library for Accelerating 3D Deep Learning Research Overview NVIDIA Kaolin library provides a PyTorch API for working with a variety

NVIDIA GameWorks 3.5k Jan 7, 2023
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Collie do

ShopRunner 96 Dec 29, 2022
Deep learning image registration library for PyTorch

TorchIR: Pytorch Image Registration TorchIR is a image registration library for deep learning image registration (DLIR). I have integrated several ide

Bob de Vos 40 Dec 16, 2022
This repository provides an efficient PyTorch-based library for training deep models.

An Efficient Library for Training Deep Models This repository provides an efficient PyTorch-based library for training deep models. Installation Make

Bytedance Inc. 123 Jan 5, 2023
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022