Thank you for open-sourcing this project!
I was trying to train semanticGAN on our dataset (3 labels), by running:
python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=AVAILABLE_PORT train_seg_gan.py --img_dataset=IMG_DATASET_LOC --seg_dataset=SEG_DATASET_LOC --inception=INCEPTION_FILE
(Note: I already updated seg_dim
to 3 in our case in argparse
and reduced the color_map
to only 3 colors in dataset.py
)
However, after the code initializates, I run into this error:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior
Here is the entire output:
/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Loading unlabel dataloader with size 5
Loading unlabel dataloader with size 5
Loading train dataloader with size 4
Loading train dataloader with size 4
Loading val dataloader with size 2
Loading val dataloader with size 2
Loading unlabel dataloader with size Loading unlabel dataloader with size 55
Loading train dataloader with size 4
Loading train dataloader with size 4
Loading val dataloader with size 2
Loading val dataloader with size 2
[W reducer.cpp:1303] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1303] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1303] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1303] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
Traceback (most recent call last):
File "train_seg_gan.py", line 737, in <module>
g_optim, d_img_optim, d_seg_optim, g_ema, device, writer)
File "train_seg_gan.py", line 386, in train
fake_img, latents, mean_path_length
File "train_seg_gan.py", line 111, in g_path_regularize
outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/autograd/__init__.py", line 236, in grad
inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Traceback (most recent call last):
File "train_seg_gan.py", line 737, in <module>
g_optim, d_img_optim, d_seg_optim, g_ema, device, writer)
File "train_seg_gan.py", line 386, in train
fake_img, latents, mean_path_length
File "train_seg_gan.py", line 111, in g_path_regularize
outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/autograd/__init__.py", line 236, in grad
inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Traceback (most recent call last):
File "train_seg_gan.py", line 737, in <module>
g_optim, d_img_optim, d_seg_optim, g_ema, device, writer)
File "train_seg_gan.py", line 386, in train
fake_img, latents, mean_path_length
File "train_seg_gan.py", line 111, in g_path_regularize
outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/autograd/__init__.py", line 236, in grad
inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Traceback (most recent call last):
File "train_seg_gan.py", line 737, in <module>
g_optim, d_img_optim, d_seg_optim, g_ema, device, writer)
File "train_seg_gan.py", line 386, in train
fake_img, latents, mean_path_length
File "train_seg_gan.py", line 111, in g_path_regularize
outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/autograd/__init__.py", line 236, in grad
inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 113892) of binary: /home/user/anaconda3/envs/semanticGAN/bin/python
Traceback (most recent call last):
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user/anaconda3/envs/semanticGAN/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train_seg_gan.py FAILED
------------------------------------------------------------
The error seems to be in this function:
def g_path_regularize(fake_img, latents, mean_path_length, decay=0.01):
noise = torch.randn_like(fake_img) / math.sqrt(
fake_img.shape[2] * fake_img.shape[3]
)
grad, = autograd.grad(
outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True
)
path_lengths = torch.sqrt(grad.pow(2).sum(2).mean(1))
path_mean = mean_path_length + decay * (path_lengths.mean() - mean_path_length)
path_penalty = (path_lengths - path_mean).pow(2).mean()
return path_penalty, path_mean.detach(), path_lengths
Specifically, in line 110-111 where the grad
function is being called, on the output
comprising of fake_img
and noise
, with the input being latents
.
Please let me know if there I'm making an error while executing the command or if I need to make some more changes to the code, considering that we're trying to train on our own dataset.
Thank you for your time!