when i run PMF code on Sensat Urban Dataset,it show that “UnboundLocalError: local variable 'mean_acc' referenced before assignment” and "UnboundLocalError: local variable 'lr' referenced before assignment"。How can I solve this problem?
Complete error reporting information is as follows
/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
| distributed init (rank 1): env://
| distributed init (rank 2): env://
| distributed init (rank 0): env://
>> Init a recoder at ../../../experiments/PMF-sensat/log_SensatUrban_PMFNet-resnet101_bs12-lr0.001_baseline-timestamp
loading data frame...
0it [00:00, ?it/s]
Using 0 data frame from train split
loading data frame...
0it [00:00, ?it/s]
Using 0 data frame from val split
Generate 0 samples from train split
Generate 0 samples from val split
loading data frame...
0it [00:00, ?it/s]
Using 0 data frame from train split
loading data frame...
0it [00:00, ?it/s]
Using 0 data frame from val split
Generate 0 samples from train split
Generate 0 samples from val split
focal_loss alpha: [ 0. 1. 1. 1. 2. 2.5 1. 3. 1. 1. 1. 1. 10. 2.5]
loading data frame...
0it [00:00, ?it/s]
Using 0 data frame from train split
loading data frame...
0it [00:00, ?it/s]
Using 0 data frame from val split
Generate 0 samples from train split
Generate 0 samples from val split
[IOU EVAL] IGNORE: tensor([0])
[IOU EVAL] INCLUDE: tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
[IOU EVAL] IGNORE: tensor([0])
[IOU EVAL] INCLUDE: tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
===init env success===
[IOU EVAL] IGNORE: tensor([0])
[IOU EVAL] INCLUDE: tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
[IOU EVAL] IGNORE: tensor([0])
[IOU EVAL] INCLUDE: tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
[IOU EVAL] IGNORE: ===init env success===
tensor([0])
[IOU EVAL] INCLUDE: tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
[IOU EVAL] IGNORE: tensor([0])
[IOU EVAL] INCLUDE: tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
===init env success===
Traceback (most recent call last):
File "main.py", line 148, in <module>
exp.run()
File "main.py", line 99, in run
self.trainer.run(epoch, mode="Train")
File "/home/yczhou/PMF-master/tasks/sensat_urban/pmf/trainer.py", line 528, in run
"Acc": mean_acc.item(),
UnboundLocalError: local variable 'mean_acc' referenced before assignment
Traceback (most recent call last):
File "main.py", line 148, in <module>
exp.run()
File "main.py", line 99, in run
self.trainer.run(epoch, mode="Train")
File "/home/yczhou/PMF-master/tasks/sensat_urban/pmf/trainer.py", line 528, in run
"Acc": mean_acc.item(),
UnboundLocalError: local variable 'mean_acc' referenced before assignment
Traceback (most recent call last):
File "main.py", line 148, in <module>
exp.run()
File "main.py", line 99, in run
self.trainer.run(epoch, mode="Train")
File "/home/yczhou/PMF-master/tasks/sensat_urban/pmf/trainer.py", line 448, in run
tag="{}_lr".format(mode), scalar_value=lr, global_step=epoch)
UnboundLocalError: local variable 'lr' referenced before assignment
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 48867) of binary: /home/yczhou/anaconda3/envs/buct-bishe/bin/python
Traceback (most recent call last):
File "/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
)(*cmd_args)
File "/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/yczhou/anaconda3/envs/buct-bishe/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2022-04-15_18:58:42
host : zkti
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 48868)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2022-04-15_18:58:42
host : zkti
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 48869)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2022-04-15_18:58:42
host : zkti
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 48867)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================