Hi there,
When I training the model with multi-GPU by setting gpus=2
in pl.Trainer()
, it throws an error:
TypeError: cannot pickle 'module' object.
How can I solve this problem? Thanks!
...
trainer = pl.Trainer(
deterministic = True,
gpus = 2, # <---------
checkpoint_callback = False,
max_epochs = config.max_epoch,
auto_lr_find = True,
sync_batchnorm=True,
# check_val_every_n_epoch = 1,
val_check_interval = 0.25,
)
...
python 3.8.3
torch '1.7.1+cu110'
Ubuntu 18.04.5 LTS
Global seed set to 19961206
Data List: data/test_adc.txt
Song Size: 12
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 12/12 [00:01<00:00, 9.73it/s]
[W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator())
/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:849: UserWarning: You requested multiple GPUs but did not specify a backend, e.g. `Trainer(strategy="dp"|"ddp"|"ddp2")`. Setting `strategy="ddp_spawn"` for you.
rank_zero_warn(
/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:147: LightningDeprecationWarning: Setting `Trainer(checkpoint_callback=False)` is deprecated in v1.5 and will be removed in v1.7. Please consider using `Trainer(enable_checkpointing=False)`.
rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
Traceback (most recent call last):
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data4/chengfang/.vscode-server/extensions/ms-python.python-2022.0.1814523869/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
cli.main()
File "/data4/chengfang/.vscode-server/extensions/ms-python.python-2022.0.1814523869/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
run()
File "/data4/chengfang/.vscode-server/extensions/ms-python.python-2022.0.1814523869/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data4/chengfang/project/melodyExtraction/TONet/main.py", line 163, in <module>
train()
File "/data4/chengfang/project/melodyExtraction/TONet/main.py", line 97, in train
trainer.fit(model, train_dataloader, test_dataloaders)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 173, in start_training
self.spawn(self.new_process, trainer, self.mp_queue, return_result=False)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 201, in spawn
mp.spawn(self._wrapped_function, args=(function, args, kwargs, return_queue), nprocs=self.num_processes)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 148, in start_processes
process.start()
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/data4/chengfang/.conda/envs/melody/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'module' object