TRAINING FROM SCRATCH
Use load_from_local loader
NHEADS 4
NO WEIGHT SHARING!!!
/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py:20: LightningDeprecationWarning: The `pl.plugins.training_type.ddp.DDPPlugin` is deprecated in v1.6 and will be removed in v1.8. Use `pl.strategies.ddp.DDPStrategy` instead.
rank_zero_deprecation(
/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:287: LightningDeprecationWarning: Passing `Trainer(accelerator='ddp')` has been deprecated in v1.5 and will be removed in v1.7. Use `Trainer(strategy='ddp')` instead.
rank_zero_deprecation(
/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:307: LightningDeprecationWarning: Passing <pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7fcac5539c40> `strategy` to the `plugins` flag in Trainer has been deprecated in v1.5 and will be removed in v1.7. Use `Trainer(strategy=<pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7fcac5539c40>)` instead.
rank_zero_deprecation(
Using 16bit native Automatic Mixed Precision (AMP)
/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.
rank_zero_warn(
/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:151: LightningDeprecationWarning: Setting `Trainer(checkpoint_callback=<pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint object at 0x7fcac54c0c10>)` is deprecated in v1.5 and will be removed in v1.7. Please consider using `Trainer(enable_checkpointing=<pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint object at 0x7fcac54c0c10>)`.
rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
TRAINING FROM SCRATCH
Use load_from_local loader
NHEADS 4
NO WEIGHT SHARING!!!
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [4,5]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [4,5]
| Name | Type | Params
-----------------------------------------------------------
0 | bu_model | AssociativeEmbedding_ | 28.6 M
1 | kp_embed_cnn | Sequential | 595 K
2 | person_embed_cnn | Sequential | 595 K
3 | pos_embed | PositionEmbeddingSine | 0
4 | group_model | GroupingModel | 2.6 M
-----------------------------------------------------------
32.4 M Trainable params
54.9 K Non-trainable params
32.5 M Total params
64.932 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]loading annotations into memory...
loading annotations into memory...
Done (t=0.42s)
creating index...
Done (t=0.43s)
creating index...
index created!
index created!
=> num_images: 5000
=> num_images: 5000
Sanity Checking DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.34s/it]/center-group/mmpose/mmpose/datasets/datasets/bottom_up/bottom_up_coco.py:264: RuntimeWarning: overflow encountered in half_scalars
area = (np.max(kpt[:, 0]) - np.min(kpt[:, 0])) * (
/anaconda3/envs/centergroup/lib/python3.8/site-packages/json_tricks/encoders.py:367: UserWarning: json-tricks: numpy scalar serialization is experimental and may work differently in future versions
warnings.warn('json-tricks: numpy scalar serialization is experimental and may work differently in future versions')
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *keypoints*
loading annotations into memory...
DONE (t=0.66s).
Accumulating evaluation results...
DONE (t=0.04s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.000
loading annotations into memory...
Done (t=9.15s)
creating index...
index created!
Done (t=9.16s)
creating index...
=> num_images: 64115
index created!
=> num_images: 64115
Epoch 0: 0%| | 0/34558 [00:00<?, ?it/s]/anaconda3/envs/centergroup/lib/python3.8/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
/anaconda3/envs/centergroup/lib/python3.8/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Epoch 0: 0%|▎ | 126/34558 [00:45<3:29:19, 2.74it/s, loss=nan, v_num=0-34]Traceback (most recent call last):
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
results = self._run_stage()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
return self._run_train()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train
self.fit_loop.run()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
batch_output = self.batch_loop.run(batch, batch_idx)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
result = self._run_optimization(
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1596, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/center-group/centergroup/models/centergroup.py", line 385, in optimizer_step
super().optimizer_step(
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/native_amp.py", line 85, in optimizer_step
closure_result = closure()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
self._result = self.closure(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
step_output = self._step_fn()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 344, in training_step
return self.model(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
output = self.module.training_step(*inputs, **kwargs)
File "/center-group/centergroup/models/centergroup.py", line 327, in training_step
losses = compute_group_loss(preds, person_batch['vis_target'], person_batch['loc_target'],
File "/center-group/centergroup/core/train_utils.py", line 178, in compute_group_loss
assert not torch.isnan(person_tp_loss).any() and not torch.isnan(losses['person_loss'])
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tools/train.py", line 126, in <module>
main()
File "tools/train.py", line 123, in main
trainer.fit(model, datamodule, ckpt_path=ckpt_path)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._call_and_handle_interrupt(
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 737, in _call_and_handle_interrupt
self.strategy.reconciliate_processes(traceback.format_exc())
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 446, in reconciliate_processes
raise DeadlockDetectedException(f"DeadLock detected from rank: {self.global_rank} \n {trace}")
pytorch_lightning.utilities.exceptions.DeadlockDetectedException: DeadLock detected from rank: 0
Traceback (most recent call last):
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
results = self._run_stage()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
return self._run_train()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train
self.fit_loop.run()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
batch_output = self.batch_loop.run(batch, batch_idx)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
result = self._run_optimization(
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1596, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/center-group/centergroup/models/centergroup.py", line 385, in optimizer_step
super().optimizer_step(
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/native_amp.py", line 85, in optimizer_step
closure_result = closure()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
self._result = self.closure(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
step_output = self._step_fn()
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 344, in training_step
return self.model(*args, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/anaconda3/envs/centergroup/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
output = self.module.training_step(*inputs, **kwargs)
File "/center-group/centergroup/models/centergroup.py", line 327, in training_step
losses = compute_group_loss(preds, person_batch['vis_target'], person_batch['loc_target'],
File "/center-group/centergroup/core/train_utils.py", line 178, in compute_group_loss
assert not torch.isnan(person_tp_loss).any() and not torch.isnan(losses['person_loss'])
AssertionError