Hello, I made the following mistake during training, how should I solve it?
I added it in adet/data/builtin.py before training:
_PERDEFINED_SPLITS_YOUTUBEVIS_VIDEO = {
'youtubevis_train':
# ('youtubevis/train/', 'youtubevis/annotations/train.json'),
('/media/lin/file/VIS/datasets/youtube-vis2021/train/JPEGImages', '/media/lin/file/VIS/datasets/youtube-vis2021/train/instances.json'),
'youtubevis_valid':
# ('youtubevis/valid/', 'youtubevis/annotations/valid.json'),
('/media/lin/file/VIS/datasets/youtube-vis2021/valid/JPEGImages', '/media/lin/file/VIS/datasets/youtube-vis2021/valid/instances.json'),
'youtubevis_test':
('youtubevis/test/', 'youtubevis/annotations/test.json'),
}
metadata_youtubevis_video = {
'thing_classes': [
'airplane', 'bear', 'bird', 'boat', 'car',
'cat', 'cow', 'deer', 'dog', 'duck',
'earless_seal', 'elephant', 'fish', 'flying_disc', 'fox',
'frog', 'giant_panda', 'giraffe', 'horse', 'leopard',
'lizard', 'monkey', 'motorbike', 'mouse', 'parrot',
'person', 'rabbit', 'shark', 'skateboard', 'snake',
'snowboard', 'squirrel', 'surfboard', 'tennis_racket', 'tiger',
'train', 'truck', 'turtle', 'whale', 'zebra'
]
}
Then:
I'm detectron2 / data/datasets/builtin_meta. Registered in py VIS_CATEGORIES, and modified to them
def _get_coco_instances_meta():
thing_ids = [k["id"] for k in VIS_CATEGORIES if k["isthing"] == 1]
thing_colors = [k["color"] for k in VIS_CATEGORIES if k["isthing"] == 1]
assert len(thing_ids) == 40, len(thing_ids)
# Mapping from the incontiguous COCO category id to an id in [0, 79]
thing_dataset_id_to_contiguous_id = {k: i for i, k in enumerate(thing_ids)}
thing_classes = [k["name"] for k in VIS_CATEGORIES if k["isthing"] == 1]
ret = {
"thing_dataset_id_to_contiguous_id": thing_dataset_id_to_contiguous_id,
"thing_classes": thing_classes,
"thing_colors": thing_colors,
}
return ret
Then,change NUM_CLASSES in adet/config/defaults.py to 40
My final training order is:python tools/train_net.py --config configs/CrossVIS/R_50_1x.yaml MODEL.WEIGHTS CondInst_MS_R_50_1x.pth
Error is as follows
[05/16 20:27:26 adet.data.common]: Serializing 89750 elements to byte tensors and concatenating them all ...
[05/16 20:27:26 adet.data.common]: Serialized dataset takes 216.92 MiB
[05/16 20:27:26 adet.data.build]: Using training sampler TrainingSampler
[05/16 20:27:26 fvcore.common.checkpoint]: [Checkpointer] Loading from CondInst_MS_R_50_1x.pth ...
WARNING [05/16 20:27:26 fvcore.common.checkpoint]: Skip loading parameter 'proposal_generator.fcos_head.cls_logits.weight' to the model due to incompatible shapes: (80, 256, 3, 3) in the checkpoint but (40, 256, 3, 3) in the model! You might want to double check if this is expected.
WARNING [05/16 20:27:26 fvcore.common.checkpoint]: Skip loading parameter 'proposal_generator.fcos_head.cls_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (40,) in the model! You might want to double check if this is expected.
WARNING [05/16 20:27:26 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
cls.{bias, weight}
mask_head._iter
proposal_generator.fcos_head.cls_logits.{bias, weight}
proposal_generator.fcos_head.reid_pred.{bias, weight}
proposal_generator.fcos_head.reid_pred_bn.{bias, running_mean, running_var, weight}
proposal_generator.fcos_head.reid_tower.0.{bias, weight}
proposal_generator.fcos_head.reid_tower.1.{bias, weight}
proposal_generator.fcos_head.reid_tower.10.{bias, weight}
proposal_generator.fcos_head.reid_tower.3.{bias, weight}
proposal_generator.fcos_head.reid_tower.4.{bias, weight}
proposal_generator.fcos_head.reid_tower.6.{bias, weight}
proposal_generator.fcos_head.reid_tower.7.{bias, weight}
proposal_generator.fcos_head.reid_tower.9.{bias, weight}
[05/16 20:27:26 adet.trainer]: Starting training from iteration 0
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [34,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [36,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [39,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [40,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [43,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [45,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [47,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [51,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [52,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [3,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [4,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [5,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [6,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [9,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [12,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [15,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [18,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [19,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [20,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [23,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [27,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [28,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [29,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
Traceback (most recent call last):
File "tools/train_net.py", line 231, in
args=(args, ),
File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "tools/train_net.py", line 219, in main
return trainer.train()
File "tools/train_net.py", line 97, in train
self.train_loop(self.start_iter, self.max_iter)
File "tools/train_net.py", line 87, in train_loop
self.run_step()
File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 285, in run_step
losses.backward()
File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/torch/autograd/init.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered