python main.py --dataset_dir --mode train --model_name HCN --dataset_name NTU-RGB-D-CV --num 01
I got RuntimeError: cuda runtime error (710) during training.
PyTorch version=1.4.0
2020-02-15 23:45:01,865:INFO: seed: 0
2020-02-15 23:45:01,865:INFO: batch_size: 64
2020-02-15 23:45:01,866:INFO: scheduler_gamma3: 0.5
2020-02-15 23:45:01,866:INFO: lr_step: [100, 160, 200]
2020-02-15 23:45:01,866:INFO: gpu_id: 0
2020-02-15 23:45:01,866:INFO: lr_decay_type: exp
2020-02-15 23:45:01,867:INFO: patience: 20
2020-02-15 23:45:01,867:INFO: test_feeder_args: {'window_size': 32, 'normalization': False, 'random_valid_choose': False, 'random_shift': False, 'p_interval': [0.95], 'origin_transfer': 0, 'debug': False, 'data_path': None, 'random_move': False, 'crop_resize': True, 'label_path': None}
2020-02-15 23:45:01,867:INFO: model_version: HCN
2020-02-15 23:45:01,867:INFO: dataset_name: NTU-RGB-D-CV
2020-02-15 23:45:01,868:INFO: data_parallel: False
2020-02-15 23:45:01,868:INFO: optimizer: Adam
2020-02-15 23:45:01,868:INFO: restore_file: None
2020-02-15 23:45:01,868:INFO: model_args: {'window_size': 32, 'num_class': 60, 'num_person': 2, 'num_joint': 25, 'in_channel': 3, 'out_channel': 64}
2020-02-15 23:45:01,869:INFO: loss_args: {'type': 'CE'}
2020-02-15 23:45:01,869:INFO: scheduler_gamma: 0.1
2020-02-15 23:45:01,869:INFO: dataset_dir: /home/ashish/Documents/BTP/HCN-pytorch-master/feeder/data/
2020-02-15 23:45:01,869:INFO: start_epoch: 0
2020-02-15 23:45:01,869:INFO: num_epochs: 400
2020-02-15 23:45:01,870:INFO: weight_decay: 0.0001
2020-02-15 23:45:01,870:INFO: lr: 0.001
2020-02-15 23:45:01,870:INFO: num_workers: 4
2020-02-15 23:45:01,870:INFO: clip: 0.5
weight initial finished!
2020-02-15 23:45:03,664:INFO: HCN(
(conv1): Sequential(
(0): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU()
)
(conv2): Conv2d(64, 32, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0))
(conv3): Sequential(
(0): Conv2d(25, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv4): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.5, inplace=False)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv1m): Sequential(
(0): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU()
)
(conv2m): Conv2d(64, 32, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0))
(conv3m): Sequential(
(0): Conv2d(25, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv4m): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.5, inplace=False)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv5): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Dropout2d(p=0.5, inplace=False)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv6): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Dropout2d(p=0.5, inplace=False)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc7): Sequential(
(0): Linear(in_features=1024, out_features=512, bias=True)
(1): ReLU()
(2): Dropout2d(p=0.5, inplace=False)
)
(fc8): Linear(in_features=512, out_features=60, bias=True)
)
2020-02-15 23:45:03,665:INFO: Loading the datasets...
2020-02-15 23:45:03,945:INFO: - done.
2020-02-15 23:45:03,945:INFO: Starting training for 400 epoch(s)
2020-02-15 23:45:03,945:INFO: lr decay:exp
/home/ashish/anaconda3/lib/python3.5/site-packages/torch/optim/lr_scheduler.py:122: UserWarning: Detected call of lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
2020-02-15 23:45:03,946:INFO: Epoch 1/400
0%| | 0/1 [00:00<?, ?it/s]/home/ashish/anaconda3/lib/python3.5/site-packages/torch/nn/functional.py:2416: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/ashish/anaconda3/lib/python3.5/site-packages/torch/nn/functional.py:2416: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes
failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes
failed.
THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=110 error=710 : device-side assert triggered
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 522, in
args.model_dir,logger, params.restore_file)
File "main.py", line 319, in train_and_evaluate
train_metrics,train_confusion_meter = train(model, optimizer, loss_fn, train_dataloader, metrics, params,logger)
File "main.py", line 87, in train
loss_bag = loss_fn(output_batch,labels_batch,current_epoch=params.current_epoch, params=params)
File "/home/ashish/Documents/BTP/HCN-pytorch-master/model/HCN.py", line 151, in loss_fn
CE = nn.CrossEntropyLoss()(outputs, labels)
File "/home/ashish/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ashish/anaconda3/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 916, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/ashish/anaconda3/lib/python3.5/site-packages/torch/nn/functional.py", line 2021, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/ashish/anaconda3/lib/python3.5/site-packages/torch/nn/functional.py", line 1838, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:110