Hi, I try to training my coco style dataset by your scripts, I dont know which bash script should be used to train.(Could you please briefly explain the function of each script?) Then I use "scripts/train/lambda/coco/train.sh" this one for training. but one error happened.
cd /data_2/lutianhao/code/MIPNet/
CUDA_VISIBLE_DEVICES=4,5,6,7, python tools/lambda/train_lambda_real.py
--cfg experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml
GPUS '(0,1,2,3,)'
OUTPUT_DIR 'Outputs/outputs/lambda/lambda_coco_real_waffle'
LOG_DIR 'Outputs/logs/lambda/lambda_coco_real_waffle'
TEST.MODEL_FILE 'models/pytorch/pose_coco/pose_hrnet_w48_384x288.pth'
DATASET.TRAIN_DATASET 'coco_lambda'
DATASET.TRAIN_SET 'train2017'
DATASET.TRAIN_IMAGE_DIR '/data_2/lutianhao/datasets/pose/coco2017/train2017'
DATASET.TRAIN_ANNOTATION_FILE '/data_2/lutianhao/datasets/pose/coco2017/annotations/person_keypoints_train2017.json'
DATASET.TRAIN_DATASET_TYPE 'coco_lambda'
DATASET.TEST_DATASET 'coco'
DATASET.TEST_SET 'val2017'
DATASET.TEST_IMAGE_DIR '/data_2/lutianhao/datasets/pose/coco2017/val2017'
DATASET.TEST_ANNOTATION_FILE '/data_2/lutianhao/datasets/pose/coco2017/annotations/person_keypoints_val2017.json'
DATASET.TEST_DATASET_TYPE 'coco'
TRAIN.LR 0.001
TRAIN.BEGIN_EPOCH 0
TRAIN.END_EPOCH 110
TRAIN.LR_STEP '(70, 100)'
TRAIN.BATCH_SIZE_PER_GPU 2
TEST.BATCH_SIZE_PER_GPU 1
TEST.USE_GT_BBOX True
EPOCH_EVAL_FREQ 1
PRINT_FREQ 100
MODEL.NAME 'pose_hrnet_se_lambda'
MODEL.SE_MODULES '[False, False, True, True]'
And the error is :
GAMMA1: 0.99 [0/927]
GAMMA2: 0.0
LR: 0.001
LR_FACTOR: 0.1
LR_STEP: [70, 100]
MOMENTUM: 0.9
NESTEROV: False
OPTIMIZER: adam
RESUME: False
SHUFFLE: True
WD: 0.0001
WORKERS: 24
=> init weights from normal distribution
=> loading pretrained model models/pytorch/imagenet/hrnet_w48-8ef0771d.pth
Total Parameters: 63,746,081
Total Multiply Adds (For Convolution and Linear Layers only): 46.562052726745605 GFLOPs
Number of Layers
Conv2d : 293 layers BatchNorm2d : 292 layers ReLU : 271 layers Bottleneck : 4 layers BasicBlock : 104 layers Upsample : 28 layers HighResolutionModule : 8 layers AdaptiveAvgPool2d : 5 l
ayers Linear : 20 layers Sigmoid : 10 layers BatchNorm1d : 5 layers SELambdaLayer : 5 layers SELambdaModule : 2 layers
=> loading model from models/pytorch/pose_coco/pose_hrnet_w48_384x288.pth
=> loading from latest_state_dict at models/pytorch/pose_coco/pose_hrnet_w48_384x288.pth
loading annotations into memory...
Done (t=31.87s)
creating index...
index created!
=> classes: ['background', 'person']
=> num_images: 118287
loading from cache from cache/coco_lambda/train2017/gt_db.pkl
done!
=> load 149813 samples
loading annotations into memory...
Done (t=4.04s)
creating index...
index created!
=> classes: ['background', 'person']
=> num_images: 5000
=> load 6352 samples
=> resuming optimizer from models/pytorch/pose_coco/pose_hrnet_w48_384x288.pth
=> updated lr schedule is [70, 100]
training on lambda
Epoch: [0][0/18727] Time 64.338s (64.338s) Speed 0.2 samples/s Data 10.114s (10.114s) Loss 0.00020 (0.00020) Accuracy 0.513 (0.513) model_grad 0.000568 (0.000568) DivLoss -0.00074 (-0.00074) PoseLoss 0.00020 (0.00020)
Traceback (most recent call last):
File "tools/lambda/train_lambda_real.py", line 280, in
main()
File "tools/lambda/train_lambda_real.py", line 242, in main
final_output_dir, tb_log_dir, writer_dict, print_prefix='lambda')
File "/data_2/lutianhao/code/MIPNet/tools/lambda/../../lib/core/train.py", line 464, in train_lambda
suffix += '_[{}:{}]'.format(count, round(lambda_a[count + B].item(), 2))
IndexError: index 16 is out of bounds for dimension 0 with size 16