Can you please interpret me the following error? Is it a problem with CUDA version? I am not that much experienced and I would like to know so that I can solve it and continue.
[33mWARNING:[0m NVIDIA binaries may not be bound with --writable
[32m[0706 13:49:52 @voc.py:279][0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval']
[32m[0706 13:49:52 @coco.py:271][0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100']
[32m[0706 13:49:52 @coco.py:205][0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017']
[32m[0706 13:49:52 @coco.py:260][0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017', 'coco_unlabeledtrainval20class']
[32m[0706 13:49:52 @logger.py:138][0m Directory '/home/vlamp/Documents/STAC/RESULTS' backuped to '/home/vlamp/Documents/STAC/RESULTS0706-134952'
[32m[0706 13:49:52 @logger.py:92][0m Argv: /home/vlamp/Documents/STAC/detection/train_stg1_bdd.py --logdir /home/vlamp/Documents/STAC/RESULTS/ --simple_path --config BACKBONE.WEIGHTS=/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz DATA.BASEDIR=/home/vlamp/Documents/STAC/DATA_STAC/coco MODE_MASK=False FRCNN.BATCH_PER_IM=64 PREPROC.TRAIN_SHORT_EDGE_SIZE=[500,800] TRAIN.EVAL_PERIOD=20 TRAIN.AUGTYPE_LAB=default
[32m[0706 13:49:54 @train_stg1_bdd.py:87][0m Environment Information:
sys.platform linux
Python 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
Tensorpack v0.10.1-9-g9c1b1b7b-dirty
Numpy 1.16.4
TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5
TF Compiler Version 4.8.5
TF CUDA support True
TF MKL support False
TF XLA support False
Nvidia Driver /.singularity.d/libs/libnvidia-ml.so
CUDA /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1.243
CUDNN /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.4
NCCL
CUDA_VISIBLE_DEVICES 0,1
GPU 0,1 Tesla T4
Free RAM 369.15/376.54 GB
CPU Count 40
cv2 4.2.0
msgpack 1.0.0
python-prctl False
list(_C.DATA.TRAIN) = ['train2017']
list(_C.DATA.VAL) = ('val2017',)
datasets = ['train2017', 'val2017']
_C.DATA.CLASS_NAMES = ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle']
[32m[0706 13:49:54 @config.py:352][0m Config: ------------------------------------------
{'BACKBONE': {'FREEZE_AFFINE': False,
'FREEZE_AT': 2,
'NORM': 'FreezeBN',
'RESNET_NUM_BLOCKS': [3, 4, 6, 3],
'STRIDE_1X1': False,
'TF_PAD_MODE': False,
'WEIGHTS': '/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz'},
'CASCADE': {'BBOX_REG_WEIGHTS': [[10.0, 10.0, 5.0, 5.0], [20.0, 20.0, 10.0, 10.0],
[30.0, 30.0, 15.0, 15.0]],
'IOUS': [0.5, 0.6, 0.7]},
'DATA': {'ABSOLUTE_COORD': True,
'BASEDIR': '/home/vlamp/Documents/STAC/DATA_STAC/coco',
'CLASS_NAMES': ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle'],
'NUM_CATEGORY': 5,
'NUM_WORKERS': 24,
'TRAIN': ('train2017',),
'UNLABEL': ('',),
'VAL': ('val2017',)},
'EVAL': {'PSEUDO_INFERENCE': False},
'FPN': {'ANCHOR_SIZES': (32, 64, 128, 256, 512),
'ANCHOR_STRIDES': (4, 8, 16, 32, 64),
'CASCADE': False,
'FRCNN_CONV_HEAD_DIM': 256,
'FRCNN_FC_HEAD_DIM': 1024,
'FRCNN_HEAD_FUNC': 'fastrcnn_2fc_head',
'MRCNN_HEAD_FUNC': 'maskrcnn_up4conv_head',
'NORM': 'None',
'NUM_CHANNEL': 256,
'PROPOSAL_MODE': 'Level',
'RESOLUTION_REQUIREMENT': 32},
'FRCNN': {'BATCH_PER_IM': 64,
'BBOX_REG_WEIGHTS': [10.0, 10.0, 5.0, 5.0],
'FG_RATIO': 0.25,
'FG_THRESH': 0.5},
'MODE_FPN': True,
'MODE_MASK': False,
'MRCNN': {'ACCURATE_PASTE': True, 'HEAD_DIM': 256},
'PREPROC': {'MAX_SIZE': 1344.0,
'PIXEL_MEAN': [123.675, 116.28, 103.53],
'PIXEL_STD': [58.395, 57.12, 57.375],
'TEST_SHORT_EDGE_SIZE': 800,
'TRAIN_SHORT_EDGE_SIZE': [500, 800]},
'RPN': {'ANCHOR_RATIOS': (0.5, 1.0, 2.0),
'ANCHOR_SIZES': (32, 64, 128, 256, 512),
'ANCHOR_STRIDE': 16,
'BATCH_PER_IM': 256,
'CROWD_OVERLAP_THRESH': 9.99,
'FG_RATIO': 0.5,
'HEAD_DIM': 1024,
'MIN_SIZE': 0,
'NEGATIVE_ANCHOR_THRESH': 0.3,
'NUM_ANCHOR': 15,
'POSITIVE_ANCHOR_THRESH': 0.7,
'PROPOSAL_NMS_THRESH': 0.7,
'TEST_PER_LEVEL_NMS_TOPK': 1000,
'TEST_POST_NMS_TOPK': 1000,
'TEST_PRE_NMS_TOPK': 6000,
'TRAIN_PER_LEVEL_NMS_TOPK': 2000,
'TRAIN_POST_NMS_TOPK': 2000,
'TRAIN_PRE_NMS_TOPK': 12000},
'TEST': {'FRCNN_NMS_THRESH': 0.5,
'RESULTS_PER_IM': 100,
'RESULT_SCORE_THRESH': 0.05,
'RESULT_SCORE_THRESH_VIS': 0.5},
'TRAIN': {'AUGTYPE': 'strong',
'AUGTYPE_LAB': 'default',
'BASE_LR': 0.01,
'CHECKPOINT_PERIOD': 20,
'CONFIDENCE': 0.9,
'EVAL_PERIOD': 20,
'GAMMA': 0.1,
'LR_SCHEDULE': [120000, 160000, 180000],
'NO_PRN_LOSS': False,
'NUM_GPUS': 2,
'STAGE': 1,
'STARTING_EPOCH': 1,
'STEPS_PER_EPOCH': 500,
'WARMUP': 1000,
'WARMUP_INIT_LR': 0.0033000000000000004,
'WEIGHT_DECAY': 0.0001,
'WU': 2.0},
'TRAINER': 'replicated'}
[32m[0706 13:49:54 @train_stg1_bdd.py:106][0m Warm Up Schedule (steps, value): [(0, 0.0033000000000000004), (1000, 0.01)]
[32m[0706 13:49:54 @train_stg1_bdd.py:107][0m LR Schedule (epochs, value): [(2, 0.01), (960.0, 0.001), (1280.0, 0.00010000000000000002)]
loading annotations into memory...
Done (t=5.18s)
creating index...
index created!
[32m[0706 13:49:59 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_train2017.json.
0%| | 0/69403 [00:00<?, ?it/s]
3%|3 | 2090/69403 [00:00<00:03, 20895.19it/s]
6%|5 | 4034/69403 [00:00<00:03, 20434.79it/s]
9%|8 | 6073/69403 [00:00<00:03, 20416.41it/s]
12%|#1 | 8201/69403 [00:00<00:02, 20666.09it/s]
15%|#4 | 10336/69403 [00:00<00:02, 20866.20it/s]
18%|#7 | 12465/69403 [00:00<00:02, 20991.31it/s]
21%|##1 | 14620/69403 [00:00<00:02, 21155.12it/s]
24%|##4 | 16775/69403 [00:00<00:02, 21271.79it/s]
27%|##7 | 18896/69403 [00:00<00:02, 21253.07it/s]
30%|### | 21042/69403 [00:01<00:02, 21313.93it/s]
33%|###3 | 23115/69403 [00:01<00:02, 21052.23it/s]
36%|###6 | 25181/69403 [00:01<00:02, 20796.20it/s]
39%|###9 | 27234/69403 [00:01<00:02, 20696.98it/s]
42%|####2 | 29285/69403 [00:01<00:01, 20509.34it/s]
45%|####5 | 31323/69403 [00:01<00:01, 20425.01it/s]
48%|####8 | 33357/69403 [00:01<00:01, 20302.50it/s]
51%|##### | 35382/69403 [00:01<00:01, 20251.87it/s]
54%|#####3 | 37403/69403 [00:01<00:01, 20201.65it/s]
57%|#####6 | 39488/69403 [00:01<00:01, 20390.27it/s]
60%|#####9 | 41550/69403 [00:02<00:01, 20456.26it/s]
63%|######2 | 43660/69403 [00:02<00:01, 20643.18it/s]
66%|######5 | 45767/69403 [00:02<00:01, 20768.95it/s]
69%|######8 | 47887/69403 [00:02<00:01, 20894.81it/s]
72%|#######2 | 50002/69403 [00:02<00:00, 20968.20it/s]
75%|#######5 | 52146/69403 [00:02<00:00, 21105.63it/s]
78%|#######8 | 54280/69403 [00:02<00:00, 21174.64it/s]
81%|########1 | 56406/69403 [00:02<00:00, 21198.35it/s]
84%|########4 | 58537/69403 [00:02<00:00, 21230.58it/s]
87%|########7 | 60701/69403 [00:02<00:00, 21351.07it/s]
91%|######### | 62872/69403 [00:03<00:00, 21456.21it/s]
94%|#########3| 65018/69403 [00:03<00:00, 21151.33it/s]
97%|#########6| 67169/69403 [00:03<00:00, 21256.36it/s]
100%|#########9| 69342/69403 [00:03<00:00, 21396.14it/s]
100%|##########| 69403/69403 [00:03<00:00, 20915.84it/s][32m[0706 13:50:03 @timer.py:45][0m Load annotations for instances_train2017.json finished, time:3.3659 sec.
[32m[0706 13:50:05 @data.py:79][0m Ground-Truth category distribution:
[36m| class | #box | class | #box | class | #box |
|:-------:|:-------|:----------:|:-------|:-----------:|:-------|
| car | 713210 | pedestrian | 91349 | big vehicle | 41643 |
| bicycle | 7210 | motorcycle | 3002 | | |
| total | 856414 | | | | |[0m
[32m[0706 13:50:05 @data.py:416][0m Filtered 0 images which contain no non-crowd groudtruth boxes. Total #images for training: 69403
[32m[0706 13:50:05 @augmentation.py:171][0m ----------------------------------------------------------------------------------------------------
[32m[0706 13:50:05 @augmentation.py:172][0m Augmentation type default: []
[32m[0706 13:50:05 @augmentation.py:173][0m ----------------------------------------------------------------------------------------------------
[32m[0706 13:50:05 @data.py:107][0m Use affine-enabled TrainingDataPreprocessor_aug
[32m[0706 13:50:05 @train_stg1_bdd.py:112][0m Total passes of the training set is: 20.748
[32m[0706 13:50:05 @sessinit.py:294][0m Loading dictionary from /home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz ...
[32m[0706 13:50:06 @training.py:48][0m [DataParallel] Training a model of 2 towers.
[32m[0706 13:50:06 @interface.py:41][0m Automatically applying StagingInput on the DataFlow.
[32m[0706 13:50:06 @input_source.py:221][0m Setting up the queue 'QueueInput/input_queue' for CPU prefetching ...
[32m[0706 13:50:06 @training.py:108][0m Building graph for training tower 0 on device /gpu:0 ...
[32m[0706 13:50:06 @argtools.py:138][0m [5m[31mWRN[0m Some BatchNorm layer uses moving_mean/moving_variance in training.
[32m[0706 13:50:06 @registry.py:90][0m 'conv0': [1, 3, ?, ?] --> [1, 64, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'pool0': [1, 64, ?, ?] --> [1, 64, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block0/conv1': [1, 64, ?, ?] --> [1, 64, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block0/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block0/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block0/convshortcut': [1, 64, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block1/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block1/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block1/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block2/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block2/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group0/block2/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:06 @registry.py:90][0m 'group1/block0/conv1': [1, 256, ?, ?] --> [1, 128, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block0/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block0/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block0/convshortcut': [1, 256, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block1/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block1/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block1/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block2/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block2/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block2/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block3/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block3/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group1/block3/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block0/conv1': [1, 512, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block0/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block0/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block0/convshortcut': [1, 512, ?, ?] --> [1, 1024, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block1/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block1/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block1/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block2/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block2/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block2/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block3/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block3/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block3/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block4/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block4/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block4/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block5/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block5/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group2/block5/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block0/conv1': [1, 1024, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block0/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block0/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block0/convshortcut': [1, 1024, ?, ?] --> [1, 2048, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block1/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block1/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block1/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block2/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block2/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'group3/block2/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
[32m[0706 13:50:07 @registry.py:80][0m 'fpn' input: [1, 256, ?, ?], [1, 512, ?, ?], [1, 1024, ?, ?], [1, 2048, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'fpn/lateral_1x1_c2': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'fpn/lateral_1x1_c3': [1, 512, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'fpn/lateral_1x1_c4': [1, 1024, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'fpn/lateral_1x1_c5': [1, 2048, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'fpn/upsample_lat5': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:07 @registry.py:90][0m 'fpn/upsample_lat4': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'fpn/upsample_lat3': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'fpn/posthoc_3x3_p2': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'fpn/posthoc_3x3_p3': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'fpn/posthoc_3x3_p4': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'fpn/posthoc_3x3_p5': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'fpn/maxpool_p6': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:93][0m 'fpn' output: [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:80][0m 'rpn' input: [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'rpn/conv0': [1, 256, ?, ?] --> [1, 256, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'rpn/class': [1, 256, ?, ?] --> [1, 3, ?, ?]
[32m[0706 13:50:08 @registry.py:90][0m 'rpn/box': [1, 256, ?, ?] --> [1, 12, ?, ?]
[32m[0706 13:50:08 @registry.py:93][0m 'rpn' output: [?, ?, 3], [?, ?, 3, 4]
[32m[0706 13:50:09 @registry.py:80][0m 'fastrcnn' input: [?, 256, 7, 7]
[32m[0706 13:50:10 @registry.py:90][0m 'fastrcnn/fc6': [?, 256, 7, 7] --> [?, 1024]
[32m[0706 13:50:10 @registry.py:90][0m 'fastrcnn/fc7': [?, 1024] --> [?, 1024]
[32m[0706 13:50:10 @registry.py:93][0m 'fastrcnn' output: [?, 1024]
[32m[0706 13:50:10 @registry.py:80][0m 'fastrcnn/outputs' input: [?, 1024]
[32m[0706 13:50:10 @registry.py:90][0m 'fastrcnn/outputs/class': [?, 1024] --> [?, 6]
[32m[0706 13:50:10 @registry.py:90][0m 'fastrcnn/outputs/box': [?, 1024] --> [?, 24]
[32m[0706 13:50:10 @registry.py:93][0m 'fastrcnn/outputs' output: [?, 6], [?, 6, 4]
[32m[0706 13:50:10 @regularize.py:97][0m regularize_cost() found 57 variables to regularize.
[32m[0706 13:50:10 @regularize.py:21][0m The following tensors will be regularized: group1/block0/conv1/W:0, group1/block0/conv2/W:0, group1/block0/conv3/W:0, group1/block0/convshortcut/W:0, group1/block1/conv1/W:0, group1/block1/conv2/W:0, group1/block1/conv3/W:0, group1/block2/conv1/W:0, group1/block2/conv2/W:0, group1/block2/conv3/W:0, group1/block3/conv1/W:0, group1/block3/conv2/W:0, group1/block3/conv3/W:0, group2/block0/conv1/W:0, group2/block0/conv2/W:0, group2/block0/conv3/W:0, group2/block0/convshortcut/W:0, group2/block1/conv1/W:0, group2/block1/conv2/W:0, group2/block1/conv3/W:0, group2/block2/conv1/W:0, group2/block2/conv2/W:0, group2/block2/conv3/W:0, group2/block3/conv1/W:0, group2/block3/conv2/W:0, group2/block3/conv3/W:0, group2/block4/conv1/W:0, group2/block4/conv2/W:0, group2/block4/conv3/W:0, group2/block5/conv1/W:0, group2/block5/conv2/W:0, group2/block5/conv3/W:0, group3/block0/conv1/W:0, group3/block0/conv2/W:0, group3/block0/conv3/W:0, group3/block0/convshortcut/W:0, group3/block1/conv1/W:0, group3/block1/conv2/W:0, group3/block1/conv3/W:0, group3/block2/conv1/W:0, group3/block2/conv2/W:0, group3/block2/conv3/W:0, fpn/lateral_1x1_c2/W:0, fpn/lateral_1x1_c3/W:0, fpn/lateral_1x1_c4/W:0, fpn/lateral_1x1_c5/W:0, fpn/posthoc_3x3_p2/W:0, fpn/posthoc_3x3_p3/W:0, fpn/posthoc_3x3_p4/W:0, fpn/posthoc_3x3_p5/W:0, rpn/conv0/W:0, rpn/class/W:0, rpn/box/W:0, fastrcnn/fc6/W:0, fastrcnn/fc7/W:0, fastrcnn/outputs/class/W:0, fastrcnn/outputs/box/W:0
[32m[0706 13:50:12 @training.py:108][0m Building graph for training tower 1 on device /gpu:1 ...
[32m[0706 13:50:14 @regularize.py:97][0m regularize_cost() found 57 variables to regularize.
[32m[0706 13:50:16 @collection.py:152][0m Size of these collections were changed in tower1: (tf.GraphKeys.MODEL_VARIABLES: 161->194)
[32m[0706 13:50:16 @collection.py:165][0m These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 76->77)
[32m[0706 13:50:20 @training.py:350][0m 'sync_variables_from_main_tower' includes 607 operations.
[32m[0706 13:50:20 @model_utils.py:67][0m [36mList of Trainable Variables:
[0mname shape #elements
group1/block0/conv1/W [1, 1, 256, 128] 32768
group1/block0/conv1/bn/gamma [128] 128
group1/block0/conv1/bn/beta [128] 128
group1/block0/conv2/W [3, 3, 128, 128] 147456
group1/block0/conv2/bn/gamma [128] 128
group1/block0/conv2/bn/beta [128] 128
group1/block0/conv3/W [1, 1, 128, 512] 65536
group1/block0/conv3/bn/gamma [512] 512
group1/block0/conv3/bn/beta [512] 512
group1/block0/convshortcut/W [1, 1, 256, 512] 131072
group1/block0/convshortcut/bn/gamma [512] 512
group1/block0/convshortcut/bn/beta [512] 512
group1/block1/conv1/W [1, 1, 512, 128] 65536
group1/block1/conv1/bn/gamma [128] 128
group1/block1/conv1/bn/beta [128] 128
group1/block1/conv2/W [3, 3, 128, 128] 147456
group1/block1/conv2/bn/gamma [128] 128
group1/block1/conv2/bn/beta [128] 128
group1/block1/conv3/W [1, 1, 128, 512] 65536
group1/block1/conv3/bn/gamma [512] 512
group1/block1/conv3/bn/beta [512] 512
group1/block2/conv1/W [1, 1, 512, 128] 65536
group1/block2/conv1/bn/gamma [128] 128
group1/block2/conv1/bn/beta [128] 128
group1/block2/conv2/W [3, 3, 128, 128] 147456
group1/block2/conv2/bn/gamma [128] 128
group1/block2/conv2/bn/beta [128] 128
group1/block2/conv3/W [1, 1, 128, 512] 65536
group1/block2/conv3/bn/gamma [512] 512
group1/block2/conv3/bn/beta [512] 512
group1/block3/conv1/W [1, 1, 512, 128] 65536
group1/block3/conv1/bn/gamma [128] 128
group1/block3/conv1/bn/beta [128] 128
group1/block3/conv2/W [3, 3, 128, 128] 147456
group1/block3/conv2/bn/gamma [128] 128
group1/block3/conv2/bn/beta [128] 128
group1/block3/conv3/W [1, 1, 128, 512] 65536
group1/block3/conv3/bn/gamma [512] 512
group1/block3/conv3/bn/beta [512] 512
group2/block0/conv1/W [1, 1, 512, 256] 131072
group2/block0/conv1/bn/gamma [256] 256
group2/block0/conv1/bn/beta [256] 256
group2/block0/conv2/W [3, 3, 256, 256] 589824
group2/block0/conv2/bn/gamma [256] 256
group2/block0/conv2/bn/beta [256] 256
group2/block0/conv3/W [1, 1, 256, 1024] 262144
group2/block0/conv3/bn/gamma [1024] 1024
group2/block0/conv3/bn/beta [1024] 1024
group2/block0/convshortcut/W [1, 1, 512, 1024] 524288
group2/block0/convshortcut/bn/gamma [1024] 1024
group2/block0/convshortcut/bn/beta [1024] 1024
group2/block1/conv1/W [1, 1, 1024, 256] 262144
group2/block1/conv1/bn/gamma [256] 256
group2/block1/conv1/bn/beta [256] 256
group2/block1/conv2/W [3, 3, 256, 256] 589824
group2/block1/conv2/bn/gamma [256] 256
group2/block1/conv2/bn/beta [256] 256
group2/block1/conv3/W [1, 1, 256, 1024] 262144
group2/block1/conv3/bn/gamma [1024] 1024
group2/block1/conv3/bn/beta [1024] 1024
group2/block2/conv1/W [1, 1, 1024, 256] 262144
group2/block2/conv1/bn/gamma [256] 256
group2/block2/conv1/bn/beta [256] 256
group2/block2/conv2/W [3, 3, 256, 256] 589824
group2/block2/conv2/bn/gamma [256] 256
group2/block2/conv2/bn/beta [256] 256
group2/block2/conv3/W [1, 1, 256, 1024] 262144
group2/block2/conv3/bn/gamma [1024] 1024
group2/block2/conv3/bn/beta [1024] 1024
group2/block3/conv1/W [1, 1, 1024, 256] 262144
group2/block3/conv1/bn/gamma [256] 256
group2/block3/conv1/bn/beta [256] 256
group2/block3/conv2/W [3, 3, 256, 256] 589824
group2/block3/conv2/bn/gamma [256] 256
group2/block3/conv2/bn/beta [256] 256
group2/block3/conv3/W [1, 1, 256, 1024] 262144
group2/block3/conv3/bn/gamma [1024] 1024
group2/block3/conv3/bn/beta [1024] 1024
group2/block4/conv1/W [1, 1, 1024, 256] 262144
group2/block4/conv1/bn/gamma [256] 256
group2/block4/conv1/bn/beta [256] 256
group2/block4/conv2/W [3, 3, 256, 256] 589824
group2/block4/conv2/bn/gamma [256] 256
group2/block4/conv2/bn/beta [256] 256
group2/block4/conv3/W [1, 1, 256, 1024] 262144
group2/block4/conv3/bn/gamma [1024] 1024
group2/block4/conv3/bn/beta [1024] 1024
group2/block5/conv1/W [1, 1, 1024, 256] 262144
group2/block5/conv1/bn/gamma [256] 256
group2/block5/conv1/bn/beta [256] 256
group2/block5/conv2/W [3, 3, 256, 256] 589824
group2/block5/conv2/bn/gamma [256] 256
group2/block5/conv2/bn/beta [256] 256
group2/block5/conv3/W [1, 1, 256, 1024] 262144
group2/block5/conv3/bn/gamma [1024] 1024
group2/block5/conv3/bn/beta [1024] 1024
group3/block0/conv1/W [1, 1, 1024, 512] 524288
group3/block0/conv1/bn/gamma [512] 512
group3/block0/conv1/bn/beta [512] 512
group3/block0/conv2/W [3, 3, 512, 512] 2359296
group3/block0/conv2/bn/gamma [512] 512
group3/block0/conv2/bn/beta [512] 512
group3/block0/conv3/W [1, 1, 512, 2048] 1048576
group3/block0/conv3/bn/gamma [2048] 2048
group3/block0/conv3/bn/beta [2048] 2048
group3/block0/convshortcut/W [1, 1, 1024, 2048] 2097152
group3/block0/convshortcut/bn/gamma [2048] 2048
group3/block0/convshortcut/bn/beta [2048] 2048
group3/block1/conv1/W [1, 1, 2048, 512] 1048576
group3/block1/conv1/bn/gamma [512] 512
group3/block1/conv1/bn/beta [512] 512
group3/block1/conv2/W [3, 3, 512, 512] 2359296
group3/block1/conv2/bn/gamma [512] 512
group3/block1/conv2/bn/beta [512] 512
group3/block1/conv3/W [1, 1, 512, 2048] 1048576
group3/block1/conv3/bn/gamma [2048] 2048
group3/block1/conv3/bn/beta [2048] 2048
group3/block2/conv1/W [1, 1, 2048, 512] 1048576
group3/block2/conv1/bn/gamma [512] 512
group3/block2/conv1/bn/beta [512] 512
group3/block2/conv2/W [3, 3, 512, 512] 2359296
group3/block2/conv2/bn/gamma [512] 512
group3/block2/conv2/bn/beta [512] 512
group3/block2/conv3/W [1, 1, 512, 2048] 1048576
group3/block2/conv3/bn/gamma [2048] 2048
group3/block2/conv3/bn/beta [2048] 2048
fpn/lateral_1x1_c2/W [1, 1, 256, 256] 65536
fpn/lateral_1x1_c2/b [256] 256
fpn/lateral_1x1_c3/W [1, 1, 512, 256] 131072
fpn/lateral_1x1_c3/b [256] 256
fpn/lateral_1x1_c4/W [1, 1, 1024, 256] 262144
fpn/lateral_1x1_c4/b [256] 256
fpn/lateral_1x1_c5/W [1, 1, 2048, 256] 524288
fpn/lateral_1x1_c5/b [256] 256
fpn/posthoc_3x3_p2/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p2/b [256] 256
fpn/posthoc_3x3_p3/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p3/b [256] 256
fpn/posthoc_3x3_p4/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p4/b [256] 256
fpn/posthoc_3x3_p5/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p5/b [256] 256
rpn/conv0/W [3, 3, 256, 256] 589824
rpn/conv0/b [256] 256
rpn/class/W [1, 1, 256, 3] 768
rpn/class/b [3] 3
rpn/box/W [1, 1, 256, 12] 3072
rpn/box/b [12] 12
fastrcnn/fc6/W [12544, 1024] 12845056
fastrcnn/fc6/b [1024] 1024
fastrcnn/fc7/W [1024, 1024] 1048576
fastrcnn/fc7/b [1024] 1024
fastrcnn/outputs/class/W [1024, 6] 6144
fastrcnn/outputs/class/b [6] 6
fastrcnn/outputs/box/W [1024, 24] 24576
fastrcnn/outputs/box/b [24] 24[36m
Number of trainable variables: 156
Number of parameters (elements): 41147437
Storage space needed for all trainable variables: 156.97MB[0m
[32m[0706 13:50:20 @base.py:207][0m Setup callbacks graph ...
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
[32m[0706 13:50:27 @argtools.py:138][0m [5m[31mWRN[0m "import prctl" failed! Install python-prctl so that processes can be cleaned with guarantee.
[32m[0706 13:50:29 @prof.py:291][0m [HostMemoryTracker] Free RAM in setup_graph() is 364.27 GB.
[32m[0706 13:50:29 @tower.py:135][0m Building graph for predict tower 'tower-pred-0' on device /gpu:0 ...
[32m[0706 13:50:30 @collection.py:152][0m Size of these collections were changed in tower-pred-0: (tf.GraphKeys.MODEL_VARIABLES: 194->227)
[32m[0706 13:50:30 @collection.py:165][0m These collections were modified but restored in tower-pred-0: (tf.GraphKeys.SUMMARIES: 76->77)
[32m[0706 13:50:30 @tower.py:135][0m Building graph for predict tower 'tower-pred-1' on device /gpu:1 with variable scope 'tower1'...
[32m[0706 13:50:31 @collection.py:152][0m Size of these collections were changed in tower-pred-1: (tf.GraphKeys.MODEL_VARIABLES: 227->260)
[32m[0706 13:50:31 @collection.py:165][0m These collections were modified but restored in tower-pred-1: (tf.GraphKeys.SUMMARIES: 76->77)
loading annotations into memory...
Done (t=0.75s)
creating index...
index created!
[32m[0706 13:50:31 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.
0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 725119.19it/s][32m[0706 13:50:31 @timer.py:45][0m Load annotations for instances_val2017.json finished, time:0.0151 sec.
[32m[0706 13:50:31 @data.py:456][0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.83s)
creating index...
index created!
[32m[0706 13:50:32 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.
0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 739211.43it/s][32m[0706 13:50:32 @timer.py:45][0m Load annotations for instances_val2017.json finished, time:0.0150 sec.
[32m[0706 13:50:32 @data.py:456][0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.82s)
creating index...
index created!
[32m[0706 13:50:33 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.
0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 744062.40it/s][32m[0706 13:50:33 @timer.py:45][0m Load annotations for instances_val2017.json finished, time:0.0149 sec.
[32m[0706 13:50:33 @data.py:456][0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.77s)
creating index...
index created!
[32m[0706 13:50:34 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.
0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 713481.88it/s][32m[0706 13:50:34 @timer.py:45][0m Load annotations for instances_val2017.json finished, time:0.0153 sec.
[32m[0706 13:50:34 @data.py:456][0m Found 9921 images for inference.
[32m[0706 13:50:34 @summary.py:47][0m [MovingAverageSummary] 73 operations in collection 'MOVING_SUMMARY_OPS' will be run with session hooks.
[32m[0706 13:50:34 @summary.py:94][0m Summarizing collection 'summaries' of size 76.
[32m[0706 13:50:34 @base.py:228][0m Creating the session ...
2020-07-06 13:50:34.737615: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-06 13:50:34.743032: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-07-06 13:50:34.887781: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14c78d20 executing computations on platform CUDA. Devices:
2020-07-06 13:50:34.887822: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-07-06 13:50:34.887827: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): Tesla T4, Compute Capability 7.5
2020-07-06 13:50:34.890055: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2494125000 Hz
2020-07-06 13:50:34.893901: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14a0c4f0 executing computations on platform Host. Devices:
2020-07-06 13:50:34.893919: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2020-07-06 13:50:34.896069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:3b:00.0Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slur
2020-07-06 13:50:34.896771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:d8:00.0
2020-07-06 13:50:34.897783: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] m/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898069: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898538: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898705: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.901746: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-07-06 13:50:34.901764: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-07-06 13:50:34.901834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-06 13:50:34.901840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1
2020-07-06 13:50:34.901845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y
2020-07-06 13:50:34.901848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N
MultiProcessMapDataZMQ successfully cleaned-up.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
self._extend_graph()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node AllReduceGrads/NcclAllReduce}}with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device='GPU'
[[AllReduceGrads/NcclAllReduce]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/vlamp/Documents/STAC/detection/train_stg1_bdd.py", line 180, in
launch_train_with_config(traincfg, trainer)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/interface.py", line 99, in launch_train_with_config
extra_callbacks=config.extra_callbacks)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 342, in train_with_defaults
steps_per_epoch, starting_epoch, max_epoch)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 313, in train
self.initialize(session_creator, session_init)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/tower.py", line 147, in initialize
super(TowerTrainer, self).initialize(session_creator, session_init)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 230, in initialize
self.sess = session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 88, in create_session
run(tf.global_variables_initializer())
File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 86, in run
sess.run(op)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node AllReduceGrads/NcclAllReduce (defined at usr/local/lib/python3.6/dist-packages/tensorpack/graph_builder/utils.py:154) with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device='GPU'
[[AllReduceGrads/NcclAllReduce]]
Errors may have originated from an input operation.
Input Source operations connected to node AllReduceGrads/NcclAllReduce:
tower0/gradients/AddN_126 (defined at usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/optimizer.py:29)
/cm/local/apps/slurm/var/spool/job18434303/slurm_script: line 29: t: command not found