Semi-supervised learning for object detection

Overview

Source code for STAC: A Simple Semi-Supervised Learning Framework for Object Detection

STAC is a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentation.

This code is only used for research. This is not an official Google product.

Instruction

Install dependencies

Set global enviroment variables.

export PRJROOT=/path/to/your/project/directory/STAC
export DATAROOT=/path/to/your/dataroot
export COCODIR=$DATAROOT/coco
export VOCDIR=$DATAROOT/voc
export PYTHONPATH=$PYTHONPATH:${PRJROOT}/third_party/FasterRCNN:${PRJROOT}/third_party/auto_augment:${PRJROOT}/third_party/tensorpack

Install virtual environment in the root folder of the project

cd ${PRJROOT}

sudo apt install python3-dev python3-virtualenv python3-tk imagemagick
virtualenv -p python3 --system-site-packages env3
. env3/bin/activate
pip install -r requirements.txt

# Make sure your tensorflow version is 1.14 not only in virtual environment but also in
# your machine, 1.15 can cause OOM issues.
python -c 'import tensorflow as tf; print(tf.__version__)'

# install coco apis
pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

(Optional) Install tensorpack

tensorpack with a compatible version is already included at third_party/tensorpack. bash cd ${PRJROOT}/third_party pip install --upgrade git+https://github.com/tensorpack/tensorpack.git

Download COCO/PASCAL VOC data and pre-trained models

Download data

See DATA.md

Download backbone model

cd ${COCODIR}
wget http://models.tensorpack.com/FasterRCNN/ImageNet-R50-AlignPadding.npz

Training

There are three steps:

  • 1. Train a standard detector on labeled data (detection/scripts/coco/train_stg1.sh).
  • 2. Predict pseudo boxes and labels of unlabeled data using the trained detector (detection/scripts/coco/eval_stg1.sh).
  • 3. Use labeled data and unlabeled data with pseudo labels to train a STAC detector (detection/scripts/coco/train_stg2.sh).

Besides instruction at here, detection/scripts/coco/train_stac.sh provides a combined script to train STAC.

detection/scripts/voc/train_stac.sh is a combined script to train STAC on PASCAL VOC.

The following example use labeled data as 10% train2017 and rest 90% train2017 data as unlabeled data.

Step 0: Set variables

cd ${PRJROOT}/detection

# Labeled and Unlabeled datasets
DATASET=coco_train2017.1@10
UNLABELED_DATASET=${DATASET}-unlabeled

# PATH to save trained models
CKPT_PATH=result/${DATASET}

# PATH to save pseudo labels for unlabeled data
PSEUDO_PATH=${CKPT_PATH}/PSEUDO_DATA

# Train with 8 GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

Step 1: Train FasterRCNN on labeled data

. scripts/coco/train_stg1.sh.

Set TRAIN.AUGTYPE_LAB=strong to apply strong data augmentation.

# --simple_path makes train_log/${DATASET}/${EXPNAME} as exact location to save
python3 train_stg1.py \
    --logdir ${CKPT_PATH} --simple_path --config \
    BACKBONE.WEIGHTS=${COCODIR}/ImageNet-R50-AlignPadding.npz \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${DATASET}',)" \
    MODE_MASK=False \
    FRCNN.BATCH_PER_IM=64 \
    PREPROC.TRAIN_SHORT_EDGE_SIZE="[500,800]" \
    TRAIN.EVAL_PERIOD=20 \
    TRAIN.AUGTYPE_LAB='default'

Step 2: Generate pseudo labels of unlabeled data

. scripts/coco/eval_stg1.sh.

Evaluate using COCO metrics and save eval.json

# Check pseudo path
if [ ! -d ${PSEUDO_PATH} ]; then
    mkdir -p ${PSEUDO_PATH}
fi

# Evaluate the model for sanity check
# model-180000 is the last checkpoint
# save eval.json at $PSEUDO_PATH

python3 predict.py \
    --evaluate ${PSEUDO_PATH}/eval.json \
    --load "${CKPT_PATH}"/model-180000 \
    --config \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${UNLABELED_DATASET}',)"

Generate pseudo labels for unlabeled data

Set EVAL.PSEUDO_INFERENCE=True to use original images rather than resized ones for inference.

# Extract pseudo label
python3 predict.py \
    --predict_unlabeled ${PSEUDO_PATH} \
    --load "${CKPT_PATH}"/model-180000 \
    --config \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${UNLABELED_DATASET}',)" \
    EVAL.PSEUDO_INFERENCE=True

Step 3: Train STAC

. scripts/coco/train_stg2.sh.

The dataloader loads pseudo labels from ${PSEUDO_PATH}/pseudo_data.npy.

Apply default augmentation on labeled data and strong augmentation on unlabeled data.

TRAIN.CONFIDENCE and TRAIN.WU are two major parameters of the method.

python3 train_stg2.py \
    --logdir=${CKPT_PATH}/STAC --simple_path \
    --pseudo_path=${PSEUDO_PATH} \
    --config \
    BACKBONE.WEIGHTS=${COCODIR}/ImageNet-R50-AlignPadding.npz \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${DATASET}',)" \
    DATA.UNLABEL="('${UNLABELED_DATASET}',)" \
    MODE_MASK=False \
    FRCNN.BATCH_PER_IM=64 \
    PREPROC.TRAIN_SHORT_EDGE_SIZE="[500,800]" \
    TRAIN.EVAL_PERIOD=20 \
    TRAIN.AUGTYPE_LAB='default' \
    TRAIN.AUGTYPE='strong' \
    TRAIN.CONFIDENCE=0.9 \
    TRAIN.WU=2

Tensorboard

All training logs and tensorboard info are under ${PRJROOT}/detection/train_log. Visualize using

tensorboard --logdir=${PRJROOT}/detection/train_log

Citation

@inproceedings{sohn2020detection,
  title={A Simple Semi-Supervised Learning Framework for Object Detection},
  author={Kihyuk Sohn and Zizhao Zhang and Chun-Liang Li and Han Zhang and Chen-Yu Lee and Tomas Pfister},
  year={2020},
  booktitle={arXiv:2005.04757}
}

Acknowledgement

Comments
  • Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM

    Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM

    Hello! I was trying to run the code but my process was always being killed by system which said 'out of memory'. I already set batch size ==1, so what else can I do to run the code?

    opened by happyvictor008 11
  • Confirmation for some training details

    Confirmation for some training details

    Hi. I want to confirm some details in the second self-training stage. Are all the hyper-parameters (including the batch size, threshold for positive and negative, number of proposals in the RCNN head, etc. ) are the same for both supervised and unsupervised loss? Also, the unsupervised loss is imposed on both RPN and RCNN head? Thanks.

    opened by strongwolf 9
  • GPU only using half of available memory

    GPU only using half of available memory

    Hello,

    Whether I use 1 or 2 GPUs (RTX 2080 Ti), only half of the capacity is used:

    1 GPU Screenshot from 2020-09-07 15-12-21

    2 GPUs Screenshot from 2020-09-07 15-10-33

    I tried to increase the batch size (FRCNN.BATCH_PER_IM in train_stg1.sh) and also the number of workers (_C.DATA.NUM_WORKERS in third_party/FasterRCNN/FasterRCNN/config.py) but it doesn't seem to make a difference.

    Since TensorFlow is supposed to use all available memory, is this being done on purpose?

    Thank you for your time.

    opened by aoussou 8
  • the total_cost and wd_cost become nan.

    the total_cost and wd_cost become nan.

    hi, when I test your code with train_stg1.sh and compute the teacher model. the logs show that the total_cost and wd_cost become Nan, I did not change any code. the data and the gpu is as follows: DATASET='coco_train2017.1@10' CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 image

    opened by bobzhang123 6
  • Considering label scores in loss function

    Considering label scores in loss function

    I can't seem to find where in the code is the score of a pseudo box is being taken into account. Specifically, where can we see the effect of zero scored boxes (those that didn't pass the confidence threshold). To the best of my inquiry, it is missing from the code, though emphasized in the paper. Thanks!

    opened by ErezBeyond 4
  • STAC_JSON.tar is broken

    STAC_JSON.tar is broken

    STAC_JSON.tar in the project is just 61 kb, while that in https://storage.cloud.google.com/gresearch/ssl_detection/STAC_JSON.tar is 120 Mb, but it can't be downloaded.

    opened by jiollos 2
  • about using my own data

    about using my own data

    hi I want to use my own coco-format data on your framework, however, it seems that I need to prepare annotation files (json) for unlabeled data and put them under "$COCODIR/annotations/semi_supervised". Is that true? But my own unlabeled data does not have labels. What should I do? Looking foreword to a practical answer from anyone who can help me!

    opened by evangel-jiang 2
  • Have you ever tried to generate pseudo-labels online?

    Have you ever tried to generate pseudo-labels online?

    In STAC, the pseudo-labels are predicted in an offline manner, i.e., after training a network in labeled data and then using it to predict pseudo-labels. It is a multi-stage training manner. However, in FixMatch, also your work, the pseudo-labels are predicted in an online manner, i.e., in a mini-batch, the pseudo-labels are generated. It is a one-stage training manner.

    opened by Chen-Song 2
  • Question about Table 1 in the paper

    Question about Table 1 in the paper

    Hi,

    Thanks for providing this interesting work and releasing the code. I am curious about the implementation details in Table 1 of the main paper.

    (1) Are the results showed in Table 1 produced by using COCO2017 validation set (5000 instances)?

    (2) In table 1, does the 100% COCO mean that you use 100% supervised COCO2017 data as the labeled set, the external COCO2017_unlabeled data as the unlabeled set, and COCO2017 validation set as the evaluation set?

    Thank you!

    opened by ycliu93 2
  • dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory

    dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory

    Can you please interpret me the following error? Is it a problem with CUDA version? I am not that much experienced and I would like to know so that I can solve it and continue.

    WARNING: NVIDIA binaries may not be bound with --writable [0706 13:49:52 @voc.py:279] Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval'] [0706 13:49:52 @coco.py:271] Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100'] [0706 13:49:52 @coco.py:205] Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017'] [0706 13:49:52 @coco.py:260] Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017', 'coco_unlabeledtrainval20class'] [0706 13:49:52 @logger.py:138] Directory '/home/vlamp/Documents/STAC/RESULTS' backuped to '/home/vlamp/Documents/STAC/RESULTS0706-134952' [0706 13:49:52 @logger.py:92] Argv: /home/vlamp/Documents/STAC/detection/train_stg1_bdd.py --logdir /home/vlamp/Documents/STAC/RESULTS/ --simple_path --config BACKBONE.WEIGHTS=/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz DATA.BASEDIR=/home/vlamp/Documents/STAC/DATA_STAC/coco MODE_MASK=False FRCNN.BATCH_PER_IM=64 PREPROC.TRAIN_SHORT_EDGE_SIZE=[500,800] TRAIN.EVAL_PERIOD=20 TRAIN.AUGTYPE_LAB=default [0706 13:49:54 @train_stg1_bdd.py:87] Environment Information:


    sys.platform linux Python 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0] Tensorpack v0.10.1-9-g9c1b1b7b-dirty Numpy 1.16.4 TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5 TF Compiler Version 4.8.5 TF CUDA support True TF MKL support False TF XLA support False Nvidia Driver /.singularity.d/libs/libnvidia-ml.so CUDA /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1.243 CUDNN /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.4 NCCL CUDA_VISIBLE_DEVICES 0,1 GPU 0,1 Tesla T4 Free RAM 369.15/376.54 GB CPU Count 40 cv2 4.2.0 msgpack 1.0.0 python-prctl False


    list(_C.DATA.TRAIN) = ['train2017'] list(_C.DATA.VAL) = ('val2017',) datasets = ['train2017', 'val2017'] _C.DATA.CLASS_NAMES = ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle'] [0706 13:49:54 @config.py:352] Config: ------------------------------------------ {'BACKBONE': {'FREEZE_AFFINE': False, 'FREEZE_AT': 2, 'NORM': 'FreezeBN', 'RESNET_NUM_BLOCKS': [3, 4, 6, 3], 'STRIDE_1X1': False, 'TF_PAD_MODE': False, 'WEIGHTS': '/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz'}, 'CASCADE': {'BBOX_REG_WEIGHTS': [[10.0, 10.0, 5.0, 5.0], [20.0, 20.0, 10.0, 10.0], [30.0, 30.0, 15.0, 15.0]], 'IOUS': [0.5, 0.6, 0.7]}, 'DATA': {'ABSOLUTE_COORD': True, 'BASEDIR': '/home/vlamp/Documents/STAC/DATA_STAC/coco', 'CLASS_NAMES': ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle'], 'NUM_CATEGORY': 5, 'NUM_WORKERS': 24, 'TRAIN': ('train2017',), 'UNLABEL': ('',), 'VAL': ('val2017',)}, 'EVAL': {'PSEUDO_INFERENCE': False}, 'FPN': {'ANCHOR_SIZES': (32, 64, 128, 256, 512), 'ANCHOR_STRIDES': (4, 8, 16, 32, 64), 'CASCADE': False, 'FRCNN_CONV_HEAD_DIM': 256, 'FRCNN_FC_HEAD_DIM': 1024, 'FRCNN_HEAD_FUNC': 'fastrcnn_2fc_head', 'MRCNN_HEAD_FUNC': 'maskrcnn_up4conv_head', 'NORM': 'None', 'NUM_CHANNEL': 256, 'PROPOSAL_MODE': 'Level', 'RESOLUTION_REQUIREMENT': 32}, 'FRCNN': {'BATCH_PER_IM': 64, 'BBOX_REG_WEIGHTS': [10.0, 10.0, 5.0, 5.0], 'FG_RATIO': 0.25, 'FG_THRESH': 0.5}, 'MODE_FPN': True, 'MODE_MASK': False, 'MRCNN': {'ACCURATE_PASTE': True, 'HEAD_DIM': 256}, 'PREPROC': {'MAX_SIZE': 1344.0, 'PIXEL_MEAN': [123.675, 116.28, 103.53], 'PIXEL_STD': [58.395, 57.12, 57.375], 'TEST_SHORT_EDGE_SIZE': 800, 'TRAIN_SHORT_EDGE_SIZE': [500, 800]}, 'RPN': {'ANCHOR_RATIOS': (0.5, 1.0, 2.0), 'ANCHOR_SIZES': (32, 64, 128, 256, 512), 'ANCHOR_STRIDE': 16, 'BATCH_PER_IM': 256, 'CROWD_OVERLAP_THRESH': 9.99, 'FG_RATIO': 0.5, 'HEAD_DIM': 1024, 'MIN_SIZE': 0, 'NEGATIVE_ANCHOR_THRESH': 0.3, 'NUM_ANCHOR': 15, 'POSITIVE_ANCHOR_THRESH': 0.7, 'PROPOSAL_NMS_THRESH': 0.7, 'TEST_PER_LEVEL_NMS_TOPK': 1000, 'TEST_POST_NMS_TOPK': 1000, 'TEST_PRE_NMS_TOPK': 6000, 'TRAIN_PER_LEVEL_NMS_TOPK': 2000, 'TRAIN_POST_NMS_TOPK': 2000, 'TRAIN_PRE_NMS_TOPK': 12000}, 'TEST': {'FRCNN_NMS_THRESH': 0.5, 'RESULTS_PER_IM': 100, 'RESULT_SCORE_THRESH': 0.05, 'RESULT_SCORE_THRESH_VIS': 0.5}, 'TRAIN': {'AUGTYPE': 'strong', 'AUGTYPE_LAB': 'default', 'BASE_LR': 0.01, 'CHECKPOINT_PERIOD': 20, 'CONFIDENCE': 0.9, 'EVAL_PERIOD': 20, 'GAMMA': 0.1, 'LR_SCHEDULE': [120000, 160000, 180000], 'NO_PRN_LOSS': False, 'NUM_GPUS': 2, 'STAGE': 1, 'STARTING_EPOCH': 1, 'STEPS_PER_EPOCH': 500, 'WARMUP': 1000, 'WARMUP_INIT_LR': 0.0033000000000000004, 'WEIGHT_DECAY': 0.0001, 'WU': 2.0}, 'TRAINER': 'replicated'} [0706 13:49:54 @train_stg1_bdd.py:106] Warm Up Schedule (steps, value): [(0, 0.0033000000000000004), (1000, 0.01)] [0706 13:49:54 @train_stg1_bdd.py:107] LR Schedule (epochs, value): [(2, 0.01), (960.0, 0.001), (1280.0, 0.00010000000000000002)] loading annotations into memory... Done (t=5.18s) creating index... index created! [0706 13:49:59 @coco.py:60] Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_train2017.json.

    0%| | 0/69403 [00:00<?, ?it/s] 3%|3 | 2090/69403 [00:00<00:03, 20895.19it/s] 6%|5 | 4034/69403 [00:00<00:03, 20434.79it/s] 9%|8 | 6073/69403 [00:00<00:03, 20416.41it/s] 12%|#1 | 8201/69403 [00:00<00:02, 20666.09it/s] 15%|#4 | 10336/69403 [00:00<00:02, 20866.20it/s] 18%|#7 | 12465/69403 [00:00<00:02, 20991.31it/s] 21%|##1 | 14620/69403 [00:00<00:02, 21155.12it/s] 24%|##4 | 16775/69403 [00:00<00:02, 21271.79it/s] 27%|##7 | 18896/69403 [00:00<00:02, 21253.07it/s] 30%|### | 21042/69403 [00:01<00:02, 21313.93it/s] 33%|###3 | 23115/69403 [00:01<00:02, 21052.23it/s] 36%|###6 | 25181/69403 [00:01<00:02, 20796.20it/s] 39%|###9 | 27234/69403 [00:01<00:02, 20696.98it/s] 42%|####2 | 29285/69403 [00:01<00:01, 20509.34it/s] 45%|####5 | 31323/69403 [00:01<00:01, 20425.01it/s] 48%|####8 | 33357/69403 [00:01<00:01, 20302.50it/s] 51%|##### | 35382/69403 [00:01<00:01, 20251.87it/s] 54%|#####3 | 37403/69403 [00:01<00:01, 20201.65it/s] 57%|#####6 | 39488/69403 [00:01<00:01, 20390.27it/s] 60%|#####9 | 41550/69403 [00:02<00:01, 20456.26it/s] 63%|######2 | 43660/69403 [00:02<00:01, 20643.18it/s] 66%|######5 | 45767/69403 [00:02<00:01, 20768.95it/s] 69%|######8 | 47887/69403 [00:02<00:01, 20894.81it/s] 72%|#######2 | 50002/69403 [00:02<00:00, 20968.20it/s] 75%|#######5 | 52146/69403 [00:02<00:00, 21105.63it/s] 78%|#######8 | 54280/69403 [00:02<00:00, 21174.64it/s] 81%|########1 | 56406/69403 [00:02<00:00, 21198.35it/s] 84%|########4 | 58537/69403 [00:02<00:00, 21230.58it/s] 87%|########7 | 60701/69403 [00:02<00:00, 21351.07it/s] 91%|######### | 62872/69403 [00:03<00:00, 21456.21it/s] 94%|#########3| 65018/69403 [00:03<00:00, 21151.33it/s] 97%|#########6| 67169/69403 [00:03<00:00, 21256.36it/s] 100%|#########9| 69342/69403 [00:03<00:00, 21396.14it/s] 100%|##########| 69403/69403 [00:03<00:00, 20915.84it/s][0706 13:50:03 @timer.py:45] Load annotations for instances_train2017.json finished, time:3.3659 sec. [0706 13:50:05 @data.py:79] Ground-Truth category distribution: | class | #box | class | #box | class | #box | |:-------:|:-------|:----------:|:-------|:-----------:|:-------| | car | 713210 | pedestrian | 91349 | big vehicle | 41643 | | bicycle | 7210 | motorcycle | 3002 | | | | total | 856414 | | | | | [0706 13:50:05 @data.py:416] Filtered 0 images which contain no non-crowd groudtruth boxes. Total #images for training: 69403 [0706 13:50:05 @augmentation.py:171] ---------------------------------------------------------------------------------------------------- [0706 13:50:05 @augmentation.py:172] Augmentation type default: [] [0706 13:50:05 @augmentation.py:173] ---------------------------------------------------------------------------------------------------- [0706 13:50:05 @data.py:107] Use affine-enabled TrainingDataPreprocessor_aug [0706 13:50:05 @train_stg1_bdd.py:112] Total passes of the training set is: 20.748 [0706 13:50:05 @sessinit.py:294] Loading dictionary from /home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz ... [0706 13:50:06 @training.py:48] [DataParallel] Training a model of 2 towers. [0706 13:50:06 @interface.py:41] Automatically applying StagingInput on the DataFlow. [0706 13:50:06 @input_source.py:221] Setting up the queue 'QueueInput/input_queue' for CPU prefetching ... [0706 13:50:06 @training.py:108] Building graph for training tower 0 on device /gpu:0 ... [0706 13:50:06 @argtools.py:138] WRN Some BatchNorm layer uses moving_mean/moving_variance in training. [0706 13:50:06 @registry.py:90] 'conv0': [1, 3, ?, ?] --> [1, 64, ?, ?] [0706 13:50:06 @registry.py:90] 'pool0': [1, 64, ?, ?] --> [1, 64, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block0/conv1': [1, 64, ?, ?] --> [1, 64, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block0/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block0/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block0/convshortcut': [1, 64, ?, ?] --> [1, 256, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block1/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block1/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block1/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block2/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block2/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?] [0706 13:50:06 @registry.py:90] 'group0/block2/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?] [0706 13:50:06 @registry.py:90] 'group1/block0/conv1': [1, 256, ?, ?] --> [1, 128, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block0/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block0/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block0/convshortcut': [1, 256, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block1/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block1/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block1/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block2/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block2/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block2/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block3/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block3/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?] [0706 13:50:07 @registry.py:90] 'group1/block3/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block0/conv1': [1, 512, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block0/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block0/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block0/convshortcut': [1, 512, ?, ?] --> [1, 1024, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block1/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block1/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block1/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block2/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block2/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block2/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block3/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block3/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block3/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block4/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block4/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block4/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block5/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block5/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'group2/block5/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block0/conv1': [1, 1024, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block0/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block0/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block0/convshortcut': [1, 1024, ?, ?] --> [1, 2048, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block1/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block1/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block1/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block2/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block2/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?] [0706 13:50:07 @registry.py:90] 'group3/block2/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?] [0706 13:50:07 @registry.py:80] 'fpn' input: [1, 256, ?, ?], [1, 512, ?, ?], [1, 1024, ?, ?], [1, 2048, ?, ?] [0706 13:50:07 @registry.py:90] 'fpn/lateral_1x1_c2': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'fpn/lateral_1x1_c3': [1, 512, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'fpn/lateral_1x1_c4': [1, 1024, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'fpn/lateral_1x1_c5': [1, 2048, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'fpn/upsample_lat5': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:07 @registry.py:90] 'fpn/upsample_lat4': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:08 @registry.py:90] 'fpn/upsample_lat3': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:08 @registry.py:90] 'fpn/posthoc_3x3_p2': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:08 @registry.py:90] 'fpn/posthoc_3x3_p3': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:08 @registry.py:90] 'fpn/posthoc_3x3_p4': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:08 @registry.py:90] 'fpn/posthoc_3x3_p5': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:08 @registry.py:90] 'fpn/maxpool_p6': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:08 @registry.py:93] 'fpn' output: [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?] [0706 13:50:08 @registry.py:80] 'rpn' input: [1, 256, ?, ?] [0706 13:50:08 @registry.py:90] 'rpn/conv0': [1, 256, ?, ?] --> [1, 256, ?, ?] [0706 13:50:08 @registry.py:90] 'rpn/class': [1, 256, ?, ?] --> [1, 3, ?, ?] [0706 13:50:08 @registry.py:90] 'rpn/box': [1, 256, ?, ?] --> [1, 12, ?, ?] [0706 13:50:08 @registry.py:93] 'rpn' output: [?, ?, 3], [?, ?, 3, 4] [0706 13:50:09 @registry.py:80] 'fastrcnn' input: [?, 256, 7, 7] [0706 13:50:10 @registry.py:90] 'fastrcnn/fc6': [?, 256, 7, 7] --> [?, 1024] [0706 13:50:10 @registry.py:90] 'fastrcnn/fc7': [?, 1024] --> [?, 1024] [0706 13:50:10 @registry.py:93] 'fastrcnn' output: [?, 1024] [0706 13:50:10 @registry.py:80] 'fastrcnn/outputs' input: [?, 1024] [0706 13:50:10 @registry.py:90] 'fastrcnn/outputs/class': [?, 1024] --> [?, 6] [0706 13:50:10 @registry.py:90] 'fastrcnn/outputs/box': [?, 1024] --> [?, 24] [0706 13:50:10 @registry.py:93] 'fastrcnn/outputs' output: [?, 6], [?, 6, 4] [0706 13:50:10 @regularize.py:97] regularize_cost() found 57 variables to regularize. [0706 13:50:10 @regularize.py:21] The following tensors will be regularized: group1/block0/conv1/W:0, group1/block0/conv2/W:0, group1/block0/conv3/W:0, group1/block0/convshortcut/W:0, group1/block1/conv1/W:0, group1/block1/conv2/W:0, group1/block1/conv3/W:0, group1/block2/conv1/W:0, group1/block2/conv2/W:0, group1/block2/conv3/W:0, group1/block3/conv1/W:0, group1/block3/conv2/W:0, group1/block3/conv3/W:0, group2/block0/conv1/W:0, group2/block0/conv2/W:0, group2/block0/conv3/W:0, group2/block0/convshortcut/W:0, group2/block1/conv1/W:0, group2/block1/conv2/W:0, group2/block1/conv3/W:0, group2/block2/conv1/W:0, group2/block2/conv2/W:0, group2/block2/conv3/W:0, group2/block3/conv1/W:0, group2/block3/conv2/W:0, group2/block3/conv3/W:0, group2/block4/conv1/W:0, group2/block4/conv2/W:0, group2/block4/conv3/W:0, group2/block5/conv1/W:0, group2/block5/conv2/W:0, group2/block5/conv3/W:0, group3/block0/conv1/W:0, group3/block0/conv2/W:0, group3/block0/conv3/W:0, group3/block0/convshortcut/W:0, group3/block1/conv1/W:0, group3/block1/conv2/W:0, group3/block1/conv3/W:0, group3/block2/conv1/W:0, group3/block2/conv2/W:0, group3/block2/conv3/W:0, fpn/lateral_1x1_c2/W:0, fpn/lateral_1x1_c3/W:0, fpn/lateral_1x1_c4/W:0, fpn/lateral_1x1_c5/W:0, fpn/posthoc_3x3_p2/W:0, fpn/posthoc_3x3_p3/W:0, fpn/posthoc_3x3_p4/W:0, fpn/posthoc_3x3_p5/W:0, rpn/conv0/W:0, rpn/class/W:0, rpn/box/W:0, fastrcnn/fc6/W:0, fastrcnn/fc7/W:0, fastrcnn/outputs/class/W:0, fastrcnn/outputs/box/W:0 [0706 13:50:12 @training.py:108] Building graph for training tower 1 on device /gpu:1 ... [0706 13:50:14 @regularize.py:97] regularize_cost() found 57 variables to regularize. [0706 13:50:16 @collection.py:152] Size of these collections were changed in tower1: (tf.GraphKeys.MODEL_VARIABLES: 161->194) [0706 13:50:16 @collection.py:165] These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 76->77) [0706 13:50:20 @training.py:350] 'sync_variables_from_main_tower' includes 607 operations. [0706 13:50:20 @model_utils.py:67] List of Trainable Variables: name shape #elements


    group1/block0/conv1/W [1, 1, 256, 128] 32768 group1/block0/conv1/bn/gamma [128] 128 group1/block0/conv1/bn/beta [128] 128 group1/block0/conv2/W [3, 3, 128, 128] 147456 group1/block0/conv2/bn/gamma [128] 128 group1/block0/conv2/bn/beta [128] 128 group1/block0/conv3/W [1, 1, 128, 512] 65536 group1/block0/conv3/bn/gamma [512] 512 group1/block0/conv3/bn/beta [512] 512 group1/block0/convshortcut/W [1, 1, 256, 512] 131072 group1/block0/convshortcut/bn/gamma [512] 512 group1/block0/convshortcut/bn/beta [512] 512 group1/block1/conv1/W [1, 1, 512, 128] 65536 group1/block1/conv1/bn/gamma [128] 128 group1/block1/conv1/bn/beta [128] 128 group1/block1/conv2/W [3, 3, 128, 128] 147456 group1/block1/conv2/bn/gamma [128] 128 group1/block1/conv2/bn/beta [128] 128 group1/block1/conv3/W [1, 1, 128, 512] 65536 group1/block1/conv3/bn/gamma [512] 512 group1/block1/conv3/bn/beta [512] 512 group1/block2/conv1/W [1, 1, 512, 128] 65536 group1/block2/conv1/bn/gamma [128] 128 group1/block2/conv1/bn/beta [128] 128 group1/block2/conv2/W [3, 3, 128, 128] 147456 group1/block2/conv2/bn/gamma [128] 128 group1/block2/conv2/bn/beta [128] 128 group1/block2/conv3/W [1, 1, 128, 512] 65536 group1/block2/conv3/bn/gamma [512] 512 group1/block2/conv3/bn/beta [512] 512 group1/block3/conv1/W [1, 1, 512, 128] 65536 group1/block3/conv1/bn/gamma [128] 128 group1/block3/conv1/bn/beta [128] 128 group1/block3/conv2/W [3, 3, 128, 128] 147456 group1/block3/conv2/bn/gamma [128] 128 group1/block3/conv2/bn/beta [128] 128 group1/block3/conv3/W [1, 1, 128, 512] 65536 group1/block3/conv3/bn/gamma [512] 512 group1/block3/conv3/bn/beta [512] 512 group2/block0/conv1/W [1, 1, 512, 256] 131072 group2/block0/conv1/bn/gamma [256] 256 group2/block0/conv1/bn/beta [256] 256 group2/block0/conv2/W [3, 3, 256, 256] 589824 group2/block0/conv2/bn/gamma [256] 256 group2/block0/conv2/bn/beta [256] 256 group2/block0/conv3/W [1, 1, 256, 1024] 262144 group2/block0/conv3/bn/gamma [1024] 1024 group2/block0/conv3/bn/beta [1024] 1024 group2/block0/convshortcut/W [1, 1, 512, 1024] 524288 group2/block0/convshortcut/bn/gamma [1024] 1024 group2/block0/convshortcut/bn/beta [1024] 1024 group2/block1/conv1/W [1, 1, 1024, 256] 262144 group2/block1/conv1/bn/gamma [256] 256 group2/block1/conv1/bn/beta [256] 256 group2/block1/conv2/W [3, 3, 256, 256] 589824 group2/block1/conv2/bn/gamma [256] 256 group2/block1/conv2/bn/beta [256] 256 group2/block1/conv3/W [1, 1, 256, 1024] 262144 group2/block1/conv3/bn/gamma [1024] 1024 group2/block1/conv3/bn/beta [1024] 1024 group2/block2/conv1/W [1, 1, 1024, 256] 262144 group2/block2/conv1/bn/gamma [256] 256 group2/block2/conv1/bn/beta [256] 256 group2/block2/conv2/W [3, 3, 256, 256] 589824 group2/block2/conv2/bn/gamma [256] 256 group2/block2/conv2/bn/beta [256] 256 group2/block2/conv3/W [1, 1, 256, 1024] 262144 group2/block2/conv3/bn/gamma [1024] 1024 group2/block2/conv3/bn/beta [1024] 1024 group2/block3/conv1/W [1, 1, 1024, 256] 262144 group2/block3/conv1/bn/gamma [256] 256 group2/block3/conv1/bn/beta [256] 256 group2/block3/conv2/W [3, 3, 256, 256] 589824 group2/block3/conv2/bn/gamma [256] 256 group2/block3/conv2/bn/beta [256] 256 group2/block3/conv3/W [1, 1, 256, 1024] 262144 group2/block3/conv3/bn/gamma [1024] 1024 group2/block3/conv3/bn/beta [1024] 1024 group2/block4/conv1/W [1, 1, 1024, 256] 262144 group2/block4/conv1/bn/gamma [256] 256 group2/block4/conv1/bn/beta [256] 256 group2/block4/conv2/W [3, 3, 256, 256] 589824 group2/block4/conv2/bn/gamma [256] 256 group2/block4/conv2/bn/beta [256] 256 group2/block4/conv3/W [1, 1, 256, 1024] 262144 group2/block4/conv3/bn/gamma [1024] 1024 group2/block4/conv3/bn/beta [1024] 1024 group2/block5/conv1/W [1, 1, 1024, 256] 262144 group2/block5/conv1/bn/gamma [256] 256 group2/block5/conv1/bn/beta [256] 256 group2/block5/conv2/W [3, 3, 256, 256] 589824 group2/block5/conv2/bn/gamma [256] 256 group2/block5/conv2/bn/beta [256] 256 group2/block5/conv3/W [1, 1, 256, 1024] 262144 group2/block5/conv3/bn/gamma [1024] 1024 group2/block5/conv3/bn/beta [1024] 1024 group3/block0/conv1/W [1, 1, 1024, 512] 524288 group3/block0/conv1/bn/gamma [512] 512 group3/block0/conv1/bn/beta [512] 512 group3/block0/conv2/W [3, 3, 512, 512] 2359296 group3/block0/conv2/bn/gamma [512] 512 group3/block0/conv2/bn/beta [512] 512 group3/block0/conv3/W [1, 1, 512, 2048] 1048576 group3/block0/conv3/bn/gamma [2048] 2048 group3/block0/conv3/bn/beta [2048] 2048 group3/block0/convshortcut/W [1, 1, 1024, 2048] 2097152 group3/block0/convshortcut/bn/gamma [2048] 2048 group3/block0/convshortcut/bn/beta [2048] 2048 group3/block1/conv1/W [1, 1, 2048, 512] 1048576 group3/block1/conv1/bn/gamma [512] 512 group3/block1/conv1/bn/beta [512] 512 group3/block1/conv2/W [3, 3, 512, 512] 2359296 group3/block1/conv2/bn/gamma [512] 512 group3/block1/conv2/bn/beta [512] 512 group3/block1/conv3/W [1, 1, 512, 2048] 1048576 group3/block1/conv3/bn/gamma [2048] 2048 group3/block1/conv3/bn/beta [2048] 2048 group3/block2/conv1/W [1, 1, 2048, 512] 1048576 group3/block2/conv1/bn/gamma [512] 512 group3/block2/conv1/bn/beta [512] 512 group3/block2/conv2/W [3, 3, 512, 512] 2359296 group3/block2/conv2/bn/gamma [512] 512 group3/block2/conv2/bn/beta [512] 512 group3/block2/conv3/W [1, 1, 512, 2048] 1048576 group3/block2/conv3/bn/gamma [2048] 2048 group3/block2/conv3/bn/beta [2048] 2048 fpn/lateral_1x1_c2/W [1, 1, 256, 256] 65536 fpn/lateral_1x1_c2/b [256] 256 fpn/lateral_1x1_c3/W [1, 1, 512, 256] 131072 fpn/lateral_1x1_c3/b [256] 256 fpn/lateral_1x1_c4/W [1, 1, 1024, 256] 262144 fpn/lateral_1x1_c4/b [256] 256 fpn/lateral_1x1_c5/W [1, 1, 2048, 256] 524288 fpn/lateral_1x1_c5/b [256] 256 fpn/posthoc_3x3_p2/W [3, 3, 256, 256] 589824 fpn/posthoc_3x3_p2/b [256] 256 fpn/posthoc_3x3_p3/W [3, 3, 256, 256] 589824 fpn/posthoc_3x3_p3/b [256] 256 fpn/posthoc_3x3_p4/W [3, 3, 256, 256] 589824 fpn/posthoc_3x3_p4/b [256] 256 fpn/posthoc_3x3_p5/W [3, 3, 256, 256] 589824 fpn/posthoc_3x3_p5/b [256] 256 rpn/conv0/W [3, 3, 256, 256] 589824 rpn/conv0/b [256] 256 rpn/class/W [1, 1, 256, 3] 768 rpn/class/b [3] 3 rpn/box/W [1, 1, 256, 12] 3072 rpn/box/b [12] 12 fastrcnn/fc6/W [12544, 1024] 12845056 fastrcnn/fc6/b [1024] 1024 fastrcnn/fc7/W [1024, 1024] 1048576 fastrcnn/fc7/b [1024] 1024 fastrcnn/outputs/class/W [1024, 6] 6144 fastrcnn/outputs/class/b [6] 6 fastrcnn/outputs/box/W [1024, 24] 24576 fastrcnn/outputs/box/b [24] 24 Number of trainable variables: 156 Number of parameters (elements): 41147437 Storage space needed for all trainable variables: 156.97MB [0706 13:50:20 @base.py:207] Setup callbacks graph ...

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " [0706 13:50:27 @argtools.py:138] WRN "import prctl" failed! Install python-prctl so that processes can be cleaned with guarantee. [0706 13:50:29 @prof.py:291] [HostMemoryTracker] Free RAM in setup_graph() is 364.27 GB. [0706 13:50:29 @tower.py:135] Building graph for predict tower 'tower-pred-0' on device /gpu:0 ... [0706 13:50:30 @collection.py:152] Size of these collections were changed in tower-pred-0: (tf.GraphKeys.MODEL_VARIABLES: 194->227) [0706 13:50:30 @collection.py:165] These collections were modified but restored in tower-pred-0: (tf.GraphKeys.SUMMARIES: 76->77) [0706 13:50:30 @tower.py:135] Building graph for predict tower 'tower-pred-1' on device /gpu:1 with variable scope 'tower1'... [0706 13:50:31 @collection.py:152] Size of these collections were changed in tower-pred-1: (tf.GraphKeys.MODEL_VARIABLES: 227->260) [0706 13:50:31 @collection.py:165] These collections were modified but restored in tower-pred-1: (tf.GraphKeys.SUMMARIES: 76->77) loading annotations into memory... Done (t=0.75s) creating index... index created! [0706 13:50:31 @coco.py:60] Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

    0%| | 0/9921 [00:00<?, ?it/s] 100%|##########| 9921/9921 [00:00<00:00, 725119.19it/s][0706 13:50:31 @timer.py:45] Load annotations for instances_val2017.json finished, time:0.0151 sec. [0706 13:50:31 @data.py:456] Found 9921 images for inference. loading annotations into memory... Done (t=0.83s) creating index... index created! [0706 13:50:32 @coco.py:60] Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

    0%| | 0/9921 [00:00<?, ?it/s] 100%|##########| 9921/9921 [00:00<00:00, 739211.43it/s][0706 13:50:32 @timer.py:45] Load annotations for instances_val2017.json finished, time:0.0150 sec. [0706 13:50:32 @data.py:456] Found 9921 images for inference. loading annotations into memory... Done (t=0.82s) creating index... index created! [0706 13:50:33 @coco.py:60] Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

    0%| | 0/9921 [00:00<?, ?it/s] 100%|##########| 9921/9921 [00:00<00:00, 744062.40it/s][0706 13:50:33 @timer.py:45] Load annotations for instances_val2017.json finished, time:0.0149 sec. [0706 13:50:33 @data.py:456] Found 9921 images for inference. loading annotations into memory... Done (t=0.77s) creating index... index created! [0706 13:50:34 @coco.py:60] Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

    0%| | 0/9921 [00:00<?, ?it/s] 100%|##########| 9921/9921 [00:00<00:00, 713481.88it/s][0706 13:50:34 @timer.py:45] Load annotations for instances_val2017.json finished, time:0.0153 sec. [0706 13:50:34 @data.py:456] Found 9921 images for inference. [0706 13:50:34 @summary.py:47] [MovingAverageSummary] 73 operations in collection 'MOVING_SUMMARY_OPS' will be run with session hooks. [0706 13:50:34 @summary.py:94] Summarizing collection 'summaries' of size 76. [0706 13:50:34 @base.py:228] Creating the session ... 2020-07-06 13:50:34.737615: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-07-06 13:50:34.743032: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2020-07-06 13:50:34.887781: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14c78d20 executing computations on platform CUDA. Devices: 2020-07-06 13:50:34.887822: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 2020-07-06 13:50:34.887827: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): Tesla T4, Compute Capability 7.5 2020-07-06 13:50:34.890055: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2494125000 Hz 2020-07-06 13:50:34.893901: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14a0c4f0 executing computations on platform Host. Devices: 2020-07-06 13:50:34.893919: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2020-07-06 13:50:34.896069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59 pciBusID: 0000:3b:00.0Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slur 2020-07-06 13:50:34.896771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59 pciBusID: 0000:d8:00.0 2020-07-06 13:50:34.897783: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] m/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898069: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898538: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898705: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.901746: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2020-07-06 13:50:34.901764: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2020-07-06 13:50:34.901834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-07-06 13:50:34.901840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1 2020-07-06 13:50:34.901845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y 2020-07-06 13:50:34.901848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N

    MultiProcessMapDataZMQ successfully cleaned-up. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1339, in _run_fn self._extend_graph() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node AllReduceGrads/NcclAllReduce}}with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"] Registered devices: [CPU, XLA_CPU, XLA_GPU] Registered kernels: device='GPU'

     [[AllReduceGrads/NcclAllReduce]]
    

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/home/vlamp/Documents/STAC/detection/train_stg1_bdd.py", line 180, in launch_train_with_config(traincfg, trainer) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/interface.py", line 99, in launch_train_with_config extra_callbacks=config.extra_callbacks) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 342, in train_with_defaults steps_per_epoch, starting_epoch, max_epoch) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 313, in train self.initialize(session_creator, session_init) File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/tower.py", line 147, in initialize super(TowerTrainer, self).initialize(session_creator, session_init) File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 230, in initialize self.sess = session_creator.create_session() File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 88, in create_session run(tf.global_variables_initializer()) File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 86, in run sess.run(op) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node AllReduceGrads/NcclAllReduce (defined at usr/local/lib/python3.6/dist-packages/tensorpack/graph_builder/utils.py:154) with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"] Registered devices: [CPU, XLA_CPU, XLA_GPU] Registered kernels: device='GPU'

     [[AllReduceGrads/NcclAllReduce]]
    

    Errors may have originated from an input operation. Input Source operations connected to node AllReduceGrads/NcclAllReduce: tower0/gradients/AddN_126 (defined at usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/optimizer.py:29) /cm/local/apps/slurm/var/spool/job18434303/slurm_script: line 29: t: command not found

    opened by vaslamp 2
  • VOC Training and Data Scripts missing

    VOC Training and Data Scripts missing

    Hi @zizhaozhang , thanks for providing the code to this paper.

    I'm trying to replicate the VOC results but the instructions in here were incomplete. When will you be providing the necessary scripts to train the VOC model and is it possible for us to adapt the prepare_coco_data.py easily to do so in the meantime?

    Thanks for the help.

    opened by varunnair18 2
  • Bump tensorflow-gpu from 1.14.0 to 2.9.3

    Bump tensorflow-gpu from 1.14.0 to 2.9.3

    Bumps tensorflow-gpu from 1.14.0 to 2.9.3.

    Release notes

    Sourced from tensorflow-gpu's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow-gpu's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.16.4 to 1.22.0

    Bump numpy from 1.16.4 to 1.22.0

    Bumps numpy from 1.16.4 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Skipping cancelled dequeue attempt with queue not closed

    Skipping cancelled dequeue attempt with queue not closed

    1. ERROR LOG (first epoch) [1210 18:09:10 @param.py:158] [HyperParamSetter] At global_step=0, learning_rate is set to 0.001000 [1210 18:09:11 @prof.py:294] [HostMemoryTracker] Free RAM in before_train() is 238.12 GB. [1210 18:09:11 @stac_helper.py:83] ---------------------------------------------------------------------------------------------------- [1210 18:09:11 @stac_helper.py:84] Model save path: result/VOC2007/instances_trainval [1210 18:09:11 @stac_helper.py:85] ---------------------------------------------------------------------------------------------------- [1210 18:09:11 @eval.py:313] [EvalCallback] Will evaluate every 20 epochs [1210 18:09:28 @base.py:273] Start Epoch 1 ... 0%| |0/500[00:00<?,?it/s]2021-12-10 18:09:43.544891: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2021-12-10 18:10:23.596973: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 0%| |0/500[02:46<?,?it/s] 2021-12-10 18:12:16.766932: W tensorflow/core/kernels/queue_base.cc:277] _0_QueueInput/input_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DeadlineExceededError: Timed out waiting for notification

    2. Environment Information:


    sys.platform linux Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0] Tensorpack v0.9.8-61-g4ac2e22b-dirty Numpy 1.16.4 TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5 TF Compiler Version 4.8.5 TF CUDA support True TF MKL support False TF XLA support False Nvidia Driver /usr/lib64/libnvidia-ml.so.460.73.01 CUDA /mnt/lustre/share/cuda-10.0/lib64/libcudart.so.10.0.130 CUDNN /mnt/lustre/share/cuda-10.0/lib64/libcudnn.so.7.4.1 NCCL CUDA_VISIBLE_DEVICES 1,2,3,4 GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB Free RAM 344.40/376.39 GB CPU Count 48 cv2 4.1.1 msgpack 1.0.3 python-prctl False


    opened by Liang-ZX 0
  • About the augmentation in teacher model

    About the augmentation in teacher model

    In the first stage, you train the teacher model in weak augmentation. However, the model trained in strong augmentation outperform model trained in weak augmentation in your experiment. Why do not use model trained in strong augmentation as teacher model.

    opened by chenshi3 0
  • Training on a single GPU (Losses keep fluctuating and do not converge)

    Training on a single GPU (Losses keep fluctuating and do not converge)

    Hi,

    I am training the Faster RCNN model on 10% of labelled COCO data. It seems like while training with 1 GPU, the losses don't converge and based on an earlier issue (https://github.com/google-research/ssl_detection/issues/12), I understand that with 1 GPU and a batch size of 1 due to tensorpack constaints, the batch size may be too small for the network to train and converge. If that's the case, what are the alternatives? Is the only alternative to move away from tensorpack in order to be able to use a larger batch size?

    Any inputs/suggestions are more than welcome as I am a bit stuck at the moment and do not have access to more than 1 GPU.

    Regards, Chandra

    opened by nuschandra 0
Owner
Google Research
Google Research
Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

An official implementation of paper Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

null 11 Nov 23, 2022
CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

[CVPR2022] DSL: Dense Learning based Semi-Supervised Object Detection DSL is the first work on Anchor-Free detector for Semi-Supervised Object Detecti

Bhchen 69 Dec 8, 2022
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

null 34 Dec 31, 2022
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Shilong Zhang 129 Dec 24, 2022
Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Tom-R.T.Kvalvaag 2 Dec 17, 2021
UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning This is the official PyTorch implementation for UniMoCo pape

dddzg 49 Jan 2, 2023
[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

Rex Cheng 106 Jan 3, 2023
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Semi Hand-Object Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time (CVPR 2021).

null 96 Dec 27, 2022
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

null 3 Jan 26, 2022
code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

Jason Ren 22 Nov 23, 2022
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

null 43 Nov 19, 2022
Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization (Salesforce Research) This is a PyTorch implementation of the CoMatch paper [B

Salesforce 107 Dec 14, 2022
Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

null 47 Jan 1, 2023
The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

Guolz 36 Oct 19, 2022
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

ProSelfLC: CVPR 2021 ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks For any specific discussion or potential fu

amos_xwang 57 Dec 4, 2022
Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

Semi-supervised-learning-for-medical-image-segmentation. Recently, semi-supervised image segmentation has become a hot topic in medical image computin

Healthcare Intelligence Laboratory 1.3k Jan 3, 2023