Semi-supervised learning for object detection

Google Research

Last update: Dec 25, 2022

Related tags

Deep Learning ssl_detection

Overview

Source code for STAC: A Simple Semi-Supervised Learning Framework for Object Detection

STAC is a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentation.

This code is only used for research. This is not an official Google product.

Instruction

Install dependencies

Set global enviroment variables.

export PRJROOT=/path/to/your/project/directory/STAC
export DATAROOT=/path/to/your/dataroot
export COCODIR=$DATAROOT/coco
export VOCDIR=$DATAROOT/voc
export PYTHONPATH=$PYTHONPATH:${PRJROOT}/third_party/FasterRCNN:${PRJROOT}/third_party/auto_augment:${PRJROOT}/third_party/tensorpack

Install virtual environment in the root folder of the project

cd ${PRJROOT}

sudo apt install python3-dev python3-virtualenv python3-tk imagemagick
virtualenv -p python3 --system-site-packages env3
. env3/bin/activate
pip install -r requirements.txt

# Make sure your tensorflow version is 1.14 not only in virtual environment but also in
# your machine, 1.15 can cause OOM issues.
python -c 'import tensorflow as tf; print(tf.__version__)'

# install coco apis
pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

(Optional) Install tensorpack

tensorpack with a compatible version is already included at third_party/tensorpack. bash cd ${PRJROOT}/third_party pip install --upgrade git+https://github.com/tensorpack/tensorpack.git

Download COCO/PASCAL VOC data and pre-trained models

Download data

See DATA.md

Download backbone model

cd ${COCODIR}
wget http://models.tensorpack.com/FasterRCNN/ImageNet-R50-AlignPadding.npz

Training

There are three steps:

1. Train a standard detector on labeled data (detection/scripts/coco/train_stg1.sh).
2. Predict pseudo boxes and labels of unlabeled data using the trained detector (detection/scripts/coco/eval_stg1.sh).
3. Use labeled data and unlabeled data with pseudo labels to train a STAC detector (detection/scripts/coco/train_stg2.sh).

Besides instruction at here, detection/scripts/coco/train_stac.sh provides a combined script to train STAC.

detection/scripts/voc/train_stac.sh is a combined script to train STAC on PASCAL VOC.

The following example use labeled data as 10% train2017 and rest 90% train2017 data as unlabeled data.

Step 0: Set variables

cd ${PRJROOT}/detection

# Labeled and Unlabeled datasets
DATASET=coco_train2017.1@10
UNLABELED_DATASET=${DATASET}-unlabeled

# PATH to save trained models
CKPT_PATH=result/${DATASET}

# PATH to save pseudo labels for unlabeled data
PSEUDO_PATH=${CKPT_PATH}/PSEUDO_DATA

# Train with 8 GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

Step 1: Train FasterRCNN on labeled data

. scripts/coco/train_stg1.sh.

Set TRAIN.AUGTYPE_LAB=strong to apply strong data augmentation.

# --simple_path makes train_log/${DATASET}/${EXPNAME} as exact location to save
python3 train_stg1.py \
    --logdir ${CKPT_PATH} --simple_path --config \
    BACKBONE.WEIGHTS=${COCODIR}/ImageNet-R50-AlignPadding.npz \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${DATASET}',)" \
    MODE_MASK=False \
    FRCNN.BATCH_PER_IM=64 \
    PREPROC.TRAIN_SHORT_EDGE_SIZE="[500,800]" \
    TRAIN.EVAL_PERIOD=20 \
    TRAIN.AUGTYPE_LAB='default'

Step 2: Generate pseudo labels of unlabeled data

. scripts/coco/eval_stg1.sh.

Evaluate using COCO metrics and save eval.json

# Check pseudo path
if [ ! -d ${PSEUDO_PATH} ]; then
    mkdir -p ${PSEUDO_PATH}
fi

# Evaluate the model for sanity check
# model-180000 is the last checkpoint
# save eval.json at $PSEUDO_PATH

python3 predict.py \
    --evaluate ${PSEUDO_PATH}/eval.json \
    --load "${CKPT_PATH}"/model-180000 \
    --config \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${UNLABELED_DATASET}',)"

Generate pseudo labels for unlabeled data

Set EVAL.PSEUDO_INFERENCE=True to use original images rather than resized ones for inference.

# Extract pseudo label
python3 predict.py \
    --predict_unlabeled ${PSEUDO_PATH} \
    --load "${CKPT_PATH}"/model-180000 \
    --config \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${UNLABELED_DATASET}',)" \
    EVAL.PSEUDO_INFERENCE=True

Step 3: Train STAC

. scripts/coco/train_stg2.sh.

The dataloader loads pseudo labels from ${PSEUDO_PATH}/pseudo_data.npy.

Apply default augmentation on labeled data and strong augmentation on unlabeled data.

TRAIN.CONFIDENCE and TRAIN.WU are two major parameters of the method.

python3 train_stg2.py \
    --logdir=${CKPT_PATH}/STAC --simple_path \
    --pseudo_path=${PSEUDO_PATH} \
    --config \
    BACKBONE.WEIGHTS=${COCODIR}/ImageNet-R50-AlignPadding.npz \
    DATA.BASEDIR=${COCODIR} \
    DATA.TRAIN="('${DATASET}',)" \
    DATA.UNLABEL="('${UNLABELED_DATASET}',)" \
    MODE_MASK=False \
    FRCNN.BATCH_PER_IM=64 \
    PREPROC.TRAIN_SHORT_EDGE_SIZE="[500,800]" \
    TRAIN.EVAL_PERIOD=20 \
    TRAIN.AUGTYPE_LAB='default' \
    TRAIN.AUGTYPE='strong' \
    TRAIN.CONFIDENCE=0.9 \
    TRAIN.WU=2

Tensorboard

All training logs and tensorboard info are under ${PRJROOT}/detection/train_log. Visualize using

tensorboard --logdir=${PRJROOT}/detection/train_log

Citation

@inproceedings{sohn2020detection,
  title={A Simple Semi-Supervised Learning Framework for Object Detection},
  author={Kihyuk Sohn and Zizhao Zhang and Chun-Liang Li and Han Zhang and Chen-Yu Lee and Tomas Pfister},
  year={2020},
  booktitle={arXiv:2005.04757}
}

Acknowledgement

Comments

Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM

Hello! I was trying to run the code but my process was always being killed by system which said 'out of memory'. I already set batch size ==1, so what else can I do to run the code?

opened by happyvictor008 11
Confirmation for some training details

Hi. I want to confirm some details in the second self-training stage. Are all the hyper-parameters (including the batch size, threshold for positive and negative, number of proposals in the RCNN head, etc. ) are the same for both supervised and unsupervised loss? Also, the unsupervised loss is imposed on both RPN and RCNN head? Thanks.

opened by strongwolf 9
GPU only using half of available memory

Hello,

Whether I use 1 or 2 GPUs (RTX 2080 Ti), only half of the capacity is used:

1 GPU

2 GPUs

I tried to increase the batch size (FRCNN.BATCH_PER_IM in train_stg1.sh) and also the number of workers (_C.DATA.NUM_WORKERS in third_party/FasterRCNN/FasterRCNN/config.py) but it doesn't seem to make a difference.

Since TensorFlow is supposed to use all available memory, is this being done on purpose?

Thank you for your time.

opened by aoussou 8
the total_cost and wd_cost become nan.

hi, when I test your code with train_stg1.sh and compute the teacher model. the logs show that the total_cost and wd_cost become Nan, I did not change any code. the data and the gpu is as follows: DATASET='coco_train2017.1@10' CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

opened by bobzhang123 6
Considering label scores in loss function

I can't seem to find where in the code is the score of a pseudo box is being taken into account. Specifically, where can we see the effect of zero scored boxes (those that didn't pass the confidence threshold). To the best of my inquiry, it is missing from the code, though emphasized in the paper. Thanks!

opened by ErezBeyond 4
STAC_JSON.tar is broken

STAC_JSON.tar in the project is just 61 kb, while that in https://storage.cloud.google.com/gresearch/ssl_detection/STAC_JSON.tar is 120 Mb, but it can't be downloaded.

opened by jiollos 2
about using my own data

hi I want to use my own coco-format data on your framework, however, it seems that I need to prepare annotation files (json) for unlabeled data and put them under "$COCODIR/annotations/semi_supervised". Is that true? But my own unlabeled data does not have labels. What should I do? Looking foreword to a practical answer from anyone who can help me!

opened by evangel-jiang 2
Have you ever tried to generate pseudo-labels online?

In STAC, the pseudo-labels are predicted in an offline manner, i.e., after training a network in labeled data and then using it to predict pseudo-labels. It is a multi-stage training manner. However, in FixMatch, also your work, the pseudo-labels are predicted in an online manner, i.e., in a mini-batch, the pseudo-labels are generated. It is a one-stage training manner.

opened by Chen-Song 2
Question about Table 1 in the paper

Hi,

Thanks for providing this interesting work and releasing the code. I am curious about the implementation details in Table 1 of the main paper.

(1) Are the results showed in Table 1 produced by using COCO2017 validation set (5000 instances)?

(2) In table 1, does the 100% COCO mean that you use 100% supervised COCO2017 data as the labeled set, the external COCO2017_unlabeled data as the unlabeled set, and COCO2017 validation set as the evaluation set?

Thank you!

opened by ycliu93 2
dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
Can you please interpret me the following error? Is it a problem with CUDA version? I am not that much experienced and I would like to know so that I can solve it and continue.

[33mWARNING:[0m NVIDIA binaries may not be bound with --writable [32m[0706 13:49:52 @voc.py:279][0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval'] [32m[0706 13:49:52 @coco.py:271][0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100'] [32m[0706 13:49:52 @coco.py:205][0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017'] [32m[0706 13:49:52 @coco.py:260][0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017', 'coco_unlabeledtrainval20class'] [32m[0706 13:49:52 @logger.py:138][0m Directory '/home/vlamp/Documents/STAC/RESULTS' backuped to '/home/vlamp/Documents/STAC/RESULTS0706-134952' [32m[0706 13:49:52 @logger.py:92][0m Argv: /home/vlamp/Documents/STAC/detection/train_stg1_bdd.py --logdir /home/vlamp/Documents/STAC/RESULTS/ --simple_path --config BACKBONE.WEIGHTS=/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz DATA.BASEDIR=/home/vlamp/Documents/STAC/DATA_STAC/coco MODE_MASK=False FRCNN.BATCH_PER_IM=64 PREPROC.TRAIN_SHORT_EDGE_SIZE=[500,800] TRAIN.EVAL_PERIOD=20 TRAIN.AUGTYPE_LAB=default [32m[0706 13:49:54 @train_stg1_bdd.py:87][0m Environment Information:

sys.platform linux Python 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0] Tensorpack v0.10.1-9-g9c1b1b7b-dirty Numpy 1.16.4 TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5 TF Compiler Version 4.8.5 TF CUDA support True TF MKL support False TF XLA support False Nvidia Driver /.singularity.d/libs/libnvidia-ml.so CUDA /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1.243 CUDNN /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.4 NCCL CUDA_VISIBLE_DEVICES 0,1 GPU 0,1 Tesla T4 Free RAM 369.15/376.54 GB CPU Count 40 cv2 4.2.0 msgpack 1.0.0 python-prctl False

list(_C.DATA.TRAIN) = ['train2017'] list(_C.DATA.VAL) = ('val2017',) datasets = ['train2017', 'val2017'] _C.DATA.CLASS_NAMES = ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle'] [32m[0706 13:49:54 @config.py:352][0m Config: ------------------------------------------ {'BACKBONE': {'FREEZE_AFFINE': False, 'FREEZE_AT': 2, 'NORM': 'FreezeBN', 'RESNET_NUM_BLOCKS': [3, 4, 6, 3], 'STRIDE_1X1': False, 'TF_PAD_MODE': False, 'WEIGHTS': '/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz'}, 'CASCADE': {'BBOX_REG_WEIGHTS': [[10.0, 10.0, 5.0, 5.0], [20.0, 20.0, 10.0, 10.0], [30.0, 30.0, 15.0, 15.0]], 'IOUS': [0.5, 0.6, 0.7]}, 'DATA': {'ABSOLUTE_COORD': True, 'BASEDIR': '/home/vlamp/Documents/STAC/DATA_STAC/coco', 'CLASS_NAMES': ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle'], 'NUM_CATEGORY': 5, 'NUM_WORKERS': 24, 'TRAIN': ('train2017',), 'UNLABEL': ('',), 'VAL': ('val2017',)}, 'EVAL': {'PSEUDO_INFERENCE': False}, 'FPN': {'ANCHOR_SIZES': (32, 64, 128, 256, 512), 'ANCHOR_STRIDES': (4, 8, 16, 32, 64), 'CASCADE': False, 'FRCNN_CONV_HEAD_DIM': 256, 'FRCNN_FC_HEAD_DIM': 1024, 'FRCNN_HEAD_FUNC': 'fastrcnn_2fc_head', 'MRCNN_HEAD_FUNC': 'maskrcnn_up4conv_head', 'NORM': 'None', 'NUM_CHANNEL': 256, 'PROPOSAL_MODE': 'Level', 'RESOLUTION_REQUIREMENT': 32}, 'FRCNN': {'BATCH_PER_IM': 64, 'BBOX_REG_WEIGHTS': [10.0, 10.0, 5.0, 5.0], 'FG_RATIO': 0.25, 'FG_THRESH': 0.5}, 'MODE_FPN': True, 'MODE_MASK': False, 'MRCNN': {'ACCURATE_PASTE': True, 'HEAD_DIM': 256}, 'PREPROC': {'MAX_SIZE': 1344.0, 'PIXEL_MEAN': [123.675, 116.28, 103.53], 'PIXEL_STD': [58.395, 57.12, 57.375], 'TEST_SHORT_EDGE_SIZE': 800, 'TRAIN_SHORT_EDGE_SIZE': [500, 800]}, 'RPN': {'ANCHOR_RATIOS': (0.5, 1.0, 2.0), 'ANCHOR_SIZES': (32, 64, 128, 256, 512), 'ANCHOR_STRIDE': 16, 'BATCH_PER_IM': 256, 'CROWD_OVERLAP_THRESH': 9.99, 'FG_RATIO': 0.5, 'HEAD_DIM': 1024, 'MIN_SIZE': 0, 'NEGATIVE_ANCHOR_THRESH': 0.3, 'NUM_ANCHOR': 15, 'POSITIVE_ANCHOR_THRESH': 0.7, 'PROPOSAL_NMS_THRESH': 0.7, 'TEST_PER_LEVEL_NMS_TOPK': 1000, 'TEST_POST_NMS_TOPK': 1000, 'TEST_PRE_NMS_TOPK': 6000, 'TRAIN_PER_LEVEL_NMS_TOPK': 2000, 'TRAIN_POST_NMS_TOPK': 2000, 'TRAIN_PRE_NMS_TOPK': 12000}, 'TEST': {'FRCNN_NMS_THRESH': 0.5, 'RESULTS_PER_IM': 100, 'RESULT_SCORE_THRESH': 0.05, 'RESULT_SCORE_THRESH_VIS': 0.5}, 'TRAIN': {'AUGTYPE': 'strong', 'AUGTYPE_LAB': 'default', 'BASE_LR': 0.01, 'CHECKPOINT_PERIOD': 20, 'CONFIDENCE': 0.9, 'EVAL_PERIOD': 20, 'GAMMA': 0.1, 'LR_SCHEDULE': [120000, 160000, 180000], 'NO_PRN_LOSS': False, 'NUM_GPUS': 2, 'STAGE': 1, 'STARTING_EPOCH': 1, 'STEPS_PER_EPOCH': 500, 'WARMUP': 1000, 'WARMUP_INIT_LR': 0.0033000000000000004, 'WEIGHT_DECAY': 0.0001, 'WU': 2.0}, 'TRAINER': 'replicated'} [32m[0706 13:49:54 @train_stg1_bdd.py:106][0m Warm Up Schedule (steps, value): [(0, 0.0033000000000000004), (1000, 0.01)] [32m[0706 13:49:54 @train_stg1_bdd.py:107][0m LR Schedule (epochs, value): [(2, 0.01), (960.0, 0.001), (1280.0, 0.00010000000000000002)] loading annotations into memory... Done (t=5.18s) creating index... index created! [32m[0706 13:49:59 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_train2017.json.

0%| | 0/69403 [00:00<?, ?it/s] 3%|3 | 2090/69403 [00:00<00:03, 20895.19it/s] 6%|5 | 4034/69403 [00:00<00:03, 20434.79it/s] 9%|8 | 6073/69403 [00:00<00:03, 20416.41it/s] 12%|#1 | 8201/69403 [00:00<00:02, 20666.09it/s] 15%|#4 | 10336/69403 [00:00<00:02, 20866.20it/s] 18%|#7 | 12465/69403 [00:00<00:02, 20991.31it/s] 21%|##1 | 14620/69403 [00:00<00:02, 21155.12it/s] 24%|##4 | 16775/69403 [00:00<00:02, 21271.79it/s] 27%|##7 | 18896/69403 [00:00<00:02, 21253.07it/s] 30%|### | 21042/69403 [00:01<00:02, 21313.93it/s] 33%|###3 | 23115/69403 [00:01<00:02, 21052.23it/s] 36%|###6 | 25181/69403 [00:01<00:02, 20796.20it/s] 39%|###9 | 27234/69403 [00:01<00:02, 20696.98it/s] 42%|####2 | 29285/69403 [00:01<00:01, 20509.34it/s] 45%|####5 | 31323/69403 [00:01<00:01, 20425.01it/s] 48%|####8 | 33357/69403 [00:01<00:01, 20302.50it/s] 51%|##### | 35382/69403 [00:01<00:01, 20251.87it/s] 54%|#####3 | 37403/69403 [00:01<00:01, 20201.65it/s] 57%|#####6 | 39488/69403 [00:01<00:01, 20390.27it/s] 60%|#####9 | 41550/69403 [00:02<00:01, 20456.26it/s] 63%|######2 | 43660/69403 [00:02<00:01, 20643.18it/s] 66%|######5 | 45767/69403 [00:02<00:01, 20768.95it/s] 69%|######8 | 47887/69403 [00:02<00:01, 20894.81it/s] 72%|#######2 | 50002/69403 [00:02<00:00, 20968.20it/s] 75%|#######5 | 52146/69403 [00:02<00:00, 21105.63it/s] 78%|#######8 | 54280/69403 [00:02<00:00, 21174.64it/s] 81%|########1 | 56406/69403 [00:02<00:00, 21198.35it/s] 84%|########4 | 58537/69403 [00:02<00:00, 21230.58it/s] 87%|########7 | 60701/69403 [00:02<00:00, 21351.07it/s] 91%|######### | 62872/69403 [00:03<00:00, 21456.21it/s] 94%|#########3| 65018/69403 [00:03<00:00, 21151.33it/s] 97%|#########6| 67169/69403 [00:03<00:00, 21256.36it/s] 100%|#########9| 69342/69403 [00:03<00:00, 21396.14it/s] 100%|##########| 69403/69403 [00:03<00:00, 20915.84it/s][32m[0706 13:50:03 @timer.py:45][0m Load annotations for instances_train2017.json finished, time:3.3659 sec. [32m[0706 13:50:05 @data.py:79][0m Ground-Truth category distribution: [36m| class | #box | class | #box | class | #box | |:-------:|:-------|:----------:|:-------|:-----------:|:-------| | car | 713210 | pedestrian | 91349 | big vehicle | 41643 | | bicycle | 7210 | motorcycle | 3002 | | | | total | 856414 | | | | |[0m [32m[0706 13:50:05 @data.py:416][0m Filtered 0 images which contain no non-crowd groudtruth boxes. Total #images for training: 69403 [32m[0706 13:50:05 @augmentation.py:171][0m ---------------------------------------------------------------------------------------------------- [32m[0706 13:50:05 @augmentation.py:172][0m Augmentation type default: [] [32m[0706 13:50:05 @augmentation.py:173][0m ---------------------------------------------------------------------------------------------------- [32m[0706 13:50:05 @data.py:107][0m Use affine-enabled TrainingDataPreprocessor_aug [32m[0706 13:50:05 @train_stg1_bdd.py:112][0m Total passes of the training set is: 20.748 [32m[0706 13:50:05 @sessinit.py:294][0m Loading dictionary from /home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz ... [32m[0706 13:50:06 @training.py:48][0m [DataParallel] Training a model of 2 towers. [32m[0706 13:50:06 @interface.py:41][0m Automatically applying StagingInput on the DataFlow. [32m[0706 13:50:06 @input_source.py:221][0m Setting up the queue 'QueueInput/input_queue' for CPU prefetching ... [32m[0706 13:50:06 @training.py:108][0m Building graph for training tower 0 on device /gpu:0 ... [32m[0706 13:50:06 @argtools.py:138][0m [5m[31mWRN[0m Some BatchNorm layer uses moving_mean/moving_variance in training. [32m[0706 13:50:06 @registry.py:90][0m 'conv0': [1, 3, ?, ?] --> [1, 64, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'pool0': [1, 64, ?, ?] --> [1, 64, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block0/conv1': [1, 64, ?, ?] --> [1, 64, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block0/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block0/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block0/convshortcut': [1, 64, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block1/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block1/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block1/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block2/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block2/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group0/block2/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:06 @registry.py:90][0m 'group1/block0/conv1': [1, 256, ?, ?] --> [1, 128, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block0/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block0/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block0/convshortcut': [1, 256, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block1/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block1/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block1/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block2/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block2/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block2/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block3/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block3/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group1/block3/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block0/conv1': [1, 512, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block0/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block0/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block0/convshortcut': [1, 512, ?, ?] --> [1, 1024, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block1/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block1/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block1/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block2/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block2/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block2/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block3/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block3/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block3/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block4/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block4/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block4/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block5/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block5/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group2/block5/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block0/conv1': [1, 1024, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block0/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block0/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block0/convshortcut': [1, 1024, ?, ?] --> [1, 2048, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block1/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block1/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block1/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block2/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block2/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'group3/block2/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?] [32m[0706 13:50:07 @registry.py:80][0m 'fpn' input: [1, 256, ?, ?], [1, 512, ?, ?], [1, 1024, ?, ?], [1, 2048, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'fpn/lateral_1x1_c2': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'fpn/lateral_1x1_c3': [1, 512, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'fpn/lateral_1x1_c4': [1, 1024, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'fpn/lateral_1x1_c5': [1, 2048, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'fpn/upsample_lat5': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:07 @registry.py:90][0m 'fpn/upsample_lat4': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'fpn/upsample_lat3': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'fpn/posthoc_3x3_p2': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'fpn/posthoc_3x3_p3': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'fpn/posthoc_3x3_p4': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'fpn/posthoc_3x3_p5': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'fpn/maxpool_p6': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:93][0m 'fpn' output: [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:80][0m 'rpn' input: [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'rpn/conv0': [1, 256, ?, ?] --> [1, 256, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'rpn/class': [1, 256, ?, ?] --> [1, 3, ?, ?] [32m[0706 13:50:08 @registry.py:90][0m 'rpn/box': [1, 256, ?, ?] --> [1, 12, ?, ?] [32m[0706 13:50:08 @registry.py:93][0m 'rpn' output: [?, ?, 3], [?, ?, 3, 4] [32m[0706 13:50:09 @registry.py:80][0m 'fastrcnn' input: [?, 256, 7, 7] [32m[0706 13:50:10 @registry.py:90][0m 'fastrcnn/fc6': [?, 256, 7, 7] --> [?, 1024] [32m[0706 13:50:10 @registry.py:90][0m 'fastrcnn/fc7': [?, 1024] --> [?, 1024] [32m[0706 13:50:10 @registry.py:93][0m 'fastrcnn' output: [?, 1024] [32m[0706 13:50:10 @registry.py:80][0m 'fastrcnn/outputs' input: [?, 1024] [32m[0706 13:50:10 @registry.py:90][0m 'fastrcnn/outputs/class': [?, 1024] --> [?, 6] [32m[0706 13:50:10 @registry.py:90][0m 'fastrcnn/outputs/box': [?, 1024] --> [?, 24] [32m[0706 13:50:10 @registry.py:93][0m 'fastrcnn/outputs' output: [?, 6], [?, 6, 4] [32m[0706 13:50:10 @regularize.py:97][0m regularize_cost() found 57 variables to regularize. [32m[0706 13:50:10 @regularize.py:21][0m The following tensors will be regularized: group1/block0/conv1/W:0, group1/block0/conv2/W:0, group1/block0/conv3/W:0, group1/block0/convshortcut/W:0, group1/block1/conv1/W:0, group1/block1/conv2/W:0, group1/block1/conv3/W:0, group1/block2/conv1/W:0, group1/block2/conv2/W:0, group1/block2/conv3/W:0, group1/block3/conv1/W:0, group1/block3/conv2/W:0, group1/block3/conv3/W:0, group2/block0/conv1/W:0, group2/block0/conv2/W:0, group2/block0/conv3/W:0, group2/block0/convshortcut/W:0, group2/block1/conv1/W:0, group2/block1/conv2/W:0, group2/block1/conv3/W:0, group2/block2/conv1/W:0, group2/block2/conv2/W:0, group2/block2/conv3/W:0, group2/block3/conv1/W:0, group2/block3/conv2/W:0, group2/block3/conv3/W:0, group2/block4/conv1/W:0, group2/block4/conv2/W:0, group2/block4/conv3/W:0, group2/block5/conv1/W:0, group2/block5/conv2/W:0, group2/block5/conv3/W:0, group3/block0/conv1/W:0, group3/block0/conv2/W:0, group3/block0/conv3/W:0, group3/block0/convshortcut/W:0, group3/block1/conv1/W:0, group3/block1/conv2/W:0, group3/block1/conv3/W:0, group3/block2/conv1/W:0, group3/block2/conv2/W:0, group3/block2/conv3/W:0, fpn/lateral_1x1_c2/W:0, fpn/lateral_1x1_c3/W:0, fpn/lateral_1x1_c4/W:0, fpn/lateral_1x1_c5/W:0, fpn/posthoc_3x3_p2/W:0, fpn/posthoc_3x3_p3/W:0, fpn/posthoc_3x3_p4/W:0, fpn/posthoc_3x3_p5/W:0, rpn/conv0/W:0, rpn/class/W:0, rpn/box/W:0, fastrcnn/fc6/W:0, fastrcnn/fc7/W:0, fastrcnn/outputs/class/W:0, fastrcnn/outputs/box/W:0 [32m[0706 13:50:12 @training.py:108][0m Building graph for training tower 1 on device /gpu:1 ... [32m[0706 13:50:14 @regularize.py:97][0m regularize_cost() found 57 variables to regularize. [32m[0706 13:50:16 @collection.py:152][0m Size of these collections were changed in tower1: (tf.GraphKeys.MODEL_VARIABLES: 161->194) [32m[0706 13:50:16 @collection.py:165][0m These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 76->77) [32m[0706 13:50:20 @training.py:350][0m 'sync_variables_from_main_tower' includes 607 operations. [32m[0706 13:50:20 @model_utils.py:67][0m [36mList of Trainable Variables: [0mname shape #elements

group1/block0/conv1/W [1, 1, 256, 128] 32768 group1/block0/conv1/bn/gamma [128] 128 group1/block0/conv1/bn/beta [128] 128 group1/block0/conv2/W [3, 3, 128, 128] 147456 group1/block0/conv2/bn/gamma [128] 128 group1/block0/conv2/bn/beta [128] 128 group1/block0/conv3/W [1, 1, 128, 512] 65536 group1/block0/conv3/bn/gamma [512] 512 group1/block0/conv3/bn/beta [512] 512 group1/block0/convshortcut/W [1, 1, 256, 512] 131072 group1/block0/convshortcut/bn/gamma [512] 512 group1/block0/convshortcut/bn/beta [512] 512 group1/block1/conv1/W [1, 1, 512, 128] 65536 group1/block1/conv1/bn/gamma [128] 128 group1/block1/conv1/bn/beta [128] 128 group1/block1/conv2/W [3, 3, 128, 128] 147456 group1/block1/conv2/bn/gamma [128] 128 group1/block1/conv2/bn/beta [128] 128 group1/block1/conv3/W [1, 1, 128, 512] 65536 group1/block1/conv3/bn/gamma [512] 512 group1/block1/conv3/bn/beta [512] 512 group1/block2/conv1/W [1, 1, 512, 128] 65536 group1/block2/conv1/bn/gamma [128] 128 group1/block2/conv1/bn/beta [128] 128 group1/block2/conv2/W [3, 3, 128, 128] 147456 group1/block2/conv2/bn/gamma [128] 128 group1/block2/conv2/bn/beta [128] 128 group1/block2/conv3/W [1, 1, 128, 512] 65536 group1/block2/conv3/bn/gamma [512] 512 group1/block2/conv3/bn/beta [512] 512 group1/block3/conv1/W [1, 1, 512, 128] 65536 group1/block3/conv1/bn/gamma [128] 128 group1/block3/conv1/bn/beta [128] 128 group1/block3/conv2/W [3, 3, 128, 128] 147456 group1/block3/conv2/bn/gamma [128] 128 group1/block3/conv2/bn/beta [128] 128 group1/block3/conv3/W [1, 1, 128, 512] 65536 group1/block3/conv3/bn/gamma [512] 512 group1/block3/conv3/bn/beta [512] 512 group2/block0/conv1/W [1, 1, 512, 256] 131072 group2/block0/conv1/bn/gamma [256] 256 group2/block0/conv1/bn/beta [256] 256 group2/block0/conv2/W [3, 3, 256, 256] 589824 group2/block0/conv2/bn/gamma [256] 256 group2/block0/conv2/bn/beta [256] 256 group2/block0/conv3/W [1, 1, 256, 1024] 262144 group2/block0/conv3/bn/gamma [1024] 1024 group2/block0/conv3/bn/beta [1024] 1024 group2/block0/convshortcut/W [1, 1, 512, 1024] 524288 group2/block0/convshortcut/bn/gamma [1024] 1024 group2/block0/convshortcut/bn/beta [1024] 1024 group2/block1/conv1/W [1, 1, 1024, 256] 262144 group2/block1/conv1/bn/gamma [256] 256 group2/block1/conv1/bn/beta [256] 256 group2/block1/conv2/W [3, 3, 256, 256] 589824 group2/block1/conv2/bn/gamma [256] 256 group2/block1/conv2/bn/beta [256] 256 group2/block1/conv3/W [1, 1, 256, 1024] 262144 group2/block1/conv3/bn/gamma [1024] 1024 group2/block1/conv3/bn/beta [1024] 1024 group2/block2/conv1/W [1, 1, 1024, 256] 262144 group2/block2/conv1/bn/gamma [256] 256 group2/block2/conv1/bn/beta [256] 256 group2/block2/conv2/W [3, 3, 256, 256] 589824 group2/block2/conv2/bn/gamma [256] 256 group2/block2/conv2/bn/beta [256] 256 group2/block2/conv3/W [1, 1, 256, 1024] 262144 group2/block2/conv3/bn/gamma [1024] 1024 group2/block2/conv3/bn/beta [1024] 1024 group2/block3/conv1/W [1, 1, 1024, 256] 262144 group2/block3/conv1/bn/gamma [256] 256 group2/block3/conv1/bn/beta [256] 256 group2/block3/conv2/W [3, 3, 256, 256] 589824 group2/block3/conv2/bn/gamma [256] 256 group2/block3/conv2/bn/beta [256] 256 group2/block3/conv3/W [1, 1, 256, 1024] 262144 group2/block3/conv3/bn/gamma [1024] 1024 group2/block3/conv3/bn/beta [1024] 1024 group2/block4/conv1/W [1, 1, 1024, 256] 262144 group2/block4/conv1/bn/gamma [256] 256 group2/block4/conv1/bn/beta [256] 256 group2/block4/conv2/W [3, 3, 256, 256] 589824 group2/block4/conv2/bn/gamma [256] 256 group2/block4/conv2/bn/beta [256] 256 group2/block4/conv3/W [1, 1, 256, 1024] 262144 group2/block4/conv3/bn/gamma [1024] 1024 group2/block4/conv3/bn/beta [1024] 1024 group2/block5/conv1/W [1, 1, 1024, 256] 262144 group2/block5/conv1/bn/gamma [256] 256 group2/block5/conv1/bn/beta [256] 256 group2/block5/conv2/W [3, 3, 256, 256] 589824 group2/block5/conv2/bn/gamma [256] 256 group2/block5/conv2/bn/beta [256] 256 group2/block5/conv3/W [1, 1, 256, 1024] 262144 group2/block5/conv3/bn/gamma [1024] 1024 group2/block5/conv3/bn/beta [1024] 1024 group3/block0/conv1/W [1, 1, 1024, 512] 524288 group3/block0/conv1/bn/gamma [512] 512 group3/block0/conv1/bn/beta [512] 512 group3/block0/conv2/W [3, 3, 512, 512] 2359296 group3/block0/conv2/bn/gamma [512] 512 group3/block0/conv2/bn/beta [512] 512 group3/block0/conv3/W [1, 1, 512, 2048] 1048576 group3/block0/conv3/bn/gamma [2048] 2048 group3/block0/conv3/bn/beta [2048] 2048 group3/block0/convshortcut/W [1, 1, 1024, 2048] 2097152 group3/block0/convshortcut/bn/gamma [2048] 2048 group3/block0/convshortcut/bn/beta [2048] 2048 group3/block1/conv1/W [1, 1, 2048, 512] 1048576 group3/block1/conv1/bn/gamma [512] 512 group3/block1/conv1/bn/beta [512] 512 group3/block1/conv2/W [3, 3, 512, 512] 2359296 group3/block1/conv2/bn/gamma [512] 512 group3/block1/conv2/bn/beta [512] 512 group3/block1/conv3/W [1, 1, 512, 2048] 1048576 group3/block1/conv3/bn/gamma [2048] 2048 group3/block1/conv3/bn/beta [2048] 2048 group3/block2/conv1/W [1, 1, 2048, 512] 1048576 group3/block2/conv1/bn/gamma [512] 512 group3/block2/conv1/bn/beta [512] 512 group3/block2/conv2/W [3, 3, 512, 512] 2359296 group3/block2/conv2/bn/gamma [512] 512 group3/block2/conv2/bn/beta [512] 512 group3/block2/conv3/W [1, 1, 512, 2048] 1048576 group3/block2/conv3/bn/gamma [2048] 2048 group3/block2/conv3/bn/beta [2048] 2048 fpn/lateral_1x1_c2/W [1, 1, 256, 256] 65536 fpn/lateral_1x1_c2/b [256] 256 fpn/lateral_1x1_c3/W [1, 1, 512, 256] 131072 fpn/lateral_1x1_c3/b [256] 256 fpn/lateral_1x1_c4/W [1, 1, 1024, 256] 262144 fpn/lateral_1x1_c4/b [256] 256 fpn/lateral_1x1_c5/W [1, 1, 2048, 256] 524288 fpn/lateral_1x1_c5/b [256] 256 fpn/posthoc_3x3_p2/W [3, 3, 256, 256] 589824 fpn/posthoc_3x3_p2/b [256] 256 fpn/posthoc_3x3_p3/W [3, 3, 256, 256] 589824 fpn/posthoc_3x3_p3/b [256] 256 fpn/posthoc_3x3_p4/W [3, 3, 256, 256] 589824 fpn/posthoc_3x3_p4/b [256] 256 fpn/posthoc_3x3_p5/W [3, 3, 256, 256] 589824 fpn/posthoc_3x3_p5/b [256] 256 rpn/conv0/W [3, 3, 256, 256] 589824 rpn/conv0/b [256] 256 rpn/class/W [1, 1, 256, 3] 768 rpn/class/b [3] 3 rpn/box/W [1, 1, 256, 12] 3072 rpn/box/b [12] 12 fastrcnn/fc6/W [12544, 1024] 12845056 fastrcnn/fc6/b [1024] 1024 fastrcnn/fc7/W [1024, 1024] 1048576 fastrcnn/fc7/b [1024] 1024 fastrcnn/outputs/class/W [1024, 6] 6144 fastrcnn/outputs/class/b [6] 6 fastrcnn/outputs/box/W [1024, 24] 24576 fastrcnn/outputs/box/b [24] 24[36m Number of trainable variables: 156 Number of parameters (elements): 41147437 Storage space needed for all trainable variables: 156.97MB[0m [32m[0706 13:50:20 @base.py:207][0m Setup callbacks graph ...

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " [32m[0706 13:50:27 @argtools.py:138][0m [5m[31mWRN[0m "import prctl" failed! Install python-prctl so that processes can be cleaned with guarantee. [32m[0706 13:50:29 @prof.py:291][0m [HostMemoryTracker] Free RAM in setup_graph() is 364.27 GB. [32m[0706 13:50:29 @tower.py:135][0m Building graph for predict tower 'tower-pred-0' on device /gpu:0 ... [32m[0706 13:50:30 @collection.py:152][0m Size of these collections were changed in tower-pred-0: (tf.GraphKeys.MODEL_VARIABLES: 194->227) [32m[0706 13:50:30 @collection.py:165][0m These collections were modified but restored in tower-pred-0: (tf.GraphKeys.SUMMARIES: 76->77) [32m[0706 13:50:30 @tower.py:135][0m Building graph for predict tower 'tower-pred-1' on device /gpu:1 with variable scope 'tower1'... [32m[0706 13:50:31 @collection.py:152][0m Size of these collections were changed in tower-pred-1: (tf.GraphKeys.MODEL_VARIABLES: 227->260) [32m[0706 13:50:31 @collection.py:165][0m These collections were modified but restored in tower-pred-1: (tf.GraphKeys.SUMMARIES: 76->77) loading annotations into memory... Done (t=0.75s) creating index... index created! [32m[0706 13:50:31 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s] 100%|##########| 9921/9921 [00:00<00:00, 725119.19it/s][32m[0706 13:50:31 @timer.py:45][0m Load annotations for instances_val2017.json finished, time:0.0151 sec. [32m[0706 13:50:31 @data.py:456][0m Found 9921 images for inference. loading annotations into memory... Done (t=0.83s) creating index... index created! [32m[0706 13:50:32 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s] 100%|##########| 9921/9921 [00:00<00:00, 739211.43it/s][32m[0706 13:50:32 @timer.py:45][0m Load annotations for instances_val2017.json finished, time:0.0150 sec. [32m[0706 13:50:32 @data.py:456][0m Found 9921 images for inference. loading annotations into memory... Done (t=0.82s) creating index... index created! [32m[0706 13:50:33 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s] 100%|##########| 9921/9921 [00:00<00:00, 744062.40it/s][32m[0706 13:50:33 @timer.py:45][0m Load annotations for instances_val2017.json finished, time:0.0149 sec. [32m[0706 13:50:33 @data.py:456][0m Found 9921 images for inference. loading annotations into memory... Done (t=0.77s) creating index... index created! [32m[0706 13:50:34 @coco.py:60][0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s] 100%|##########| 9921/9921 [00:00<00:00, 713481.88it/s][32m[0706 13:50:34 @timer.py:45][0m Load annotations for instances_val2017.json finished, time:0.0153 sec. [32m[0706 13:50:34 @data.py:456][0m Found 9921 images for inference. [32m[0706 13:50:34 @summary.py:47][0m [MovingAverageSummary] 73 operations in collection 'MOVING_SUMMARY_OPS' will be run with session hooks. [32m[0706 13:50:34 @summary.py:94][0m Summarizing collection 'summaries' of size 76. [32m[0706 13:50:34 @base.py:228][0m Creating the session ... 2020-07-06 13:50:34.737615: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-07-06 13:50:34.743032: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2020-07-06 13:50:34.887781: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14c78d20 executing computations on platform CUDA. Devices: 2020-07-06 13:50:34.887822: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 2020-07-06 13:50:34.887827: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): Tesla T4, Compute Capability 7.5 2020-07-06 13:50:34.890055: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2494125000 Hz 2020-07-06 13:50:34.893901: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14a0c4f0 executing computations on platform Host. Devices: 2020-07-06 13:50:34.893919: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2020-07-06 13:50:34.896069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59 pciBusID: 0000:3b:00.0Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slur 2020-07-06 13:50:34.896771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59 pciBusID: 0000:d8:00.0 2020-07-06 13:50:34.897783: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] m/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898069: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898538: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.898705: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/ 2020-07-06 13:50:34.901746: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2020-07-06 13:50:34.901764: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2020-07-06 13:50:34.901834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-07-06 13:50:34.901840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1 2020-07-06 13:50:34.901845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y 2020-07-06 13:50:34.901848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N

MultiProcessMapDataZMQ successfully cleaned-up. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1339, in _run_fn self._extend_graph() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node AllReduceGrads/NcclAllReduce}}with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"] Registered devices: [CPU, XLA_CPU, XLA_GPU] Registered kernels: device='GPU'

[[AllReduceGrads/NcclAllReduce]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/vlamp/Documents/STAC/detection/train_stg1_bdd.py", line 180, in launch_train_with_config(traincfg, trainer) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/interface.py", line 99, in launch_train_with_config extra_callbacks=config.extra_callbacks) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 342, in train_with_defaults steps_per_epoch, starting_epoch, max_epoch) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 313, in train self.initialize(session_creator, session_init) File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/tower.py", line 147, in initialize super(TowerTrainer, self).initialize(session_creator, session_init) File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 230, in initialize self.sess = session_creator.create_session() File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 88, in create_session run(tf.global_variables_initializer()) File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 86, in run sess.run(op) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node AllReduceGrads/NcclAllReduce (defined at usr/local/lib/python3.6/dist-packages/tensorpack/graph_builder/utils.py:154) with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"] Registered devices: [CPU, XLA_CPU, XLA_GPU] Registered kernels: device='GPU'

[[AllReduceGrads/NcclAllReduce]]

Errors may have originated from an input operation. Input Source operations connected to node AllReduceGrads/NcclAllReduce: tower0/gradients/AddN_126 (defined at usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/optimizer.py:29) /cm/local/apps/slurm/var/spool/job18434303/slurm_script: line 29: t: command not found
opened by vaslamp 2
VOC Training and Data Scripts missing

Hi @zizhaozhang , thanks for providing the code to this paper.

I'm trying to replicate the VOC results but the instructions in here were incomplete. When will you be providing the necessary scripts to train the VOC model and is it possible for us to adapt the prepare_coco_data.py easily to do so in the meantime?

Thanks for the help.

opened by varunnair18 2
Bump tensorflow-gpu from 1.14.0 to 2.9.3
Bumps tensorflow-gpu from 1.14.0 to 2.9.3.

Release notes

Sourced from tensorflow-gpu's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow-gpu's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump numpy from 1.16.4 to 1.22.0
Bumps numpy from 1.16.4 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Skipping cancelled dequeue attempt with queue not closed
ERROR LOG (first epoch) [1210 18:09:10 @param.py:158] [HyperParamSetter] At global_step=0, learning_rate is set to 0.001000 [1210 18:09:11 @prof.py:294] [HostMemoryTracker] Free RAM in before_train() is 238.12 GB. [1210 18:09:11 @stac_helper.py:83] ---------------------------------------------------------------------------------------------------- [1210 18:09:11 @stac_helper.py:84] Model save path: result/VOC2007/instances_trainval [1210 18:09:11 @stac_helper.py:85] ---------------------------------------------------------------------------------------------------- [1210 18:09:11 @eval.py:313] [EvalCallback] Will evaluate every 20 epochs [1210 18:09:28 @base.py:273] Start Epoch 1 ... 0%| |0/500[00:00<?,?it/s]2021-12-10 18:09:43.544891: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2021-12-10 18:10:23.596973: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 0%| |0/500[02:46<?,?it/s] 2021-12-10 18:12:16.766932: W tensorflow/core/kernels/queue_base.cc:277] _0_QueueInput/input_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DeadlineExceededError: Timed out waiting for notification

Environment Information:

sys.platform linux Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0] Tensorpack v0.9.8-61-g4ac2e22b-dirty Numpy 1.16.4 TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5 TF Compiler Version 4.8.5 TF CUDA support True TF MKL support False TF XLA support False Nvidia Driver /usr/lib64/libnvidia-ml.so.460.73.01 CUDA /mnt/lustre/share/cuda-10.0/lib64/libcudart.so.10.0.130 CUDNN /mnt/lustre/share/cuda-10.0/lib64/libcudnn.so.7.4.1 NCCL CUDA_VISIBLE_DEVICES 1,2,3,4 GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB Free RAM 344.40/376.39 GB CPU Count 48 cv2 4.1.1 msgpack 1.0.3 python-prctl False
opened by Liang-ZX 0
About the augmentation in teacher model

In the first stage, you train the teacher model in weak augmentation. However, the model trained in strong augmentation outperform model trained in weak augmentation in your experiment. Why do not use model trained in strong augmentation as teacher model.

opened by chenshi3 0
Training on a single GPU (Losses keep fluctuating and do not converge)

Hi,

I am training the Faster RCNN model on 10% of labelled COCO data. It seems like while training with 1 GPU, the losses don't converge and based on an earlier issue (https://github.com/google-research/ssl_detection/issues/12), I understand that with 1 GPU and a batch size of 1 due to tensorpack constaints, the batch size may be too small for the network to train and converge. If that's the case, what are the alternatives? Is the only alternative to move away from tensorpack in order to be able to use a larger batch size?

Any inputs/suggestions are more than welcome as I am a bit stuck at the moment and do not have access to more than 1 GPU.

Regards, Chandra

opened by nuschandra 0

Semi-supervised learning for object detection

Related tags

Overview

Source code for STAC: A Simple Semi-Supervised Learning Framework for Object Detection

Instruction

Install dependencies

Set global enviroment variables.

Install virtual environment in the root folder of the project

(Optional) Install tensorpack

Download COCO/PASCAL VOC data and pre-trained models

Download data

Download backbone model

Training

Step 0: Set variables

Step 1: Train FasterRCNN on labeled data

Step 2: Generate pseudo labels of unlabeled data

Evaluate using COCO metrics and save eval.json

Generate pseudo labels for unlabeled data

Step 3: Train STAC

Tensorboard

Citation

Acknowledgement

Comments

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Owner

Google Research

Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Yolo object detection - Yolo object detection with python

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

Semi-supervised Learning for Sentiment Analysis

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.