A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack

Last update: Jan 9, 2023

Related tags

Overview

Tensorpack is a neural network training interface based on TensorFlow.

Features:

It's Yet Another TF high-level API, with speed, and flexibility built together.

Focus on training speed.
- Speed comes for free with Tensorpack -- it uses TensorFlow in the efficient way with no extra overhead. On common CNNs, it runs training 1.2~5x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack.
- Data-parallel multi-GPU/distributed training strategy is off-the-shelf to use. It scales as well as Google's official benchmark.
- See tensorpack/benchmarks for some benchmark scripts.
Focus on large datasets.
- You don't usually need tf.data. Symbolic programming often makes data processing harder. Tensorpack helps you efficiently process large datasets (e.g. ImageNet) in pure Python with autoparallelization.
It's not a model wrapper.
- There are too many symbolic function wrappers in the world. Tensorpack includes only a few common models. But you can use any symbolic function library inside Tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/....

See tutorials and documentations to know more about these features.

Examples:

We refuse toy examples. Instead of showing tiny CNNs trained on MNIST/Cifar10, we provide training scripts that reproduce well-known papers.

We refuse low-quality implementations. Unlike most open source repos which only implement papers, Tensorpack examples faithfully reproduce papers, demonstrating its flexibility for actual research.

Vision:

Train ResNet and other models on ImageNet
Train Mask/Faster R-CNN on COCO object detection
Unsupervised learning with Momentum Contrast (MoCo)
Generative Adversarial Network(GAN) variants, including DCGAN, InfoGAN, Conditional GAN, WGAN, BEGAN, DiscoGAN, Image to Image, CycleGAN
DoReFa-Net: train binary / low-bitwidth CNN on ImageNet
Fully-convolutional Network for Holistically-Nested Edge Detection(HED)
Spatial Transformer Networks on MNIST addition
Visualize CNN saliency maps
Similarity learning on MNIST

Reinforcement Learning:

Deep Q-Network(DQN) variants on Atari games, including DQN, DoubleDQN, DuelingDQN.
Asynchronous Advantage Actor-Critic(A3C) with demos on OpenAI Gym

Speech / NLP:

Install:

Dependencies:

Python 3.3+.
Python bindings for OpenCV. (Optional, but required by a lot of features)
TensorFlow ≥ 1.5, < 2
- TF is not not required if you only want to use tensorpack.dataflow alone as a data processing library
- TF2 is supported if used in graph mode (and use tf.compat.v1 when needed)

pip install --upgrade git+https://github.com/tensorpack/tensorpack.git
# or add `--user` to install to user's local directories

Please note that tensorpack is not yet stable. If you use tensorpack in your code, remember to mark the exact version of tensorpack you use as your dependencies.

Citing Tensorpack:

If you use Tensorpack in your research or wish to refer to the examples, please cite with:

@misc{wu2016tensorpack,
  title={Tensorpack},
  author={Wu, Yuxin and others},
  howpublished={\url{https://github.com/tensorpack/}},
  year={2016}
}

Comments

Run Inference after training
Hello! I am sorry if it is unrelated to Tensorpack. I runned the ResNet on Cifar10 dataset with Trained Ternary Quantization. Now i dont know how to run Inference on the saved checkpoint after training. I have already read "Don’t Use Training Metagraph for Inference" in Tensorpack documentation. However, i still dont know how to use this one as below exactly:

a, b = tf.placeholder(...), tf.placeholder(...) with TowerContext('', is_training=False): model.build_graph(a, b)

Could you guide me to do that? Thanks you in advance!
usage
opened by minhson 58

error running alexnet_dorefa.py

environment: tensorflow1.13.0(in docker) cuda8.0 cudnn6 anaconda2

error running alexnet_dorefa.py. it is weird that in the /root/tensorpack_data, there is a caffe_ilsvrc12.tar.gz file but it is only 4kb in size, which should be in 17MB in size. These are a little confusing to me. Any help is appreciated! @ppwwyyxx the error looks like this:

root@997991b14e71:/data/home/users/ccc/projects/tensorpack/examples/DoReFa-Net# ./alexnet-dorefa.py --dorefa 1,2,6 --data /data/data/ImageNetOrigin --gpu 4,5,6,7
/root/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
[0703 06:54:57 @logger.py:109] WRN Log directory train_log/alexnet-dorefa-1,2,6 exists! Use 'd' to delete it. 
[0703 06:54:57 @logger.py:112] WRN If you're resuming from a previous run, you can choose to keep it.
Press any other key to exit. 
Select Action: k (keep) / d (delete) / q (quit):d
[0703 06:54:58 @logger.py:74] Argv: ./alexnet-dorefa.py --dorefa 1,2,6 --data /data/data/ImageNetOrigin --gpu 4,5,6,7
[0703 06:54:58 @alexnet-dorefa.py:222] Batch per tower: 64
[0703 06:54:58 @fs.py:88] WRN Env var $TENSORPACK_DATASET not set, using /root/tensorpack_data for datasets.
caffe_ilsvrc12.tar.gz: 8.19kB [00:00, 26.0kB/s]
Succesfully downloaded caffe_ilsvrc12.tar.gz. 2942 bytes.
Traceback (most recent call last):
  File "./alexnet-dorefa.py", line 224, in <module>
    config = get_config()
  File "./alexnet-dorefa.py", line 147, in get_config
    data_train = get_data('train')
  File "./alexnet-dorefa.py", line 143, in get_data
    args.data, dataset_name, BATCH_SIZE, augmentors)
  File "/data/home/users/ccc/projects/tensorpack/examples/DoReFa-Net/imagenet_utils.py", line 101, in get_imagenet_dataflow
    ds = dataset.ILSVRC12(datadir, name, shuffle=True)
  File "/root/anaconda2/lib/python2.7/site-packages/tensorpack/dataflow/dataset/ilsvrc.py", line 247, in __init__
    dir, name, meta_dir, shuffle, dir_structure)
  File "/root/anaconda2/lib/python2.7/site-packages/tensorpack/dataflow/dataset/ilsvrc.py", line 158, in __init__
    meta = ILSVRCMeta(meta_dir)
  File "/root/anaconda2/lib/python2.7/site-packages/tensorpack/dataflow/dataset/ilsvrc.py", line 32, in __init__
    self._download_caffe_meta()
  File "/root/anaconda2/lib/python2.7/site-packages/tensorpack/dataflow/dataset/ilsvrc.py", line 57, in _download_caffe_meta
    tarfile.open(fpath, 'r:gz').extractall(self.dir)
  File "/root/anaconda2/lib/python2.7/tarfile.py", line 1693, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/root/anaconda2/lib/python2.7/tarfile.py", line 1751, in gzopen
    raise ReadError("not a gzip file")
tarfile.ReadError: not a gzip file

examples

opened by brisker 55

Quantizing Gradients - Meaning of max0() operator in DoReFa v2 paper?
Thank you for your help so far.

(1) In section 2.5 on quantizing gradients you use an operator called max₀ but do not define it. I did not find a definition in the XNOR or BNN papers either. What does this operator do? How is it different from the regular max() operator?

(2) Second, you say that dr / 2max₀(|dr|) + 1/2 is an affine transform to map the gradient into [0,1], but it seems like in your code you apply an additional step to manually clip the values. Why do you need this additional step?

Code: https://github.com/ppwwyyxx/tensorpack/blob/master/examples/DoReFa-Net/dorefa.py

def grad_fg(op, x): rank = x.get_shape().ndims assert rank is not None maxx = tf.reduce_max(tf.abs(x), list(range(1,rank)), keep_dims=True) x = x / maxx n = float(2**bitG-1) x = x * 0.5 + 0.5 + tf.random_uniform( tf.shape(x), minval=-0.5/n, maxval=0.5/n) x = tf.clip_by_value(x, 0.0, 1.0) # this is the extra step not in the paper x = quantize(x, bitG) - 0.5 return x * maxx * 2

(3) I am also having trouble understanding this line, could you please explain? - maxx = tf.reduce_max(tf.abs(x), list(range(1,rank)), keep_dims=True).

It seems like list(range(1,rank)) is somehow related to your statement that "Here dr = ∂c/∂r is the back-propagated gradient of the output r of some layer, and the maximum is taken over all axis of the gradient tensor dr except for the mini-batch axis (therefore each instance in a mini-batch will have its own scaling factor)", but I do not understand this sentence either. Thank you for your help!
examples
opened by the-bobo 35
train on an Atari game: Breakout-v0 (Utilization of gpu and convergence)

Hello Yuxin,

I am doing training on Atari Game and I noticed that utilization of gpu ( nvidia smi -l ) is very low ( ~ 10-50%). Could you comment that, please?

nvidia-smi-l.txt

Could you also tell wherever my training is going all right, please? It runs for quite a lot of time and I would like to make sure that there is a progress.

Part of the output: ................ [0120 23:32:23 @timer.py:46] Epoch 273 (global_step 1638000) finished, time:2611.25sec. [0120 23:32:24 @stats.py:101] SummaryGradient/conv0/W/rms: 0.0015963 [0120 23:32:24 @stats.py:101] SummaryGradient/conv0/b/rms: 0.034784 [0120 23:32:24 @stats.py:101] SummaryGradient/conv1/W/rms: 0.00075034 [0120 23:32:24 @stats.py:101] SummaryGradient/conv1/b/rms: 0.014863 [0120 23:32:24 @stats.py:101] SummaryGradient/conv2/W/rms: 0.00071202 [0120 23:32:24 @stats.py:101] SummaryGradient/conv2/b/rms: 0.0056869 [0120 23:32:24 @stats.py:101] SummaryGradient/conv3/W/rms: 0.00084989 [0120 23:32:24 @stats.py:101] SummaryGradient/conv3/b/rms: 0.0093001 [0120 23:32:24 @stats.py:101] SummaryGradient/fc-pi/W/rms: 0.0036259 [0120 23:32:24 @stats.py:101] SummaryGradient/fc-pi/b/rms: 0.0050046 [0120 23:32:24 @stats.py:101] SummaryGradient/fc-v/W/rms: 0.023725 [0120 23:32:24 @stats.py:101] SummaryGradient/fc-v/b/rms: 0.030802 [0120 23:32:24 @stats.py:101] SummaryGradient/fc0/W/rms: 0.00015396 [0120 23:32:24 @stats.py:101] SummaryGradient/fc0/b/rms: 0.0010555 [0120 23:32:24 @stats.py:101] SummaryGradient/prelu/alpha/rms: 0.083734 [0120 23:32:24 @stats.py:101] async_global_step: 1.638e+06 [0120 23:32:24 @stats.py:101] cost: 0.010786 [0120 23:32:24 @stats.py:101] input_queue_size: 2.3367e-37 [0120 23:32:24 @stats.py:101] learning_rate: 0.0001 [0120 23:32:24 @stats.py:101] policy_loss: -0.57677 [0120 23:32:24 @stats.py:101] predict_reward: 2.8047 [0120 23:32:24 @stats.py:101] rms_advantage: 0.20093 [0120 23:32:24 @stats.py:101] value_loss: 2.9039 [0120 23:32:24 @stats.py:101] xentropy_loss: -189.29 [0120 23:32:25 @timer.py:42] Start Epoch 274 (global_step 1644000) ... 100%|#####################################################################|6000/6000[43:57<00:00, 2.22it/s] [0121 00:16:22 @timer.py:46] Epoch 274 (global_step 1644000) finished, time:2637.45sec. [2017-01-21 00:16:24,998] Making new env: Breakout-v0 [2017-01-21 00:16:25,189] Making new env: Breakout-v0 100%|#########################################################################|16/16[06:02<00:00, 0.05it/s] [0121 00:22:28 @common.py:76] Waiting for all the workers to finish the last run... [0121 00:22:28 @stats.py:101] SummaryGradient/conv0/W/rms: 0.0017033 [0121 00:22:28 @stats.py:101] SummaryGradient/conv0/b/rms: 0.030689 [0121 00:22:28 @stats.py:101] SummaryGradient/conv1/W/rms: 0.00074152 [0121 00:22:28 @stats.py:101] SummaryGradient/conv1/b/rms: 0.01373 [0121 00:22:28 @stats.py:101] SummaryGradient/conv2/W/rms: 0.00068949 [0121 00:22:28 @stats.py:101] SummaryGradient/conv2/b/rms: 0.005354 [0121 00:22:28 @stats.py:101] SummaryGradient/conv3/W/rms: 0.00080288 [0121 00:22:28 @stats.py:101] SummaryGradient/conv3/b/rms: 0.0079926 [0121 00:22:28 @stats.py:101] SummaryGradient/fc-pi/W/rms: 0.0033409 [0121 00:22:28 @stats.py:101] SummaryGradient/fc-pi/b/rms: 0.0056811 [0121 00:22:28 @stats.py:101] SummaryGradient/fc-v/W/rms: 0.01776 [0121 00:22:28 @stats.py:101] SummaryGradient/fc-v/b/rms: 0.026071 [0121 00:22:28 @stats.py:101] SummaryGradient/fc0/W/rms: 0.00015412 [0121 00:22:28 @stats.py:101] SummaryGradient/fc0/b/rms: 0.001081 [0121 00:22:28 @stats.py:101] SummaryGradient/prelu/alpha/rms: 0.088892 [0121 00:22:28 @stats.py:101] async_global_step: 1.644e+06 [0121 00:22:28 @stats.py:101] cost: 0.0021201 [0121 00:22:28 @stats.py:101] input_queue_size: 0.00082628 [0121 00:22:28 @stats.py:101] learning_rate: 0.0001 [0121 00:22:28 @stats.py:101] max_score: 864 [0121 00:22:28 @stats.py:101] mean_score: 543.19 [0121 00:22:28 @stats.py:101] policy_loss: -1.5347 [0121 00:22:28 @stats.py:101] predict_reward: 2.6608 [0121 00:22:28 @stats.py:101] rms_advantage: 0.19512 [0121 00:22:28 @stats.py:101] value_loss: 2.762 [0121 00:22:28 @stats.py:101] xentropy_loss: -191.18 [0121 00:22:28 @group.py:42] Callbacks took 364.255 sec in total. Periodic-Evaluator: 363.350sec [0121 00:22:28 @timer.py:42] Start Epoch 275 (global_step 1650000) ... ......................
examples

opened by ghost 33
Train Faster RCNN
I get an error to train faster rcnn based on your example; however, with your model, I am able to evaluate its performance and get the same results you posted on github.

Always include the following:

What you did. (command you run if using examples; post or describe your code if not)

./examples/FasterRCNN/train.py --load snapshots/tensorpack/COCO-ResNet50-FasterRCNN.npz --gpu 2,3 --datadir /path/to/COCO14 --logdir snapshots/fasterRCNN-ResNet50

What you observed. (training logs)

[1116 16:23:10 @graph.py:70] Running Op sync_variables_from_main_tower ... 2017-11-16 16:23:10.457645: E tensorflow/stream_executor/cuda/cuda_driver.cc:1299] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED [1116 16:23:14 @param.py:144] After epoch 0, learning_rate will change to 0.00300000 [1116 16:23:15 @base.py:209] Start Epoch 1 ...

and then the program is idle there forever, does it related to the line about CUDA_ERROR_NOT_INITIALIZED

Your environment (TF version, GPUs), if it matters. TF version 1.4.0, Python-3.6, CUDA 9, CUDNN-7. Tensorpack version: the newest commit.

Others:

if I commented out the ds = PrefetchDataZMQ(ds, 1) in get_train_dataflow function. of data.py file, the training is running. Or if I replace ds = PrefetchDataZMQ(ds, 1) by ds = PrefetchData(ds, 500, 1), it will work as well.

Thanks.
opened by chunfuchen 32

Build ZMQ-operator

I tried to compile your custom-operator on my machine and get

Compiling user ops ...
make: Entering directory '/home/patwie/git/tensorpack/tensorpack/user_ops'
[dep] zmq_recv_op.cc ...
In file included from zmq_conn.h:8:0,
                 from zmq_recv_op.cc:10:
zmq.hpp:84:36: error: missing binary operator before token "("
 #if ZMQ_VERSION >= ZMQ_MAKE_VERSION(3, 3, 0)

Can you shortly comment, which zmq version do you use. I had to change

//#include <zmq.hpp> into
#include "zmq.hpp"

and use https://github.com/zeromq/cppzmq

But still getting the error.

enhancement

opened by PatWie 32

Bug Reports: How to deal with ValueError: Cannot feed value of shape (224, 224, 3) for Tensor 'input:0', which has shape '(?, 224, 224, 3)'

It seems the first run would be OK after reboot the server. For the following attempt, it will give me this error message.

The log is as below:

[32m[1026 20:26:32 @logger.py:74][0m Argv: main.py [32m[1026 20:26:32 @tensor_net.py:46][0m Running on 2 towers. Batch size per tower: 64 [32m[1026 20:26:32 @fs.py:89][0m [5m[31mWRN[0m Env var $TENSORPACK_DATASET not set, using /home/hgao/tensorpack_data for datasets. [32m[1026 20:26:34 @prefetch.py:263][0m [PrefetchDataZMQ] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d. [32m[1026 20:26:34 @ilsvrc.py:118][0m Assuming directory /tempspace2/hgao/data/imagenet/val has original structure. [32m[1026 20:26:34 @param.py:189][0m Use ./logdir/hyper.txt to set hyperparam: 'learning_rate'. [32m[1026 20:26:34 @inference_runner.py:83][0m InferenceRunner will eval on an InputSource of size 782 [32m[1026 20:27:04 @input_source.py:178][0m Setting up the queue 'QueueInput/input_queue' for CPU prefetching ... [32m[1026 20:27:04 @input_source.py:459][0m Setting up StagingArea for GPU prefetching ... [32m[1026 20:27:04 @training.py:41][0m Training a model of 2 towers [32m[1026 20:27:04 @training.py:92][0m Building graph for training tower 0 on device LeastLoadedDeviceSetter-/gpu:0... [32m[1026 20:27:06 @regularize.py:108][0m Add REGULARIZATION_LOSSES of 58 tensors on the total cost. [32m[1026 20:27:07 @training.py:92][0m Building graph for training tower 1 on device LeastLoadedDeviceSetter-/gpu:1... [32m[1026 20:27:08 @regularize.py:108][0m Add REGULARIZATION_LOSSES of 58 tensors on the total cost. [32m[1026 20:27:10 @model_utils.py:47][0m [36mModel Parameters: [0mname shape dim device

conv_s/weights:0 [3, 3, 3, 32] 864 /device:GPU:0 conv_s/batch_norm/gamma:0 [32] 32 /device:GPU:1 conv_s/batch_norm/beta:0 [32] 32 /device:GPU:1 conv_1_0/conv1/conv/weights:0 [3, 3, 32, 1] 288 /device:GPU:1 conv_1_0/conv1/batch_norm/gamma:0 [32] 32 /device:GPU:1 conv_1_0/conv1/batch_norm/beta:0 [32] 32 /device:GPU:1 conv_1_0/conv2/weights:0 [1, 1, 32, 64] 2048 /device:GPU:1 conv_1_0/conv2/batch_norm/gamma:0 [64] 64 /device:GPU:0 conv_1_0/conv2/batch_norm/beta:0 [64] 64 /device:GPU:0 conv_1_1/conv1/conv/weights:0 [3, 3, 64, 1] 576 /device:GPU:0 conv_1_1/conv1/batch_norm/gamma:0 [64] 64 /device:GPU:0 conv_1_1/conv1/batch_norm/beta:0 [64] 64 /device:GPU:0 conv_1_1/conv2/weights:0 [1, 1, 64, 128] 8192 /device:GPU:0 conv_1_1/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_1_1/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_1_2/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_1_2/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_1_2/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_1_2/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_1_2/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_1_2/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_1_3/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_1_3/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_1_3/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_1_3/conv2/weights:0 [1, 1, 128, 256] 32768 /device:GPU:0 conv_1_3/conv2/batch_norm/gamma:0 [256] 256 /device:GPU:1 conv_1_3/conv2/batch_norm/beta:0 [256] 256 /device:GPU:1 conv_1_4/conv1/conv/weights:0 [3, 3, 256, 1] 2304 /device:GPU:1 conv_1_4/conv1/batch_norm/gamma:0 [256] 256 /device:GPU:1 conv_1_4/conv1/batch_norm/beta:0 [256] 256 /device:GPU:1 conv_1_4/conv2/weights:0 [1, 1, 256, 256] 65536 /device:GPU:1 conv_1_4/conv2/batch_norm/gamma:0 [256] 256 /device:GPU:0 conv_1_4/conv2/batch_norm/beta:0 [256] 256 /device:GPU:0 conv_1_5/conv1/conv/weights:0 [3, 3, 256, 1] 2304 /device:GPU:0 conv_1_5/conv1/batch_norm/gamma:0 [256] 256 /device:GPU:0 conv_1_5/conv1/batch_norm/beta:0 [256] 256 /device:GPU:0 conv_1_5/conv2/weights:0 [1, 1, 256, 512] 131072 /device:GPU:0 conv_1_5/conv2/batch_norm/gamma:0 [512] 512 /device:GPU:1 conv_1_5/conv2/batch_norm/beta:0 [512] 512 /device:GPU:1 conv_2/group_0_conv0/conv/weights:0 [1, 1, 4, 1, 1] 4 /device:GPU:1 conv_2/group_0/conv_0/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_0/conv_0/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_0/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_0/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_0/conv_0/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_0/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_1/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_0/conv_1/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_1/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_1/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_0/conv_1/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_1/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_2/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_0/conv_2/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_2/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_2/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_0/conv_2/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_2/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_3/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_0/conv_3/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_3/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_3/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_0/conv_3/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_3/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_4/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_0/conv_4/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_4/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_0/conv_4/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_0/conv_4/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_0/conv_4/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_1_conv0/conv/weights:0 [1, 1, 4, 1, 1] 4 /device:GPU:0 conv_2/group_1/conv_0/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_2/group_1/conv_0/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_0/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_0/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:0 conv_2/group_1/conv_0/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_0/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_1/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_1/conv_1/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_1/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_1/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_1/conv_1/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_1/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_2/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_2/group_1/conv_2/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_2/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_2/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:0 conv_2/group_1/conv_2/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_2/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_3/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_1/conv_3/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_3/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_3/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_1/conv_3/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_3/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_4/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_2/group_1/conv_4/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_4/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_1/conv_4/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:0 conv_2/group_1/conv_4/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_1/conv_4/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_2_conv0/conv/weights:0 [1, 1, 4, 1, 1] 4 /device:GPU:1 conv_2/group_2/conv_0/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_2/conv_0/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_0/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_0/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_2/conv_0/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_0/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_1/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_2/group_2/conv_1/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_1/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_1/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:0 conv_2/group_2/conv_1/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_1/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_2/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_2/conv_2/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_2/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_2/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_2/conv_2/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_2/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_3/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_2/group_2/conv_3/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_3/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_3/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:0 conv_2/group_2/conv_3/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_3/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_4/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_2/conv_4/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_4/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_2/conv_4/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_2/conv_4/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_2/conv_4/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_3_conv0/conv/weights:0 [1, 1, 4, 1, 1] 4 /device:GPU:0 conv_2/group_3/conv_0/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_2/group_3/conv_0/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_0/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_0/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:0 conv_2/group_3/conv_0/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_0/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_1/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_3/conv_1/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_1/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_1/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_3/conv_1/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_1/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_2/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_2/group_3/conv_2/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_2/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_2/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:0 conv_2/group_3/conv_2/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_2/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_3/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:1 conv_2/group_3/conv_3/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_3/conv1/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_3/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:1 conv_2/group_3/conv_3/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_3/conv2/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_4/conv1/conv/weights:0 [3, 3, 128, 1] 1152 /device:GPU:0 conv_2/group_3/conv_4/conv1/batch_norm/gamma:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_4/conv1/batch_norm/beta:0 [128] 128 /device:GPU:0 conv_2/group_3/conv_4/conv2/weights:0 [1, 1, 128, 128] 16384 /device:GPU:0 conv_2/group_3/conv_4/conv2/batch_norm/gamma:0 [128] 128 /device:GPU:1 conv_2/group_3/conv_4/conv2/batch_norm/beta:0 [128] 128 /device:GPU:1 conv_3_0/conv1/conv/weights:0 [3, 3, 512, 1] 4608 /device:GPU:1 conv_3_0/conv1/batch_norm/gamma:0 [512] 512 /device:GPU:1 conv_3_0/conv1/batch_norm/beta:0 [512] 512 /device:GPU:1 conv_3_0/conv2/weights:0 [1, 1, 512, 1024] 524288 /device:GPU:1 conv_3_0/conv2/batch_norm/gamma:0 [1024] 1024 /device:GPU:0 conv_3_0/conv2/batch_norm/beta:0 [1024] 1024 /device:GPU:0 conv_3_1/conv1/conv/weights:0 [3, 3, 1024, 1] 9216 /device:GPU:0 conv_3_1/conv1/batch_norm/gamma:0 [1024] 1024 /device:GPU:0 conv_3_1/conv1/batch_norm/beta:0 [1024] 1024 /device:GPU:0 conv_3_1/conv2/weights:0 [1, 1, 1024, 1024] 1048576 /device:GPU:0 conv_3_1/conv2/batch_norm/gamma:0 [1024] 1024 /device:GPU:1 conv_3_1/conv2/batch_norm/beta:0 [1024] 1024 /device:GPU:1 out/pool/batch_norm/gamma:0 [1024] 1024 /device:GPU:1 out/pool/batch_norm/beta:0 [1024] 1024 /device:GPU:1 out/dense/weights:0 [1024, 1000] 1024000 /device:GPU:1 out/dense/biases:0 [1000] 1000 /device:GPU:0[36m Total #vars=179, #param=3251000 (12.40 MB assuming all float32)[0m [32m[1026 20:27:10 @base.py:207][0m Setup callbacks graph ... [32m[1026 20:27:11 @input_source.py:178][0m Setting up the queue 'DataParallelInferenceRunner/QueueInput/input_queue' for CPU prefetching ... [32m[1026 20:27:11 @predictor_factory.py:54][0m Building predictor tower 'InferenceTower0' on device /gpu:0 ... [32m[1026 20:27:12 @predictor_factory.py:54][0m Building predictor tower 'InferenceTower1' on device /gpu:1 ... [32m[1026 20:27:13 @summary.py:34][0m Maintain moving average summary of 4 tensors. [32m[1026 20:27:13 @graph.py:91][0m Applying collection UPDATE_OPS of 232 ops. [32m[1026 20:27:16 @base.py:212][0m Creating the session ... [32m[1026 20:27:19 @base.py:216][0m Initializing the session ... [32m[1026 20:27:19 @base.py:223][0m Graph Finalized. [32m[1026 20:27:21 @concurrency.py:36][0m Starting EnqueueThread DataParallelInferenceRunner/QueueInput/input_queue ... [32m[1026 20:27:21 @concurrency.py:36][0m Starting EnqueueThread QueueInput/input_queue ... [32m[1026 20:27:21 @input_source.py:418][0m Pre-filling staging area ... [32m[1026 20:27:21 @input_source.py:140][0m [4m[5m[31mERR[0m Exception in EnqueueThread DataParallelInferenceRunner/QueueInput/input_queue: Traceback (most recent call last): File "/tempspace/hgao/py3.6/lib/python3.6/site-packages/tensorpack/input_source/input_source.py", line 133, in run self.op.run(feed_dict=feed) File "/tempspace/hgao/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2084, in run _run_using_default_session(self, feed_dict, self.graph, session) File "/tempspace/hgao/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4542, in _run_using_default_session session.run(operation, feed_dict) File "/tempspace/hgao/py3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/tempspace/hgao/py3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1096, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (224, 224, 3) for Tensor 'input:0', which has shape '(?, 224, 224, 3)' [32m[1026 20:27:22 @input_source.py:146][0m EnqueueThread DataParallelInferenceRunner/QueueInput/input_queue Exited.

opened by HongyangGao 31
MultiProcessRunner RuntimeError
If you're asking about an unexpected problem which you do not know the root cause, use this template. PLEASE DO NOT DELETE THIS TEMPLATE, FILL IT:

If you already know the root cause to your problem, feel free to delete everything in this template.

1. What you did:

(1) If you're using examples, what's the command you run:

(2) If you're using examples, have you made any changes to the examples? Paste git status; git diff here:

(3) If not using examples, tell us what you did:

It's always better to copy-paste what you did than to describe them.

Please try to provide enough information to let other reproduce your issues. Without reproducing the issue, we may not be able to investigate it.

I tried to follow the "Efficient Dataflow" tutorial, continuing from https://github.com/tensorpack/tensorpack/issues/1209.

2. What you observed:

(1) Include the ENTIRE logs here:

It's always better to copy-paste what you observed instead of describing them.

It's always better to paste as much as possible, although sometimes a partial log is OK.

Tensorpack typically saves stdout to its training log. If stderr is relevant, you can run a command with my_command 2>&1 | tee logs.txt to save both stdout and stderr to one file.

[0528 10:55:08 @parallel.py:195] WRN MultiProcessRunner does support Windows. However, Windows requires more strict picklability on processes, which may lead of failure on some of the code. Traceback (most recent call last): File "", line 1, in File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\spawn.py", line 106, in spawn_main exitcode = _main(fd) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\spawn.py", line 115, in _main prepare(preparation_data) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\spawn.py", line 226, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path run_name="mp_main") File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\runpy.py", line 254, in run_path pkg_name=pkg_name, script_name=fname) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\AI_Workspace\z_debug\load_lmdb.py", line 79, in load_lmdb3() File "C:\AI_Workspace\z_debug\load_lmdb.py", line 69, in load_lmdb3 ds = MultiProcessRunner(ds, 5000, 1) # NOTE: PrefetchData() deprecated in May 2019 File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\site-packages\tensorpack\dataflow\parallel.py", line 214, in init start_proc_mask_signal(self.procs) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\site-packages\tensorpack\utils\concurrency.py", line 244, in start_proc_mask_signal p.start() File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\contextlib.py", line 77, in exit self.gen.throw(type, value, traceback) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\site-packages\tensorpack\utils\concurrency.py", line 216, in mask_sigint yield True File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\site-packages\tensorpack\utils\concurrency.py", line 244, in start_proc_mask_signal p.start() File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\context.py", line 212, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\context.py", line 313, in _Popen return Popen(process_obj) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\popen_spawn_win32.py", line 34, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\spawn.py", line 144, in get_preparation_data _check_not_importing_main() File "C:\Users\dps42\AppData\Local\Continuum\miniconda3\envs\dps42_dev\lib\multiprocessing\spawn.py", line 137, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.

I will attach the code here: z_debug.zip

But please notice that the LMDB file I'm using is too large to be attached to the zip file. The LMDB file was created from the same "debug2.py" but with more images and data entries.

From load_lmdb3() function, the code crashed with "MultiProcessRunner()" with a RuntimeError. Maybe another Windows issue ? I had the same error before PrefetchData() was renamed to MultiProcessRunner()

(2) Other observations, if any: For example, CPU/GPU utilization, output images, tensorboard curves, if relevant to your issue.

3. What you expected, if not obvious.

If you expect higher speed, please read http://tensorpack.readthedocs.io/tutorial/performance-tuning.html before posting.

If you expect certain accuracy, only in one of the two conditions can we help with it: (1) You're unable to reproduce the accuracy documented in tensorpack examples. (2) It appears to be a tensorpack bug.

Otherwise, how to train a model to certain accuracy is a machine learning question. We do not answer machine learning questions and it is your responsibility to figure out how to make your models more accurate.

4. Your environment:

Paste the output of this command: python -c 'import tensorpack.tfutils as u; print(u.collect_env_info())' If this command failed, tell us your version of Python/TF/tensorpack.

You can install Tensorpack master by pip install -U git+https://github.com/ppwwyyxx/tensorpack.git and see if your issue is already solved.

If you're not using tensorpack under a normal command line shell (e.g., using an IDE or jupyter notebook), please retry under a normal command line shell.

Include relevant hardware information, e.g. number of GPUs used for training, amount of RAM.

You may often want to provide extra information related to your issue, but at the minimum please try to provide the above information accurately to save effort in the investigation.

Windows 10. I think no GPU was used at the moment.
enhancement
opened by dps42 30
how to adapt model-agnostic meta learning in tensorpack
Hello,

I would like to do model-agnostic meta learning in tensorpack The training algorithm of a classification task using model-agnostic meta learning is below:

We have fθ as the model with parameter θ , α,β are hyperparameters

in each iteration sample [ inputa, inputb, labela, labelb ] from training set

forward inputa to fθ and evaluate the gradient using cross entropy

Compute adapted parameters with gradient descent:

θ' = θ - α∇θfθ(inputa)

update θ ← θ − β∇θfθ'(inputb)

https://arxiv.org/abs/1703.03400

The source code of model-agnostic meta learning from github is below:

for j in range(num_updates - 1): loss = self.loss_func(self.forward(inputa, fast_weights, reuse=True), labela) grads = tf.gradients(loss, list(fast_weights.values())) if FLAGS.stop_grad: grads = [tf.stop_gradient(grad) for grad in grads] gradients = dict(zip(fast_weights.keys(), grads)) fast_weights = dict(zip(fast_weights.keys(), [fast_weights[key] - self.update_lr*gradients[key] for key in fast_weights.keys()])) output = self.forward(inputb, fast_weights, reuse=True) task_outputbs.append(output) task_lossesb.append(self.loss_func(output, labelb)) task_output = [task_outputa, task_outputbs, task_lossa, task_lossesb]

https://github.com/cbfinn/maml/blob/master/maml.py

I'd like to know in tensorpack and using trainers, how can I access model weights θ between the training iteration and forward with inputa, compute the gradient decent and adapted as θ' and update the model weight θ using the task_lossesb as we used to do at the end of an iteration.
usage
opened by john81923 30
Better ModelDesc
The original design lacks enough consideration and it's not clear how the graph is built, and what one can and cannot do inside build_graph. E.g.:

Is it OK to create placeholders inside build_graph?

What symbolic functions are allowed to use and what not? (e.g. tf.layers.batch_norm? tf.train.input_producer?)..

What to put in get_inputs and what not? Is this interface even necessary?

FIXED by introducing TowerTrainer, TowerFunc, TowerTensorHandle ~~How to access a tensor a bit later? Because setting self.xxx sadly doesn't work (#287), and using the tensor names is not easy. (#315, #317, #442)~~

RESOLVED Use return cost for single-cost ModelDesc. For other types of models, you need to write your own trainer any way, so you'll build the graph by yourself anyway. On the contrary, self.cost needs to be set. This seems very hard-coded, and the reason behind it is that self.cost is only set because some (but not all) trainers need it. This contract between Model and Trainer needs to be addressed in a clearer way.

FIXED ~~What's worse, some examples now actually is using self.xxx. Technically they should not rely on this unsupported use.~~

Fancy dynamic stuff might also be hard, but I'm not very familiar.

Some of example use case that is hard or too tricky to do with the current interface:

Input data has different layout (needs different placeholder) in training vs inference.

Access some tensors in all towers.

Mix of data/model parallel. A special case is to create some variables (not reuse) in each tower.

Nothing should be deprecated because the current interface works well for most problems. But I'm thinking about new ones which can expose more of the graph building process to users.
enhancement
opened by ppwwyyxx 30

Stuck in Pre-filling StagingArea

Hi there, Thanks for tensorpack ! I am training segmentation model on cityscapes. I write dataflow refering to get_imagenet_dataflow()

def __iter__(self):
        for img_addr, gt_addr in self.lst:
            img = cv2.cvtColor(cv2.imread(img_addr, cv2.IMREAD_COLOR), cv2.COLOR_BGR2RGB)
            gt = cv2.imread(gt_addr, cv2.IMREAD_GRAYSCALE)
            yield [img, gt]

And test this dataflow using below code, it prints the numpy array and achieves like 30 it/s(8 cores), and it will suddenly stop at somewhere, like 250/5000.

ds = PrefetchDataZMQ(ds, parallel)
    ds = BatchData(ds, batch_size, remainder=False)  
    ds.reset_state()
    print(next(ds.get_data()))
    TestDataSpeed(ds).start()

Then run training with SyncMultiGPUTrainerParameterServer, the problem is it stuck at Pre-filling StagingArea, showed in below. At the start, CPU is running at 104% with little GPU memory usage, after about 10-15 mins, CPU usage drops and GPU increase, but no computation on GPU with GPU-Util 0%. I have no idea where I did wrong. Could you give me some insights on this ?? Thanks so much.

[0926 11:25:49 @base.py:211] Initializing the session ...
[0926 11:25:49 @base.py:218] Graph Finalized.
[0926 11:25:50 @concurrency.py:37] Starting EnqueueThread QueueInput/input_queue ...
[0926 11:26:01 @param.py:148] [HyperParamSetter] At global_step=0, learning_rate will change to 0.00025000
[0926 11:26:03 @base.py:250] Start Epoch 1 ...
  0%|                                                                                                              |0/371[00:00<?,?it/s]
[0926 11:26:03 @input_source.py:550] Pre-filling StagingArea ...
[0926 11:26:05 @input_source.py:554] 1 element was put into StagingArea on each tower.

My environment:

Python version: Python 2.7
TF version: tf 1.6.0
Tensorpack version: 0.8.9.
OS: Ubuntu 16.04
Hardware information: E5 2630, 4 1080Ti GPUs.

usage

opened by s7ev3n 27

Add MMEval support for COCO detection evaluation
Hi, thanks for this nice work!

This PR wants to provide a new evaluation tool for examples/FasterRCNN: MMEval

MMEval is a unified evaluation library for multiple machine-learning libraries, the link to the home page is: https://github.com/open-mmlab/mmeval

The coco_det_mmeval.py support multi-gpus and multi-node evaluation with MPI4PY:

# run evaluation python tensorpack_mmeval.py --load <model_path> # launch multi-gpus evaluation by mpirun mpirun -np 8 python tensorpack_mmeval.py --load <model_path>

We tested this evaluation script on COCO-MaskRCNN-R50C41x and got the same evaluation results as the TensorPack report.

Related refer: https://github.com/open-mmlab/mmeval/tree/main/examples/tensorpack
opened by ice-tong 0

Option to disable the tqdm progress bars

Could you guys add the option to disable the tqdm progress bar? I made the code change here, adding a keyword argument "pbar_disable", but I'm not able to check it in.

def send_dataflow_zmq(df, addr, hwm=50, format=None, bind=False, pbar_disable=False):
    """
    Run DataFlow and send data to a ZMQ socket addr.
    It will serialize and send each datapoint to this address with a PUSH socket.
    This function never returns.

    Args:
        df (DataFlow): Will infinitely loop over the DataFlow.
        addr: a ZMQ socket endpoint.
        hwm (int): ZMQ high-water mark (buffer size)
        format (str): The serialization format.
             Default format uses :mod:`utils.serialize`.
             This format works with :class:`dataflow.RemoteDataZMQ`.
             An alternate format is 'zmq_ops', used by https://github.com/tensorpack/zmq_ops
             and :class:`input_source.ZMQInput`.
        bind (bool): whether to bind or connect to the endpoint address.
    """
    assert format in [None, 'zmq_op', 'zmq_ops']
    if format is None:
        dump_fn = dumps
    else:
        from zmq_ops import dump_arrays
        dump_fn = dump_arrays

    ctx = zmq.Context()
    socket = ctx.socket(zmq.PUSH)
    socket.set_hwm(hwm)
    if bind:
        socket.bind(addr)
    else:
        socket.connect(addr)
    try:
        df.reset_state()
        logger.info("Serving data to {} with {} format ...".format(
            addr, 'default' if format is None else 'zmq_ops'))
        INTERVAL = 200
        q = deque(maxlen=INTERVAL)

        try:
            total = len(df)
        except NotImplementedError:
            total = 0
        tqdm_args = get_tqdm_kwargs(
            leave=True, smoothing=0.8, disable=pbar_disable)
        tqdm_args['bar_format'] = tqdm_args['bar_format'] + "{postfix}"
        while True:
            with tqdm.trange(total, **tqdm_args) as pbar:
                for dp in df:
                    start = time.time()
                    socket.send(dump_fn(dp), copy=False)
                    q.append(time.time() - start)
                    pbar.update(1)
                    if pbar.n % INTERVAL == 0:
                        avg = "{:.3f}".format(sum(q) / len(q))
                        pbar.set_postfix({'AvgSendLat': avg})
    finally:
        logger.info("Exiting send_dataflow_zmq ...")
        socket.setsockopt(zmq.LINGER, 0)
        socket.close()
        if not ctx.closed:
            ctx.destroy(0)

opened by actuallyaswin 0

Issue when using automatic mixed precision in training with evaluation callback

1. What you did:

I tried to use automatic mixed precision when training a MaskRCNN model via a graph rewrite. As presented here: https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/train/experimental/enable_mixed_precision_graph_rewrite, I added the following line at the end of the generalized_rcnn function GeneralizedRCNN.optimizer(): opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)

2. What you observed:

When I train the model without evaluation callback, there is no issue at all. Once it is trained, if I load the model with OfflinePredictor, it also works well. However, if I train the model with evaluation callback, I get the following error during the first evaluation:

InternalError                             Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
   1364     try:
-> 1365       return fn(*args)
   1366     except errors.OpError as e:

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1349       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1350                                       target_list, run_metadata)
   1351 

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1442                                             fetch_list, target_list,
-> 1443                                             run_metadata)
   1444 

InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(12032000, 1), b.shape=(1, 4), m=12032000, n=4, k=1
	 [[{{node tower-pred-0/fpn/upsample_lat4/Tensordot/MatMul}}]]
  (1) Internal: Blas GEMM launch failed : a.shape=(12032000, 1), b.shape=(1, 4), m=12032000, n=4, k=1
	 [[{{node tower-pred-0/fpn/upsample_lat4/Tensordot/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)

/opt/conda/lib/python3.7/site-packages/tensorpack/train/interface.py in launch_train_with_config(config, trainer)
     97         starting_epoch=config.starting_epoch,
     98         max_epoch=config.max_epoch,
---> 99         extra_callbacks=config.extra_callbacks)
    100 
    101 

/opt/conda/lib/python3.7/site-packages/tensorpack/train/base.py in train_with_defaults(self, _sentinel, callbacks, monitors, session_creator, session_init, steps_per_epoch, starting_epoch, max_epoch, extra_callbacks)
    340         self.train(callbacks, monitors,
    341                    session_creator, session_init,
--> 342                    steps_per_epoch, starting_epoch, max_epoch)
    343 
    344     def __new__(cls, *args, **kwargs):

/opt/conda/lib/python3.7/site-packages/tensorpack/train/base.py in train(self, callbacks, monitors, session_creator, session_init, steps_per_epoch, starting_epoch, max_epoch)
    312         self.setup_callbacks(callbacks, monitors)
    313         self.initialize(session_creator, session_init)
--> 314         self.main_loop(steps_per_epoch, starting_epoch, max_epoch)
    315 
    316     def train_with_defaults(

/opt/conda/lib/python3.7/site-packages/tensorpack/utils/argtools.py in wrapper(*args, **kwargs)
    166         cache.add(func)
    167 
--> 168         return func(*args, **kwargs)
    169 
    170     return wrapper

/opt/conda/lib/python3.7/site-packages/tensorpack/train/base.py in main_loop(self, steps_per_epoch, starting_epoch, max_epoch)
    284 
    285                     # trigger epoch outside the timing region.
--> 286                     self._callbacks.trigger_epoch()
    287                 logger.info("Training has finished!")
    288             except (StopTraining, tf.errors.OutOfRangeError) as e:

/opt/conda/lib/python3.7/site-packages/tensorpack/callbacks/base.py in trigger_epoch(self)
    154 
    155     def trigger_epoch(self):
--> 156         self._trigger_epoch()
    157 
    158     def _trigger_epoch(self):

/opt/conda/lib/python3.7/site-packages/tensorpack/callbacks/group.py in _trigger_epoch(self)
     93             display_name = str(cb)
     94             with tm.timed_callback(display_name):
---> 95                 cb.trigger_epoch()
     96         tm.log()
     97 

/opt/conda/lib/python3.7/site-packages/tensorpack/callbacks/base.py in trigger_epoch(self)
    154 
    155     def trigger_epoch(self):
--> 156         self._trigger_epoch()
    157 
    158     def _trigger_epoch(self):

/opt/conda/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    433                 raise CancelledError()
    434             elif self._state == FINISHED:
--> 435                 return self.__get_result()
    436             else:
    437                 raise TimeoutError()

/opt/conda/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

/opt/conda/lib/python3.7/concurrent/futures/thread.py in run(self)
     55 
     56         try:
---> 57             result = self.fn(*self.args, **self.kwargs)
     58         except BaseException as exc:
     59             self.future.set_exception(exc)

/home/jovyan/eval.py in predict_dataflow()
--> 157               outputs = predict_image(img, model_func)

/home/jovyan/eval.py in predict_image(img, model_func)
---> 46     outputs = model_func(img)

/opt/conda/lib/python3.7/site-packages/tensorpack/predict/base.py in __call__(self, *dp)
     39             list[array]: list of outputs
     40         """
---> 41         output = self._do_call(dp)
     42         if self.return_input:
     43             return (dp, output)

/opt/conda/lib/python3.7/site-packages/tensorpack/predict/base.py in _do_call(self, dp)
    134         # run_metadata = tf.RunMetadata()
    135         # options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
--> 136         return self._callable(*dp)
    137 
    138 

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in _generic_run(*feed_args, **kwargs)
   1230             feed: feed_val for feed, feed_val in zip(feed_list, feed_args)
   1231         }
-> 1232         return self.run(fetches, feed_dict=feed_dict, **kwargs)
   1233 
   1234       return _generic_run

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    954     try:
    955       result = self._run(None, fetches, feed_dict, options_ptr,
--> 956                          run_metadata_ptr)
    957       if run_metadata:
    958         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1178     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1179       results = self._do_run(handle, final_targets, final_fetches,
-> 1180                              feed_dict_tensor, options, run_metadata)
   1181     else:
   1182       results = []

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1357     if handle is None:
   1358       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1359                            run_metadata)
   1360     else:
   1361       return self._do_call(_prun_fn, handle, feeds, fetches)

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
   1382                     '\nsession_config.graph_options.rewrite_options.'
   1383                     'disable_meta_optimizer = True')
-> 1384       raise type(e)(node_def, op, message)
   1385 
   1386   def _extend_graph(self):

InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(12032000, 1), b.shape=(1, 4), m=12032000, n=4, k=1
	 [[node tower-pred-0/fpn/upsample_lat4/Tensordot/MatMul (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Internal: Blas GEMM launch failed : a.shape=(12032000, 1), b.shape=(1, 4), m=12032000, n=4, k=1
	 [[node tower-pred-0/fpn/upsample_lat4/Tensordot/MatMul (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'tower-pred-0/fpn/upsample_lat4/Tensordot/MatMul':
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py", line 845, in launch_instance
    app.start()
  File "/opt/conda/lib/python3.7/site-packages/ipykernel/kernelapp.py", line 612, in start
    self.io_loop.start()
  File "/opt/conda/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 688, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 814, in inner
    self.ctx_run(self.run)
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 775, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 374, in dispatch_queue
    yield self.process_one()
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 250, in wrapper
    runner = Runner(ctx_run, result, future, yielded)
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 741, in __init__
    self.ctx_run(self.run)
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 775, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 358, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/opt/conda/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 261, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/opt/conda/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 538, in execute_request
    user_expressions, allow_stdin,
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 234, in wrapper
    yielded = ctx_run(next, result)
  File "/opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py", line 302, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/opt/conda/lib/python3.7/site-packages/ipykernel/zmqshell.py", line 539, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2895, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2940, in _run_cell
    return runner(coro)
  File "/opt/conda/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3166, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3357, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-f9d37edbca59>", line 23, in <module>
    commit_hash = "unknown",
  File "/home/jovyan/train.py", line 315, in train_mask_rcnn
    launch_train_with_config(traincfg, trainer)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/train/interface.py", line 99, in launch_train_with_config
    extra_callbacks=config.extra_callbacks)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/train/base.py", line 342, in train_with_defaults
    steps_per_epoch, starting_epoch, max_epoch)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/train/base.py", line 312, in train
    self.setup_callbacks(callbacks, monitors)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/utils/argtools.py", line 168, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/train/base.py", line 209, in setup_callbacks
    self._callbacks.setup_graph(weakref.proxy(self))
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/callbacks/base.py", line 59, in setup_graph
    self._setup_graph()
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/callbacks/group.py", line 68, in _setup_graph
    cb.setup_graph(self.trainer)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/callbacks/base.py", line 59, in setup_graph
    self._setup_graph()
  File "/home/jovyan/eval.py", line 305, in _setup_graph
    self.predictors = [self._build_predictor(k % num_gpu) for k in range(self.num_predictor)]
  File "/home/jovyan/eval.py", line 305, in <listcomp>
    self.predictors = [self._build_predictor(k % num_gpu) for k in range(self.num_predictor)]
  File "/home/jovyan/eval.py", line 319, in _build_predictor
    return self.trainer.get_predictor(self._in_names, self._out_names, device=idx)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/train/tower.py", line 136, in get_predictor
    self.tower_func(*input.get_input_tensors())
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/tfutils/tower.py", line 291, in __call__
    output = self._tower_fn(*args)
  File "/home/jovyan/modeling/generalized_rcnn.py", line 129, in build_graph
    features = self.backbone(image)
  File "/home/jovyan/modeling/generalized_rcnn.py", line 307, in backbone
    p23456 = fpn_model('fpn', c2345)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/models/registry.py", line 173, in wrapped_func
    outputs = func(*args, **actual_args)
  File "/home/jovyan/modeling/model_fpn.py", line 65, in fpn_model
    lat = lat + upsample2x('upsample_lat{}'.format(6 - idx), lat_sum_5432[-1])
  File "/home/jovyan/modeling/model_fpn.py", line 51, in upsample2x
    data_format='channels_first')
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/models/registry.py", line 173, in wrapped_func
    outputs = func(*args, **actual_args)
  File "/opt/conda/lib/python3.7/site-packages/tensorpack/models/pool.py", line 127, in FixedUnPooling
    ret = tf.tensordot(x, mat, axes=1)  # bxcxhxwxshxsw
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 4071, in tensordot
    ab_matmul = matmul(a_reshape, b_reshape)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 2754, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6136, in mat_mul
    name=name)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

4. Your environment:

sys.platform          linux
Python                3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0]
Tensorpack            v0.10.1-0-g8f831349
Numpy                 1.19.5
TensorFlow            1.15.5/v1.15.5-1-g7d0c58b5326
TF Compiler Version   7.3.1 20180303
TF CUDA support       True
TF MKL support        False
TF XLA support        False
Nvidia Driver         /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.450.51.06
CUDA                  /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudart.so.11.0.221
CUDNN                 /usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
NCCL                  /usr/lib/x86_64-linux-gnu/libnccl.so.2.7.8
CUDA_VISIBLE_DEVICES  Unspecified
GPU 0                 Tesla T4
Free RAM              21.86/29.45 GB
CPU Count             8
Horovod               0.21.3
cv2                   4.4.0
msgpack               1.0.2
python-prctl          False

Question: is it possible to run evaluation callback while training with automatic mixed precision (even if it already works in inference outside of the training) or are there changes to perform to make it work?

opened by martinjammes 0

Is there an analogue for parallel Dataset.interleave in Dataflow?
A typical data loading pipeline in TensorFlow using tf.data.Dataset might look something like this:

dataset = tf.data.Dataset.from_tensor_slices(filenames) dataset = dataset.interleave( tf.data.TFRecordDataset, num_parallel_calls=reader_num_threads) dataset = dataset.batch(batch_size, drop_remainder=True) dataset = dataset.map( lambda serialized_example: tf.io.parse_example(serialized_example, features), num_parallel_calls=parser_num_threads)

Obviously, I'm not trying to use Dataflow to parse TFRecords, but it is somewhat of an analogous workflow of wanting to parallelize reading multiple file iterators at a time. I understand how to do the parallel map using Dataflow, but I don't quite see how to do the parallel interleave. Any tips?
enhancement
opened by cyc 6

Why doesn't MultiProcessMapData() stop?

I tried something very simple with MultiProcessMapData():

from tensorpack import *

class MyFlow(DataFlow):
    def __init__(self, n):
        super().__init__()
        self.n = n

    def __iter__(self):
        for i in range(self.n):
            yield i

    def __len__(self):
        return self.n

def f(i):
    return i*10

d0 = MyFlow(10)
d1 = MultiProcessMapData(d0, num_proc = 4, map_func=f, buffer_size=10, strict=False)
d1.reset_state()

for i in d1:
    print(i)
print("end")

In this example, the loop never stops. It just produces more and more numbers. If I set strict to False, the code produces 5 numbers (0, 10, 20, 30, 40) and then freezes. Is this the expected behaviour? I am using the latest version of Tensorpack on macOS. Thank you.

opened by hsinhaoyu 2

[Placeholder]Detectron2 fbnet backbone

It was amazing to see detectron2, that's like the best of pytorch and tensorflow. Thank you for the great library.

according to @wat3rbro https://github.com/facebookresearch/detectron2/issues/12#issuecomment-565566046

https://github.com/facebookresearch/detectron2/issues/12#issuecomment-566822670 mobile friendly models are coming soon.

Creating this issue as a placeholder to support fbnet backbone when even they are available.

Once again thank you for the great library. Pardon if the category is wrong.

opened by no-1ne 0

Releases(doc-v0.9.0.1)

doc-v0.9.0.1(Jan 18, 2019)

Source code(tar.gz)
Source code(zip)
tensorpack.docset.tgz(2.21 MB)
0.7.1-docs(Nov 12, 2017)

Source code(tar.gz)
Source code(zip)
tensorpack.docset.tgz(2.00 MB)
0.4.0-doc(Aug 16, 2017)

Source code(tar.gz)
Source code(zip)
tensorpack.docset.tgz(1.95 MB)

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Related tags

Overview

Features:

Examples:

Vision:

Reinforcement Learning:

Speech / NLP:

Install:

Citing Tensorpack:

Comments

1. What you did:

2. What you observed:

3. What you expected, if not obvious.

4. Your environment:

1. What you did:

2. What you observed:

4. Your environment:

Releases(doc-v0.9.0.1)

doc-v0.9.0.1(Jan 18, 2019)

0.7.1-docs(Nov 12, 2017)

0.4.0-doc(Aug 16, 2017)

Owner

Tensorpack

Flax is a neural network ecosystem for JAX that is designed for flexibility.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

U-2-Net: U Square Net - Modified for paired image training of style transfer

Speed-Test - You can check your intenet speed using this tool

Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

Accelerate Neural Net Training by Progressively Freezing Layers

Simple codebase for flexible neural net training

A complete, self-contained example for training ImageNet at state-of-the-art speed with FFCV

Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.