TensorFlow tutorials and best practices.

Overview

Effective TensorFlow 2

Table of Contents

Part I: TensorFlow 2 Fundamentals

  1. TensorFlow 2 Basics
  2. Broadcasting the good and the ugly
  3. Take advantage of the overloaded operators
  4. Control flow operations: conditionals and loops
  5. Prototyping kernels and advanced visualization with Python ops
  6. Numerical stability in TensorFlow

We updated the guide to follow the newly released TensorFlow 2.x API. If you want the original guide for TensorFlow 1.x see the v1 branch.

To install TensorFlow 2.0 (alpha) follow the instructions on the official website:

pip install tensorflow==2.0.0-alpha0

We aim to gradually expand this series by adding new articles and keep the content up to date with the latest releases of TensorFlow API. If you have suggestions on how to improve this series or find the explanations ambiguous, feel free to create an issue, send patches, or reach out by email.

Part I: TensorFlow 2.0 Fundamentals

TensorFlow Basics

TensorFlow 2 went under a massive redesign to make the API more accessible and easier to use. If you are familiar with numpy you will find yourself right at home when using TensorFlow 2. Unlike TensorFlow 1 which was purely symbolic, TensorFlow 2 hides its symbolic nature behind the hood to look like any other imperative library like NumPy. It's important to note the change is mostly an interface change, and TensorFlow 2 is still able to take advantage of its symbolic machinery to do everything that TensorFlow 1.x can do (e.g. automatic-differentiation and massively parallel computation on TPUs/GPUs).

Let's start with a simple example, we want to multiply two random matrices. First we look at an implementation done in NumPy:

import numpy as np

x = np.random.normal(size=[10, 10])
y = np.random.normal(size=[10, 10])
z = np.dot(x, y)

print(z)

Now we perform the exact same computation this time in TensorFlow 2.0:

import tensorflow as tf

x = tf.random.normal([10, 10])
y = tf.random.normal([10, 10])
z = tf.matmul(x, y)

print(z)

Similar to NumPy TensorFlow 2 also immediately performs the computation and produces the result. The only difference is that TensorFlow uses tf.Tensor type to store the results which can be easily converted to NumPy, by calling tf.Tensor.numpy() member function:

print(z.numpy())

To understand how powerful symbolic computation can be let's have a look at another example. Assume that we have samples from a curve (say f(x) = 5x^2 + 3) and we want to estimate f(x) based on these samples. We define a parametric function g(x, w) = w0 x^2 + w1 x + w2, which is a function of the input x and latent parameters w, our goal is then to find the latent parameters such that g(x, w) ≈ f(x). This can be done by minimizing the following loss function: L(w) = ∑ (f(x) - g(x, w))^2. Although there's a closed form solution for this simple problem, we opt to use a more general approach that can be applied to any arbitrary differentiable function, and that is using stochastic gradient descent. We simply compute the average gradient of L(w) with respect to w over a set of sample points and move in the opposite direction.

Here's how it can be done in TensorFlow:

import numpy as np
import tensorflow as tf

# Assuming we know that the desired function is a polynomial of 2nd degree, we
# allocate a vector of size 3 to hold the coefficients and initialize it with
# random noise.
w = tf.Variable(tf.random.normal([3, 1]))

# We use the Adam optimizer with learning rate set to 0.1 to minimize the loss.
opt = tf.optimizers.Adam(0.1)

def model(x):
    # We define yhat to be our estimate of y.
    f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1)
    yhat = tf.squeeze(tf.matmul(f, w), 1)
    return yhat

def compute_loss(y, yhat):
    # The loss is defined to be the l2 distance between our estimate of y and its
    # true value. We also added a shrinkage term, to ensure the resulting weights
    # would be small.
    loss = tf.nn.l2_loss(yhat - y) + 0.1 * tf.nn.l2_loss(w)
    return loss

def generate_data():
    # Generate some training data based on the true function
    x = np.random.uniform(-10.0, 10.0, size=100).astype(np.float32)
    y = 5 * np.square(x) + 3
    return x, y

def train_step():
    x, y = generate_data()

    def _loss_fn():
        yhat = model(x)
        loss = compute_loss(y, yhat)
        return loss
    
    opt.minimize(_loss_fn, [w])

for _ in range(1000):
    train_step()

print(w.numpy())

By running this piece of code you should see a result close to this:

[4.9924135, 0.00040895029, 3.4504161]

Which is a relatively close approximation to our parameters.

Note that in the above code we are running Tensorflow in imperative mode (i.e. operations get instantly executed), which is not very efficient. TensorFlow 2.0 can also turn a given piece of python code into a graph which can then optimized and efficiently parallelized on GPUs and TPUs. To get all those benefits we simply need to decorate the train_step function with tf.function decorator:

@tf.function
def train_step():
    x, y = generate_data()

    def _loss_fn():
        yhat = model(x)
        loss = compute_loss(y, yhat)
        return loss
    
    opt.minimize(_loss_fn, [w])

What's cool about tf.function is that it's also able to convert basic python statements like while, for and if into native TensorFlow functions. We will get to that later.

This is just tip of the iceberg for what TensorFlow can do. Many problems such as optimizing large neural networks with millions of parameters can be implemented efficiently in TensorFlow in just a few lines of code. TensorFlow takes care of scaling across multiple devices, and threads, and supports a variety of platforms.

Broadcasting the good and the ugly

TensorFlow supports broadcasting elementwise operations. Normally when you want to perform operations like addition and multiplication, you need to make sure that shapes of the operands match, e.g. you can’t add a tensor of shape [3, 2] to a tensor of shape [3, 4]. But there’s a special case and that’s when you have a singular dimension. TensorFlow implicitly tiles the tensor across its singular dimensions to match the shape of the other operand. So it’s valid to add a tensor of shape [3, 2] to a tensor of shape [3, 1]

import tensorflow as tf

a = tf.constant([[1., 2.], [3., 4.]])
b = tf.constant([[1.], [2.]])
# c = a + tf.tile(b, [1, 2])
c = a + b

print(c)

Broadcasting allows us to perform implicit tiling which makes the code shorter, and more memory efficient, since we don’t need to store the result of the tiling operation. One neat place that this can be used is when combining features of varying length. In order to concatenate features of varying length we commonly tile the input tensors, concatenate the result and apply some nonlinearity. This is a common pattern across a variety of neural network architectures:

a = tf.random.uniform([5, 3, 5])
b = tf.random.uniform([5, 1, 6])

# concat a and b and apply nonlinearity
tiled_b = tf.tile(b, [1, 3, 1])
c = tf.concat([a, tiled_b], 2)
d = tf.keras.layers.Dense(10, activation=tf.nn.relu).apply(c)

print(d)

But this can be done more efficiently with broadcasting. We use the fact that f(m(x + y)) is equal to f(mx + my). So we can do the linear operations separately and use broadcasting to do implicit concatenation:

pa = tf.keras.layers.Dense(10).apply(a)
pb = tf.keras.layers.Dense(10).apply(b)
d = tf.nn.relu(pa + pb)

print(d)

In fact this piece of code is pretty general and can be applied to tensors of arbitrary shape as long as broadcasting between tensors is possible:

def merge(a, b, units, activation=None):
    pa = tf.keras.layers.Dense(units).apply(a)
    pb = tf.keras.layers.Dense(units).apply(b)
    c = pa + pb
    if activation is not None:
        c = activation(c)
    return c

So far we discussed the good part of broadcasting. But what’s the ugly part you may ask? Implicit assumptions almost always make debugging harder to do. Consider the following example:

a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
c = tf.reduce_sum(a + b)

print(c)

What do you think the value of c would be after evaluation? If you guessed 6, that’s wrong. It’s going to be 12. This is because when rank of two tensors don’t match, TensorFlow automatically expands the first dimension of the tensor with lower rank before the elementwise operation, so the result of addition would be [[2, 3], [3, 4]], and the reducing over all parameters would give us 12.

The way to avoid this problem is to be as explicit as possible. Had we specified which dimension we would want to reduce across, catching this bug would have been much easier:

a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
c = tf.reduce_sum(a + b, 0)

print(c)

Here the value of c would be [5, 7], and we immediately would guess based on the shape of the result that there’s something wrong. A general rule of thumb is to always specify the dimensions in reduction operations and when using tf.squeeze.

Take advantage of the overloaded operators

Just like NumPy, TensorFlow overloads a number of python operators to make building graphs easier and the code more readable.

The slicing op is one of the overloaded operators that can make indexing tensors very easy:

z = x[begin:end]  # z = tf.slice(x, [begin], [end-begin])

Be very careful when using this op though. The slicing op is very inefficient and often better avoided, especially when the number of slices is high. To understand how inefficient this op can be let's look at an example. We want to manually perform reduction across the rows of a matrix:

import tensorflow as tf
import time

x = tf.random.uniform([500, 10])

z = tf.zeros([10])

start = time.time()
for i in range(500):
    z += x[i]
print("Took %f seconds." % (time.time() - start))

On my MacBook Pro, this took 0.045 seconds to run which is quite slow. The reason is that we are calling the slice op 500 times, which is going to be very slow to run. A better choice would have been to use tf.unstack op to slice the matrix into a list of vectors all at once:

z = tf.zeros([10])
for x_i in tf.unstack(x):
    z += x_i

This took 0.01 seconds. Of course, the right way to do this simple reduction is to use tf.reduce_sum op:

z = tf.reduce_sum(x, axis=0)

This took 0.0001 seconds, which is 100x faster than the original implementation.

TensorFlow also overloads a range of arithmetic and logical operators:

z = -x  # z = tf.negative(x)
z = x + y  # z = tf.add(x, y)
z = x - y  # z = tf.subtract(x, y)
z = x * y  # z = tf.mul(x, y)
z = x / y  # z = tf.div(x, y)
z = x // y  # z = tf.floordiv(x, y)
z = x % y  # z = tf.mod(x, y)
z = x ** y  # z = tf.pow(x, y)
z = x @ y  # z = tf.matmul(x, y)
z = x > y  # z = tf.greater(x, y)
z = x >= y  # z = tf.greater_equal(x, y)
z = x < y  # z = tf.less(x, y)
z = x <= y  # z = tf.less_equal(x, y)
z = abs(x)  # z = tf.abs(x)
z = x & y  # z = tf.logical_and(x, y)
z = x | y  # z = tf.logical_or(x, y)
z = x ^ y  # z = tf.logical_xor(x, y)
z = ~x  # z = tf.logical_not(x)

You can also use the augmented version of these ops. For example x += y and x **= 2 are also valid.

Note that Python doesn't allow overloading "and", "or", and "not" keywords.

Other operators that aren't supported are equal (==) and not equal (!=) operators which are overloaded in NumPy but not in TensorFlow. Use the function versions instead which are tf.equal and tf.not_equal.

Control flow operations: conditionals and loops

When building complex models such as recurrent neural networks you may need to control the flow of operations through conditionals and loops. In this section we introduce a number of commonly used control flow ops.

Let's assume you want to decide whether to multiply to or add two given tensors based on a predicate. This can be simply implemented with either python's built-in if statement or using tf.cond function:

a = tf.constant(1)
b = tf.constant(2)

p = tf.constant(True)

# Alternatively:
# x = tf.cond(p, lambda: a + b, lambda: a * b)
x = a + b if p else a * b

print(x.numpy())

Since the predicate is True in this case, the output would be the result of the addition, which is 3.

Most of the times when using TensorFlow you are using large tensors and want to perform operations in batch. A related conditional operation is tf.where, which like tf.cond takes a predicate, but selects the output based on the condition in batch.

a = tf.constant([1, 1])
b = tf.constant([2, 2])

p = tf.constant([True, False])

x = tf.where(p, a + b, a * b)

print(x.numpy())

This will return [3, 2].

Another widely used control flow operation is tf.while_loop. It allows building dynamic loops in TensorFlow that operate on sequences of variable length. Let's see how we can generate Fibonacci sequence with tf.while_loops:

@tf.function
def fibonacci(n):
    a = tf.constant(1)
    b = tf.constant(1)

    for i in range(2, n):
        a, b = b, a + b
    
    return b
    
n = tf.constant(5)
b = fibonacci(n)
    
print(b.numpy())

This will print 5. Note that tf.function automatically converts the given python code to use tf.while_loop so we don't need to directly interact with the TF API.

Now imagine we want to keep the whole series of Fibonacci sequence. We may update our body to keep a record of the history of current values:

@tf.function
def fibonacci(n):
    a = tf.constant(1)
    b = tf.constant(1)
    c = tf.constant([1, 1])

    for i in range(2, n):
        a, b = b, a + b
        c = tf.concat([c, [b]], 0)
    
    return c
    
n = tf.constant(5)
b = fibonacci(n)
    
print(b.numpy())

Now if you try running this, TensorFlow will complain that the shape of the the one of the loop variables is changing. One way to fix this is is to use "shape invariants", but this functionality is only available when using the low-level tf.while_loop API:

n = tf.constant(5)

def cond(i, a, b, c):
    return i < n

def body(i, a, b, c):
    a, b = b, a + b
    c = tf.concat([c, [b]], 0)
    return i + 1, a, b, c

i, a, b, c = tf.while_loop(
    cond, body, (2, 1, 1, tf.constant([1, 1])),
    shape_invariants=(tf.TensorShape([]),
                      tf.TensorShape([]),
                      tf.TensorShape([]),
                      tf.TensorShape([None])))

print(c.numpy())

This is not only getting ugly, but is also pretty inefficient. Note that we are building a lot of intermediary tensors that we don't use. TensorFlow has a better solution for this kind of growing arrays. Meet tf.TensorArray. Let's do the same thing this time with tensor arrays:

@tf.function
def fibonacci(n):
    a = tf.constant(1)
    b = tf.constant(1)

    c = tf.TensorArray(tf.int32, n)
    c = c.write(0, a)
    c = c.write(1, b)

    for i in range(2, n):
        a, b = b, a + b
        c = c.write(i, b)
    
    return c.stack()

n = tf.constant(5)
c = fibonacci(n)
    
print(c.numpy())

TensorFlow while loops and tensor arrays are essential tools for building complex recurrent neural networks. As an exercise try implementing beam search using tf.while_loops. Can you make it more efficient with tensor arrays?

Prototyping kernels and advanced visualization with Python ops

Operation kernels in TensorFlow are entirely written in C++ for efficiency. But writing a TensorFlow kernel in C++ can be quite a pain. So, before spending hours implementing your kernel you may want to prototype something quickly, however inefficient. With tf.py_function() you can turn any piece of python code to a TensorFlow operation.

For example this is how you can implement a simple ReLU nonlinearity kernel in TensorFlow as a python op:

import numpy as np
import tensorflow as tf
import uuid

def relu(inputs):
    # Define the op in python
    def _py_relu(x):
        return np.maximum(x, 0.)

    # Define the op's gradient in python
    def _py_relu_grad(x):
        return np.float32(x > 0)
    
    @tf.custom_gradient
    def _relu(x):
        y = tf.py_function(_py_relu, [x], tf.float32)
        
        def _relu_grad(dy):
            return dy * tf.py_function(_py_relu_grad, [x], tf.float32)

        return y, _relu_grad

    return _relu(inputs)

To verify that the gradients are correct you can compare the numerical and analytical gradients and compare the vlaues.

# Compute analytical gradient
x = tf.random.normal([10], dtype=np.float32)
with tf.GradientTape() as tape:
    tape.watch(x)
    y = relu(x)
g = tape.gradient(y, x)
print(g)

# Compute numerical gradient
dx_n = 1e-5
dy_n = relu(x + dx_n) - relu(x)
g_n = dy_n / dx_n
print(g_n)

The numbers should be very close.

Note that this implementation is pretty inefficient, and is only useful for prototyping, since the python code is not parallelizable and won't run on GPU. Once you verified your idea, you definitely would want to write it as a C++ kernel.

In practice we commonly use python ops to do visualization on Tensorboard. Consider the case that you are building an image classification model and want to visualize your model predictions during training. TensorFlow allows visualizing images with tf.summary.image() function:

image = tf.placeholder(tf.float32)
tf.summary.image("image", image)

But this only visualizes the input image. In order to visualize the predictions you have to find a way to add annotations to the image which may be almost impossible with existing ops. An easier way to do this is to do the drawing in python, and wrap it in a python op:

def visualize_labeled_images(images, labels, max_outputs=3, name="image"):
    def _visualize_image(image, label):
        # Do the actual drawing in python
        fig = plt.figure(figsize=(3, 3), dpi=80)
        ax = fig.add_subplot(111)
        ax.imshow(image[::-1,...])
        ax.text(0, 0, str(label),
          horizontalalignment="left",
          verticalalignment="top")
        fig.canvas.draw()

        # Write the plot as a memory file.
        buf = io.BytesIO()
        data = fig.savefig(buf, format="png")
        buf.seek(0)

        # Read the image and convert to numpy array
        img = PIL.Image.open(buf)
        return np.array(img.getdata()).reshape(img.size[0], img.size[1], -1)

    def _visualize_images(images, labels):
        # Only display the given number of examples in the batch
        outputs = []
        for i in range(max_outputs):
            output = _visualize_image(images[i], labels[i])
            outputs.append(output)
        return np.array(outputs, dtype=np.uint8)

    # Run the python op.
    figs = tf.py_function(_visualize_images, [images, labels], tf.uint8)
    return tf.summary.image(name, figs)

Note that since summaries are usually only evaluated once in a while (not per step), this implementation may be used in practice without worrying about efficiency.

Numerical stability in TensorFlow

When using any numerical computation library such as NumPy or TensorFlow, it's important to note that writing mathematically correct code doesn't necessarily lead to correct results. You also need to make sure that the computations are stable.

Let's start with a simple example. From primary school we know that x * y / y is equal to x for any non zero value of x. But let's see if that's always true in practice:

import numpy as np

x = np.float32(1)

y = np.float32(1e-50)  # y would be stored as zero
z = x * y / y

print(z)  # prints nan

The reason for the incorrect result is that y is simply too small for float32 type. A similar problem occurs when y is too large:

y = np.float32(1e39)  # y would be stored as inf
z = x * y / y

print(z)  # prints nan

The smallest positive value that float32 type can represent is 1.4013e-45 and anything below that would be stored as zero. Also, any number beyond 3.40282e+38, would be stored as inf.

print(np.nextafter(np.float32(0), np.float32(1)))  # prints 1.4013e-45
print(np.finfo(np.float32).max)  # print 3.40282e+38

To make sure that your computations are stable, you want to avoid values with small or very large absolute value. This may sound very obvious, but these kind of problems can become extremely hard to debug especially when doing gradient descent in TensorFlow. This is because you not only need to make sure that all the values in the forward pass are within the valid range of your data types, but also you need to make sure of the same for the backward pass (during gradient computation).

Let's look at a real example. We want to compute the softmax over a vector of logits. A naive implementation would look something like this:

import tensorflow as tf

def unstable_softmax(logits):
    exp = tf.exp(logits)
    return exp / tf.reduce_sum(exp)

print(unstable_softmax([1000., 0.]).numpy())  # prints [ nan, 0.]

Note that computing the exponential of logits for relatively small numbers results to gigantic results that are out of float32 range. The largest valid logit for our naive softmax implementation is ln(3.40282e+38) = 88.7, anything beyond that leads to a nan outcome.

But how can we make this more stable? The solution is rather simple. It's easy to see that exp(x - c) / ∑ exp(x - c) = exp(x) / ∑ exp(x). Therefore we can subtract any constant from the logits and the result would remain the same. We choose this constant to be the maximum of logits. This way the domain of the exponential function would be limited to [-inf, 0], and consequently its range would be [0.0, 1.0] which is desirable:

import tensorflow as tf

def softmax(logits):
    exp = tf.exp(logits - tf.reduce_max(logits))
    return exp / tf.reduce_sum(exp)

print(softmax([1000., 0.]).numpy())  # prints [ 1., 0.]

Let's look at a more complicated case. Consider we have a classification problem. We use the softmax function to produce probabilities from our logits. We then define our loss function to be the cross entropy between our predictions and the labels. Recall that cross entropy for a categorical distribution can be simply defined as xe(p, q) = -∑ p_i log(q_i). So a naive implementation of the cross entropy would look like this:

def unstable_softmax_cross_entropy(labels, logits):
    logits = tf.math.log(softmax(logits))
    return -tf.reduce_sum(labels * logits)

labels = tf.constant([0.5, 0.5])
logits = tf.constant([1000., 0.])

xe = unstable_softmax_cross_entropy(labels, logits)

print(xe.numpy())  # prints inf

Note that in this implementation as the softmax output approaches zero, the log's output approaches infinity which causes instability in our computation. We can rewrite this by expanding the softmax and doing some simplifications:

def softmax_cross_entropy(labels, logits):
    scaled_logits = logits - tf.reduce_max(logits)
    normalized_logits = scaled_logits - tf.reduce_logsumexp(scaled_logits)
    return -tf.reduce_sum(labels * normalized_logits)

labels = tf.constant([0.5, 0.5])
logits = tf.constant([1000., 0.])

xe = softmax_cross_entropy(labels, logits)

print(xe.numpy())  # prints 500.0

We can also verify that the gradients are also computed correctly:

with tf.GradientTape() as tape:
    tape.watch(logits)
    xe = softmax_cross_entropy(labels, logits)
    
g = tape.gradient(xe, logits)
print(g.numpy())  # prints [0.5, -0.5]

which is correct.

Let me remind again that extra care must be taken when doing gradient descent to make sure that the range of your functions as well as the gradients for each layer are within a valid range. Exponential and logarithmic functions when used naively are especially problematic because they can map small numbers to enormous ones and the other way around.

Comments
  • Please add a license to this repo

    Please add a license to this repo

    Could you please add an explicit LICENSE file to the repo so that it's clear under what terms the content is provided, and under what terms user contributions are licensed?

    Per GitHub docs on licensing:

    [...] without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. If you're creating an open source project, we strongly encourage you to include an open source license.

    Thanks!

    opened by mbrukman 5
  • how does the `get_shape` function work with placeholders?

    how does the `get_shape` function work with placeholders?

    I tried The code exapmle b = tf.placeholder(tf.float32, [None, 10, 32]); shape = get_shape(b) , but when I print out the shape, it show tensor objects, rather than the dynamic/static shape as expected.

    I wonder how can I use this get_shape function in a session properly in order to get a placeholder's shape?

    Thx!

    opened by JenkinsY94 4
  • Why use static shapes while converting the Tensor of rank 3 to rank 2?

    Why use static shapes while converting the Tensor of rank 3 to rank 2?

    In the example for converting the Tensor of rank 3 to rank 2, a combination of static and dynamic shapes are used (based on the get_sahpe function). Is not it enough to use the dynamic shapes for this purpose as follows? What is the merit of using static shapes?

    b = tf.placeholder(tf.float32, [None, 10, 32])
    shape = tf.shape(b)
    b = tf.reshape(b, [shape[0], shape[1] * shape[2]])
    
    opened by h-amirkhani 2
  • Avoiding blocking of processes due to lack of data

    Avoiding blocking of processes due to lack of data

    The code relevant to this issue can be found here Situation of the problem I am using tf.contrib.staging.StagingArea for efficient usage of GPUs by prefetching. To explain the issue better I am taking a small part of the snippet from the above code here :

    with tf.device("/gpu:0"):
            runningcorrect = tf.get_variable("runningcorrect", [], dtype=tf.float32, initializer=tf.zeros_initializer(), trainable=False)
            runningnum = tf.get_variable("runningnum", [], dtype=tf.float32, initializer=tf.zeros_initializer(), trainable=False)
        for i in range(numgpus):
            with tf.variable_scope(tf.get_variable_scope(), reuse=i>0) as vscope:
                with tf.device('/gpu:{}'.format(i)):
                    with tf.name_scope('GPU-Tower-{}'.format(i)) as scope:
                        stagingarea = tf.contrib.staging.StagingArea([tf.float32, tf.int32], shapes=[[trainbatchsize, 3, 221, 221], [trainbatchsize]], capacity=20)
                        stagingclarify.append(stagingarea.clear())
                        putop = stagingarea.put(input_iterator.get_next())
                        train_put_list.append(putop)
                        getop = stagingarea.get()
                        train_get_list.append(getop)
                        elem = train_get_list[i]
                        net, networksummaries =  overfeataccurate(elem[0],numclasses=1000)
    

    So I am using a tf.contrib.staging.StagingArea on each GPU. Each StagingArea takes its input from a tf.contrib.data.Dataset using a tf.contrib.data.Iterator. For each GPU the input is taken from the StagingArea using a StagingArea.get() op.

    The Problem Initially the training works fine. Towards the end of an epoch however, when a StagingArea does not get trainbatchsize number of tensors and the tf.contrib.data.Iterator has produced a tf.errors.OutOfRangeError, the training is blocked. It is clear that why this problem is happening. However I am not able to think of a clean way to correct this problem. Can I get insights into this issue ?

    opened by ghost 2
  • Explain why tf.nn.softmax isn't used for entropy

    Explain why tf.nn.softmax isn't used for entropy

    Hello!

    Thank you for the wonderful guide! There's one thing I'm confused about: the recipe for entropy uses a manually-defined softmax function instead of tf.nn.softmax. Is there a particular reason for this, or was it just to demonstrate how to implement both numerically-stable softmax and entropy?

    Cheers!

    opened by mrahtz 2
  • Added a small clarification to a snippet for order of execution

    Added a small clarification to a snippet for order of execution

    I added some clarifications because people coming over from Python or C++ often get flipped by the fact that a=a+b does not refer to the process of assignment but rather to an operation.

    opened by ghost 2
  • Mistake in

    Mistake in "broadcasting good and ugly"

    a =  tf.random_uniform([5, 3, 5])
    b = tf.random_uniform([5, 1, 6])
    # concat a and b and apply nonlinearity
    tiled_b = tf.tile(b, [1, 3, 1])
    c = tf.concat([a, tiled_b], 2)
    

    Should be a = tf.random_uniform([5, 3, 6]) of b = tf.random_uniform([5, 1, 5])

    opened by berylsheep-up 1
  • Where are the trainable variables placed in the

    Where are the trainable variables placed in the "Multi-GPU processing with data parallelism " ?

    I mean if there are 4 gpus that can be used for data parallelism. Where are the variables placed ? All variables are placed on the gpu:0 or in some kind of other allocation approach? If all the variables are placed on the gpu0, it seems possible to meet the OOM (Out of Memory ) issue. Waiting for your reply, thanks!

    opened by zhangjcqq 1
  • tf.AUTO_REUSE work with tf.layers.conv2d

    tf.AUTO_REUSE work with tf.layers.conv2d

    In 'Scopes and when to use them` section

    with tf.variable_scope("scope", reuse=tf.AUTO_REUSE):
      features1 = tf.layers.conv2d(image1, filters=32, kernel_size=3)
      features2 = tf.layers.conv2d(image2, filters=32, kernel_size=3)
    

    The above conv2d layer won't share weights, it seems that we have to explicitly specify name attribute of tf.layers.conv2d to share weights like,

    with tf.variable_scope("scope", reuse=tf.AUTO_REUSE):
      features1 = tf.layers.conv2d(image1, filters=32, kernel_size=3, name='conv2d')
      features2 = tf.layers.conv2d(image2, filters=32, kernel_size=3, name='conv2d')
    

    tf.version = 1.8.0

    Thanks

    opened by raytroop 1
  • Detailed comments.

    Detailed comments.

    The material is great. I can understand how Tensorflow works and go deeper. But there is a lack of some commenting on special arguments. As an example, in the stochastic gradient descent code, there are some parameters for the axis values that I can't understand. It would be great if you provide them too. I know that's a little too much too ask, but for a beginner like me, that's crucial. I've also tried reading the docs of Tensorflow, but they are not good at explaining the axis terms in a simple and understandable way. Thanks!

    opened by gholomia 1
  • Mistake in

    Mistake in "Scopes and when to use them"

    I've tried to run the penultimate example of the "Scopes and when to use them" section like so:

    image1 = tf.placeholder(tf.float32, shape=[None, 100, 100, 3])
    image2 = tf.placeholder(tf.float32, shape=[None, 100, 100, 3])
    
    features1 = tf.layers.conv2d(image1, filters=32, kernel_size=3)
    
    # Use the same convolution weights to process the second image
    with tf.variable_scope(tf.get_variable_scope(), reuse=True):
        features2 = tf.layers.conv2d(image2, filters=32, kernel_size=3)
    

    but I got:

    ValueError: Variable conv2d_1/kernel does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?
    

    So I tried:

    conv1_weights = tf.get_variable('conv1_w', [3, 3, 3, 64])
    features1 = tf.nn.conv2d(image1, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
    
    # Use the same convolution weights to process the second image
    with tf.variable_scope(tf.get_variable_scope(), reuse=True):
        conv1_weights = tf.get_variable('conv1_w')
        features2 = tf.nn.conv2d(image2, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
    

    and this does work (at least in terms of raising errors), but does not provide the segue to the final example (which uses tf.layers.conv2d).

    Perhaps you know a way of modifying the current example so that it's runnable?

    EDIT: I should have added: my tf.__version__ is '1.6.0-rc0'

    opened by MTDzi 1
Owner
Vahid Kazemi
PhD in Computer Vision & Robotics Ex. Google, Waymo, Microsoft, & Snap.
Vahid Kazemi
Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets This is the official implementation of "Towards Good Pract

Sanja Fidler's Lab 52 Nov 22, 2022
A best practice for tensorflow project template architecture.

A best practice for tensorflow project template architecture.

Mahmoud Gamal Salem 3.6k Dec 22, 2022
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 2, 2023
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

null 7 Jun 22, 2022
Simple tutorials on Pytorch DDP training

pytorch-distributed-training Distribute Dataparallel (DDP) Training on Pytorch Features Easy to study DDP training You can directly copy this code for

Ren Tianhe 188 Jan 6, 2023
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Phillip Lippe 1.1k Jan 7, 2023
Pytorch tutorials for Neural Style transfert

PyTorch Tutorials This tutorial is no longer maintained. Please use the official version: https://pytorch.org/tutorials/advanced/neural_style_tutorial

Alexis David Jacq 135 Jun 26, 2022
Pytorch Geometric Tutorials

Pytorch Geometric Tutorials

Antonio Longa 648 Jan 8, 2023
Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy.

Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. Now with tensorflow 1.0 support. Evaluation usa

Marcel R. 349 Aug 6, 2022
This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

Mohamed Ayman 33 Dec 2, 2022
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

null 2.6k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

flow-dev 2 Aug 21, 2022
Let Python optimize the best stop loss and take profits for your TradingView strategy.

TradingView Machine Learning TradeView is a free and open source Trading View bot written in Python. It is designed to support all major exchanges. It

Robert Roman 473 Jan 9, 2023
Top #1 Submission code for the first https://alphamev.ai MEV competition with best AUC (0.9893) and MSE (0.0982).

alphamev-winning-submission Top #1 Submission code for the first alphamev MEV competition with best AUC (0.9893) and MSE (0.0982). The code won't run

null 70 Oct 29, 2022
Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible, to be the most reliable with the least complexity possible

Nikolas B Virionis 2 Aug 1, 2022
Machine learning and Deep learning models, deploy on telegram (the best social media)

Semi Intelligent BOT The project involves : Classifying fake news Classifying objects such as aeroplane, automobile, bird, cat, deer, dog, frog, horse

MohammadReza Norouzi 5 Mar 6, 2022
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this

Nervana 3.9k Dec 20, 2022