Minimal deep learning library written from scratch in Python, using NumPy/CuPy.



Project status: experimental, unstable.

SmallPebble is a minimal/toy automatic differentiation/deep learning library written from scratch in Python, using NumPy/CuPy.

The implementation is in


  • Relatively simple implementation.
  • Powerful API for creating models.
  • Various operations, such as matmul, conv2d, maxpool2d.
  • Broadcasting support.
  • Eager or lazy execution.
  • It's easy to add new SmallPebble functions.
  • GPU, if use CuPy.

Graphs are built implicitly via Python objects referencing Python objects. The only real step taken towards improving performance is to use NumPy/CuPy.

Should I use this?

You probably want a more efficient and featureful framework, such as JAX, PyTorch, TensorFlow, etc.

Read on to see:

  • Examples of deep learning models created and trained using SmallPebble.
  • A brief guide to using SmallPebble.

For an introduction to autodiff and an even more minimal autodiff implementation, look here.

import matplotlib.pyplot as plt
import numpy as np
import smallpebble as sp
from smallpebble.misc import load_data
from tqdm import tqdm

Training a neural network on MNIST

Load the dataset, and create a validation set.

X_train, y_train, _, _ = load_data('mnist')  # load / download from
X_train = X_train/255

# Separate out data for validation.
X = X_train[:50_000, ...]
y = y_train[:50_000]
X_eval = X_train[50_000:60_000, ...]
y_eval = y_train[50_000:60_000]

Build a model.

X_in = sp.Placeholder()
y_true = sp.Placeholder()

h = sp.linearlayer(28*28, 100)(X_in)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.linearlayer(100, 100)(h)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.linearlayer(100, 10)(h)
y_pred = sp.Lazy(sp.softmax)(h)
loss = sp.Lazy(sp.cross_entropy)(y_pred, y_true)

learnables = sp.get_learnables(y_pred)

loss_vals = []
validation_acc = []

Train model, and measure performance on validation dataset.


eval_batch = sp.batch(X_eval, y_eval, BATCH_SIZE)

for i, (xbatch, ybatch) in tqdm(enumerate(sp.batch(X, y, BATCH_SIZE)), total=NUM_EPOCHS):
    if i > NUM_EPOCHS: break
    loss_val =  # run the graph
    if np.isnan(loss_val.array):
        print("loss is nan, aborting.")
    # Compute gradients, and carry out learning step.
    gradients = sp.get_gradients(loss_val)
    sp.sgd_step(learnables, gradients, 3e-4)
    # Compute validation accuracy:
    x_eval_batch, y_eval_batch = next(eval_batch)
    predictions =
    predictions = np.argmax(predictions.array, axis=1)
    accuracy = (y_eval_batch == predictions).mean()

plt.figure(figsize=(14, 4))
plt.subplot(1, 2, 1)
plt.subplot(1, 2, 2)
plt.title('Validation accuracy')
plt.suptitle('Neural network trained on MNIST, using SmallPebble.')
plt.ylim([0, 1])
301it [00:03, 94.26it/s]                         


Training a convolutional neural network on MNIST

Make a function that creates trainable convolutional layers:

def convlayer(height, width, depth, n_kernels, strides=[1,1]):
    # Initialise kernels:
    sigma = np.sqrt(6 / (height*width*depth+height*width*n_kernels))
    kernels_init = sigma*(np.random.random([height, width, depth, n_kernels]) - .5)
    # Wrap with sp.Variable, so we can compute gradients:
    kernels = sp.Variable(kernels_init)
    # Flag as learnable, so we can extract from the model to train:
    kernels = sp.learnable(kernels)
    # Curry, to set `strides`:
    func = lambda images, kernels: sp.conv2d(images, kernels, strides=strides, padding='SAME')
    # Curry, to use the kernels created here:
    return lambda images: sp.Lazy(func)(images, kernels)

Define a model.

X_in = sp.Placeholder()
y_true = sp.Placeholder()

h = convlayer(height=3, width=3, depth=1, n_kernels=16)(X_in)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.Lazy(lambda a: sp.maxpool2d(a, 2, 2, strides=[2, 2]))(h)

h = sp.Lazy(lambda x: sp.reshape(x, [-1, 14*14*16]))(h)
h = sp.linearlayer(14*14*16, 64)(h)
h = sp.Lazy(sp.leaky_relu)(h)

h = sp.linearlayer(64, 10)(h)
y_pred = sp.Lazy(sp.softmax)(h)
loss = sp.Lazy(sp.cross_entropy)(y_pred, y_true)

learnables = sp.get_learnables(y_pred)

loss_vals = []
validation_acc = []

# Check we get the dimensions we expected.
(3, 10)

eval_batch = sp.batch(X_eval.reshape([-1,28,28,1]), y_eval, BATCH_SIZE)

for i, (xbatch, ybatch) in tqdm(
    enumerate(sp.batch(X.reshape([-1,28,28,1]), y, BATCH_SIZE)), total=NUM_EPOCHS):
    if i > NUM_EPOCHS: break
    loss_val =
    if np.isnan(loss_val.array):
        print("Aborting, loss is nan.")
    # Compute gradients, and carry out learning step.
    gradients = sp.get_gradients(loss_val)
    sp.sgd_step(learnables, gradients, 3e-4)
    # Compute validation accuracy:
    x_eval_batch, y_eval_batch = next(eval_batch)
    predictions =
    predictions = np.argmax(predictions.array, axis=1)
    accuracy = (y_eval_batch == predictions).mean()

plt.figure(figsize=(14, 4))
plt.subplot(1, 2, 1)
plt.subplot(1, 2, 2)
plt.title('Validation accuracy')
plt.suptitle('CNN trained on MNIST, using SmallPebble.')
plt.ylim([0, 1])
301it [03:35,  1.40it/s]                         


Training a CNN on CIFAR

Load the dataset.

X_train, y_train, _, _ = load_data('cifar')
X_train = X_train/255

# Separate out some data for validation.
X = X_train[:45_000, ...]
y = y_train[:45_000]
X_eval = X_train[45_000:50_000, ...]
y_eval = y_train[45_000:50_000]

Plot, to check it's the right data.

# This code is from:

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

for i in range(25):


Define the model. Due to my lack of ram, it is kept relatively small.

X_in = sp.Placeholder()
y_true = sp.Placeholder()

h = convlayer(height=3, width=3, depth=3, n_kernels=16)(X_in)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.Lazy(lambda a: sp.maxpool2d(a, 2, 2, strides=[2, 2]))(h)

h = convlayer(height=3, width=3, depth=16, n_kernels=32)(h)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.Lazy(lambda a: sp.maxpool2d(a, 2, 2, strides=[2, 2]))(h)

h = sp.Lazy(lambda x: sp.reshape(x, [-1, 8*8*32]))(h)
h = sp.linearlayer(8*8*32, 64)(h)
h = sp.Lazy(sp.leaky_relu)(h)

h = sp.linearlayer(64, 10)(h)
h = sp.Lazy(sp.softmax)(h)

y_pred = h
loss = sp.Lazy(sp.cross_entropy)(y_pred, y_true)

learnables = sp.get_learnables(y_pred)

loss_vals = []
validation_acc = []

# Check we get the expected dimensions
X_in.assign_value(sp.Variable(X[0:3, :].reshape([-1, 32, 32, 3])))
(3, 10)

Train the model.


eval_batch = sp.batch(X_eval, y_eval, BATCH_SIZE)

for i, (xbatch, ybatch) in tqdm(enumerate(sp.batch(X, y, BATCH_SIZE)), total=NUM_EPOCHS):
    if i > NUM_EPOCHS: break
    xbatch_images = xbatch.reshape([-1, 32, 32, 3])
    loss_val =
    if np.isnan(loss_val.array):
        print("Aborting, loss is nan.")
    # Compute gradients, and carry out learning step.
    gradients = sp.get_gradients(loss_val)  
    sp.sgd_step(learnables, gradients, 3e-3)
    # Compute validation accuracy:
    x_eval_batch, y_eval_batch = next(eval_batch)
    X_in.assign_value(sp.Variable(x_eval_batch.reshape([-1, 32, 32, 3])))
    predictions =
    predictions = np.argmax(predictions.array, axis=1)
    accuracy = (y_eval_batch == predictions).mean()

plt.figure(figsize=(14, 4))
plt.subplot(1, 2, 1)
plt.subplot(1, 2, 2)
plt.title('Validation accuracy')
3001it [25:16,  1.98it/s]                            


...And we see some improvement, despite the model's small size, the unsophisticated optimisation method and the difficulty of the task.

Brief guide to using SmallPebble

SmallPebble provides the following building blocks to make models with:

  • sp.Variable
  • SmallPebble operations, such as sp.add, sp.mul, etc.
  • sp.get_gradients
  • sp.Lazy
  • sp.Placeholder (this is really just sp.Lazy on the identity function)
  • sp.learnable
  • sp.get_learnables

The following examples show how these are used.

sp.Variable & sp.get_gradients

With SmallPebble, you can:

  • Wrap NumPy arrays in sp.Variable
  • Apply SmallPebble operations (e.g. sp.matmul, sp.add, etc.)
  • Compute gradients with sp.get_gradients
a = sp.Variable(np.random.random([2, 2]))
b = sp.Variable(np.random.random([2, 2]))
c = sp.Variable(np.random.random([2]))
y = sp.mul(a, b) + c
print('y.array:\n', y.array)

gradients = sp.get_gradients(y)
grad_a = gradients[a]
grad_b = gradients[b]
grad_c = gradients[c]
print('grad_a:\n', grad_a)
print('grad_b:\n', grad_b)
print('grad_c:\n', grad_c)
 [[0.50222439 0.67745659]
 [0.68666171 0.58330707]]
 [[0.56436821 0.2581522 ]
 [0.89043144 0.25750461]]
 [[0.11665152 0.85303194]
 [0.28106794 0.48955456]]
 [2. 2.]

Note that y is computed straight away, i.e. the (forward) computation happens immediately.

Also note that y is a sp.Variable and we could continue to carry out SmallPebble operations on it.

sp.Lazy & sp.Placeholder

Lazy graphs are constructed using sp.Lazy and sp.Placeholder.

lazy_node = sp.Lazy(lambda a, b: a + b)(1, 2)
<smallpebble.smallpebble.Lazy object at 0x7fbc92d58d50>
a = sp.Lazy(lambda a: a)(2)
y = sp.Lazy(lambda a, b, c: a * b + c)(a, 3, 4)
<smallpebble.smallpebble.Lazy object at 0x7fbc92d41d50>

Forward computation does not happen immediately - only when .run() is called.

a = sp.Placeholder()
b = sp.Variable(np.random.random([2, 2]))
y = sp.Lazy(sp.matmul)(a, b)

a.assign_value(sp.Variable(np.array([[1,2], [3,4]])))

result =
print('result.array:\n', result.array)
 [[1.01817665 2.54693119]
 [2.42244218 5.69810698]]

You can use .run() as many times as you like.

Let's change the placeholder value and re-run the graph:

a.assign_value(sp.Variable(np.array([[10,20], [30,40]])))
result =
print('result.array:\n', result.array)
 [[10.18176654 25.46931189]
 [24.22442177 56.98106985]]

Finally, let's compute gradients:

gradients = sp.get_gradients(result)

Note that sp.get_gradients is called on result, which is a sp.Variable, not on y, which is a sp.Lazy instance.

sp.learnable & sp.get_learnables

Use sp.learnable to flag parameters as learnable, allowing them to be extracted from a lazy graph with sp.get_learnables.

This enables the workflow of building a model, while flagging parameters as learnable, and then extracting all the parameters in one go at the end.

a = sp.Placeholder()
b = sp.learnable(sp.Variable(np.random.random([2, 1])))
y = sp.Lazy(sp.matmul)(a, b)
y = sp.Lazy(sp.add)(y, sp.learnable(sp.Variable(np.array([5]))))

learnables = sp.get_learnables(y)

for learnable in learnables:
<smallpebble.smallpebble.Variable object at 0x7fbc60b6ebd0>
<smallpebble.smallpebble.Variable object at 0x7fbc60b6ec50>

Switching between NumPy and CuPy

We can dynamically switch between NumPy and CuPy:

import cupy
import numpy
import smallpebble as sp

# Switch to CuPy.
sp.array_library = cupy

# And back to NumPy again:
sp.array_library = numpy
  • Contributing to SmallPebble

    Contributing to SmallPebble

    @sradc This is the best repository I have seen in weeks. Two reasons - one, because I wanted to know more about automatic differentiation, and second, it is to the point. I found this through Aurilien Geron's retweet. I want to contribute to this. But I am hazy on where to start, and what's the core point of this repo.Could you please tell where I can start?

    opened by AdityaKane2001 10
  • Open for hacktoberfest?

    Open for hacktoberfest?


    Hacktoberfest's right around the corner and I would like to contribute a bit! As you might recall (#2 ), I am interested to add some valuable functionality to this repo. It is really easy to get your repo counted for Hacktoberfest.

    It'd be great if you could sign up this repo for hacktoberfest. It'll be really great, but I'll still contribute in case you choose otherwise. I really think this repo is a good idea. (I tried to mimic something like this but failed due to lack of time and interest)

    opened by AdityaKane2001 7
  • Restructured and modularized code

    Restructured and modularized code


    As discussed I have modularized the code in a way I think is best. Please take a look at the structure and let me know if you think there need to be any changes. All tests are passing.

    # New structure
    ├── smallpebble/
    │   ├── core/
    │   │   ├──
    │   │   └──
    │   │       ├── AssignmentError
    │   │       ├── Lazy
    │   │       ├── Placeholder
    │   │       ├── Variable
    │   │       ├── add
    │   │       ├── add_at
    │   │       ├── broadcastinfo
    │   │       ├── div
    │   │       ├── enable_broadcast
    │   │       ├── exp
    │   │       ├── expand_dims
    │   │       ├── get_gradients
    │   │       ├── getitem
    │   │       ├── log
    │   │       ├── matmul
    │   │       ├── matrix_transpose
    │   │       ├── maxax
    │   │       ├── mul
    │   │       ├── neg
    │   │       ├── np_add_at
    │   │       ├── np_strided_sliding_view
    │   │       ├── reshape
    │   │       ├── setat
    │   │       ├── square
    │   │       ├── sub
    │   │       ├── sum
    │   │       └── where 
    │   ├── nn/
    │   │   ├──
    │   │   ├──
    │   │   │   ├── conv2d
    │   │   │   ├── maxpool2d      
    │   │   │   ├── pad        
    │   │   │   ├── pad_amounts
    │   │   │   ├── padding2d
    │   │   │   ├── patches_index
    │   │   │   └── strided_sliding_view
    │   │   └──
    │   │       ├── Adam
    │   │       ├── batch
    │   │       ├── convlayer
    │   │       ├── cross_entropy
    │   │       ├── get_learnables
    │   │       ├── he_init
    │   │       ├── leaky_relu
    │   │       ├── learnable
    │   │       ├── linearlayer
    │   │       ├── onehot
    │   │       ├── sgd_step
    │   │       └── softmax
    │   ├── misc/
    │   │   └── <unchanged>
    │   ├── tests/
    │   │   └── <unchanged>
    │   ├──
    │   │   └── <unchanged>
    │   └──
    │       └── <unchanged>
    ├── LICENSE
    ├── README
    └── ...
    opened by AdityaKane2001 6
  • Why enclosing Variable inside lambda instead of the variable only?

    Why enclosing Variable inside lambda instead of the variable only?

    First of all thanks a lot for your amazing autodiff post.

    For Nth order derivatives you wrote 2.To change local_gradients to contain functions, instead of local gradient values, I think that if you only store a Variable with the weight/local_gradient, it would be sufficient as the lambda function only multiplies the path value by a constant factormultiply_by_locgrad, so get_gradients would use path_value * local_gradient where both are of Variable type.

    side-note: I have combined that with a tape/array for storing variables to save memory, but the combinatorial explosion in the number of possible paths in the graph does too much allocation especially for nth-derivative.

    opened by Islam0mar 6
  • single file, type hints, cupy version, docstrings, conda env

    single file, type hints, cupy version, docstrings, conda env

    • reverted to mostly single file
    • added type hints (hopefully mostly correct)
    • updated cupy version, made changes to keep it working
    • updated some docstrings
    • added conda environment.yml
    opened by sradc 1
  • Variable.local_gradients indented behavior is unclarified

    Variable.local_gradients indented behavior is unclarified

    class Variable:
        "To be used in calculations to be differentiated."
        def __init__(self, array, local_gradients=[]):
            self.array = array
            self.local_gradients = local_gradients

    this snippet would lead to:

    >>> var1 = Variable()
    >>> var1.local_gradients.append(1)
    >>> var2 = Variable()
    >>> print(var2.local_gradients)

    This generally could lead to hard-to-track bugs, unless one really is in need of this behavior, it would be advisable to change it or else at least clarify it in the code comments or docs that Variable has a class variable that is shared between instances.

    opened by Fazel94 1
