SmallPebble
Project status: experimental, unstable.
SmallPebble is a minimal/toy automatic differentiation/deep learning library written from scratch in Python, using NumPy/CuPy.
The implementation is in smallpebble.py.
Features:
- Relatively simple implementation.
- Powerful API for creating models.
- Various operations, such as matmul, conv2d, maxpool2d.
- Broadcasting support.
- Eager or lazy execution.
- It's easy to add new SmallPebble functions.
- GPU, if use CuPy.
Graphs are built implicitly via Python objects referencing Python objects. The only real step taken towards improving performance is to use NumPy/CuPy.
Should I use this?
You probably want a more efficient and featureful framework, such as JAX, PyTorch, TensorFlow, etc.
Read on to see:
- Examples of deep learning models created and trained using SmallPebble.
- A brief guide to using SmallPebble.
For an introduction to autodiff and an even more minimal autodiff implementation, look here.
import matplotlib.pyplot as plt
import numpy as np
import smallpebble as sp
from smallpebble.misc import load_data
from tqdm import tqdm
Training a neural network on MNIST
Load the dataset, and create a validation set.
X_train, y_train, _, _ = load_data('mnist') # load / download from openml.org
X_train = X_train/255
# Separate out data for validation.
X = X_train[:50_000, ...]
y = y_train[:50_000]
X_eval = X_train[50_000:60_000, ...]
y_eval = y_train[50_000:60_000]
Build a model.
X_in = sp.Placeholder()
y_true = sp.Placeholder()
h = sp.linearlayer(28*28, 100)(X_in)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.linearlayer(100, 100)(h)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.linearlayer(100, 10)(h)
y_pred = sp.Lazy(sp.softmax)(h)
loss = sp.Lazy(sp.cross_entropy)(y_pred, y_true)
learnables = sp.get_learnables(y_pred)
loss_vals = []
validation_acc = []
Train model, and measure performance on validation dataset.
NUM_EPOCHS = 300
BATCH_SIZE = 200
eval_batch = sp.batch(X_eval, y_eval, BATCH_SIZE)
for i, (xbatch, ybatch) in tqdm(enumerate(sp.batch(X, y, BATCH_SIZE)), total=NUM_EPOCHS):
if i > NUM_EPOCHS: break
X_in.assign_value(sp.Variable(xbatch))
y_true.assign_value(ybatch)
loss_val = loss.run() # run the graph
if np.isnan(loss_val.array):
print("loss is nan, aborting.")
break
loss_vals.append(loss_val.array)
# Compute gradients, and carry out learning step.
gradients = sp.get_gradients(loss_val)
sp.sgd_step(learnables, gradients, 3e-4)
# Compute validation accuracy:
x_eval_batch, y_eval_batch = next(eval_batch)
X_in.assign_value(sp.Variable(x_eval_batch))
predictions = y_pred.run()
predictions = np.argmax(predictions.array, axis=1)
accuracy = (y_eval_batch == predictions).mean()
validation_acc.append(accuracy)
plt.figure(figsize=(14, 4))
plt.subplot(1, 2, 1)
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.plot(loss_vals)
plt.subplot(1, 2, 2)
plt.title('Validation accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.suptitle('Neural network trained on MNIST, using SmallPebble.')
plt.ylim([0, 1])
plt.plot(validation_acc)
plt.show()
301it [00:03, 94.26it/s]
Training a convolutional neural network on MNIST
Make a function that creates trainable convolutional layers:
def convlayer(height, width, depth, n_kernels, strides=[1,1]):
# Initialise kernels:
sigma = np.sqrt(6 / (height*width*depth+height*width*n_kernels))
kernels_init = sigma*(np.random.random([height, width, depth, n_kernels]) - .5)
# Wrap with sp.Variable, so we can compute gradients:
kernels = sp.Variable(kernels_init)
# Flag as learnable, so we can extract from the model to train:
kernels = sp.learnable(kernels)
# Curry, to set `strides`:
func = lambda images, kernels: sp.conv2d(images, kernels, strides=strides, padding='SAME')
# Curry, to use the kernels created here:
return lambda images: sp.Lazy(func)(images, kernels)
Define a model.
X_in = sp.Placeholder()
y_true = sp.Placeholder()
h = convlayer(height=3, width=3, depth=1, n_kernels=16)(X_in)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.Lazy(lambda a: sp.maxpool2d(a, 2, 2, strides=[2, 2]))(h)
h = sp.Lazy(lambda x: sp.reshape(x, [-1, 14*14*16]))(h)
h = sp.linearlayer(14*14*16, 64)(h)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.linearlayer(64, 10)(h)
y_pred = sp.Lazy(sp.softmax)(h)
loss = sp.Lazy(sp.cross_entropy)(y_pred, y_true)
learnables = sp.get_learnables(y_pred)
loss_vals = []
validation_acc = []
# Check we get the dimensions we expected.
X_in.assign_value(sp.Variable(X_train[0:3,:].reshape([-1,28,28,1])))
y_true.assign_value(y_train[0])
h.run().array.shape
(3, 10)
NUM_EPOCHS = 300
BATCH_SIZE = 200
eval_batch = sp.batch(X_eval.reshape([-1,28,28,1]), y_eval, BATCH_SIZE)
for i, (xbatch, ybatch) in tqdm(
enumerate(sp.batch(X.reshape([-1,28,28,1]), y, BATCH_SIZE)), total=NUM_EPOCHS):
if i > NUM_EPOCHS: break
X_in.assign_value(sp.Variable(xbatch))
y_true.assign_value(ybatch)
loss_val = loss.run()
if np.isnan(loss_val.array):
print("Aborting, loss is nan.")
break
loss_vals.append(loss_val.array)
# Compute gradients, and carry out learning step.
gradients = sp.get_gradients(loss_val)
sp.sgd_step(learnables, gradients, 3e-4)
# Compute validation accuracy:
x_eval_batch, y_eval_batch = next(eval_batch)
X_in.assign_value(sp.Variable(x_eval_batch))
predictions = y_pred.run()
predictions = np.argmax(predictions.array, axis=1)
accuracy = (y_eval_batch == predictions).mean()
validation_acc.append(accuracy)
plt.figure(figsize=(14, 4))
plt.subplot(1, 2, 1)
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.plot(loss_vals)
plt.subplot(1, 2, 2)
plt.title('Validation accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.suptitle('CNN trained on MNIST, using SmallPebble.')
plt.ylim([0, 1])
plt.plot(validation_acc)
plt.show()
301it [03:35, 1.40it/s]
Training a CNN on CIFAR
Load the dataset.
X_train, y_train, _, _ = load_data('cifar')
X_train = X_train/255
# Separate out some data for validation.
X = X_train[:45_000, ...]
y = y_train[:45_000]
X_eval = X_train[45_000:50_000, ...]
y_eval = y_train[45_000:50_000]
Plot, to check it's the right data.
# This code is from: https://www.tensorflow.org/tutorials/images/cnn
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
plt.figure(figsize=(8,8))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(X_train[i,:].reshape(32,32,3), cmap=plt.cm.binary)
plt.xlabel(class_names[y_train[i]])
plt.show()
Define the model. Due to my lack of ram, it is kept relatively small.
X_in = sp.Placeholder()
y_true = sp.Placeholder()
h = convlayer(height=3, width=3, depth=3, n_kernels=16)(X_in)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.Lazy(lambda a: sp.maxpool2d(a, 2, 2, strides=[2, 2]))(h)
h = convlayer(height=3, width=3, depth=16, n_kernels=32)(h)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.Lazy(lambda a: sp.maxpool2d(a, 2, 2, strides=[2, 2]))(h)
h = sp.Lazy(lambda x: sp.reshape(x, [-1, 8*8*32]))(h)
h = sp.linearlayer(8*8*32, 64)(h)
h = sp.Lazy(sp.leaky_relu)(h)
h = sp.linearlayer(64, 10)(h)
h = sp.Lazy(sp.softmax)(h)
y_pred = h
loss = sp.Lazy(sp.cross_entropy)(y_pred, y_true)
learnables = sp.get_learnables(y_pred)
loss_vals = []
validation_acc = []
# Check we get the expected dimensions
X_in.assign_value(sp.Variable(X[0:3, :].reshape([-1, 32, 32, 3])))
h.run().shape
(3, 10)
Train the model.
NUM_EPOCHS = 3000
BATCH_SIZE = 32
eval_batch = sp.batch(X_eval, y_eval, BATCH_SIZE)
for i, (xbatch, ybatch) in tqdm(enumerate(sp.batch(X, y, BATCH_SIZE)), total=NUM_EPOCHS):
if i > NUM_EPOCHS: break
xbatch_images = xbatch.reshape([-1, 32, 32, 3])
X_in.assign_value(sp.Variable(xbatch_images))
y_true.assign_value(ybatch)
loss_val = loss.run()
if np.isnan(loss_val.array):
print("Aborting, loss is nan.")
break
loss_vals.append(loss_val.array)
# Compute gradients, and carry out learning step.
gradients = sp.get_gradients(loss_val)
sp.sgd_step(learnables, gradients, 3e-3)
# Compute validation accuracy:
x_eval_batch, y_eval_batch = next(eval_batch)
X_in.assign_value(sp.Variable(x_eval_batch.reshape([-1, 32, 32, 3])))
predictions = y_pred.run()
predictions = np.argmax(predictions.array, axis=1)
accuracy = (y_eval_batch == predictions).mean()
validation_acc.append(accuracy)
plt.figure(figsize=(14, 4))
plt.subplot(1, 2, 1)
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.plot(loss_vals)
plt.subplot(1, 2, 2)
plt.title('Validation accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.plot(validation_acc)
plt.show()
3001it [25:16, 1.98it/s]
...And we see some improvement, despite the model's small size, the unsophisticated optimisation method and the difficulty of the task.
Brief guide to using SmallPebble
SmallPebble provides the following building blocks to make models with:
sp.Variable
- SmallPebble operations, such as
sp.add
,sp.mul
, etc. sp.get_gradients
sp.Lazy
sp.Placeholder
(this is really justsp.Lazy
on the identity function)sp.learnable
sp.get_learnables
The following examples show how these are used.
sp.Variable & sp.get_gradients
With SmallPebble, you can:
- Wrap NumPy arrays in
sp.Variable
- Apply SmallPebble operations (e.g.
sp.matmul
,sp.add
, etc.) - Compute gradients with
sp.get_gradients
a = sp.Variable(np.random.random([2, 2]))
b = sp.Variable(np.random.random([2, 2]))
c = sp.Variable(np.random.random([2]))
y = sp.mul(a, b) + c
print('y.array:\n', y.array)
gradients = sp.get_gradients(y)
grad_a = gradients[a]
grad_b = gradients[b]
grad_c = gradients[c]
print('grad_a:\n', grad_a)
print('grad_b:\n', grad_b)
print('grad_c:\n', grad_c)
y.array:
[[0.50222439 0.67745659]
[0.68666171 0.58330707]]
grad_a:
[[0.56436821 0.2581522 ]
[0.89043144 0.25750461]]
grad_b:
[[0.11665152 0.85303194]
[0.28106794 0.48955456]]
grad_c:
[2. 2.]
Note that y
is computed straight away, i.e. the (forward) computation happens immediately.
Also note that y
is a sp.Variable and we could continue to carry out SmallPebble operations on it.
sp.Lazy & sp.Placeholder
Lazy graphs are constructed using sp.Lazy
and sp.Placeholder
.
lazy_node = sp.Lazy(lambda a, b: a + b)(1, 2)
print(lazy_node)
print(lazy_node.run())
<smallpebble.smallpebble.Lazy object at 0x7fbc92d58d50>
3
a = sp.Lazy(lambda a: a)(2)
y = sp.Lazy(lambda a, b, c: a * b + c)(a, 3, 4)
print(y)
print(y.run())
<smallpebble.smallpebble.Lazy object at 0x7fbc92d41d50>
10
Forward computation does not happen immediately - only when .run() is called.
a = sp.Placeholder()
b = sp.Variable(np.random.random([2, 2]))
y = sp.Lazy(sp.matmul)(a, b)
a.assign_value(sp.Variable(np.array([[1,2], [3,4]])))
result = y.run()
print('result.array:\n', result.array)
result.array:
[[1.01817665 2.54693119]
[2.42244218 5.69810698]]
You can use .run() as many times as you like.
Let's change the placeholder value and re-run the graph:
a.assign_value(sp.Variable(np.array([[10,20], [30,40]])))
result = y.run()
print('result.array:\n', result.array)
result.array:
[[10.18176654 25.46931189]
[24.22442177 56.98106985]]
Finally, let's compute gradients:
gradients = sp.get_gradients(result)
Note that sp.get_gradients
is called on result
, which is a sp.Variable
, not on y
, which is a sp.Lazy
instance.
sp.learnable & sp.get_learnables
Use sp.learnable
to flag parameters as learnable, allowing them to be extracted from a lazy graph with sp.get_learnables
.
This enables the workflow of building a model, while flagging parameters as learnable, and then extracting all the parameters in one go at the end.
a = sp.Placeholder()
b = sp.learnable(sp.Variable(np.random.random([2, 1])))
y = sp.Lazy(sp.matmul)(a, b)
y = sp.Lazy(sp.add)(y, sp.learnable(sp.Variable(np.array([5]))))
learnables = sp.get_learnables(y)
for learnable in learnables:
print(learnable)
<smallpebble.smallpebble.Variable object at 0x7fbc60b6ebd0>
<smallpebble.smallpebble.Variable object at 0x7fbc60b6ec50>
Switching between NumPy and CuPy
We can dynamically switch between NumPy and CuPy:
import cupy
import numpy
import smallpebble as sp
# Switch to CuPy.
sp.array_library = cupy
# And back to NumPy again:
sp.array_library = numpy