A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Andrej

Last update: Jan 8, 2023

Related tags

Pytorch Utilities micrograd

Overview

micrograd

A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural networks library on top of it with a PyTorch-like API. Both are tiny, with about 100 and 50 lines of code respectively. The DAG only operates over scalar values, so e.g. we chop up each neuron into all of its individual tiny adds and multiplies. However, this is enough to build up entire deep neural nets doing binary classification, as the demo notebook shows. Potentially useful for educational purposes.

Installation

pip install micrograd

Example usage

Below is a slightly contrived example showing a number of possible supported operations:

from micrograd.engine import Value

a = Value(-4.0)
b = Value(2.0)
c = a + b
d = a * b + b**3
c += c + 1
c += 1 + c + (-a)
d += d * 2 + (b + a).relu()
d += 3 * d + (b - a).relu()
e = c - d
f = e**2
g = f / 2.0
g += 10.0 / f
print(f'{g.data:.4f}') # prints 24.7041, the outcome of this forward pass
g.backward()
print(f'{a.grad:.4f}') # prints 138.8338, i.e. the numerical value of dg/da
print(f'{b.grad:.4f}') # prints 645.5773, i.e. the numerical value of dg/db

Training a neural net

The notebook demo.ipynb provides a full demo of training an 2-layer neural network (MLP) binary classifier. This is achieved by initializing a neural net from micrograd.nn module, implementing a simple svm "max-margin" binary classification loss and using SGD for optimization. As shown in the notebook, using a 2-layer neural net with two 16-node hidden layers we achieve the following decision boundary on the moon dataset:

Tracing / visualization

For added convenience, the notebook trace_graph.ipynb produces graphviz visualizations. E.g. this one below is of a simple 2D neuron, arrived at by calling draw_dot on the code below, and it shows both the data (left number in each node) and the gradient (right number in each node).

from micrograd import nn
n = nn.Neuron(2)
x = [Value(1.0), Value(-2.0)]
y = n(x)
dot = draw_dot(y)

Running tests

To run the unit tests you will have to install PyTorch, which the tests use as a reference for verifying the correctness of the calculated gradients. Then simply:

python -m pytest

License

MIT

Comments

Add exp tanh to value

You might already have these changes locally, but I figured I'd upload this just in case, since people that watched the YouTube video probably expect these functions to be implemented.

opened by bradklingensmith 4

Incorrect gradient when non-leaf Values are re-used

Thank you @evcu for raising, my little 2D toy problem converged and instead of going on to proper tests and double checking through the recursion I got all trigger-happy and amused with puppies. The core issue is that if variables are re-used then their gradient will be accumulated for each path. Do you think this simpler reference counting idea will work as a potential simpler solution? The idea is to suppress backward() calls until the very last one.

(Love your Stylized puppy in your branch btw! :D)


class Value:
    """ stores a single scalar value and its gradient """

    def __init__(self, data):
        self.data = data
        self.grad = 0
        self.backward = lambda: None
        self.refs = 0

    def __add__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data + other.data)
        self.refs += 1
        other.refs += 1
        
        def backward():
            if out.refs > 1:
                out.refs -= 1
                return
            self.grad += out.grad
            other.grad += out.grad
            self.backward()
            other.backward()
        out.backward = backward

        return out

    def __radd__(self, other):
        return self.__add__(other)

    def __mul__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data * other.data)
        self.refs += 1
        other.refs += 1
        
        def backward():
            if out.refs > 1:
                out.refs -= 1
                return
            self.grad += other.data * out.grad
            other.grad += self.data * out.grad
            self.backward()
            other.backward()
        out.backward = backward

        return out

    def __rmul__(self, other):
        return self.__mul__(other)

    def relu(self):
        out = Value(0 if self.data < 0 else self.data)
        self.refs += 1
        def backward():
            if out.refs > 1:
                out.refs -= 1
                return
            self.grad += (out.data > 0) * out.grad
            self.backward()
        out.backward = backward
        return out

    def __repr__(self):
        return f"Value(data={self.data}, grad={self.grad})"

opened by karpathy 3

`other` should have a gradient in `__pow__` (?)

Hey Andrej -- just want to say thanks so much for your YouTube video on micrograd. The video has been absolutely enlightening.

Quick question -- while re-implementing micrograd on my end, I noticed that __pow__ (in Value) was missing a back-propagation definition for other. Is this expected? https://github.com/karpathy/micrograd/blob/c911406e5ace8742e5841a7e0df113ecb5d54685/micrograd/engine.py#L39-L40

opened by rokkosan 2
PyPI package
Feature

Convert Micrograd into a PyPI package.

Need

Would be easier for institutions or bootcamps to help adopt this for their students.

As an organizer of the Data Science Club at SJSU, I would love to introduce the fundamentals with this library.
opened by aaditkapoor 2
the svm max-loss had been implemented to have a negative score when t…

…he label was positive and visa versa; probably to avoid implementing sub. Changed the loss and demo still avoiding implementing sub but sign of score now matches label

opened by sinjax 2
Issue with zero_grad?

Hi, unless I'm misunderstanding something, zero_grad in nn.py is zeroing out the gradients on the parameter nodes, but shouldn't it do it on all the nodes in the graph? Otherwise the inner nodes will keep accumulating them.

opened by sky87 1
Simplify topo sorted traversal to depth first traversal

Each node points to 0 or 1 other node, never to more. There's only one unique path by which each node can be reached by going backwards from the output node. This means that in the backward() method v is never already in visited. The algorithm thus simplifies to a depth-first traversal.

Side-note: if a node could be reached by more than 1 path from the output node, the node would have a different gradient for each of the paths. So, while you could reuse nodes for the forward calculation and get correct results, there's no facility to store several gradients per node. Thus it's unnecessary to account for this possibility here.

opened by mbertheau 1
Bugfix : Value class initializer

The bug only pops up if a crazy person like me accidentally or intentionally re-initiates a value.

before the fix: a = Value(2.0) b = Value(a) print(b) output: Value(data=Value(data=2.0, grad=0), grad=0)

after the fix: a = Value(2.0) b = Value(a) print(b) output: Value(data=2.0, grad=0)

it's very minor and can be ignored but would mean a lot for my very first open-source contribution

opened by Kulwa-silya 1
Noob question about backprop implementation
Hello, I came across this from your YT video tutorials, thank you for making these!

In engine.py, you implement back propagation using explicit topological order computation. Are there any reasons why we would not recursively call _backward for every child ? e.g. implement backward function in Value as such:

def backward(self): self._backward() for v in self._prev: v.backward()

Does it have something to do with how backprop is implemented in actual NN libraries? Is recursion harder to parallelise in practice compared to using topological ordering?

Thank you
opened by khanigoo 0
Homework Assignment Error with softmax activation function
Hi @karpathy I was solving the assignment as mentioned in the YouTube video. In the Softmax function, I was getting the following error TypeError: unsupported operand type(s) for +: 'int' and 'Value'

This is the line where I am getting the error

def softmax(logits): counts = [logit.exp() for logit in logits] denominator = sum(counts) #Here I am getting the Typeerror out = [c / denominator for c in counts] return out

And, my add function in Value Class is the following

def __add__(self, other): # exactly as in the video other = other if isinstance(other, Value) else Value(other) out = Value(self.data + other.data, (self, other), '+') def _backward(): self.grad += 1.0 * out.grad other.grad += 1.0 * out.grad out._backward = _backward return out

So my query is on the sum of list function. It is probably similar to counts[i].add(counts[i+1]) and then we keep on adding to the result till the end of the list. So this add function should work well. But I am not sure why it is not working, am I missing something? Thanks in advance
opened by Athe-kunal 1
micrograd.NET: C# port for .NET developers

Hi, Andrej, Thanks for this excellent library! It may be useful not only for Python developers, but for C#, F#, Pascal etc. developers too, so I wrote a C# port for .NET ecosystem. The basic info about this is here: https://github.com/ColorfulSoft/System.AI/blob/master/Docs/micrograd.NET.md Best, Gleb S. Brykin

opened by GlebSBrykin 0
Added note about .NET version

Added note about a .NET C# reimplementation.

I know that you don't accept PRs and I don't even expect you to endorse or comment on this work but I just wanted you to know that micrograd is now living in other languages as well.

opened by tymtam2 0
More accurate accuracy

I know you don't accept pull requests, but you may want to make the change where the log shows accuracy at the point of logging. At the moment it shows the "old" accuracy.

opened by tymtam2 1
handle single input neuron returned by layer with single output
The Layer class special cases single output neurons:

return out[0] if len(out) == 1 else out

This is nice for loss functions but currently breaks for intermediate layers with a single neuron.

from micrograd.nn import MLP from micrograd.engine import Value # works model = MLP(2, [16, 16, 1]) x = [Value(1.0), Value(-2.0)] print(model(x)) # fails: intermediate layer with 1 neuron model = MLP(2, [16, 1, 1]) x = [Value(1.0), Value(-2.0)] print(model(x))

This PR fixes this by correctly handling a single input Value.

Note that I currently only check for Value instances explicitly, which is good enough for intermediate layers.

To handle single network input neurons, we should also check for float and int scalars. I kept it as minimal as possible for now, but I can add that if you want. (Technically, it would probably be best to check if the input is iterable or not, but doing that right nowadays requires a try statement.)
opened by jonasrauber 0

Owner

Andrej

I like to train Deep Neural Nets on large datasets.

GitHub

Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

119 Jan 3, 2023

You like pytorch? You like micrograd? You love tinygrad! ❤️

For something in between a pytorch and a karpathy/micrograd This may not be the best deep learning framework, but it is a deep learning framework. Due

9.7k Jan 5, 2023

TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards

TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards. It can reduce GPU memory and scale up the training when the model has massive linear layers (e.g., ViT, BERT and GPT) or huge classes (millions). It has the same API design as PyTorch.

275 Nov 22, 2022

High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

382 Dec 6, 2022

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation.

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation. It aims to accelerate research by providing a modular design that allows for easy extension and combination of NIF-related components, as well as readily available paper implementations and dataset loaders.

96 Nov 28, 2022

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

56 Sep 13, 2022

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

12 Dec 19, 2021

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

15 Dec 24, 2022

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

836 Dec 26, 2022

PyTorch Extension Library of Optimized Scatter Operations

PyTorch Scatter Documentation This package consists of a small extension library of highly optimized sparse update (scatter and segment) operations fo

1.2k Jan 7, 2023

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

1.5k Jan 3, 2023

Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

1.1k Jan 4, 2023

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Related tags

Overview

micrograd

Installation

Example usage

Training a neural net

Tracing / visualization

Running tests

License

Comments

Feature

Need

Owner

Andrej

Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

You like pytorch? You like micrograd? You love tinygrad! ❤️

TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards

High-level batteries-included neural network training library for Pytorch

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch Extension Library of Optimized Scatter Operations

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

ocaml-torch provides some ocaml bindings for the PyTorch tensor library.

PyGCL: Graph Contrastive Learning Library for PyTorch

S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training