A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Overview

micrograd

awww

A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural networks library on top of it with a PyTorch-like API. Both are tiny, with about 100 and 50 lines of code respectively. The DAG only operates over scalar values, so e.g. we chop up each neuron into all of its individual tiny adds and multiplies. However, this is enough to build up entire deep neural nets doing binary classification, as the demo notebook shows. Potentially useful for educational purposes.

Installation

pip install micrograd

Example usage

Below is a slightly contrived example showing a number of possible supported operations:

from micrograd.engine import Value

a = Value(-4.0)
b = Value(2.0)
c = a + b
d = a * b + b**3
c += c + 1
c += 1 + c + (-a)
d += d * 2 + (b + a).relu()
d += 3 * d + (b - a).relu()
e = c - d
f = e**2
g = f / 2.0
g += 10.0 / f
print(f'{g.data:.4f}') # prints 24.7041, the outcome of this forward pass
g.backward()
print(f'{a.grad:.4f}') # prints 138.8338, i.e. the numerical value of dg/da
print(f'{b.grad:.4f}') # prints 645.5773, i.e. the numerical value of dg/db

Training a neural net

The notebook demo.ipynb provides a full demo of training an 2-layer neural network (MLP) binary classifier. This is achieved by initializing a neural net from micrograd.nn module, implementing a simple svm "max-margin" binary classification loss and using SGD for optimization. As shown in the notebook, using a 2-layer neural net with two 16-node hidden layers we achieve the following decision boundary on the moon dataset:

2d neuron

Tracing / visualization

For added convenience, the notebook trace_graph.ipynb produces graphviz visualizations. E.g. this one below is of a simple 2D neuron, arrived at by calling draw_dot on the code below, and it shows both the data (left number in each node) and the gradient (right number in each node).

from micrograd import nn
n = nn.Neuron(2)
x = [Value(1.0), Value(-2.0)]
y = n(x)
dot = draw_dot(y)

2d neuron

Running tests

To run the unit tests you will have to install PyTorch, which the tests use as a reference for verifying the correctness of the calculated gradients. Then simply:

python -m pytest

License

MIT

Comments
  • Add exp tanh to value

    Add exp tanh to value

    You might already have these changes locally, but I figured I'd upload this just in case, since people that watched the YouTube video probably expect these functions to be implemented.

    opened by bradklingensmith 4
  • Incorrect gradient when non-leaf Values are re-used

    Incorrect gradient when non-leaf Values are re-used

    Thank you @evcu for raising, my little 2D toy problem converged and instead of going on to proper tests and double checking through the recursion I got all trigger-happy and amused with puppies. The core issue is that if variables are re-used then their gradient will be accumulated for each path. Do you think this simpler reference counting idea will work as a potential simpler solution? The idea is to suppress backward() calls until the very last one.

    (Love your Stylized puppy in your branch btw! :D)

    
    class Value:
        """ stores a single scalar value and its gradient """
    
        def __init__(self, data):
            self.data = data
            self.grad = 0
            self.backward = lambda: None
            self.refs = 0
    
        def __add__(self, other):
            other = other if isinstance(other, Value) else Value(other)
            out = Value(self.data + other.data)
            self.refs += 1
            other.refs += 1
            
            def backward():
                if out.refs > 1:
                    out.refs -= 1
                    return
                self.grad += out.grad
                other.grad += out.grad
                self.backward()
                other.backward()
            out.backward = backward
    
            return out
    
        def __radd__(self, other):
            return self.__add__(other)
    
        def __mul__(self, other):
            other = other if isinstance(other, Value) else Value(other)
            out = Value(self.data * other.data)
            self.refs += 1
            other.refs += 1
            
            def backward():
                if out.refs > 1:
                    out.refs -= 1
                    return
                self.grad += other.data * out.grad
                other.grad += self.data * out.grad
                self.backward()
                other.backward()
            out.backward = backward
    
            return out
    
        def __rmul__(self, other):
            return self.__mul__(other)
    
        def relu(self):
            out = Value(0 if self.data < 0 else self.data)
            self.refs += 1
            def backward():
                if out.refs > 1:
                    out.refs -= 1
                    return
                self.grad += (out.data > 0) * out.grad
                self.backward()
            out.backward = backward
            return out
    
        def __repr__(self):
            return f"Value(data={self.data}, grad={self.grad})"
    
    opened by karpathy 3
  • `other` should have a gradient in `__pow__` (?)

    `other` should have a gradient in `__pow__` (?)

    Hey Andrej -- just want to say thanks so much for your YouTube video on micrograd. The video has been absolutely enlightening.

    Quick question -- while re-implementing micrograd on my end, I noticed that __pow__ (in Value) was missing a back-propagation definition for other. Is this expected? https://github.com/karpathy/micrograd/blob/c911406e5ace8742e5841a7e0df113ecb5d54685/micrograd/engine.py#L39-L40

    opened by rokkosan 2
  • PyPI package

    PyPI package

    Feature

    • Convert Micrograd into a PyPI package.

    Need

    • Would be easier for institutions or bootcamps to help adopt this for their students.
    • As an organizer of the Data Science Club at SJSU, I would love to introduce the fundamentals with this library.
    opened by aaditkapoor 2
  • the svm max-loss had been implemented to have a negative score when t…

    the svm max-loss had been implemented to have a negative score when t…

    …he label was positive and visa versa; probably to avoid implementing sub. Changed the loss and demo still avoiding implementing sub but sign of score now matches label

    opened by sinjax 2
  • Issue with zero_grad?

    Issue with zero_grad?

    Hi, unless I'm misunderstanding something, zero_grad in nn.py is zeroing out the gradients on the parameter nodes, but shouldn't it do it on all the nodes in the graph? Otherwise the inner nodes will keep accumulating them.

    opened by sky87 1
  • Simplify topo sorted traversal to depth first traversal

    Simplify topo sorted traversal to depth first traversal

    Each node points to 0 or 1 other node, never to more. There's only one unique path by which each node can be reached by going backwards from the output node. This means that in the backward() method v is never already in visited. The algorithm thus simplifies to a depth-first traversal.

    Side-note: if a node could be reached by more than 1 path from the output node, the node would have a different gradient for each of the paths. So, while you could reuse nodes for the forward calculation and get correct results, there's no facility to store several gradients per node. Thus it's unnecessary to account for this possibility here.

    opened by mbertheau 1
  • Bugfix : Value class initializer

    Bugfix : Value class initializer

    The bug only pops up if a crazy person like me accidentally or intentionally re-initiates a value.

    before the fix: a = Value(2.0) b = Value(a) print(b) output: Value(data=Value(data=2.0, grad=0), grad=0)

    after the fix: a = Value(2.0) b = Value(a) print(b) output: Value(data=2.0, grad=0)

    it's very minor and can be ignored but would mean a lot for my very first open-source contribution

    opened by Kulwa-silya 1
  • Noob question about backprop implementation

    Noob question about backprop implementation

    Hello, I came across this from your YT video tutorials, thank you for making these!

    In engine.py, you implement back propagation using explicit topological order computation. Are there any reasons why we would not recursively call _backward for every child ? e.g. implement backward function in Value as such:

        def backward(self):
            self._backward()
            for v in self._prev:
                v.backward()
    

    Does it have something to do with how backprop is implemented in actual NN libraries? Is recursion harder to parallelise in practice compared to using topological ordering?

    Thank you

    opened by khanigoo 0
  • Homework Assignment Error with softmax activation function

    Homework Assignment Error with softmax activation function

    Hi @karpathy I was solving the assignment as mentioned in the YouTube video. In the Softmax function, I was getting the following error TypeError: unsupported operand type(s) for +: 'int' and 'Value'

    This is the line where I am getting the error

    def softmax(logits):
      counts = [logit.exp() for logit in logits]
      denominator = sum(counts) #Here I am getting the Typeerror
      out = [c / denominator for c in counts]
      return out
    

    And, my add function in Value Class is the following

    def __add__(self, other): # exactly as in the video
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data + other.data, (self, other), '+')
        
        def _backward():
          self.grad += 1.0 * out.grad
          other.grad += 1.0 * out.grad
        out._backward = _backward
        
        return out
    

    So my query is on the sum of list function. It is probably similar to counts[i].add(counts[i+1]) and then we keep on adding to the result till the end of the list. So this add function should work well. But I am not sure why it is not working, am I missing something? Thanks in advance

    opened by Athe-kunal 1
  • micrograd.NET: C# port for .NET developers

    micrograd.NET: C# port for .NET developers

    Hi, Andrej, Thanks for this excellent library! It may be useful not only for Python developers, but for C#, F#, Pascal etc. developers too, so I wrote a C# port for .NET ecosystem. The basic info about this is here: https://github.com/ColorfulSoft/System.AI/blob/master/Docs/micrograd.NET.md Best, Gleb S. Brykin

    opened by GlebSBrykin 0
  • Added note about .NET version

    Added note about .NET version

    Added note about a .NET C# reimplementation.

    I know that you don't accept PRs and I don't even expect you to endorse or comment on this work but I just wanted you to know that micrograd is now living in other languages as well.

    opened by tymtam2 0
  • More accurate accuracy

    More accurate accuracy

    I know you don't accept pull requests, but you may want to make the change where the log shows accuracy at the point of logging. At the moment it shows the "old" accuracy.

    opened by tymtam2 1
  • handle single input neuron returned by layer with single output

    handle single input neuron returned by layer with single output

    The Layer class special cases single output neurons:

    return out[0] if len(out) == 1 else out
    

    This is nice for loss functions but currently breaks for intermediate layers with a single neuron.

    from micrograd.nn import MLP
    from micrograd.engine import Value
    
    # works
    model = MLP(2, [16, 16, 1])
    x = [Value(1.0), Value(-2.0)]
    print(model(x))
    
    # fails: intermediate layer with 1 neuron
    model = MLP(2, [16, 1, 1])
    x = [Value(1.0), Value(-2.0)]
    print(model(x))
    

    This PR fixes this by correctly handling a single input Value.

    Note that I currently only check for Value instances explicitly, which is good enough for intermediate layers.

    To handle single network input neurons, we should also check for float and int scalars. I kept it as minimal as possible for now, but I can add that if you want. (Technically, it would probably be best to check if the input is iterable or not, but doing that right nowadays requires a try statement.)

    opened by jonasrauber 0
Owner
Andrej
I like to train Deep Neural Nets on large datasets.
Andrej
Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

Fangjun Kuang 119 Jan 3, 2023
You like pytorch? You like micrograd? You love tinygrad! ❤️

For something in between a pytorch and a karpathy/micrograd This may not be the best deep learning framework, but it is a deep learning framework. Due

George Hotz 9.7k Jan 5, 2023
TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards

TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards. It can reduce GPU memory and scale up the training when the model has massive linear layers (e.g., ViT, BERT and GPT) or huge classes (millions). It has the same API design as PyTorch.

Kaiyu Yue 275 Nov 22, 2022
High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

null 382 Dec 6, 2022
PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation.

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation. It aims to accelerate research by providing a modular design that allows for easy extension and combination of NIF-related components, as well as readily available paper implementations and dataset loaders.

Preferred Networks, Inc. 96 Nov 28, 2022
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

Fidelity Investments 56 Sep 13, 2022
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Cong Cai 12 Dec 19, 2021
GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

Joel Huang 15 Dec 24, 2022
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

Jacob Gildenblat 836 Dec 26, 2022
PyTorch Extension Library of Optimized Scatter Operations

PyTorch Scatter Documentation This package consists of a small extension library of highly optimized sparse update (scatter and segment) operations fo

Matthias Fey 1.2k Jan 7, 2023
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

Facebook Research 1.5k Jan 3, 2023
The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

Terence Parr 704 Dec 14, 2022
ocaml-torch provides some ocaml bindings for the PyTorch tensor library.

ocaml-torch provides some ocaml bindings for the PyTorch tensor library. This brings to OCaml NumPy-like tensor computations with GPU acceleration and tape-based automatic differentiation.

Laurent Mazare 369 Jan 3, 2023
PyGCL: Graph Contrastive Learning Library for PyTorch

PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL components from published papers, standardized evaluation, and experiment management.

GCL: Graph Contrastive Learning Library for PyTorch 592 Jan 7, 2023
S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

Amazon Web Services 138 Jan 3, 2023
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

abhishek thakur 1.1k Jan 4, 2023
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

null 878 Dec 30, 2022
null 270 Dec 24, 2022
Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Martin Krasser 251 Dec 25, 2022