MiniTorch - a diy teaching library for machine learning engineers

Overview

This repo is the full student code for minitorch. It is designed as a single repo that can be completed part by part following the guide book. It uses GitHub CI to run the tests for each module.

MiniTorch is a diy teaching library for machine learning engineers who wish to learn about the internal concepts underlying deep learning systems. It is a pure Python re-implementation of the Torch API designed to be simple, easy-to-read, tested, and incremental. The final library can run Torch code. The project was developed for the course 'Machine Learning Engineering' at Cornell Tech.

To get started, first read setup to build your workspace. Then follow through each of the modules to the right. Minimal computational resources are required. Module starting code is available on GitHub, and each proceeds incrementally from past modules.

Enjoy!

Sasha Rush (@srush_nlp) with Ge Gao and Anton Abilov

Topics covered:

  • Basic Neural Networks and Modules
  • Autodifferentiation for Scalars
  • Tensors, Views, and Strides
  • Parallel Tensor Operations
  • GPU / CUDA Programming in NUMBA
  • Convolutions and Pooling
  • Advanced NN Functions
You might also like...
MLBox is a powerful Automated Machine Learning python library.
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Library for machine learning stacking generalization.

stacked_generalization Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also availab

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

Pandas Machine Learning and Quant Finance Library Collection
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Machine learning template for projects based on sklearn library.

Machine learning template for projects based on sklearn library.

Comments
  • Module 2: tensor_functions: Expand the backward result of Add

    Module 2: tensor_functions: Expand the backward result of Add

    Description

    The provided Add class does not pass the tests since we need to expand() grad_output to handle broadcasting.

    This PR adds expand() to the Add function.

    (If it was actually intentional for expand() to not be used, then we should instead add TODOs in Add to notify the students that Add needs to be fixed. We should then also mention expand() in the docs—it's really not obvious that students need to use this expand() function that was not mentioned anywhere.)

    opened by tomtseng 2
  • Please add a license to this repo

    Please add a license to this repo

    First, thank you for sharing this project with us!

    Could you please add an explicit LICENSE file to this repo (and other repos in the org) so that it's clear under what terms the content is provided, and under what terms user contributions are licensed?

    Per GitHub docs on licensing:

    [...] without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. If you're creating an open source project, we strongly encourage you to include an open source license.

    Thanks!

    opened by mbrukman 0
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In minitorch, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    numpy==1.19.1
    numba==0.49
    pytest==6.0.1
    pytest-env
    pytest-runner==5.2
    hypothesis==4.38
    flake8==3.8.3
    black==19.10b0
    colorama==0.4.3
    pep8-naming==0.11.1
    darglint==1.8.0
    tbb
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency numpy can be changed to >=1.8.0,<=1.23.0rc3. The version constraint of dependency hypothesis can be changed to >=5.19.0,<=6.47.3.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the numpy
    numpy.testing.assert_allclose
    
    The calling methods from the hypothesis
    hypothesis.strategies.permutations
    random.random
    hypothesis.strategies.lists
    hypothesis.strategies.integers
    hypothesis.strategies.floats
    
    The calling methods from the all methods
    cls.variable
    Context
    minitorch.TensorData
    range
    self.conv_and_pool
    self.backend.LT.apply
    self.backend.Neg.apply
    threadsperblock.blockspergrid.f
    minitorch.fast_ops.tensor_reduce
    shape
    self.model.forward
    minitorch.fast_ops.tensor_reduce.parallel_diagnostics
    RParam
    run_mnist_multiclass.ImageTrain.train
    grad_output.permute
    networkx.MultiDiGraph.add_edge
    col1.empty
    mul_reduce
    run_sentiment.SentenceSentimentTrain.train
    col2.slider
    self.SentimentCNN.super.__init__
    ctx.save_for_backward
    torch.tensor.float
    random.seed
    numpy.vstack
    super.__setattr__
    tensor_data.shape_broadcast
    eval
    loss.view.backward
    SentenceSentimentTrain
    CNNSentimentKim
    TensorTrain
    self.backend.ReLU.apply
    numpy.testing.assert_allclose
    minitorch.logsoftmax
    B.A.sum
    streamlit.empty.write
    Network2.forward
    cls.forward
    embeddings.permute
    col1.text_input
    int.i.i.i.np.array.astype.str.replace.replace.replace
    streamlit.empty
    int
    child_lines.append
    self.backend.View.apply
    stack.append
    minitorch.tensor.sigmoid
    steps.append
    minitorch.operators.prod
    streamlit.empty.progress
    self.dropout
    MyModule.named_parameters
    embeds.permute
    model.forward
    model.train
    col3.number_input
    plot_out
    self._ensure_tensor
    y.shape.prob.log.sum.view
    math.exp
    tensor_ops.zip
    id_map
    model.eval
    hypothesis.settings.load_profile
    HIDDEN.TensorTrain.train
    Network
    self.linear
    col1.number_input
    datasets.xor
    minitorch.tensor.type_
    y.append
    datasets.load_dataset
    x.self.linear2.sigmoid
    get_image_id
    self.backend.Sigmoid.apply
    neg_map
    SentenceSentimentTrain.train
    self.backend.Permute.apply
    make_tensor_backend
    show_tensor.tensor_figure
    tensor_ops.matrix_multiply
    st_select_index
    tensor_ops.map
    self.indices
    streamlit.markdown
    criterion
    minitorch.avgpool2d
    float
    embeddings.GloveEmbedding
    History
    hypothesis.strategies.integers
    get_train
    st_eval_error_message.reshape
    add_reduce
    plotly.graph_objects.Figure.update_layout
    make_mnist
    select_fn.keys
    probs.log.sum.reshape
    make_scatters
    minitorch.is_constant
    streamlit.markdown.markdown
    streamlit.beta_columns
    draw.permute
    prob.log
    b.a.is_close.all
    y.shape.loss.sum.view.backward
    grad_weight.permute.tuple
    random.shuffle
    tensor_conv2d
    st_eval_error_message
    model.out.to_numpy
    vals1.f.sum
    a._new
    df.append
    minitorch.Tensor.make
    X.torch.tensor.self.model.forward.detach
    Tensor
    self.zeros
    graph_builder.GraphBuilder.run
    streamlit.number_input
    numpy.array
    interface.streamlit_utils.get_img_tag
    minitorch.rand.sum
    mat.transpose
    sorted
    tensor_zip
    data.X.torch.tensor.model.forward.view
    box_adder
    self.bias.value.view
    TrainCls
    plot
    out.view.view
    transpose
    treduce
    add_one_box
    plot_tensor.add_annotation
    weight.permute
    numba.cuda.is_available
    streamlit.sidebar.markdown
    f.backward
    ob.append
    b.append
    tuple
    dict
    range.append
    n.__dict__.items
    ys.append
    streamlit.write
    streamlit.slider
    TorchTrain
    criterion.backward
    scalar
    numpy.random.RandomState.rand
    weight.permute.tuple
    abs
    streamlit.empty.dataframe
    pandas.DataFrame
    globals
    run_sentiment.encode_sentiment_data
    super
    torch.cat
    minitorch.rand.type_
    get_dataset
    W.H.BATCH.x.view.model.forward.view
    make_oned
    numba.njit
    h.self.layer3.forward.sigmoid
    plot_tensor.show
    b.a.is_close.all.item
    self.__dict__.values
    self.bias.append
    GraphBuilder.run
    GraphBuilder.run.add_edge
    load_data
    self._tensor.to_string
    pred.y.sum
    streamlit.subheader
    tensor_data.TensorData.shape_broadcast
    streamlit.cache
    random.random
    streamlit.plotly_chart
    hypothesis.settings.register_profile
    a.contiguous.view
    streamlit.set_page_config
    Mul.apply
    a._tensor.permute
    GPUBackend.FastTensorBackend.args.BACKEND.HIDDEN.FastTrain.train
    predictions_dataframe
    pandas.DataFrame.apply
    int.i.i.i.np.array.astype.str.replace
    tzip
    streamlit.selectbox
    tensor_conv1d
    minitorch.MathTest._tests
    plotly.graph_objects.Figure.add_trace
    torch.nn.Parameter
    a.append
    join
    numba.cuda.to_device
    input.permute.tuple
    plotly.graph_objects.Layout
    torch.nn.BCELoss
    hypothesis.strategies.lists
    GraphBuilder
    torch.tensor.max
    datasets_map.keys
    minitorch.fast_ops.tensor_map
    minitorch.Conv2dFun.apply
    in_size.batch.x.view.self.out_size.in_size.self.weights.value.view.sum
    name.model.graph.plot_out.show
    val_x.numpy.array.reshape
    input.tuple
    streamlit.text_area
    st_visualize_tensor
    loss.sum.view
    y.shape.loss.sum
    min
    TrainCls.train
    a.is_close
    minitorch.index_to_position
    visdom.Visdom.images
    Network2
    self._tensor.to_cuda_
    self.zero_derivative_
    self.layer1.forward
    fn
    self.backend.Exp.apply
    to_index
    PAGES.keys
    plot_tensor
    torch.nn.Conv1d
    self.forward
    model.squeeze
    dim.cols.write
    argparse.ArgumentParser.add_argument
    tensor_matrix_multiply
    y
    self.out_size.in_size.self.weights.value.view.in_size.batch.x.view.view
    b.contiguous.view.tuple
    minitorch.to_index
    ScalarTrain
    streamlit.beta_expander
    streamlit.table
    y.shape.loss.sum.view
    Network2.parameters
    map
    minitorch.Tensor
    y.shape.prob.log.sum
    construct_tensor
    x.x.map.list.np.array.ravel
    zip
    numpy.hstack
    plotly.graph_objects.Figure
    self.sum
    argparse.ArgumentParser.parse_args
    Conv2d
    visdom.Visdom
    self.backend.Copy.apply
    col2.button
    a.tuple
    torch.nn.Dropout
    col2.text_input
    tensor.Tensor
    run_mnist_multiclass.ImageTrain
    self.backend.All.apply
    self.weights.append
    project.run_sentiment.encode_sentiment_data
    streamlit.progress.progress
    out.tuple
    torch.nn.ModuleList
    out.view.to_cuda_
    threadsperblock.blockspergrid.tensor_matrix_multiply
    graph_builder.build_tensor_expression
    cur.is_leaf
    set
    y2.out.get_data.sum
    make_pts
    random.randint
    streamlit.button
    predictions_array.append
    tensor.Tensor.make.requires_grad_
    self.get
    X.append
    graph_builder.build_expression
    bool
    minitorch.datasets
    minitorch.fast_ops.tensor_map.parallel_diagnostics
    self.backend._id_map
    index_to_position
    load_glue_dataset
    show_expression_interface.render_show_expression
    self.is_leaf
    enumerate
    vals2.f.sum
    grad_weight.permute.permute
    train
    max
    conv
    TrainCls.run_one
    model.parameters
    exec
    minitorch.fast_ops.tensor_zip
    str
    minitorch.Parameter
    p.grad.zero_
    tmm.parallel_diagnostics
    torch.nn.Sigmoid
    MMLinear
    a.contiguous.view.contiguous
    zero
    y.copy
    h.self.layer2.forward.relu
    y.shape.prob.sum
    project.interface.streamlit_utils.render_function
    tensor_ops.reduce
    flatten
    torch.nn.Linear
    plotly.express.imshow
    ImageTrain.train
    backpropagate
    list
    graph_builder.GraphBuilder
    HIDDEN.ScalarTrain.train
    raw_vals.append
    project.run_sentiment.CNNSentimentKim
    coords.append
    losses.append
    self._tensor.set
    grad_output.zeros
    st_visualize_storage
    p.update
    super.backward
    embeddings_lookup.emb
    X.self.model.forward.view
    plot_tensor.update_layout
    self.layer2.forward
    B.A.sum.backward
    G.nx.nx_pydot.to_pydot.to_string
    end.self.layer3.forward.sigmoid
    minitorch.operators.sigmoid
    draw
    minitorch.tensor.view
    self.contiguous
    tensor.permute
    self.backend.Log.apply
    math.cos
    out.view.tuple
    in_size.batch.x.view.self.out_size.in_size.self.weights.value.view.sum.view
    streamlit.sidebar.radio
    super.__init__
    oa.append
    self.backend.MatMul.apply
    grad_output.permute.tuple
    tensor_interface.render_tensor_sandbox
    Conv1d
    x.self.conv1.relu
    math_interface.render_math_sandbox
    streamlit.checkbox
    input.zeros
    self.index
    Ze.extend
    n_cols.idx.cols.number_input
    Xs.append
    central_difference
    Tensor.make._type_
    x
    streamlit.text_input
    self.conv3
    model.zero_grad
    x.x1.map.list.np.array.ravel
    unwrap_tuple
    model
    Tensor.make
    FastTrain
    run_mnist_multiclass.make_mnist
    probs.log.sum.view
    i.i.i.np.array.astype
    tensor_reduce
    plot_tensor.add_trace
    self.model.parameters
    Parameter
    self.backend.Mul.apply
    self._tensor.tuple
    x.requires_grad_
    X.self.model.forward.view.get_data
    log_fn
    weights.append
    self.add_parameter
    minitorch.SGD.zero_grad
    x.self.conv3.relu
    grad_output.tuple
    _tensor
    data.N.loss.backward
    torch.rand
    max_reduce
    interface.train.render_train_interface
    validation_accuracy.append
    minitorch.SGD
    self.backend.Sum.apply
    model_output.to_numpy
    minitorch.MathTestVariable._tests
    self.fc
    shapes
    torch.tensor
    data.N.loss.sum.view.backward
    zero._type_
    minitorch.matmul
    Linear
    minitorch.make_tensor_backend
    inspect.getsource
    page
    self.out_size.in_size.self.weights.value.view.in_size.batch.x.view.minitorch.matmul.view
    autodiff.History
    loss.reshape.item
    queue.append
    a.contiguous
    tmm
    MyModule
    streamlit.empty.plotly_chart
    repr
    x.self.layer1.forward.relu
    threadsperblock.blockspergrid.jit_sum_practice
    operators.log
    x.zero_grad_
    isinstance
    tensor_data
    scalar.backward
    b.tuple
    self.backend.IsClose.apply
    threadsperblock.blockspergrid.jit_mm_practice
    data.N.loss.sum
    f.sum
    self.conv2
    numpy.ones
    len
    a.zeros.tuple
    self.value.requires_grad_
    time.time
    set.add
    a.contiguous.view.tuple
    datasets.split
    fast_ops.FastOps.reduce
    col1.selectbox.datasets_map
    networkx.nx_pydot.to_pydot
    mnist.MNIST.load_training
    self.linear1
    numpy.round
    scalars
    minitorch.tensor.requires_grad_
    operators.log_back
    self.backend.EQ.apply
    probs.log.sum
    b.contiguous.view.contiguous
    plotly.graph_objects.Contour
    x.sigmoid.view
    val_ys.append
    torch.nn.functional.relu
    minitorch.zeros
    streamlit.empty.markdown
    inv_back_zip
    networkx.MultiDiGraph.add_node
    numba.cuda.is_cuda_array
    values.append
    numba.cuda.jit
    print
    minitorch.fast_ops.tensor_zip.parallel_diagnostics
    minitorch.make_tensor_functions
    a.contiguous.view.zeros
    minitorch.max
    hasattr
    BATCH.x.view.model.forward.view
    strides_from_shape
    operators.prod
    argparse.ArgumentParser
    GraphBuilder.run.add_node
    self.layer3.forward
    Ye.extend
    add_zip
    y.out.sum
    streamlit.text
    self.backend._add_reduce
    minitorch.operators.is_close
    minitorch.rand
    minitorch.prod
    G.nx.nx_pydot.to_pydot.create_svg
    make_pts.append
    reversed
    plotly.graph_objects.Mesh3d
    a._tensor.is_contiguous
    sentence.split
    TrainCls.run_many
    join.pop
    Inv.apply
    int.i.i.i.np.array.astype.str.replace.replace
    out.sum.backward
    val_losses.append
    math.sin
    get_predictions_array
    torch.optim.Adam.step
    s_.split
    numpy.random.RandomState
    probs.log
    input.zeros.tuple
    self.backend.Add.apply
    col1.empty.button
    criterion.item
    tensor_map
    minitorch.conv2d
    prob.log.sum
    minitorch.SGD.step
    self.sig
    interface.plots.plot_out
    ImageTrain
    streamlit.error
    self.linear2
    weight.tuple
    train.model.named_parameters
    model.mid.to_numpy
    cls.data
    streamlit.selectbox.select_fn
    SentimentCNN
    torch.optim.Adam
    input.permute
    self._tensor.get
    mnist.MNIST
    Xe.extend
    NotImplementedError
    encode_sentiment_data
    encode_sentences
    interface.streamlit_utils.render_function
    HIDDEN.TorchTrain.train
    type
    make_scatters.append
    a.zeros
    list.append
    y.shape.prob.sum.view
    i.self.weights.append
    col1.selectbox
    self.conv1
    minitorch.Scalar
    grad_central_difference
    construct_whole_box
    show_tensor.tensor_figure.update_layout
    streamlit.header
    Graph
    run_sentiment.SentenceSentimentTrain
    plotly.graph_objects.Figure.show
    format
    hypothesis.strategies.permutations
    self._modules.items
    x.view
    data.N.loss.sum.view
    datasets.simple
    streamlit.progress
    tensor.Tensor.make
    self.contiguous._tensor._storage.reshape
    list.reverse
    streamlit.warning
    IndexingError
    plotly.graph_objects.Scatter
    loss.sum.view.backward
    col2.number_input
    train_accuracy.append
    minitorch.tensor
    get_accuracy
    minitorch.dropout
    b.contiguous.view
    shape_broadcast
    streamlit.graphviz_chart
    tensor_data.TensorData
    networkx.MultiDiGraph
    zeros
    layout.append
    BATCH.x.view.self.linear1.relu
    h.relu
    setuptools.setup
    minitorch.conv1d
    self.weights.value.view
    math.log
    inv_map
    plotly.graph_objects.Surface
    f
    tmap
    self.backend.Inv.apply
    x._tensor.sample
    v.get_data
    x.self.conv2.relu
    hypothesis.strategies.floats
    self.get_name
    pred.y.sum.item
    _addindent
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
  • Installing numba using pip in a conda environment throws a segmentation fault

    Installing numba using pip in a conda environment throws a segmentation fault

    The build for numba when using pip install numba breaks when installed in a conda environment. This is known bug with numba: https://github.com/numba/numba/issues/4515

    The installation setup should mention that when using conda environments, a re-install of numba using conda is required

    opened by sshkhr 0
Releases(v2021)
Owner
null
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022
A Tools that help Data Scientists and ML engineers train and deploy ML models.

Domino Research This repo contains projects under active development by the Domino R&D team. We build tools that help Data Scientists and ML engineers

Domino Data Lab 73 Oct 17, 2022
Model factory is a ML training platform to help engineers to build ML models at scale

Model Factory Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high qu

null 16 Sep 23, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 9, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8.1k Dec 30, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 3, 2022
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Dec 28, 2022
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 1, 2023
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 29, 2022