Repository for DNN training, theory to practice, part of the Large Scale Machine Learning class at Mines Paritech

Alexandre Défossez

Last update: Nov 14, 2022

Related tags

Miscellaneous dnn_theo_practice

Overview

DNN Training, from theory to practice

This repository is complementary to the deep learning training lesson given to les Mines ParisTech on the 11th of March 2022 as part of the Large Scale Machine Learning class.

You can find here the slides of the class.

Requirements

To get started, clone it and prepare a new virtual env.

git clone https://github.com/adefossez/dnn_theo_practice
cd dnn_theo_practice
python3 -m venv env
source env/bin/activate
python3 -m pip install -r requirements.txt

Note: it can be safer to install PyTorch through a conda environment to make sure all proper versions of CUDA realted libraries are installed and used. We use pip here for simplicity.

Basic training pipeline

To get started, you can run

python -m basic.train

You can tweak some hyper parameters:

python -m basic.train --lr 0.1 --epochs 30 --model mobilenet_v2

This basic pipeline provides all the essential tools for training a neural network:

automatic experiment naming,
logging and metric dumping,
checkpointing with automatic resume.

Looking at basic/train.py you will see that 90% of the code is not deep learning but pure engineering. Some frameworks like PyTorch Lightning can save you some of this trouble, at the cost of losing control and understanding over what happens. In any case it is good to have an idea of how things work under the hood!

PyTorch-Lightning training pipeline

Insite the pl_hydra folder, I provide the same pipeline, but using PyTorch-Lightning along with Hydra, as an alternative to argparse. Have a look at pl_hydra/train.py to see the differences with the previous implementation.

python -m pl_hydra.train optim.lr=0.1 model=mobilenet_v2

Using existing frameworks:

At this point, it is a good time to introduce a few frameworks you might want to use for your projects.

Hydra

Hydra handles things like logging, configuration parsing (based on YAML files, which is a bit nicer than argparse, especially for large projects), and also has support for some grid search scheduling with a dedicated language. It also supports meta-optimizers like Nevergrad (see after).

Nevergrad

Nevergrad is a framework for gradient free optimization. It can be used to automatically tune your model or optimization hyper-parameters with smart random search.

PyTorch-Lightning

PyTorch Lightning takes care of logging, distributed training, checkpointing and many more boilerplate parts of a deep learning research project. It is powerful but also quite complex, and you will lose some control over the training pipeline.

Dora

Dora is an experiment management framework:

Grid searches are expressed as pure python.
Experiments have an automatic signature assigned based on its args.
Keeps in sync experiments defined in grid files, and those running on the cluster.
Basic terminal based reporting of job states, metrics etc.

Dora allows you to scale up to hundreds of experiments without losing your sanity.

Plotting and monitoring utilities

While it is always good to have basic metric reporting inside logs, it can be more conveniant to track experimental progress through a web browser. TensorBoard, initially developed for TensorFlow provide just that. A fully hosted alternative is Wandb. Finally, HiPlot is a lightweight package to easily make sense of the impact of hyperparameters on the metrics of interest.

Unix tools

It is a good idea to learn to master the standard Unix/Linux tools! For large scale machine learning, you will often have to run experiments on a remote cluster, with only SSH access. tmux is a must have, as well as knowing at least of one terminal based file editor (nano is the simplest, emacs or vim are more complex but quite powerful). Take some time to learn about tuning your bashrc, setting up aliases for often used commands etc.

You will probably need tools like grep, less, find or ack. I personnaly really enjoy fd, an alternative to find with some intuitive interface. Similarly ag is a nice way to quickly look through a codebase in the terminal. If you need to go through a lot of logs, you will enjoy ripgreg.

License

This code in this repository is released into the public domain. You can freely reuse any part of it and you don't even need to say where you found it! See the LICENSE for more information.

The slides are released under Creative Commons CC-BY-NC.

System Design Assignments as part of Arpit's System Design Masterclass

System Design Assignments The repository contains a set of problem statements around Software Architecture and System Design as conducted by Arpit's S

1.1k Jan 9, 2023

A repo to record how I prepare my Interview, and really hope it can help you as well. Really appreciate Kieran's help in the pattern's part.

Project Overview The purpose of this repo is to help others to find solutions and explaintion I will commit a solution and explanation to every proble

1 Nov 29, 2021

Blender Addon for Snapping a UV to a specific part of a Tilemap

UVGridSnapper A simple Blender Addon for easier texturing. A menu in the UV editor allows a square UV to be snapped to an Atlas texture, or Tilemap. P

2 Jul 17, 2022

Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity

Date created February 10, 2022 Project Title Explore US Bikeshare Data Descripti

1 Feb 14, 2022

My sister is a GR of her class. She had to mark attendance of students from screenshots of teams meeting on an excel sheet. I resolved her problem by reading names from screenshots using PyTesseract and marking them present on the excel using Pandas in Python. It took me 1hr to write the code and it is saving half an hour everyday.

auto-team-attandance Don't judge the code, this is not the best way to write code. I was learning tkinter that is why GUI is bad. Here's the Mega link

25 Sep 4, 2022

What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space

What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space Introduction: Environment: Python3.6.5, PyTorch1.5.0 Dataset: CIFAR-10, Image

8 Mar 23, 2022

Cross-platform MachO/ObjC Static binary analysis tool & library. class-dump + otool + lipo + more

ktool Static Mach-O binary metadata analysis tool / information dumper pip3 install k2l Development is currently taking place on the @python3.10 branc

301 Dec 28, 2022

Coursework project for DIP class. The goal is to use vision to guide the Dashgo robot through two traffic cones in bright color.

3 Oct 24, 2022

A class to draw curves expressed as L-System production rules

6 Sep 9, 2022

Repository for DNN training, theory to practice, part of the Large Scale Machine Learning class at Mines Paritech

Related tags

Overview

DNN Training, from theory to practice

Requirements

Basic training pipeline

PyTorch-Lightning training pipeline

Using existing frameworks:

Hydra

Nevergrad

PyTorch-Lightning

Dora

Plotting and monitoring utilities

Unix tools

License

You might also like...

System Design Assignments as part of Arpit's System Design Masterclass

A repo to record how I prepare my Interview, and really hope it can help you as well. Really appreciate Kieran's help in the pattern's part.

Blender Addon for Snapping a UV to a specific part of a Tilemap

Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity

What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space

Cross-platform MachO/ObjC Static binary analysis tool & library. class-dump + otool + lipo + more

Coursework project for DIP class. The goal is to use vision to guide the Dashgo robot through two traffic cones in bright color.

A class to draw curves expressed as L-System production rules

Owner

Alexandre Défossez

Tracking development of the Class Schedule Siri Shortcut, an iOS program that checks the type of school day and tells you class scheduling.

LTGen provides classic algorithms used in Language Theory.

This is a library for simulate probability theory problems specialy conditional probability

An open-source Python project series where beginners can contribute and practice coding.

A practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python without library.

This is a practice on Airflow, which is building virtual env, installing Airflow and constructing data pipeline (DAGs)

The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.

Data Structures and Algorithms Python - Practice data structures and algorithms in python with few small projects

MiniJVM is simple java virtual machine written by python language, it can load class file from file system and run it.

Built as part of an assignment for S5 OOSE Subject CSE