Accelerate Neural Net Training by Progressively Freezing Layers

Andy Brock

Last update: Jun 19, 2022

Related tags

Deep Learning machine-learning deep-learning memes pytorch neural-networks densenet vgg16 wide-residual-networks

Overview

FreezeOut

A simple technique to accelerate neural net training by progressively freezing layers.

This repository contains code for the extended abstract "FreezeOut."

FreezeOut directly accelerates training by annealing layer-wise learning rates to zero on a set schedule, and excluding layers from the backward pass once their learning rate bottoms out.

I had this idea while replying to a reddit comment at 4AM. I threw it in an experiment, and it just worked out of the box (with linear scaling and t_0=0.5), so I went on a 96-hour SCIENCE binge, and now, here we are.

The exact speedup you get depends on how much error you can tolerate--higher speedups appear to come at the cost of an increase in error, but speedups below 20% should be within a 3% relative error envelope, and speedups around 10% seem to incur no error cost for Scaled Cubic and Unscaled Linear strategies.

Installation

To run this script, you will need PyTorch and a CUDA-capable GPU. If you wish to run it on CPU, just remove all the .cuda() calls.

Running

To run with default parameters, simply call

python train.py

This will by default download CIFAR-100, split it into train, valid, and test sets, then train a k=12 L=76 DenseNet-BC using SGD with Nesterov Momentum.

This script supports command line arguments for a variety of parameters, with the FreezeOut specific parameters being:

how_scale selects which annealing strategy to use, among linear, squared, and cubic. Cubic by default.
scale_lr determines whether to scale initial learning rates based on t_i. True by default.
t_0 is a float between 0 and 1 that decides how far into training to freeze the first layer. 0.8 (pre-cubed) by default.
const_time is an experimental setting that increases the number of epochs based on the estimated speedup, in order to match the total training time against a non-FreezeOut baseline. I have not validated if this is worthwhile or not.

You can also set the name of the weights and the metrics log, which model to use, how many epochs to train for, etc.

If you want to calculate an estimated speedup for a given strategy and t_0 value, use the calc_speedup() function in utils.py.

Notes

If you know how to implement this in a static-graph framework (specifically TensorFlow or Caffe2), shoot me an email! It's really easy to do with dynamic graphs, but I believe it to be possible with some simple conditionals in a static graph.

There's (at least) one typo in the paper where it defines the learning rate schedule, there should be a 1/2 in front of alpha.

Acknowledgments

DenseNet code stolen in a daring midnight heist from Brandon Amos: https://github.com/bamos/densenet.pytorch
Training and Progress code acquired in a drunken game of SpearPong with Jan Schlüter: https://github.com/Lasagne/Recipes/tree/master/papers/densenet
Metrics Logging code extracted from ancient diary of Daniel Maturana: https://github.com/dimatura/voxnet
WideResNet code summoned using an incantation from Xternalz: https://github.com/xternalz/WideResNet-pytorch

You might also like...

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

6.2k Jan 1, 2023

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

6.2k Jan 9, 2023

Simple codebase for flexible neural net training

neural-modular Simple codebase for flexible neural net training. Allows for seamless exchange of models, dataset, and optimizers. Uses hydra for confi

7 Apr 5, 2022

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

308 Jan 5, 2023

GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.

The GT4SD (Generative Toolkit for Scientific Discovery) is an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.

Generative Toolkit 4 Scientific Discovery

142 Dec 24, 2022

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

1 Oct 29, 2021

Comments

just a note.

not a bad idea at all, Dont you think the tensorflow / pytorch community would like to improve it with your idea? These days if you can compress video 3% more then it has a huge impact on data storage centers. If you can train an AI faster, 10%,... boy that's huge... think of the impact on data centers, training time. You should draw some attention from nvidea / intel.

opened by PGTBoos 0
Error in densenet lr update formula:

The 0.05 in the update LR formula overrides the initialized LR per layer (the 1e-1). The line that inits the LR per layer with the scale_lr option is overridden by the formula. m.lr = 1e-1 / m.lr_ratio if self.scale_lr else 1e-1

The formula with the change: self.optim.param_groups[i]['lr'] = (m.lr)*(1+np.cos(np.pi*self.j/m.max_j))

opened by nivdu 0
fashionMNIST results

This code is linked from the fashion-mnist repo, w/ very good results. Do you have a script somewhere I might be able to use to reproduce those numbers?

Thanks Ben

opened by bkj 5
error

hi,,,,Im sorry to bother you again....there are still some errors....... like this: file train.py ,line 283, in main train_test(**vars(args))

filenotfoundError:[Error 2] No such file or directory:Path('logs/densenet_k12L76_ice80_cubicTrue_seed0_epochs100c100_log.jsonl') looking forward to your reply

opened by jingdonglin 1

Accelerate Neural Net Training by Progressively Freezing Layers

Related tags

Overview

FreezeOut

Installation

Running

Notes

Acknowledgments

You might also like...

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Simple codebase for flexible neural net training

(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

Meta Language-Specific Layers in Multilingual Language Models

Improving Deep Network Debuggability via Sparse Decision Layers

Comments

just a note.

Error in densenet lr update formula:

fashionMNIST results

error

Owner

Andy Brock

2021-MICCAI-Progressively Normalized Self-Attention Network for Video Polyp Segmentation

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

U-2-Net: U Square Net - Modified for paired image training of style transfer

Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

PyTorch common framework to accelerate network implementation, training and validation

🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.