Anderson Acceleration for Deep Learning

Oak Ridge National Laboratory

Last update: Nov 24, 2022

Related tags

Deep Learning AADL

Overview

Anderson Accelerated Deep Learning (AADL)

AADL is a Python package that implements the Anderson acceleration to speed-up the training of deep learning (DL) models using the PyTorch library.
AA is an extrapolation technique that can accelerate fixed-point iterations such those arising from the iterative training of DL models. However, large volume of data are typically processed in sequential random batches which introduces stochastic oscillations in the fixed-point iteration that hinders AA acceleration. AADL implements a moving average that reduces the oscillations and results in a smoother sequence of gradient descent updates which enables the use of AA. AADL uses a criterion to automatically decide if the moving average is needed by monitoring if the relative standard deviation between consecutive stochastic gradient updates exceeds a tolerance defined by the user.

Requirements

Python 3.5 or greater
PyTorch (any version works)

Installation

AADL comes with a setuptools install script:

python3 setup.py install

Usage

import torch
import torch.nn
import torch.optim
import AADL

# Creation of the DL model (neural network)
class model(torch.nn.Module):
	...

# Definition of the stochastic optimizer used to train the model
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.9, nesterov = True)

# Parameters for Anderson acceleration
relaxation = 0.5
wait_iterations = 0
history_depth = 10
store_each_nth = 10
frequency = store_each_nth
reg_acc = 0.0
safeguard = True
average = True

# Over-writing of the torch.optim.step() method 
AADL.accelerate(optimizer_anderson, "anderson", relaxation, wait_iterations, history_depth, store_each_nth, frequency, reg_acc, average)

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

BSD-3-Clause

Citations

"AADL: Anderson Accelerated Deep Learning", Copyright ID#: 81927550 https://doi.org/10.11578/dc.20210723.1

Comments

Conserve gpu memory by storing history on cpu memory instead
This patch offloads AADL history to the cpu memory instead of using valuable gpu memory.

This incurs a performance hit of transferring the vectors to and from cpu memory, but allows for training ~~without reducing batch sizes~~ with a smaller reduction in batch size than without the patch and not run out of memory.

This can probably be ameliorated by interleaving the memory transfers with the computation.

This change also fixes a bug with torch.nn.utils.convert_parameters.vector_to_parameters where it does not preserve the memory_format of param.data.

History device offload is configurable by the user so that they can continue to use gpu memory for history if they prefer that for some reason instead (by using accelerate(..., history_device="cuda").

For reference, I get the following error without cpu memory offload after about like 90 iterations:

RuntimeError: CUDA out of memory. Tried to allocate 2.55 GiB (GPU 0; 24.00 GiB total capacity; 16.60 GiB already allocated; 1.82 GiB free; 19.75 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

With cpu memory offload I'm able to go 300+ iterations (at the same batch size as the failure scenario above).
opened by henrymai 34
Distributed and bug fixes
This PR is mostly for discussion at this point. Please don't merge now

Critical changes:

added @torch.no_grad() decorators for accelerated optimization steps in accelerate.py. This is absolutely necessary and has been missing

fixed size of gamma in anderson_acceleration.py. This bug caused incorrect broadcasting of vectors in extr = X[:,-2] + DX[:,-1] - (DX[:,:-1]+DR)@gamma

Additions:

def distributed_accelerated_step in accelerate.py and corresponding modification to def accelerate. def averaged_* have not been changed but must be later

new CIFAR10_distributed example inspired by ImageNet1k. Lines to pay attention: 32-40, 120-134, 197-209, 258-267

To run new example locally: torchrun --standalone --nnodes=1 --nproc_per_node=10 main.py
bug
opened by vreshniak 3
Typo in anderson_acceleration.py ?
https://github.com/ORNL/AADL/blob/59c19f7c221ebda434359ddd7582399b925a3722/AADL/anderson_acceleration.py#L19

File "[...] anderson_acceleration.py", line 19, in anderson_qr_factorization gamma = torch.linalg.lstsq(DR, R[:, -1]).solution NameError: name 'R' is not defined

For that line, should R be DR instead?
opened by henrymai 3
Remove `__pycache__` and other redudant files
The changes proposed by the PR can be summarized as follows :-

Remove autogenerated and redundant __pycache__ files (which serve no purpose).

Remove .DS_Store file which I guess was saved by mistake
opened by SauravMaheshkar 1
[Feature Request] Add `requirements.txt`

The repository as of now lacks a requirements.txt. It's not possible to run examples because even they require certain packages like docopt.

Having a requirements.txt file would make reproducibility and the onboarding process much easier.

opened by SauravMaheshkar 1

Anderson Acceleration for Deep Learning

Related tags

Overview

Anderson Accelerated Deep Learning (AADL)

Requirements

Installation

Usage

Contributing

License

Citations

Comments

Conserve gpu memory by storing history on cpu memory instead

Distributed and bug fixes

Typo in anderson_acceleration.py ?

Remove `pycache` and other redudant files

[Feature Request] Add `requirements.txt`

Owner

Oak Ridge National Laboratory

DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Tensors and Dynamic neural networks in Python with strong GPU acceleration

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

Neural Fixed-Point Acceleration for Convex Optimization

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

FTIR-Deep Learning - FTIR Deep Learning With Python

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

Anderson Acceleration for Deep Learning

Related tags

Overview

Anderson Accelerated Deep Learning (AADL)

Requirements

Installation

Usage

Contributing

License

Citations

Comments

Conserve gpu memory by storing history on cpu memory instead

Distributed and bug fixes

Typo in anderson_acceleration.py ?

Remove `__pycache__` and other redudant files

[Feature Request] Add `requirements.txt`

Owner

Oak Ridge National Laboratory

DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Tensors and Dynamic neural networks in Python with strong GPU acceleration

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

Neural Fixed-Point Acceleration for Convex Optimization

Calculates JMA (Japan Meteorological Agency) seismic intensity (shindo) scale from acceleration data recorded in NumPy array

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

Ivy is a templated deep learning framework which maximizes the portability of deep learning codebases.

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

FTIR-Deep Learning - FTIR Deep Learning With Python

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

Remove `pycache` and other redudant files