Implements pytorch code for the Accelerated SGD algorithm.

Last update: Jan 2, 2023

Related tags

Deep Learning AccSGD

Overview

AccSGD

This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic Optimization, selected to appear at ICLR 2018.

Usage:

The code can be downloaded and placed in a given local directory. In a manner similar to using any usual optimizer from the pytorch toolkit, it is also possible to use the AccSGD optimizer with little effort. First, we require importing the optimizer through the following command:

from AccSGD import *

Next, an ASGD optimizer working with a given pytorch model can be invoked using the following command:

optimizer = AccSGD(model.parameters(), lr=0.1, kappa = 1000.0, xi = 10.0)

where, lr is the learning rate, kappa the long step parameter and xi is the statistical advantage parameter.

Guidelines on setting parameters/debugging:

The learning rate lr: lr is set in a manner similar to schemes such as vanilla Stochastic Gradient Descent (SGD)/Standard Momentum (Heavy Ball)/Nesterov's Acceleration. Note that lr is a function of batch size - a rigorous quantification of this phenomenon can be found in the following paper. Such a characterization has been observed in several empirical works.

Long Step kappa: As the networks grow deeper (e.g. with resnets) and when dealing with typically harder datasets such as CIFAR/ImageNet, employing kappa to be 10^4 or more helps. For shallow nets and easier datasets such as MNIST, a typical value of kappa can be set as 10^3 or even 10^2.

Statistical Advantage Parameter xi: xi lies between 1.0 and sqrt(kappa). When large batch sizes (nearly matching batch gradient descent) are used, it is advisable to use xi that is closer to sqrt(kappa). In general, as the batch size increases by a factor of k, increase xi by sqrt(k).

Effective ways to debug:

For Nets with ReLU/ELU type activations:

(--1--) Slower convergence: There are three reasons for this to happen:

This could be a result of setting the learning rate too low (similar to SGD/vanilla momentum/Nesterov's acceleration).
This could be as a result of setting kappa to be too high.
The other reason could be that xi has been set to a small value and needs to be increased.

(--2--) Oscillatory behavior/Divergence: There are two reasons for this to happen:

This could be a result of setting the learning rate to be too high (similar to SGD/vanilla momentum/Nesterov's acceleration).
The other reason is that xi has been set to a large value and needs to be decreased.

For nets with Sigmoid activations:

Slower convergence after an initial rapid decrease in error: This is a sign of an over aggressive setting of parameters and must be treated in a similar manner as the oscillatory/divergence behavior (--2--) encountered in the ReLU/ELU activation case.

Slow convergence right from the start: This is more likely related to slower convergence (--1--) encountered in the ReLU/ELU case.

Citation:

If AccSGD is used in your paper/experiments, please cite the following papers.

@inproceedings{Kidambi2018Insufficiency,
  title={On the insufficiency of existing momentum schemes for Stochastic Optimization},
  author={Kidambi, Rahul and Netrapalli, Praneeth and Jain, Prateek and Kakade, Sham},
  booktitle={International Conference on Learning Representations},
  year={2018}
}

@Article{Jain2017Accelerating,
  title={Accelerating Stochastic Gradient Descent},
  author={Jain, Prateek and Kakade, Sham and Kidambi, Rahul and Netrapalli, Praneeth and Sidford, Aaron},
  journal={CoRR},
  volume = {abs/1704.08227},
  year={2017}
}

You might also like...

3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

39 Nov 21, 2022

Hardware accelerated, batchable and differentiable optimizers in JAX.

JAXopt Installation | Examples | References Hardware accelerated (GPU/TPU), batchable and differentiable optimizers in JAX. Installation JAXopt can be

621 Jan 8, 2023

Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

75 Dec 15, 2022

meProp: Sparsified Back Propagation for Accelerated Deep Learning

meProp The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf]

107 Nov 18, 2022

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

419 Jan 3, 2023

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

Comments

Make pip installable

Changing the project layout, adding a setup.py file, and adding the project to PyPI might make it easier for the ML community to use AccSGD method in their projects. Should be easy to do and wouldn't mind updating the layout and adding the setup.py file

opened by InzamamRahaman 0

Implements pytorch code for the Accelerated SGD algorithm.

Related tags

Overview

AccSGD

Usage:

Guidelines on setting parameters/debugging:

Citation:

You might also like...

3D ResNet Video Classification accelerated by TensorRT

Hardware accelerated, batchable and differentiable optimizers in JAX.

Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

meProp: Sparsified Back Propagation for Accelerated Deep Learning

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

GPU Accelerated Non-rigid ICP for surface registration

Real-time pose estimation accelerated with NVIDIA TensorRT

Comments

Make pip installable

Owner

NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch

Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

auto-tuning momentum SGD optimizer

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

A mini lib that implements several useful functions binding to PyTorch in C++.

deep-table implements various state-of-the-art deep learning and self-supervised learning algorithms for tabular data using PyTorch.

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

GPU-Accelerated Deep Learning Library in Python

Accelerated deep learning R&D