An optimizer that trains as fast as Adam and as good as SGD.



PyPI - Version PyPI - Python Version PyPI - Wheel GitHub - LICENSE

An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popular tasks in the field of CV, NLP, and etc.

Based on Luo et al. (2019). Adaptive Gradient Methods with Dynamic Bound of Learning Rate. In Proc. of ICLR 2019.

Quick Links


AdaBound requires Python 3.6.0 or later. We currently provide PyTorch version and AdaBound for TensorFlow is coming soon.

Installing via pip

The preferred way to install AdaBound is via pip with a virtual environment. Just run

pip install adabound

in your Python environment and you are ready to go!

Using source code

As AdaBound is a Python class with only 100+ lines, an alternative way is directly downloading and copying it to your project.


You can use AdaBound just like any other PyTorch optimizers.

optimizer = adabound.AdaBound(model.parameters(), lr=1e-3, final_lr=0.1)

As described in the paper, AdaBound is an optimizer that behaves like Adam at the beginning of training, and gradually transforms to SGD at the end. The final_lr parameter indicates AdaBound would transforms to an SGD with this learning rate. In common cases, a default final learning rate of 0.1 can achieve relatively good and stable results on unseen data. It is not very sensitive to its hyperparameters. See Appendix G of the paper for more details.

Despite of its robust performance, we still have to state that, there is no silver bullet. It does not mean that you will be free from tuning hyperparameters once using AdaBound. The performance of a model depends on so many things including the task, the model structure, the distribution of data, and etc. You still need to decide what hyperparameters to use based on your specific situation, but you may probably use much less time than before!


Thanks to the awesome work by the GitHub team and the Jupyter team, the Jupyter notebook (.ipynb) files can render directly on GitHub. We provide several notebooks (like this one) for better visualization. We hope to illustrate the robust performance of AdaBound through these examples.

For the full list of demos, please refer to this page.


If you use AdaBound in your research, please cite Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}




Apache 2.0

  • What is up with Epoch 150

    What is up with Epoch 150

    I'm wondering what is happening at epoch 150 in all visualizations? I would like to introduce that into all my models ;-)

    opened by kootenpv 8
  • AdaBoundW


    An AdaBound version with decoupled weight decay, which has been implemented to the code as an additional class, as it has been discussed in the recent issue #13.

    opened by kayuksel 3
  • Question about the code

    Question about the code

    IIRC, because group['lr'] will never be changed, so finalr_lr will always be the same as group['final_lr']. Is this intended?

    opened by crcrpar 2
  • Don't work properly with higher lr

    Don't work properly with higher lr

    I'm new in deep learning and I found the project works well with SGD but turns to be sth wrong with adabound.

    When I start with lr=1e-3, it shows as below and break down: invalid argument 2: non-empty 3D or 4D (batch mode) tensor expected for input, but got: [1 x 64 x 0 x 27] at /pytorch/aten/src/THCUNN/generic/

    But seems to work right if I set lr to 1e-4 or lower. It confused me a lot. Any ideas?

    python=3.6 pytorch=1.0.1 / 0.4

    opened by Ocelot7777 0
  • When did the optimizer switch to SGD?

    When did the optimizer switch to SGD?

    I set the initial lr=0.0001, final_lr=0.1, but I still don't know when the optimizer will become SGD. Do I need to improve my learning rate to the final learning rate manually? thanks!

    opened by yunbujian 0
  • Pytorch 1.6 warning

    Pytorch 1.6 warning

    /home/xxxx/.local/lib/python3.7/site-packages/adabound/ UserWarning: This overload of add_ is deprecated:
            add_(Number alpha, Tensor other)
    Consider using one of the following signatures instead:
            add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
      exp_avg.mul_(beta1).add_(1 - beta1, grad)
    opened by MichaelMonashev 1
  • Learning rate changing

    Learning rate changing

    Hi, thanks a lot for sharing your excellent work.

    I wonder if I want to change learning rate with epoch increasing, how do I set parameter lr and final_lr in adamnboound ? Or is there any need changing learining rate with epoch increasing?

    Looking for your reply, thanks a lot.

    opened by EddieEduardo 0
  • LSTM hyparameters for language modeling

    LSTM hyparameters for language modeling


    Thanks for your great paper. I am wondering about the hyperparameters you used for language modeling experiments. Could you provide information about that?

    Thank you!

    opened by hoangcuong2011 0
  • Merge with Ranger or over9000

    Merge with Ranger or over9000

    I strongly believe that AdaBound would be better if it used RAdam instead of Adam. It could merge with Lookahead too and LAMB. Then we would have the best of both worlds and a beautiful example of scientific collaboration.

    opened by LifeIsStrange 0
  • v0.0.5(Mar 6, 2019)

    Bug Fixes

    • Fix wrong assertion of final_lr 02e11bae10c82f6b5365f7925c8cf71252adcd52
    • Fix .gitignore in CIFAR-10 demo to include the learning curve data 54ef9aa6c133caf0d9c82198d46979cfdbbb12f6
    Source code(tar.gz)
    Source code(zip)
A fool living in the amazing world.
Implements pytorch code for the Accelerated SGD algorithm.

AccSGD This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic O

null 204 Jul 25, 2022
torch-optimizer -- collection of optimizers for Pytorch

torch-optimizer torch-optimizer -- collection of optimizers for PyTorch compatible with optim module. Simple example import torch_optimizer as optim

Nikolay Novik 2.5k Aug 15, 2022
Bunch of optimizer implementations in PyTorch

Bunch of optimizer implementations in PyTorch

Hyeongchan Kim 45 Jul 30, 2022
Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

Mikhail Grankin 399 Jul 15, 2022
PyTorch extensions for fast R&D prototyping and Kaggle farming

Pytorch-toolbelt A pytorch-toolbelt is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming: What

Eugene Khvedchenya 1.3k Aug 9, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1k Aug 1, 2022
Training RNNs as Fast as CNNs (

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

ASAPP Research 2.1k Jul 30, 2022
Fast Discounted Cumulative Sums in PyTorch

TODO: update this README! Fast Discounted Cumulative Sums in PyTorch This repository implements an efficient parallel algorithm for the computation of

Daniel Povey 7 Feb 17, 2022
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Cong Cai 12 Dec 19, 2021
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

PyTorch Implementation of Differentiable ODE Solvers This library provides ordinary differential equation (ODE) solvers implemented in PyTorch. Backpr

Ricky Chen 4.2k Aug 9, 2022
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News March 3: v0.9.97 has various bug fixes and improvements: Bug fixes for NTXentLoss Efficiency improvement for AccuracyCalculator, by using torch i

Kevin Musgrave 4.7k Aug 11, 2022
A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Torchmeta A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning bench

Tristan Deleu 1.7k Aug 13, 2022
Pretrained EfficientNet, EfficientNet-Lite, MixNet, MobileNetV3 / V2, MNASNet A1 and B1, FBNet, Single-Path NAS

(Generic) EfficientNets for PyTorch A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter ef

Ross Wightman 1.5k Aug 8, 2022
Differentiable SDE solvers with GPU support and efficient sensitivity analysis.

PyTorch Implementation of Differentiable SDE Solvers This library provides stochastic differential equation (SDE) solvers with GPU support and efficie

Google Research 1k Aug 11, 2022
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

abhishek thakur 1.1k Aug 7, 2022
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

micrograd A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural

Andrej 2.1k Aug 3, 2022
A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

GRAAL/GRAIL 529 Aug 7, 2022
pip install antialiased-cnns to improve stability and accuracy

Antialiased CNNs [Project Page] [Paper] [Talk] Making Convolutional Networks Shift-Invariant Again Richard Zhang. In ICML, 2019. Quick & easy start Ru

Adobe, Inc. 1.5k Aug 9, 2022
A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

?? Accelerate was created for PyTorch users who like to write the training loop of PyTorch models but are reluctant to write and maintain the boilerplate code needed to use multi-GPUs/TPU/fp16.

Hugging Face 2.8k Aug 14, 2022