Over9000 optimizer

Mikhail Grankin

Last update: Nov 27, 2022

Related tags

Deep Learning over9000

Overview

Optimizers and tests

Every result is avg of 20 runs.

Dataset	LR Schedule	Imagenette size 128, 5 epoch	Imagewoof size 128, 5 epoch
Adam - baseline	OneCycle	0.8493	0.6125
RangerLars (RAdam + LARS + Lookahead)	Flat and anneal	0.8732	0.6523
Ralamb (RAdam + LARS)	Flat and anneal	0.8675	0.6367
Ranger (RAdam + Lookahead)	Flat and anneal	0.8594	0.5946
Novograd	Flat and anneal	0.8711	0.6126
Radam	Flat and anneal	0.8444	0.537
Lookahead	OneCycle	0.8578	0.6106
Lamb	OneCycle	0.8400	0.5597
DiffGrad	OneCycle	0.8527	0.5912
AdaMod	OneCycle	0.8473	0.6132

Comments

Failure to execute on Windows.

Executing:

python train.py --run 20 --woof 0 --size 128 --bs 64 --mixup 0 --sa 0 --epoch 5 --lr 1e-2 --gpu 0 --opt over9000 --sched_type flat_and_anneal --ann_start 0.5

Would fail with:

Traceback (most recent call last):
  File "train.py", line 149, in <module>
    ann_start: Param("Mixup", float)=-1.0,
  File "C:\Anaconda3\lib\site-packages\fastai\script.py", line 40, in call_parse
    func(**args.__dict__)
  File "train.py", line 154, in main
    for i in range(run)])
  File "train.py", line 154, in <listcomp>
    for i in range(run)])
  File "train.py", line 124, in train
    fit_with_annealing(learn, epochs, lr, ann_start)
  File "train.py", line 53, in fit_with_annealing
    learn.fit(num_epoch)
  File "C:\Anaconda3\lib\site-packages\fastai\basic_train.py", line 202, in fit
    fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
  File "C:\Anaconda3\lib\site-packages\fastai\basic_train.py", line 99, in fit
    for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
  File "C:\Anaconda3\lib\site-packages\fastprogress\fastprogress.py", line 72, in __iter__
    for i,o in enumerate(self._gen):
  File "C:\Anaconda3\lib\site-packages\fastai\basic_data.py", line 75, in __iter__
    for b in self.dl: yield self.proc_batch(b)
  File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 278, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 682, in __init__
    w.start()
  File "C:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'compose.<locals>.compose_'

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Any idea?

wontfix

opened by redknightlois 12

How to use over9000 in common pytorch code?

Thanks for the code. However I'm not familiar with fast.ai, so I'm not sure how to replace other optimizers with over9000. Could you please provide an Optimizer interface? Thank you very much.

opened by askerlee 10
question about LAMB implementation.

Hi Author,

Thanks for your implementation on these different optimizers. This repository is great.

By using the LAMB optimizer, I can achieve a higher accuracy than what you reported.

I only updated 2 lines (change LAMB v3 to LAMB v1).

You can easily reproduce my results in this repository: https://github.com/fastalgo/over9000_lamb

The log files are also in this repository.
wontfix

opened by fastalgo 7
Port to fastai v2?

Hi Mikhail, i wanted to thank you for developing this repo. I've been using RangerLars for some time now, and have achieved pretty good results in my datasets. I wanted to ask you is you have any plans to port if to fastai v2. Thanks again

opened by oguiza 6
AdaBound

Hi, thanks for great work. Would you also consider checking out AdaBound optimizer? It claims to be

as fast as Adam and as good as SGD

It seems worth to put it to the test and combine with other techniques. https://github.com/Luolc/AdaBound
wontfix

opened by r1ckya 6
Recommendation on optimal LR

Hi,

I've been playing around with over9000 and I can confirm it works really well. Do you have any recommendations regarding how to find the optimal LR? I tried to use the LR finder we generally use for the OneCycleLR policy but it seems the recommended LR is a little too high. The model can make rapid progress in the first couple of epochs but valid AUC oscillates a lot.

Below is a comparison between over9000 using the LR recommended by the LR finder and using 60 % flat + cosine annealing (blue) and the LR I found manually by trial and error (orange). The LR recommended was always between 0.005 and 0.004, whereas the LR I found manually was 0.001.

The 5 main drops in AUC are due to restarting the model ― I'm running 5-fold cross-validation (10 epochs per fold).

I've tried to increase decrease the annealing from 40% to 80% but when using such a high LR, 60% seems to work best.

Have you found a good heuristic to set the initial LR that works well in practice?
wontfix

opened by IamGianluca 3
Performance mismatch between notebook and readme

Hi, I noticed that there's a performance mismatch between readme and the notebook. The accuracy in notebook (just as it is when uploaded, I did not perform any re-training) is overall lower than readme, for imagenette, for example, RangerLARS in notebook is "mean = 0.7506984382867813, std = 0.00592458289260687, ci-95 = (0.7478536156223473, 0.7535432609512153)", while readme reported around 87%, has anything been changed? Somehow I suspect if the dataset from fastai has been changed, or the model, or metrics? Thanks a lot.

opened by juntang-zhuang 2

RangerLars

@mgrankin Hi: Your experimental results are excellent ，but I have a quetion, I want to use your best opt ,RangerLars ，but I don't to find it.
And other author work is ranger+gc have the best out, Could you add this experiment? https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

if gpu is None: bs *= torch.cuda.device_count()
if   opt=='adam' : opt_func = partial(optim.Adam, betas=(mom,alpha), eps=eps)
elif opt=='radam' : opt_func = partial(RAdam, betas=(mom,alpha), eps=eps)
elif opt=='novograd' : opt_func = partial(Novograd, betas=(mom,alpha), eps=eps)
elif opt=='rms'  : opt_func = partial(optim.RMSprop, alpha=alpha, eps=eps)
elif opt=='sgd'  : opt_func = partial(optim.SGD, momentum=mom)
elif opt=='ranger'  : opt_func = partial(Ranger,  betas=(mom,alpha), eps=eps)
elif opt=='ralamb'  : opt_func = partial(Ralamb,  betas=(mom,alpha), eps=eps)
elif opt=='over9000'  : opt_func = partial(Over9000,  betas=(mom,alpha), eps=eps)
elif opt=='lookahead'  : opt_func = partial(LookaheadAdam, betas=(mom,alpha), eps=eps)
elif opt=='lamb'  : opt_func = partial(Lamb, betas=(mom,alpha), eps=eps)
elif opt=='diffgrad'  : opt_func = partial(DiffGrad, version=1, betas=(mom,alpha),eps=eps)
elif opt=='adamod'  : opt_func = partial(AdaMod, betas=(mom,alpha), eps=eps, beta3=0.999)
elif opt=='madam'  : opt_func = partial(Madam, p_scale=3.0, g_bound=10.0)

wontfix

opened by sky186 2

LR Scheduler

Referring to https://github.com/mgrankin/over9000/issues/1, how do I use the various LR schedules in PyTorch? Can you please provide an example?

These are the LR schedulers that are available in PyTorch. For example, how would I implement the "Flat and anneal" LR schedule? (I have not used fastai, so any help will be appreciated).

Thank you.

opened by kakumarabhishek 2
Fix bug in RAdams buffer implementation

Different learning rates for different parameter groups were ignored due to caching of a step_size factor see: https://github.com/LiyuanLucasLiu/RAdam/issues/24 For illustration check https://nbviewer.jupyter.org/gist/sholderbach/a92e15fe8588d62f1804e9b2c508f0ce Already fixed in original repo with https://github.com/LiyuanLucasLiu/RAdam/commit/1d146e572c0e0f170cd7a7bf5995873ccc4768d0

opened by sholderbach 2
Lookahead has no attribute 'state'
When trying to save a model, such as in the mt-dnn module I encountered the following traceback.

File "~mt-dnn/mt_dnn/model.py", line 263, in save optimizer.state_dict(), File "~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/optim/optimizer.py", line 90, in state_dict for k, v in self.state.items()} AttributeError: 'Lookahead' object has no attribute 'state'

It appears that the Lookahead class is missing a call to super in it's __init__ method (as seen in the other Optimizers): super(Lookahead, self).__init__(params, defaults) Although I am not sure if this will save the state of the wrapped optimizer?
opened by staylor-ds 2

Owner

Mikhail Grankin

GitHub

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

56 Dec 12, 2022

auto-tuning momentum SGD optimizer

YellowFin YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measure

288 Nov 19, 2022

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Ranger-Deep-Learning-Optimizer Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead, and now GC (gradient centralization) i

1.1k Dec 21, 2022

Apollo optimizer in tensorflow

Apollo Optimizer in Tensorflow 2.x Notes: Warmup is important with Apollo optimizer, so be sure to pass in a learning rate schedule vs. a constant lea

1 Nov 9, 2021

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

14 Sep 13, 2022

AdamW optimizer and cosine learning rate annealing with restarts

AdamW optimizer and cosine learning rate annealing with restarts This repository contains an implementation of AdamW optimization algorithm and cosine

133 Dec 20, 2022

DeepOBS: A Deep Learning Optimizer Benchmark Suite

DeepOBS - A Deep Learning Optimizer Benchmark Suite DeepOBS is a benchmarking suite that drastically simplifies, automates and improves the evaluation

7 May 12, 2020

An Implicit Function Theorem (IFT) optimizer for bi-level optimizations

iftopt An Implicit Function Theorem (IFT) optimizer for bi-level optimizations. Requirements Python 3.7+ PyTorch 1.x Installation $ pip install git+ht

2 Dec 2, 2021

AdamW optimizer for bfloat16 models in pytorch.

Image source AdamW optimizer for bfloat16 models in pytorch. Bfloat16 is currently an optimal tradeoff between range and relative error for deep netwo

8 Nov 20, 2022

Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

1 Feb 13, 2022

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

53 Dec 29, 2022

lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch

lookahead optimizer for pytorch PyTorch implement of Lookahead Optimizer: k steps forward, 1 step back Usage: base_opt = torch.optim.Adam(model.parame

318 Dec 9, 2022

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

56 Jan 1, 2023

Over9000 optimizer

Related tags

Overview

Optimizers and tests

Comments

Failure to execute on Windows.

How to use over9000 in common pytorch code?

question about LAMB implementation.

Port to fastai v2?

AdaBound

Recommendation on optimal LR

Performance mismatch between notebook and readme

RangerLars

LR Scheduler

Fix bug in RAdams buffer implementation

Lookahead has no attribute 'state'

Owner

Mikhail Grankin

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

auto-tuning momentum SGD optimizer

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Apollo optimizer in tensorflow

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

AdamW optimizer and cosine learning rate annealing with restarts

DeepOBS: A Deep Learning Optimizer Benchmark Suite

An Implicit Function Theorem (IFT) optimizer for bi-level optimizations

AdamW optimizer for bfloat16 models in pytorch.

Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

torch-optimizer -- collection of optimizers for Pytorch

An optimizer that trains as fast as Adam and as good as SGD.

Ranger deep learning optimizer rewrite to use newest components

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

auto-tuning momentum SGD optimizer

Bunch of optimizer implementations in PyTorch

guapow is an on-demand and auto performance optimizer for Linux applications.