Over9000 optimizer

Overview

Optimizers and tests

Every result is avg of 20 runs.

Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch
Adam - baseline OneCycle 0.8493 0.6125
RangerLars (RAdam + LARS + Lookahead) Flat and anneal 0.8732 0.6523
Ralamb (RAdam + LARS) Flat and anneal 0.8675 0.6367
Ranger (RAdam + Lookahead) Flat and anneal 0.8594 0.5946
Novograd Flat and anneal 0.8711 0.6126
Radam Flat and anneal 0.8444 0.537
Lookahead OneCycle 0.8578 0.6106
Lamb OneCycle 0.8400 0.5597
DiffGrad OneCycle 0.8527 0.5912
AdaMod OneCycle 0.8473 0.6132
Comments
  • Failure to execute on Windows.

    Failure to execute on Windows.

    Executing:

    python train.py --run 20 --woof 0 --size 128 --bs 64 --mixup 0 --sa 0 --epoch 5 --lr 1e-2 --gpu 0 --opt over9000 --sched_type flat_and_anneal --ann_start 0.5
    

    Would fail with:

    Traceback (most recent call last):
      File "train.py", line 149, in <module>
        ann_start: Param("Mixup", float)=-1.0,
      File "C:\Anaconda3\lib\site-packages\fastai\script.py", line 40, in call_parse
        func(**args.__dict__)
      File "train.py", line 154, in main
        for i in range(run)])
      File "train.py", line 154, in <listcomp>
        for i in range(run)])
      File "train.py", line 124, in train
        fit_with_annealing(learn, epochs, lr, ann_start)
      File "train.py", line 53, in fit_with_annealing
        learn.fit(num_epoch)
      File "C:\Anaconda3\lib\site-packages\fastai\basic_train.py", line 202, in fit
        fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
      File "C:\Anaconda3\lib\site-packages\fastai\basic_train.py", line 99, in fit
        for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
      File "C:\Anaconda3\lib\site-packages\fastprogress\fastprogress.py", line 72, in __iter__
        for i,o in enumerate(self._gen):
      File "C:\Anaconda3\lib\site-packages\fastai\basic_data.py", line 75, in __iter__
        for b in self.dl: yield self.proc_batch(b)
      File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 278, in __iter__
        return _MultiProcessingDataLoaderIter(self)
      File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 682, in __init__
        w.start()
      File "C:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
        self._popen = self._Popen(self)
      File "C:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
        return Popen(process_obj)
      File "C:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
        reduction.dump(process_obj, to_child)
      File "C:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
        ForkingPickler(file, protocol).dump(obj)
    AttributeError: Can't pickle local object 'compose.<locals>.compose_'
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
        exitcode = _main(fd)
      File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
        self = reduction.pickle.load(from_parent)
    EOFError: Ran out of input
    

    Any idea?

    wontfix 
    opened by redknightlois 12
  • How to use over9000 in common pytorch code?

    How to use over9000 in common pytorch code?

    Thanks for the code. However I'm not familiar with fast.ai, so I'm not sure how to replace other optimizers with over9000. Could you please provide an Optimizer interface? Thank you very much.

    opened by askerlee 10
  • question about LAMB implementation.

    question about LAMB implementation.

    Hi Author,

    Thanks for your implementation on these different optimizers. This repository is great.

    By using the LAMB optimizer, I can achieve a higher accuracy than what you reported.

    I only updated 2 lines (change LAMB v3 to LAMB v1).

    You can easily reproduce my results in this repository: https://github.com/fastalgo/over9000_lamb

    The log files are also in this repository.

    wontfix 
    opened by fastalgo 7
  • Port to fastai v2?

    Port to fastai v2?

    Hi Mikhail, i wanted to thank you for developing this repo. I've been using RangerLars for some time now, and have achieved pretty good results in my datasets. I wanted to ask you is you have any plans to port if to fastai v2. Thanks again

    opened by oguiza 6
  • AdaBound

    AdaBound

    Hi, thanks for great work. Would you also consider checking out AdaBound optimizer? It claims to be

    as fast as Adam and as good as SGD

    It seems worth to put it to the test and combine with other techniques. https://github.com/Luolc/AdaBound

    wontfix 
    opened by r1ckya 6
  • Recommendation on optimal LR

    Recommendation on optimal LR

    Hi,

    I've been playing around with over9000 and I can confirm it works really well. Do you have any recommendations regarding how to find the optimal LR? I tried to use the LR finder we generally use for the OneCycleLR policy but it seems the recommended LR is a little too high. The model can make rapid progress in the first couple of epochs but valid AUC oscillates a lot.

    Below is a comparison between over9000 using the LR recommended by the LR finder and using 60 % flat + cosine annealing (blue) and the LR I found manually by trial and error (orange). The LR recommended was always between 0.005 and 0.004, whereas the LR I found manually was 0.001.

    image

    The 5 main drops in AUC are due to restarting the model ― I'm running 5-fold cross-validation (10 epochs per fold).

    I've tried to increase decrease the annealing from 40% to 80% but when using such a high LR, 60% seems to work best.

    Have you found a good heuristic to set the initial LR that works well in practice?

    wontfix 
    opened by IamGianluca 3
  • Performance mismatch between notebook and readme

    Performance mismatch between notebook and readme

    Hi, I noticed that there's a performance mismatch between readme and the notebook. The accuracy in notebook (just as it is when uploaded, I did not perform any re-training) is overall lower than readme, for imagenette, for example, RangerLARS in notebook is "mean = 0.7506984382867813, std = 0.00592458289260687, ci-95 = (0.7478536156223473, 0.7535432609512153)", while readme reported around 87%, has anything been changed? Somehow I suspect if the dataset from fastai has been changed, or the model, or metrics? Thanks a lot.

    opened by juntang-zhuang 2
  • RangerLars

    RangerLars

    @mgrankin Hi: Your experimental results are excellent ,but I have a quetion, I want to use your best opt ,RangerLars ,but I don't to find it.
    And other author work is ranger+gc have the best out, Could you add this experiment? https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

    if gpu is None: bs *= torch.cuda.device_count()
    if   opt=='adam' : opt_func = partial(optim.Adam, betas=(mom,alpha), eps=eps)
    elif opt=='radam' : opt_func = partial(RAdam, betas=(mom,alpha), eps=eps)
    elif opt=='novograd' : opt_func = partial(Novograd, betas=(mom,alpha), eps=eps)
    elif opt=='rms'  : opt_func = partial(optim.RMSprop, alpha=alpha, eps=eps)
    elif opt=='sgd'  : opt_func = partial(optim.SGD, momentum=mom)
    elif opt=='ranger'  : opt_func = partial(Ranger,  betas=(mom,alpha), eps=eps)
    elif opt=='ralamb'  : opt_func = partial(Ralamb,  betas=(mom,alpha), eps=eps)
    elif opt=='over9000'  : opt_func = partial(Over9000,  betas=(mom,alpha), eps=eps)
    elif opt=='lookahead'  : opt_func = partial(LookaheadAdam, betas=(mom,alpha), eps=eps)
    elif opt=='lamb'  : opt_func = partial(Lamb, betas=(mom,alpha), eps=eps)
    elif opt=='diffgrad'  : opt_func = partial(DiffGrad, version=1, betas=(mom,alpha),eps=eps)
    elif opt=='adamod'  : opt_func = partial(AdaMod, betas=(mom,alpha), eps=eps, beta3=0.999)
    elif opt=='madam'  : opt_func = partial(Madam, p_scale=3.0, g_bound=10.0)
    
    wontfix 
    opened by sky186 2
  • LR Scheduler

    LR Scheduler

    Referring to https://github.com/mgrankin/over9000/issues/1, how do I use the various LR schedules in PyTorch? Can you please provide an example?

    These are the LR schedulers that are available in PyTorch. For example, how would I implement the "Flat and anneal" LR schedule? (I have not used fastai, so any help will be appreciated).

    Thank you.

    opened by kakumarabhishek 2
  • Fix bug in RAdams buffer implementation

    Fix bug in RAdams buffer implementation

    Different learning rates for different parameter groups were ignored due to caching of a step_size factor see: https://github.com/LiyuanLucasLiu/RAdam/issues/24 For illustration check https://nbviewer.jupyter.org/gist/sholderbach/a92e15fe8588d62f1804e9b2c508f0ce Already fixed in original repo with https://github.com/LiyuanLucasLiu/RAdam/commit/1d146e572c0e0f170cd7a7bf5995873ccc4768d0

    opened by sholderbach 2
  • Lookahead has no attribute 'state'

    Lookahead has no attribute 'state'

    When trying to save a model, such as in the mt-dnn module I encountered the following traceback.

        File "~mt-dnn/mt_dnn/model.py", line 263, in save
            optimizer.state_dict(),
        File "~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/optim/optimizer.py", line 90, in state_dict
            for k, v in self.state.items()}
        AttributeError: 'Lookahead' object has no attribute 'state'
    

    It appears that the Lookahead class is missing a call to super in it's __init__ method (as seen in the other Optimizers): super(Lookahead, self).__init__(params, defaults) Although I am not sure if this will save the state of the wrapped optimizer?

    opened by staylor-ds 2
Owner
Mikhail Grankin
Mikhail Grankin
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
auto-tuning momentum SGD optimizer

YellowFin YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measure

Jian Zhang 288 Nov 19, 2022
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Ranger-Deep-Learning-Optimizer Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead, and now GC (gradient centralization) i

Less Wright 1.1k Dec 21, 2022
Apollo optimizer in tensorflow

Apollo Optimizer in Tensorflow 2.x Notes: Warmup is important with Apollo optimizer, so be sure to pass in a learning rate schedule vs. a constant lea

Evan Walters 1 Nov 9, 2021
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

null 14 Sep 13, 2022
AdamW optimizer and cosine learning rate annealing with restarts

AdamW optimizer and cosine learning rate annealing with restarts This repository contains an implementation of AdamW optimization algorithm and cosine

Maksym Pyrozhok 133 Dec 20, 2022
DeepOBS: A Deep Learning Optimizer Benchmark Suite

DeepOBS - A Deep Learning Optimizer Benchmark Suite DeepOBS is a benchmarking suite that drastically simplifies, automates and improves the evaluation

Aaron Bahde 7 May 12, 2020
An Implicit Function Theorem (IFT) optimizer for bi-level optimizations

iftopt An Implicit Function Theorem (IFT) optimizer for bi-level optimizations. Requirements Python 3.7+ PyTorch 1.x Installation $ pip install git+ht

The Money Shredder Lab 2 Dec 2, 2021
AdamW optimizer for bfloat16 models in pytorch.

Image source AdamW optimizer for bfloat16 models in pytorch. Bfloat16 is currently an optimal tradeoff between range and relative error for deep netwo

Alex Rogozhnikov 8 Nov 20, 2022
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

Zaher Mousa 1 Feb 13, 2022
ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

Katherine Crowson 53 Dec 29, 2022
lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch

lookahead optimizer for pytorch PyTorch implement of Lookahead Optimizer: k steps forward, 1 step back Usage: base_opt = torch.optim.Adam(model.parame

Liam 318 Dec 9, 2022
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

NNAISENSE 56 Jan 1, 2023
torch-optimizer -- collection of optimizers for Pytorch

torch-optimizer torch-optimizer -- collection of optimizers for PyTorch compatible with optim module. Simple example import torch_optimizer as optim

Nikolay Novik 2.6k Jan 3, 2023
An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

LoLo 2.9k Dec 27, 2022
Ranger deep learning optimizer rewrite to use newest components

Ranger21 - integrating the latest deep learning components into a single optimizer Ranger deep learning optimizer rewrite to use newest components Ran

Less Wright 266 Dec 28, 2022
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
auto-tuning momentum SGD optimizer

YellowFin YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measure

Jian Zhang 288 Nov 19, 2022
Bunch of optimizer implementations in PyTorch

Bunch of optimizer implementations in PyTorch

Hyeongchan Kim 76 Jan 3, 2023
guapow is an on-demand and auto performance optimizer for Linux applications.

guapow is an on-demand and auto performance optimizer for Linux applications. This project's name is an abbreviation for Guarana powder (Guaraná is a fruit from the Amazon rainforest with a highly caffeinated seed).

Vinícius Moreira 19 Nov 18, 2022