Over9000 optimizer


Optimizers and tests

Every result is avg of 20 runs.

Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch
Adam - baseline OneCycle 0.8493 0.6125
RangerLars (RAdam + LARS + Lookahead) Flat and anneal 0.8732 0.6523
Ralamb (RAdam + LARS) Flat and anneal 0.8675 0.6367
Ranger (RAdam + Lookahead) Flat and anneal 0.8594 0.5946
Novograd Flat and anneal 0.8711 0.6126
Radam Flat and anneal 0.8444 0.537
Lookahead OneCycle 0.8578 0.6106
Lamb OneCycle 0.8400 0.5597
DiffGrad OneCycle 0.8527 0.5912
AdaMod OneCycle 0.8473 0.6132
  Failure to execute on Windows.

    Failure to execute on Windows.


    python train.py --run 20 --woof 0 --size 128 --bs 64 --mixup 0 --sa 0 --epoch 5 --lr 1e-2 --gpu 0 --opt over9000 --sched_type flat_and_anneal --ann_start 0.5

    Would fail with:

    Traceback (most recent call last):
      File "train.py", line 149, in <module>
        ann_start: Param("Mixup", float)=-1.0,
      File "C:\Anaconda3\lib\site-packages\fastai\script.py", line 40, in call_parse
      File "train.py", line 154, in main
        for i in range(run)])
      File "train.py", line 154, in <listcomp>
        for i in range(run)])
      File "train.py", line 124, in train
        fit_with_annealing(learn, epochs, lr, ann_start)
      File "train.py", line 53, in fit_with_annealing
      File "C:\Anaconda3\lib\site-packages\fastai\basic_train.py", line 202, in fit
        fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
      File "C:\Anaconda3\lib\site-packages\fastai\basic_train.py", line 99, in fit
        for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
      File "C:\Anaconda3\lib\site-packages\fastprogress\fastprogress.py", line 72, in __iter__
        for i,o in enumerate(self._gen):
      File "C:\Anaconda3\lib\site-packages\fastai\basic_data.py", line 75, in __iter__
        for b in self.dl: yield self.proc_batch(b)
      File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 278, in __iter__
        return _MultiProcessingDataLoaderIter(self)
      File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 682, in __init__
      File "C:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
        self._popen = self._Popen(self)
      File "C:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
        return Popen(process_obj)
      File "C:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
        reduction.dump(process_obj, to_child)
      File "C:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
        ForkingPickler(file, protocol).dump(obj)
    AttributeError: Can't pickle local object 'compose.<locals>.compose_'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
        exitcode = _main(fd)
      File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
        self = reduction.pickle.load(from_parent)
    EOFError: Ran out of input

    Any idea?

    opened by redknightlois 12
  How to use over9000 in common pytorch code?

    How to use over9000 in common pytorch code?

    Thanks for the code. However I'm not familiar with fast.ai, so I'm not sure how to replace other optimizers with over9000. Could you please provide an Optimizer interface? Thank you very much.

    opened by askerlee 10
  question about LAMB implementation.

    question about LAMB implementation.

    Hi Author,

    Thanks for your implementation on these different optimizers. This repository is great.

    By using the LAMB optimizer, I can achieve a higher accuracy than what you reported.

    I only updated 2 lines (change LAMB v3 to LAMB v1).

    You can easily reproduce my results in this repository: https://github.com/fastalgo/over9000_lamb

    The log files are also in this repository.

    opened by fastalgo 7
  Port to fastai v2?

    Port to fastai v2?

    Hi Mikhail, i wanted to thank you for developing this repo. I've been using RangerLars for some time now, and have achieved pretty good results in my datasets. I wanted to ask you is you have any plans to port if to fastai v2. Thanks again

    opened by oguiza 6
  • AdaBound


    Hi, thanks for great work. Would you also consider checking out AdaBound optimizer? It claims to be

    as fast as Adam and as good as SGD

    It seems worth to put it to the test and combine with other techniques. https://github.com/Luolc/AdaBound

    opened by r1ckya 6
  Recommendation on optimal LR

    Recommendation on optimal LR


    I've been playing around with over9000 and I can confirm it works really well. Do you have any recommendations regarding how to find the optimal LR? I tried to use the LR finder we generally use for the OneCycleLR policy but it seems the recommended LR is a little too high. The model can make rapid progress in the first couple of epochs but valid AUC oscillates a lot.

    Below is a comparison between over9000 using the LR recommended by the LR finder and using 60 % flat + cosine annealing (blue) and the LR I found manually by trial and error (orange). The LR recommended was always between 0.005 and 0.004, whereas the LR I found manually was 0.001.


    The 5 main drops in AUC are due to restarting the model ― I'm running 5-fold cross-validation (10 epochs per fold).

    I've tried to increase decrease the annealing from 40% to 80% but when using such a high LR, 60% seems to work best.

    Have you found a good heuristic to set the initial LR that works well in practice?

    opened by IamGianluca 3
  Performance mismatch between notebook and readme

    Performance mismatch between notebook and readme

    Hi, I noticed that there's a performance mismatch between readme and the notebook. The accuracy in notebook (just as it is when uploaded, I did not perform any re-training) is overall lower than readme, for imagenette, for example, RangerLARS in notebook is "mean = 0.7506984382867813, std = 0.00592458289260687, ci-95 = (0.7478536156223473, 0.7535432609512153)", while readme reported around 87%, has anything been changed? Somehow I suspect if the dataset from fastai has been changed, or the model, or metrics? Thanks a lot.

    opened by juntang-zhuang 2
  • RangerLars


    @mgrankin Hi: Your experimental results are excellent ,but I have a quetion, I want to use your best opt ,RangerLars ,but I don't to find it.
    And other author work is ranger+gc have the best out, Could you add this experiment? https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

    if gpu is None: bs *= torch.cuda.device_count()
    if   opt=='adam' : opt_func = partial(optim.Adam, betas=(mom,alpha), eps=eps)
    elif opt=='radam' : opt_func = partial(RAdam, betas=(mom,alpha), eps=eps)
    elif opt=='novograd' : opt_func = partial(Novograd, betas=(mom,alpha), eps=eps)
    elif opt=='rms'  : opt_func = partial(optim.RMSprop, alpha=alpha, eps=eps)
    elif opt=='sgd'  : opt_func = partial(optim.SGD, momentum=mom)
    elif opt=='ranger'  : opt_func = partial(Ranger,  betas=(mom,alpha), eps=eps)
    elif opt=='ralamb'  : opt_func = partial(Ralamb,  betas=(mom,alpha), eps=eps)
    elif opt=='over9000'  : opt_func = partial(Over9000,  betas=(mom,alpha), eps=eps)
    elif opt=='lookahead'  : opt_func = partial(LookaheadAdam, betas=(mom,alpha), eps=eps)
    elif opt=='lamb'  : opt_func = partial(Lamb, betas=(mom,alpha), eps=eps)
    elif opt=='diffgrad'  : opt_func = partial(DiffGrad, version=1, betas=(mom,alpha),eps=eps)
    elif opt=='adamod'  : opt_func = partial(AdaMod, betas=(mom,alpha), eps=eps, beta3=0.999)
    elif opt=='madam'  : opt_func = partial(Madam, p_scale=3.0, g_bound=10.0)
    opened by sky186 2
  LR Scheduler

    LR Scheduler

    Referring to https://github.com/mgrankin/over9000/issues/1, how do I use the various LR schedules in PyTorch? Can you please provide an example?

    These are the LR schedulers that are available in PyTorch. For example, how would I implement the "Flat and anneal" LR schedule? (I have not used fastai, so any help will be appreciated).

    Thank you.

    opened by kakumarabhishek 2
  Fix bug in RAdams buffer implementation

    Fix bug in RAdams buffer implementation

    Different learning rates for different parameter groups were ignored due to caching of a step_size factor see: https://github.com/LiyuanLucasLiu/RAdam/issues/24 For illustration check https://nbviewer.jupyter.org/gist/sholderbach/a92e15fe8588d62f1804e9b2c508f0ce Already fixed in original repo with https://github.com/LiyuanLucasLiu/RAdam/commit/1d146e572c0e0f170cd7a7bf5995873ccc4768d0

    opened by sholderbach 2
  Lookahead has no attribute 'state'

    Lookahead has no attribute 'state'

    When trying to save a model, such as in the mt-dnn module I encountered the following traceback.

        File "~mt-dnn/mt_dnn/model.py", line 263, in save
        File "~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/optim/optimizer.py", line 90, in state_dict
            for k, v in self.state.items()}
        AttributeError: 'Lookahead' object has no attribute 'state'

    It appears that the Lookahead class is missing a call to super in it's __init__ method (as seen in the other Optimizers): super(Lookahead, self).__init__(params, defaults) Although I am not sure if this will save the state of the wrapped optimizer?

    opened by staylor-ds 2
