torch-optimizer -- collection of optimizers for Pytorch

Nikolay Novik

Last update: Jan 3, 2023

Related tags

Pytorch Utilities apollo pid optimizer pytorch lookahead stochastic-optimization lamb shampoo yogi adabound swats radam novograd diffgrad adamod accsgd aggmo adamp sgdp adabelief

Overview

torch-optimizer

torch-optimizer -- collection of optimizers for PyTorch compatible with optim module.

Simple example

import torch_optimizer as optim

# model = ...
optimizer = optim.DiffGrad(model.parameters(), lr=0.001)
optimizer.step()

Installation

Installation process is simple, just:

$ pip install torch_optimizer

Documentation

https://pytorch-optimizer.rtfd.io

Supported Optimizers

A2GradExp	https://arxiv.org/abs/1810.00553
A2GradInc	https://arxiv.org/abs/1810.00553
A2GradUni	https://arxiv.org/abs/1810.00553
AccSGD	https://arxiv.org/abs/1803.05591
AdaBelief	https://arxiv.org/abs/2010.07468
AdaBound	https://arxiv.org/abs/1902.09843
AdaMod	https://arxiv.org/abs/1910.12249
Adafactor	https://arxiv.org/abs/1804.04235
Adahessian	https://arxiv.org/abs/2006.00719
AdamP	https://arxiv.org/abs/2006.08217
AggMo	https://arxiv.org/abs/1804.00325
Apollo	https://arxiv.org/abs/2009.13586
DiffGrad	https://arxiv.org/abs/1909.11015
Lamb	https://arxiv.org/abs/1904.00962
Lookahead	https://arxiv.org/abs/1907.08610
NovoGrad	https://arxiv.org/abs/1905.11286
PID	https://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR18_PID.pdf
QHAdam	https://arxiv.org/abs/1810.06801
QHM	https://arxiv.org/abs/1810.06801
RAdam	https://arxiv.org/abs/1908.03265
Ranger	https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d
RangerQH	https://arxiv.org/abs/1810.06801
RangerVA	https://arxiv.org/abs/1908.00700v2
SGDP	https://arxiv.org/abs/2006.08217
SGDW	https://arxiv.org/abs/1608.03983
SWATS	https://arxiv.org/abs/1712.07628
Shampoo	https://arxiv.org/abs/1802.09568
Yogi	https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization

Visualizations

Visualizations help us to see how different algorithms deals with simple situations like: saddle points, local minima, valleys etc, and may provide interesting insights into inner workings of algorithm. Rosenbrock and Rastrigin benchmark functions was selected, because:

Rosenbrock (also known as banana function), is non-convex function that has one global minima (1.0. 1.0). The global minimum is inside a long, narrow, parabolic shaped flat valley. To find the valley is trivial. To converge to the global minima, however, is difficult. Optimization algorithms might pay a lot of attention to one coordinate, and have problems to follow valley which is relatively flat.

Rastrigin function is a non-convex and has one global minima in (0.0, 0.0). Finding the minimum of this function is a fairly difficult problem due to its large search space and its large number of local minima.

Each optimizer performs 501 optimization steps. Learning rate is best one found by hyper parameter search algorithm, rest of tuning parameters are default. It is very easy to extend script and tune other optimizer parameters.

python examples/viz_optimizers.py

Warning

Do not pick optimizer based on visualizations, optimization approaches have unique properties and may be tailored for different purposes or may require explicit learning rate schedule etc. Best way to find out, is to try one on your particular problem and see if it improves scores.

If you do not know which optimizer to use start with built in SGD/Adam, once training logic is ready and baseline scores are established, swap optimizer and see if there is any improvement.

A2GradExp

import torch_optimizer as optim

# model = ...
optimizer = optim.A2GradExp(
    model.parameters(),
    kappa=1000.0,
    beta=10.0,
    lips=10.0,
    rho=0.5,
)
optimizer.step()

Paper: Optimal Adaptive and Accelerated Stochastic Gradient Descent (2018) [https://arxiv.org/abs/1810.00553]

Reference Code: https://github.com/severilov/A2Grad_optimizer

A2GradInc

import torch_optimizer as optim

# model = ...
optimizer = optim.A2GradInc(
    model.parameters(),
    kappa=1000.0,
    beta=10.0,
    lips=10.0,
)
optimizer.step()

Paper: Optimal Adaptive and Accelerated Stochastic Gradient Descent (2018) [https://arxiv.org/abs/1810.00553]

Reference Code: https://github.com/severilov/A2Grad_optimizer

A2GradUni

import torch_optimizer as optim

# model = ...
optimizer = optim.A2GradUni(
    model.parameters(),
    kappa=1000.0,
    beta=10.0,
    lips=10.0,
)
optimizer.step()

Paper: Optimal Adaptive and Accelerated Stochastic Gradient Descent (2018) [https://arxiv.org/abs/1810.00553]

Reference Code: https://github.com/severilov/A2Grad_optimizer

AccSGD

import torch_optimizer as optim

# model = ...
optimizer = optim.AccSGD(
    model.parameters(),
    lr=1e-3,
    kappa=1000.0,
    xi=10.0,
    small_const=0.7,
    weight_decay=0
)
optimizer.step()

Paper: On the insufficiency of existing momentum schemes for Stochastic Optimization (2019) [https://arxiv.org/abs/1803.05591]

Reference Code: https://github.com/rahulkidambi/AccSGD

AdaBelief

import torch_optimizer as optim

# model = ...
optimizer = optim.AdaBelief(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-3,
    weight_decay=0,
    amsgrad=False,
    weight_decouple=False,
    fixed_decay=False,
    rectify=False,
)
optimizer.step()

Paper: AdaBelief Optimizer, adapting stepsizes by the belief in observed gradients (2020) [https://arxiv.org/abs/2010.07468]

Reference Code: https://github.com/juntang-zhuang/Adabelief-Optimizer

AdaBound

import torch_optimizer as optim

# model = ...
optimizer = optim.AdaBound(
    m.parameters(),
    lr= 1e-3,
    betas= (0.9, 0.999),
    final_lr = 0.1,
    gamma=1e-3,
    eps= 1e-8,
    weight_decay=0,
    amsbound=False,
)
optimizer.step()

Paper: Adaptive Gradient Methods with Dynamic Bound of Learning Rate (2019) [https://arxiv.org/abs/1902.09843]

Reference Code: https://github.com/Luolc/AdaBound

AdaMod

AdaMod method restricts the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AdaMod.png

import torch_optimizer as optim

# model = ...
optimizer = optim.AdaMod(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    beta3=0.999,
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: An Adaptive and Momental Bound Method for Stochastic Learning. (2019) [https://arxiv.org/abs/1910.12249]

Reference Code: https://github.com/lancopku/AdaMod

Adafactor

import torch_optimizer as optim

# model = ...
optimizer = optim.Adafactor(
    m.parameters(),
    lr= 1e-3,
    eps2= (1e-30, 1e-3),
    clip_threshold=1.0,
    decay_rate=-0.8,
    beta1=None,
    weight_decay=0.0,
    scale_parameter=True,
    relative_step=True,
    warmup_init=False,
)
optimizer.step()

Paper: Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. (2018) [https://arxiv.org/abs/1804.04235]

Reference Code: https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py

Adahessian

import torch_optimizer as optim

# model = ...
optimizer = optim.Adahessian(
    m.parameters(),
    lr= 1.0,
    betas= (0.9, 0.999)
    eps= 1e-4,
    weight_decay=0.0,
    hessian_power=1.0,
)
      loss_fn(m(input), target).backward(create_graph = True) # create_graph=True is necessary for Hessian calculation
optimizer.step()

Paper: ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning (2020) [https://arxiv.org/abs/2006.00719]

Reference Code: https://github.com/amirgholami/adahessian

AdamP

AdamP propose a simple and effective solution: at each iteration of Adam optimizer applied on scale-invariant weights (e.g., Conv weights preceding a BN layer), AdamP remove the radial component (i.e., parallel to the weight vector) from the update vector. Intuitively, this operation prevents the unnecessary update along the radial direction that only increases the weight norm without contributing to the loss minimization.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AdamP.png

import torch_optimizer as optim

# model = ...
optimizer = optim.AdamP(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
    delta = 0.1,
    wd_ratio = 0.1
)
optimizer.step()

Paper: Slowing Down the Weight Norm Increase in Momentum-based Optimizers. (2020) [https://arxiv.org/abs/2006.08217]

Reference Code: https://github.com/clovaai/AdamP

AggMo

import torch_optimizer as optim

# model = ...
optimizer = optim.AggMo(
    m.parameters(),
    lr= 1e-3,
    betas=(0.0, 0.9, 0.99),
    weight_decay=0,
)
optimizer.step()

Paper: Aggregated Momentum: Stability Through Passive Damping. (2019) [https://arxiv.org/abs/1804.00325]

Reference Code: https://github.com/AtheMathmo/AggMo

Apollo

import torch_optimizer as optim

# model = ...
optimizer = optim.Apollo(
    m.parameters(),
    lr= 1e-2,
    beta=0.9,
    eps=1e-4,
    warmup=0,
    init_lr=0.01,
    weight_decay=0,
)
optimizer.step()

Paper: Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization. (2020) [https://arxiv.org/abs/2009.13586]

Reference Code: https://github.com/XuezheMax/apollo

DiffGrad

Optimizer based on the difference between the present and the immediate past gradient, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_DiffGrad.png

import torch_optimizer as optim

# model = ...
optimizer = optim.DiffGrad(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: diffGrad: An Optimization Method for Convolutional Neural Networks. (2019) [https://arxiv.org/abs/1909.11015]

Reference Code: https://github.com/shivram1987/diffGrad

Lamb

import torch_optimizer as optim

# model = ...
optimizer = optim.Lamb(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (2019) [https://arxiv.org/abs/1904.00962]

Reference Code: https://github.com/cybertronai/pytorch-lamb

Lookahead

import torch_optimizer as optim

# model = ...
# base optimizer, any other optimizer can be used like Adam or DiffGrad
yogi = optim.Yogi(
    m.parameters(),
    lr= 1e-2,
    betas=(0.9, 0.999),
    eps=1e-3,
    initial_accumulator=1e-6,
    weight_decay=0,
)

optimizer = optim.Lookahead(yogi, k=5, alpha=0.5)
optimizer.step()

Paper: Lookahead Optimizer: k steps forward, 1 step back (2019) [https://arxiv.org/abs/1907.08610]

Reference Code: https://github.com/alphadl/lookahead.pytorch

NovoGrad

import torch_optimizer as optim

# model = ...
optimizer = optim.NovoGrad(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
    grad_averaging=False,
    amsgrad=False,
)
optimizer.step()

Paper: Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks (2019) [https://arxiv.org/abs/1905.11286]

Reference Code: https://github.com/NVIDIA/DeepLearningExamples/

PID

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_PID.png

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_PID.png

import torch_optimizer as optim

# model = ...
optimizer = optim.PID(
    m.parameters(),
    lr=1e-3,
    momentum=0,
    dampening=0,
    weight_decay=1e-2,
    integral=5.0,
    derivative=10.0,
)
optimizer.step()

Paper: A PID Controller Approach for Stochastic Optimization of Deep Networks (2018) [http://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR18_PID.pdf]

Reference Code: https://github.com/tensorboy/PIDOptimizer

QHAdam

import torch_optimizer as optim

# model = ...
optimizer = optim.QHAdam(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    nus=(1.0, 1.0),
    weight_decay=0,
    decouple_weight_decay=False,
    eps=1e-8,
)
optimizer.step()

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2019) [https://arxiv.org/abs/1810.06801]

Reference Code: https://github.com/facebookresearch/qhoptim

QHM

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_QHM.png

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_QHM.png

import torch_optimizer as optim

# model = ...
optimizer = optim.QHM(
    m.parameters(),
    lr=1e-3,
    momentum=0,
    nu=0.7,
    weight_decay=1e-2,
    weight_decay_type='grad',
)
optimizer.step()

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2019) [https://arxiv.org/abs/1810.06801]

Reference Code: https://github.com/facebookresearch/qhoptim

RAdam

import torch_optimizer as optim

# model = ...
optimizer = optim.RAdam(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: On the Variance of the Adaptive Learning Rate and Beyond (2019) [https://arxiv.org/abs/1908.03265]

Reference Code: https://github.com/LiyuanLucasLiu/RAdam

Ranger

import torch_optimizer as optim

# model = ...
optimizer = optim.Ranger(
    m.parameters(),
    lr=1e-3,
    alpha=0.5,
    k=6,
    N_sma_threshhold=5,
    betas=(.95, 0.999),
    eps=1e-5,
    weight_decay=0
)
optimizer.step()

Paper: New Deep Learning Optimizer, Ranger: Synergistic combination of RAdam + LookAhead for the best of both (2019) [https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d]

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

RangerQH

import torch_optimizer as optim

# model = ...
optimizer = optim.RangerQH(
    m.parameters(),
    lr=1e-3,
    betas=(0.9, 0.999),
    nus=(.7, 1.0),
    weight_decay=0.0,
    k=6,
    alpha=.5,
    decouple_weight_decay=False,
    eps=1e-8,
)
optimizer.step()

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2018) [https://arxiv.org/abs/1810.06801]

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

RangerVA

import torch_optimizer as optim

# model = ...
optimizer = optim.RangerVA(
    m.parameters(),
    lr=1e-3,
    alpha=0.5,
    k=6,
    n_sma_threshhold=5,
    betas=(.95, 0.999),
    eps=1e-5,
    weight_decay=0,
    amsgrad=True,
    transformer='softplus',
    smooth=50,
    grad_transformer='square'
)
optimizer.step()

Paper: Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM (2019) [https://arxiv.org/abs/1908.00700v2]

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

SGDP

import torch_optimizer as optim

# model = ...
optimizer = optim.SGDP(
    m.parameters(),
    lr= 1e-3,
    momentum=0,
    dampening=0,
    weight_decay=1e-2,
    nesterov=False,
    delta = 0.1,
    wd_ratio = 0.1
)
optimizer.step()

Paper: Slowing Down the Weight Norm Increase in Momentum-based Optimizers. (2020) [https://arxiv.org/abs/2006.08217]

Reference Code: https://github.com/clovaai/AdamP

SGDW

import torch_optimizer as optim

# model = ...
optimizer = optim.SGDW(
    m.parameters(),
    lr= 1e-3,
    momentum=0,
    dampening=0,
    weight_decay=1e-2,
    nesterov=False,
)
optimizer.step()

Paper: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) [https://arxiv.org/abs/1608.03983]

Reference Code: https://github.com/pytorch/pytorch/pull/22466

SWATS

import torch_optimizer as optim

# model = ...
optimizer = optim.SWATS(
    model.parameters(),
    lr=1e-1,
    betas=(0.9, 0.999),
    eps=1e-3,
    weight_decay= 0.0,
    amsgrad=False,
    nesterov=False,
)
optimizer.step()

Paper: Improving Generalization Performance by Switching from Adam to SGD (2017) [https://arxiv.org/abs/1712.07628]

Reference Code: https://github.com/Mrpatekful/swats

Shampoo

import torch_optimizer as optim

# model = ...
optimizer = optim.Shampoo(
    m.parameters(),
    lr=1e-1,
    momentum=0.0,
    weight_decay=0.0,
    epsilon=1e-4,
    update_freq=1,
)
optimizer.step()

Paper: Shampoo: Preconditioned Stochastic Tensor Optimization (2018) [https://arxiv.org/abs/1802.09568]

Reference Code: https://github.com/moskomule/shampoo.pytorch

Yogi

Yogi is optimization algorithm based on ADAM with more fine grained effective learning rate control, and has similar theoretical guarantees on convergence as ADAM.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Yogi.png

import torch_optimizer as optim

# model = ...
optimizer = optim.Yogi(
    m.parameters(),
    lr= 1e-2,
    betas=(0.9, 0.999),
    eps=1e-3,
    initial_accumulator=1e-6,
    weight_decay=0,
)
optimizer.step()

Paper: Adaptive Methods for Nonconvex Optimization (2018) [https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization]

Reference Code: https://github.com/4rtemi5/Yogi-Optimizer_Keras

Adam (PyTorch built-in)

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Adam.png

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Adam.png

SGD (PyTorch built-in)

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_SGD.png

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_SGD.png

Comments

Add Ranger optimizer

Hello

paper: https://arxiv.org/abs/1908.00700v2 implementation: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

Also: The clamp parameter for the weight_norm (=10) is hardcoded in LAMB, can you add a new parameter to custom it ? weight_norm = p.data.pow(2).sum().sqrt().clamp(0, 10) You can use torch.norm(.) to compute the norm.

Thank you.

opened by tkon3 8
Using GPU to train the model

Hello, I'm really appreciate your work. But now I wonder how to use GPU to train the model. There are always mistakes when I use the CUDA device. Thanks a lot. device = torch.device('cuda' if use_cuda else 'cpu')

opened by penny9287 7
Add adahessian

Hello,

I´m a big fan of this project. Recently a new optimized has been proposed, that promises SOTA for many tasks.

https://github.com/amirgholami/adahessian

The possibility of using it here is very good!

opened by bratao 6

Lamb optimizer warning in pytorch 1.6

Hi I'm getting this deprecated warning in pytorch 1.6 for Lamb:


  | 2020-06-25T01:58:41.682+01:00 | add_(Tensor other, *, Number alpha) (Triggered internally at /opt/conda/conda-bld/pytorch_1592982553767/work/torch/csrc/utils/python_arg_parser.cpp:766.)
-- | -- | --
  | 2020-06-25T01:58:41.682+01:00 | exp_avg.mul_(beta1).add_(1 - beta1, grad)
  | 2020-06-25T01:58:41.682+01:00 | 2020-06-25T00:58:41 - WARNING - /opt/conda/envs/py36/lib/python3.6/site-packages/torch_optimizer/lamb.py:120: UserWarning: This overload of add_ is deprecated:
  | 2020-06-25T01:58:41.682+01:00 | add_(Number alpha, Tensor other)
  | 2020-06-25T01:58:41.682+01:00 | Consider using one of the following signatures instead:
  | 2020-06-25T01:58:41.682+01:00 | add_(Tensor other, *, Number alpha) (Triggered internally at /opt/conda/conda-bld/pytorch_1592982553767/work/torch/csrc/utils/python_arg_parser.cpp:766.)
  | 2020-06-25T01:58:41.682+01:00 | exp_avg.mul_(beta1).add_(1 - beta1, grad)

opened by Laksh1997 6

Unfair comparison in Visualizations

Hi,

Thanks a lot for this great repo. For the comparison in the Visualizations example, I found that for each config, you run 100 updates. I am concerned that 100 is too small so that it would favor optimizers that have fast convergence in the first few updates.

For other optimizers that the convergence is relatively slow at beginning, it would select large lr. This could lead to unstable convergence for these optimizers.

Moreover, for hyper-parameter search, the objective is the distance between the last step point and the minimum. I think the function value of the last step point may be a better objective.

At last, some optimizers implicitly implement learning rate decay (such as AdaBound and RAdam), but some not. But in your comparison, no explicit learning rate schedule is used.

opened by XuezheMax 5

'Yogi' object has no attribute 'Yogi'

Hi, if calling yogi from pytorch-optimizer has some bug ( runtime error, Yogi object has no attribute Yogi). so at this moment, i am calling yogi.py directly.

#import torch_optimizer as optim      # in second iteration (for statement), it getting error. 
from yogi import Yogi                        # import from yogi.py file (include types.py definition)

for fold, (train_idx, val_idx) in enumerate(...):
    model = Net(...)
    # optim = optim.Yogi(model, parameters(), lr=1e-2, betas=(0.9, 0.999), eps=1e-3, initial_accumulator=1e-6, weight_decay=0)
    optim = Yogi(model, parameters(), lr=1e-2, betas=(0.9, 0.999), eps=1e-3, initial_accumulator=1e-6, weight_decay=0)
    ...

opened by sailfish009 5

Add torch_optimizer.get() method
The get method

How would you feel about a function like this?

It makes it very easy to change between optimizer to test some of them, for example

import argparse import torch_optimizer as optim parser = argparse.ArgumentParser() parser.add_argument('--optimizer', default='AccSGD') parser.add_argument('--lr', type=float, default=1e-3) if __name__ == '__main__': args = parser.parse_args() opt_class = optim.get(args.optimizer) optimizer = opt_class(model.parameters(), lr=args.lr, **kwargs)

I can be improved and restricted as well. If it would be me, I'd make aliases, and probably search globals only for things in __all__ and their aliases, and make the search case-insensitive.

Tell me if you'd be interested.

Drop-in replacement for torch.optim

I would also import all optimizers from torch.optim directly, so that Adam could also be imported from here. If both these things are adopted, it would be easier than ever to compare between Adam, Radam, SGD and AccSGD for exampe. As simple as

python train.py --optimizer adam python train.py --optimizer radam python train.py --optimizer sgd python train.py --optimizer accsgd

With only one import (torch_optimizer) and no if statements.
opened by mpariente 5
Add Rangers

As discussed in #64, just added the three Rangers as a dependency.

For now, the params can't be tested because the error messages are not the same. Also, I could not make the beale test work for Ranger.

This is just a draft, feel free to push to my fork if you want to change the PR.

Regarding the docstrings and types, I wonder if it wouldn't be easier to sublass the optimizers here so that we can use the object in types.py. Or should I copy them there?

opened by mpariente 5
YOGI Initialization

exp_avg_sq Initialization

"Thus, for YOGI, we propose to initialize the vt based on gradient square evaluated at the initial point averaged over a (reasonably large) mini-batch."

The initial exp_avg_sq should be initialized to the gradient square.

exp_avg Initialization

The YOGI optimizer exp_avg should be initialized to zero instead of initial_accumulator based on m0 above.

opened by PetrochukM 5
Add baseline visualizations

I love the illustrations, but I find the absence of any kind of baselines a shame. It'd be nice to see how Adam or SGD do on the example functions and compare them with some of the more fancy optimizers.

Would this be possible?

I can probably run the required experiments myself, if there are no problems.

opened by slckl 5
Add batch invariant optimizer AdaScale?

AdaScale was introduced in this paper: https://openreview.net/forum?id=rygxdA4YPS

The paper showed that the proposed optimizer is able to get the same results across five tasks regardless of the batch size (testing from 32 to 32.8k). Here's the graphic:

It'd be nice to have a batch invariant optimizer included.

opened by PetrochukM 5
Bump sphinx from 4.2.0 to 6.1.2
Bumps sphinx from 4.2.0 to 6.1.2.

Release notes

Sourced from sphinx's releases.

v6.1.2

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v6.1.1

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v6.1.0

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v6.0.1

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v6.0.0

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v6.0.0b2

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v6.0.0b1

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.3.0

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.2.3

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.2.2

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.2.1

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.2.0

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.1.1

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.1.0

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.0.2

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.0.1

Changelog: https://www.sphinx-doc.org/en/master/changes.html

v5.0.0

No release notes provided.

... (truncated)

Changelog

Sourced from sphinx's changelog.

Release 6.1.2 (released Jan 07, 2023)

Bugs fixed

#11101: LaTeX: div.topic_padding key of sphinxsetup documented at 5.1.0 was implemented with name topic_padding

#11099: LaTeX: shadowrule key of sphinxsetup causes PDF build to crash since Sphinx 5.1.0

#11096: LaTeX: shadowsize key of sphinxsetup causes PDF build to crash since Sphinx 5.1.0

#11095: LaTeX: shadow of :dudir:topic and contents_ boxes not in page margin since Sphinx 5.1.0

.. _contents: https://docutils.sourceforge.io/docs/ref/rst/directives.html#table-of-contents

#11100: Fix copying images when running under parallel mode.

Release 6.1.1 (released Jan 05, 2023)

Bugs fixed

#11091: Fix util.nodes.apply_source_workaround for literal_block nodes with no source information in the node or the node's parents.

Release 6.1.0 (released Jan 05, 2023)

Dependencies

Adopted the Ruff_ code linter.

.. _Ruff: https://github.com/charliermarsh/ruff

Incompatible changes

#10979: gettext: Removed support for pluralisation in get_translation. This was unused and complicated other changes to sphinx.locale.

Deprecated

sphinx.util functions:

Renamed sphinx.util.typing.stringify() to sphinx.util.typing.stringify_annotation()

... (truncated)

Commits

393b408 Bump to 6.1.2 final

d8a5dd8 Add note to CHANGES for PR 11100

a1cd19e Fix copying images under parallel execution (#11100)

5008291 Ignore more checks in Ruff 0.0.213

6259c2b Markup typo in docs

7945aeb LaTeX: fix 5.1.0 bugs related to topic and contents boxes (#11102)

77aaa86 Bump to 6.1.1 final

476c115 Suppress ValueError in apply_source_workaround (#11092)

c80d656 Bump version

4e1004a Bump to 6.1.0 final

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Bump numpy from 1.21.3 to 1.24.1
Bumps numpy from 1.21.3 to 1.24.1.

Release notes

Sourced from numpy's releases.

v1.24.1

NumPy 1.24.1 Release Notes

NumPy 1.24.1 is a maintenance release that fixes bugs and regressions discovered after the 1.24.0 release. The Python versions supported by this release are 3.8-3.11.

Contributors

A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

Andrew Nelson

Ben Greiner +

Charles Harris

Clément Robert

Matteo Raso

Matti Picus

Melissa Weber Mendonça

Miles Cranmer

Ralf Gommers

Rohit Goswami

Sayed Adel

Sebastian Berg

Pull requests merged

A total of 18 pull requests were merged for this release.

#22820: BLD: add workaround in setup.py for newer setuptools

#22830: BLD: CIRRUS_TAG redux

#22831: DOC: fix a couple typos in 1.23 notes

#22832: BUG: Fix refcounting errors found using pytest-leaks

#22834: BUG, SIMD: Fix invalid value encountered in several ufuncs

#22837: TST: ignore more np.distutils.log imports

#22839: BUG: Do not use getdata() in np.ma.masked_invalid

#22847: BUG: Ensure correct behavior for rows ending in delimiter in...

#22848: BUG, SIMD: Fix the bitmask of the boolean comparison

#22857: BLD: Help raspian arm + clang 13 about __builtin_mul_overflow

#22858: API: Ensure a full mask is returned for masked_invalid

#22866: BUG: Polynomials now copy properly (#22669)

#22867: BUG, SIMD: Fix memory overlap in ufunc comparison loops

#22868: BUG: Fortify string casts against floating point warnings

#22875: TST: Ignore nan-warnings in randomized out tests

#22883: MAINT: restore npymath implementations needed for freebsd

#22884: BUG: Fix integer overflow in in1d for mixed integer dtypes #22877

#22887: BUG: Use whole file for encoding checks with charset_normalizer.

Checksums

... (truncated)

Commits

a28f4f2 Merge pull request #22888 from charris/prepare-1.24.1-release

f8fea39 REL: Prepare for the NumPY 1.24.1 release.

6f491e0 Merge pull request #22887 from charris/backport-22872

48f5fe4 BUG: Use whole file for encoding checks with charset_normalizer [f2py] (#22...

0f3484a Merge pull request #22883 from charris/backport-22882

002c60d Merge pull request #22884 from charris/backport-22878

38ef9ce BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 (#22878)

bb00c68 MAINT: restore npymath implementations needed for freebsd

64e09c3 Merge pull request #22875 from charris/backport-22869

dc7bac6 TST: Ignore nan-warnings in randomized out tests

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Bump wheel from 0.37.0 to 0.38.4
Bumps wheel from 0.37.0 to 0.38.4.

Changelog

Sourced from wheel's changelog.

Release Notes

UNRELEASED

Updated vendored packaging to 22.0

0.38.4 (2022-11-09)

Fixed PKG-INFO conversion in bdist_wheel mangling UTF-8 header values in METADATA (PR by Anderson Bravalheri)

0.38.3 (2022-11-08)

Fixed install failure when used with --no-binary, reported on Ubuntu 20.04, by removing setup_requires from setup.cfg

0.38.2 (2022-11-05)

Fixed regression introduced in v0.38.1 which broke parsing of wheel file names with multiple platform tags

0.38.1 (2022-11-04)

Removed install dependency on setuptools

The future-proof fix in 0.36.0 for converting PyPy's SOABI into a abi tag was faulty. Fixed so that future changes in the SOABI will not change the tag.

0.38.0 (2022-10-21)

Dropped support for Python < 3.7

Updated vendored packaging to 21.3

Replaced all uses of distutils with setuptools

The handling of license_files (including glob patterns and default values) is now delegated to setuptools>=57.0.0 (#466). The package dependencies were updated to reflect this change.

Fixed potential DoS attack via the WHEEL_INFO_RE regular expression

Fixed ValueError: ZIP does not support timestamps before 1980 when using SOURCE_DATE_EPOCH=0 or when on-disk timestamps are earlier than 1980-01-01. Such timestamps are now changed to the minimum value before packaging.

0.37.1 (2021-12-22)

Fixed wheel pack duplicating the WHEEL contents when the build number has changed (#415)

Fixed parsing of file names containing commas in RECORD (PR by Hood Chatham)

0.37.0 (2021-08-09)

Added official Python 3.10 support

Updated vendored packaging library to v20.9

... (truncated)

Commits

814c2ef Created a new release

10a422d Added news item about PR #489

3f8bdf1 Allow METADATA file to contain UTF-8 chars (#489)

daeb157 Created a new release

7a633c9 Removed setup_requires

4419390 Fixed parsing of wheel file names with multiple platform tags

6f1608d Created a new release

cf8f5ef Moved news item from PR #484 to its proper place

9ec2016 Removed install dependency on setuptools (#483)

747e1f6 Fixed PyPy SOABI parsing (#484)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Bump isort from 5.9.3 to 5.11.4
Bumps isort from 5.9.3 to 5.11.4.

Release notes

Sourced from isort's releases.

5.11.4

Changes

Remove obsolete toml import from the test suite (#1978) @mgorny

Stop installing documentation files to top-level site-packages (#2057) @mgorny

Only run release workflows for upstream (#2052) @hugovk

:package: Dependencies

Bump Poetry 1.3.1 (#2058) @staticdev

5.11.3

Changes

Renable portray (#2043) @timothycrosley

chore(ci): add minimum GitHub token permissions for workflows (#1969) @varunsh-coder

:beetle: Fixes

Fix packaging pypoetry (#2042) @staticdev

Fix settings for py3.11 (#2040) @staticdev

:construction_worker: Continuous Integration

General CI improvements (#2041) @staticdev

Add release workflow (#2026) @staticdev

v5.11.3

Changes

Renable portray (#2043) @timothycrosley

chore(ci): add minimum GitHub token permissions for workflows (#1969) @varunsh-coder

:beetle: Fixes

Fix packaging pypoetry (#2042) @staticdev

Fix settings for py3.11 (#2040) @staticdev

:construction_worker: Continuous Integration

General CI improvements (#2041) @staticdev

Add release workflow (#2026) @staticdev

5.11.2

Changes

Hotfix for --version. (#2035) @felixxm

5.11.1

Changes December 12 2022

... (truncated)

Changelog

Sourced from isort's changelog.

5.11.4 December 21 2022

Fixed #2038 (again): stop installing documentation files to top-level site-packages (#2057) @mgorny

CI: only run release workflows for upstream (#2052) @hugovk

Tests: remove obsolete toml import from the test suite (#1978) @mgorny

CI: bump Poetry 1.3.1 (#2058) @staticdev

5.11.3 December 16 2022

Fixed #2007: settings for py3.11 (#2040) @staticdev

Fixed #2038: packaging pypoetry (#2042) @staticdev

Docs: renable portray (#2043) @timothycrosley

Ci: add minimum GitHub token permissions for workflows (#1969) @varunsh-coder

Ci: general CI improvements (#2041) @staticdev

Ci: add release workflow (#2026) @staticdev

5.11.2 December 12 2022

Hotfix #2034: isort --version is not accurate on 5.11.x releases (#2034) @gschaffner

5.11.1 December 12 2022

Hotfix #2031: only call colorama.init if colorama is available (#2032) @tomaarsen

5.11.0 December 12 2022

Added official support for Python 3.11 (#1996, #2008, #2011) @staticdev

Dropped support for Python 3.6 (#2019) @barrelful

Fixed problematic tests (#2021, #2022) @staticdev

Fixed #1960: Rich compatibility (#1961) @ofek

Fixed #1945, #1986: Python 4.0 upper bound dependency resolving issues @staticdev

Fixed Pyodide CDN URL (#1991) @andersk

Docs: clarify description of use_parentheses (#1941) @mgedmin

Fixed #1976: black compatibility for .pyi files @XuehaiPan

Implemented #1683: magic trailing comma option (#1876) @legau

Add missing space in unrecoverable exception message (#1933) @andersk

Fixed #1895: skip-gitignore: use allow list, not deny list @bmalehorn

Fixed #1917: infinite loop for unmatched parenthesis (#1919) @anirudnits

Docs: shared profiles (#1896) @matthewhughes934

Fixed build-backend values in the example plugins (#1892) @mgorny

Remove reference to jamescurtin/isort-action (#1885) @AndrewLane

Split long cython import lines (#1931) @davidcollins001

Update plone profile: copy of black, plus three settings. (#1926) @mauritsvanrees

Fixed #1815, #1862: Add a command-line flag to sort all re-exports (#1863) @parafoxia

Fixed #1854: lines_before_imports appending lines after comments (#1861) @legau

Remove redundant multi_line_output = 3 from "Compatibility with black" (#1858) @jdufresne

Add tox config example (#1856) @umonaca

Docs: add examples for frozenset and tuple settings (#1822) @sgaist

Docs: add multiple config documentation (#1850) @anirudnits

... (truncated)

Commits

98390f5 Merge pull request #2059 from PyCQA/version/5.11.4

df69a05 Bump version 5.11.4

f9add58 Merge pull request #2058 from PyCQA/deps/poetry-1.3.1

36caa91 Bump Poetry 1.3.1

3c2e2d0 Merge pull request #1978 from mgorny/toml-test

45d6abd Remove obsolete toml import from the test suite

3020e0b Merge pull request #2057 from mgorny/poetry-install

a6fdbfd Stop installing documentation files to top-level site-packages

ff306f8 Fix tag template to match old standard

227c4ae Merge pull request #2052 from hugovk/main

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Bump black from 21.9b0 to 23.1a1
Bumps black from 21.9b0 to 23.1a1.

Release notes

Sourced from black's releases.

23.1a1

This release provides a preview of Black's 2023 stable style. Black's default formatting style includes the following changes:

Enforce empty lines before classes and functions with sticky leading comments (#3302) (22.12.0)

Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348) (22.12.0)

Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307) (22.12.0)

Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370) (22.12.0)

--skip-string-normalization / -S now prevents docstring prefixes from being normalized as expected (#3168) (since 22.8.0)

When using --skip-magic-trailing-comma or -C, trailing commas are stripped from subscript expressions with more than 1 element (#3209) (22.8.0)

Implicitly concatenated strings inside a list, set, or tuple are now wrapped inside parentheses (#3162) (22.8.0)

Fix a string merging/split issue when a comment is present in the middle of implicitly concatenated strings on its own line (#3227) (22.8.0)

Docstring quotes are no longer moved if it would violate the line length limit (#3044, #3430) (22.6.0)

Parentheses around return annotations are now managed (#2990) (22.6.0)

Remove unnecessary parentheses around awaited objects (#2991) (22.6.0)

Remove unnecessary parentheses in with statements (#2926) (22.6.0)

Remove trailing newlines after code block open (#3035) (22.6.0)

Code cell separators #%% are now standardised to # %% (#2919) (22.3.0)

Remove unnecessary parentheses from except statements (#2939) (22.3.0)

Remove unnecessary parentheses from tuple unpacking in for loops (#2945) (22.3.0)

Avoid magic-trailing-comma in single-element subscripts (#2942) (22.3.0)

Please try it out and give feedback here: psf/black#3407

A stable 23.1.0 release will follow in January 2023.

22.12.0

Preview style

Enforce empty lines before classes and functions with sticky leading comments (#3302)

Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348)

Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307)

Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370)

Configuration

... (truncated)

Changelog

Sourced from black's changelog.

Change Log

Unreleased

Highlights

Stable style

Fix a crash when a colon line is marked between # fmt: off and # fmt: on (#3439)

Preview style

Fix a crash in preview style with assert + parenthesized string (#3415)

Fix crashes in preview style with walrus operators used in function return annotations and except clauses (#3423)

Do not put the closing quotes in a docstring on a separate line, even if the line is too long (#3430)

Long values in dict literals are now wrapped in parentheses; correspondingly unnecessary parentheses around short values in dict literals are now removed; long string lambda values are now wrapped in parentheses (#3440)

Configuration

Packaging

Upgrade mypyc from 0.971 to 0.991 so mypycified Black can be built on armv7 (#3380)

Drop specific support for the tomli requirement on 3.11 alpha releases, working around a bug that would cause the requirement not to be installed on any non-final Python releases (#3448)

Parser

Performance

Output

... (truncated)

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0

Bump torch from 1.10.0 to 1.13.1

Bumps torch from 1.10.0 to 1.13.1.

Release notes

Sourced from torch's releases.

PyTorch 1.13.1 Release, small bug fix release

This release is meant to fix the following issues (regressions / silent correctness):

RuntimeError by torch.nn.modules.activation.MultiheadAttention with bias=False and batch_first=True #88669

Installation via pip on Amazon Linux 2, regression #88869

Installation using poetry on Mac M1, failure #88049

Missing masked tensor documentation #89734

torch.jit.annotations.parse_type_line is not safe (command injection) #88868

Use the Python frame safely in _pythonCallstack #88993

Double-backward with full_backward_hook causes RuntimeError #88312

Fix logical error in get_default_qat_qconfig #88876

Fix cuda/cpu check on NoneType and unit test #88854 and #88970

Onnx ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops #88504

Onnx operator_export_type on the new registry #87735

torchrun AttributeError caused by file_based_local_timer on Windows #85427

The release tracker should contain all relevant pull requests related to this release as well as links to related issues

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

Highlights

Backwards Incompatible Changes

New Features

Improvements

Performance

Documentation

Developers

Highlights

We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

Summary:

The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

Stable Beta Prototype

Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Stable	Beta	Prototype
Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices	Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

... (truncated)

Changelog

Sourced from torch's changelog.

Releasing PyTorch

General Overview

Cutting a release branch preparations

Cutting release branches

pytorch/pytorch

pytorch/builder / PyTorch domain libraries

Making release branch specific changes for PyTorch

Making release branch specific changes for domain libraries

Drafting RCs (https://github.com/pytorch/pytorch/blob/master/Release Candidates) for PyTorch and domain libraries

Release Candidate Storage

Release Candidate health validation

Cherry Picking Fixes

Promoting RCs to Stable

Additional Steps to prepare for release day

Modify release matrix

Open Google Colab issue

Patch Releases

Patch Release Criteria

Patch Release Process

Triage

Issue Tracker for Patch releases

Building a release schedule / cherry picking

Building Binaries / Promotion to Stable

Hardware / Software Support in Binary Build Matrix

Python

TL;DR

Accelerator Software

Special support cases

Special Topics

Updating submodules for a release

General Overview

Releasing a new version of PyTorch generally entails 3 major steps:

Cutting a release branch preparations

Cutting a release branch and making release branch specific changes

Drafting RCs (Release Candidates), and merging cherry picks

Promoting RCs to stable and performing release day tasks

Cutting a release branch preparations

Following Requirements needs to be met prior to final RC Cut:

Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch :

... (truncated)

Commits

49444c3 [BE] Do not package caffe2 in wheel (#87986) (#90433)
56de8a3 Add manual cuda deps search logic (#90411) (#90426)
a4d16e0 Fix ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops (#88504) (#90104)
80abad3 Handle Tensor.deepcopy via clone(), on IPU (#89129) (#89999)
73a852a [Release only change] Fix rocm5.1.1 docker image (#90321)
029ec16 Add platform markers for linux only extra_install_requires (#88826) (#89924)
197c5c0 Fix cuda/cpu check on NoneType (#88854) (#90068)
aadbeb7 Make TorchElastic timer importable on Windows (#88522) (#90045)
aa94433 Mark IPU device as not supports_as_strided (#89130) (#89998)
59b4f3b Use the Python frame safely in _pythonCallstack (#89997)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies

opened by dependabot[bot] 0

Releases(v0.3.0)

v0.3.0(Oct 31, 2021)
Changes

Revert for Drop RAdam.

Source code(tar.gz)
Source code(zip)
v0.2.0(Oct 26, 2021)
Changes

Drop RAdam optimizer since it is included in pytorch.

Do not include tests as installable package.

Preserver memory layout where possible.

Add MADGRAD optimizer.

Source code(tar.gz)
Source code(zip)
v0.1.0(Jan 1, 2021)
Changes

Initial release.

Added support for A2GradExp, A2GradInc, A2GradUni, AccSGD, AdaBelief, AdaBound, AdaMod, Adafactor, Adahessian, AdamP, AggMo, Apollo, DiffGrad, Lamb, Lookahead, NovoGrad, PID, QHAdam, QHM, RAdam, Ranger, RangerQH, RangerVA, SGDP, SGDW, SWATS, Shampoo, Yogi.

Source code(tar.gz)
Source code(zip)
v0.0.1a17(Nov 27, 2020)

Changes

4357dcf Better deploy workflow 9dfd2fc Add deploy workflow (#230) 31f60c9 Move to GitHub actions (#228) 7276b69 More test coverage for params validation. Add weight decay validaiotn for SWATS (#225) 2e013a0 Better test of lr value validation. ccc920d Bump sphinx from 3.3.0 to 3.3.1 788d1e8 Bump matplotlib from 3.3.2 to 3.3.3 3adf578 Add optimizer selection warning (#222) 5b2b59e Bump numpy from 1.19.3 to 1.19.4 9123897 Bump sphinx from 3.2.1 to 3.3.0 cc8acc8 Add apollo optimizer to the readme (#218) 1dcec1d Add apollo optimizer test 03e6dad Add apollo optimizer implemenation 15cc6da (add-apollo-optimizer) Merge branch 'master' of github.com:jettify/pytorch-optimizer ce7b51c Bump mypy from 0.782 to 0.790 3da1672 Bump numpy from 1.19.2 to 1.19.3 (#216) 6a1f550 Bump pytest from 6.1.1 to 6.1.2 3327ebb Bump torchvision from 0.7.0 to 0.8.1 40a7ba5 Bump torch from 1.6.0 to 1.7.0 a61c8ad Correct links to the A2Grad paper (#211) 8ce32cf bump version
Source code(tar.gz)
Source code(zip)
v0.0.1a16(Oct 20, 2020)

Changes

efeea8f Add adabelief to the readme. (#210) 9c72aa0 Add adabelief optimizer (#209) 0d94e4e Update CONTRIBUTING.rst (#207) 3a4abcd Bump sphinx-autodoc-typehints from 1.11.0 to 1.11.1 f08f793 Update readme with a2grad optimizers (#204) 9003a68 Add A2GradInc and A2GradExp optimizers. (#203) d221899 Bump hyperopt from 0.2.4 to 0.2.5 35f14f6 Bump ipdb from 0.13.3 to 0.13.4 1005b6b Bump flake8 from 3.8.3 to 3.8.4 9ad5102 Bump pytest from 6.1.0 to 6.1.1 bd71c05 Add a2grad optimizer (#199) ba60ddb Bump pytest from 6.0.2 to 6.1.0 (#197) 1e142d4 Bump matplotlib from 3.3.1 to 3.3.2 02f0d90 Bump pytest from 6.0.1 to 6.0.2 1ba47a8 Bump numpy from 1.19.1 to 1.19.2 be69ae7 Add adafactor test case. (#192) db002c6 Merge pull request #191 from matech96/patch-1 7edf138 yogi doc fix a2905c5 Merge pull request #190 from jettify/update-dock-with-new-optimizers d31dbc9 (origin/update-dock-with-new-optimizers, update-dock-with-new-optimizers) Update docs with new optimizers. 2308555 Merge pull request #189 from jettify/add-adafactor-optimizer ae0118b (origin/add-adafactor-optimizer, add-adafactor-optimizer) Add adafactor to the readme. f7237ee Add adafactor tests dcaa63f Add adafactor implementation b6fdd4e Merge pull request #186 from jettify/fix-warning-in-sgdp-and-adamp e1c2d2d (origin/fix-warning-in-sgdp-and-adamp, fix-warning-in-sgdp-and-adamp) Fix warnign in adamp 7eaa2f3 Fix warnings in sgdp 75d625a Fix warnings in adabound (#184) 30ef850 Bump black from 19.10b0 to 20.8b1 a149682 Fix warnings in qhadam (#182) 8787b6b Add swats to the readme (#181) c7bb0fd Add SWATS optimizer (#178) 5835261 Bump pytest-cov from 2.10.0 to 2.10.1 519473d Bump sphinx from 3.2.0 to 3.2.1 1b705d0 Fix missed warnings in case weight_decay specified (#177) 17c754e Fix warnings in shampoo (#176) 13db710 Fix warnings in accsgd (#175) 51ffdc6 Bump matplotlib from 3.3.0 to 3.3.1 913543a Fix warnings in yogi optimizer. (#173) ce54a95 Fix warnings in adamod (#172) a144f4b bump version
Source code(tar.gz)
Source code(zip)
v0.0.1a15(Aug 11, 2020)

Changes

279d420 Bump sphinx from 3.1.2 to 3.2.0 e345295 Fix warnings in sgdw (#168) 077c72d Fix warnings in qhm (#167) b6d87b2 Fix warning in radam (#166) 03ee2a4 Fix warning in novograd optimizer. (#165) 19026f5 Resolve warnings in pid optimizer (#164) 92ac42e Fix warnings in lamb optimizer (#163) 0df1686 Fix pytorch 1.6.0 compatibility (#162) 6a74cee Reformat code and sort imports (#160) 33a8bbe Add aggmo to the readme (#159) b4cc233 Bump pytest from 6.0.0 to 6.0.1 a48e955 More robust setup.py and bump numpy version (#157) a2749e2 Bump torchvision from 0.6.1 to 0.7.0 b20d9b3 Bump pytest from 5.4.3 to 6.0.0 a545011 Add aggmo optimizer (#153) 1552465 Bump matplotlib from 3.2.2 to 3.3.0 72d4e35 Bump dev version
Source code(tar.gz)
Source code(zip)
v0.0.1a14(Jul 13, 2020)

e90a185 Bump version e21b422 Bump sphinx from 3.1.1 to 3.1.2 1b70e4e Fix numpy version (#148) 9a9d233 Add SGDP optimizer (#145) 8452433 Bump numpy from 1.18.5 to 1.19.0 40c0723 Bump ipython from 7.15.0 to 7.16.1 a891371 Bump ipdb from 0.13.2 to 0.13.3 8f5c382 Add AdamP optimizer (#133) cae9bde Bump mypy from 0.781 to 0.782 17da984 Bump sphinx-autodoc-typehints from 1.10.3 to 1.11.0 387581d Bump mypy from 0.780 to 0.781 a291360 Bump torchvision from 0.6.0 to 0.6.1 fd5badc Bump torch from 1.5.0 to 1.5.1 144e72e Bump matplotlib from 3.2.1 to 3.2.2
Source code(tar.gz)
Source code(zip)
v0.0.1a13(Jun 17, 2020)

Changes

3c0dd77 Add documentation reference to the README. 4e5548e Bump pytest-cov from 2.9.0 to 2.10.0 861aa98 Bump sphinx from 3.1.0 to 3.1.1 219ae8a Bump flake8 from 3.8.2 to 3.8.3 81107ab Bump sphinx from 3.0.4 to 3.1.0 e3b059c Add lookagead optimizer to the README. (#128) 8ab662a Prepare lookahead optimizer for release. 29bd821 Bump ipython from 7.14.0 to 7.15.0 5f7e4e8 Bump mypy from 0.770 to 0.780 982a052 Bump numpy from 1.18.4 to 1.18.5 1e41844 Bump pytest from 5.4.2 to 5.4.3 96f7257 Add doc string and reformat code. da9e0f3 Fix linter. 8fda937 Add project URLs ba33395 Expose shampoo properly 6d194c1 Bump sphinx from 3.0.3 to 3.0.4 b3adae7 Refresh docs (#122) bfa8358 Bump flake8 from 3.8.1 to 3.8.2 2fd60a8 Bump pytest-cov from 2.8.1 to 2.9.0 542dd46 Add read the docs badge c77f808 Make tests less flaky (#119) 18742bf Bump flake8-quotes from 3.0.0 to 3.2.0 267b8c5 Add python3.8 to setup.py d14a85c Bump flake8 from 3.7.9 to 3.8.1 2d9da19 Update diffgrad.py (#118) ed84071 Bump pytest from 5.4.1 to 5.4.2 690ad22 Add Shampoo optimizer to the readme. (#114) 590b288 Bump ipython from 7.13.0 to 7.14.0 096b021 Bump numpy from 1.18.3 to 1.18.4 7875299 Add shampoo optimizer. (#110) fff339d Bump sphinx from 3.0.2 to 3.0.3 86664f5 Try py3.8 (#109) 87d9ccf Bump dev version
Source code(tar.gz)
Source code(zip)
v0.0.1a12(Apr 26, 2020)

Changes

63393ae Bump hyperopt from 0.2.3 to 0.2.4 bfc46c4 Bump torchvision from 0.5.0 to 0.6.0 660d831 Bump torch from 1.4.0 to 1.5.0 f21c85a Bump numpy from 1.18.2 to 1.18.3 e8d4866 Bump sphinx from 3.0.1 to 3.0.2 2cfbf20 RAdam fix for issue #96. (#103) df65965 Bump sphinx from 3.0.0 to 3.0.1 19408f5 Return type hint fixed for torch_optimizer.get (#101) 243509b Bump sphinx from 2.4.4 to 3.0.0 0320faa Raise exception if optimizer not found. Fix few mypy types. (#98) 05b2da5 Bump dev version
Source code(tar.gz)
Source code(zip)
v0.0.1a11(Apr 5, 2020)

Changes

e2425ec Rework get optimizer function (#97) 4376be1 Add torch_optimizer.get() method (#95) fd4f0a9 Add ranger optimizer to the readme (#94) f0e1f7c Add Rangers (#93) 40b832c Add keywords to setup.py af6d6e1 Add reference code links in doc strings 1bd385d Bump flake8-quotes from 2.1.1 to 3.0.0 4997fa3 Documentation update and README fixes. a7e330e Minor doc string cleanup and fix constant for QHM cb7bdb9 Add QHAdam to the readme (#91) 5155f0b Update docs with new optimizers. (#90) 3ef2b4f Bump matplotlib from 3.2.0 to 3.2.1 7aaaafa Better test coverage for QHAdam optimizer (#88) 7ab6b72 Add QHAdam optimizer (#87) ad59d3f Bump pytest from 5.4.0 to 5.4.1 a5e0fdc Bump version
Source code(tar.gz)
Source code(zip)
v0.0.1a10(Mar 15, 2020)

Changes

71e9bb4 Add Python3.5 support (#85) 071a73e apply black 3447833 Bump pytest from 5.3.5 to 5.4.0 b2e587a Bump mypy from 0.761 to 0.770 64e8e48 Add QHM optimizer to the readme. (#82) 86c2587 Better test coverage for QHM (#81) a3609b4 Fix readme formatting 4b7d230 added Adam & SGD images to README and viz script (#80) 697fdb1 Add comments about yogi optimizer initialization #77 (#79) 95b0f91 Add deepsourde badge. fd57585 Minor style changes. 5bcc51f Bump version and minor tweaks in setup.py 7003d6c Less error prone parameter validation. (#78) abb84d3 Add QHM basic implementation. (#73) ea1d413 Bump sphinx from 2.4.3 to 2.4.4 6a2e8f8 Tweak linter configuration and address one issue in setup.py (#74) d6a52ae Bump matplotlib from 3.1.3 to 3.2.0 fa961b7 Add PID optimizer to the list of supported in README. (#70) c84d1ea Add .deepsource.toml 76484ff Bump ipdb from 0.13.1 to 0.13.2
Source code(tar.gz)
Source code(zip)
v0.0.1a9(Mar 4, 2020)

9b80b11 Better code coverage for PID optimizer (#68) a3f2fdc Change default values for yogi optimizer (#62) 564584e Add grad_clip to the example and re-tune p. for all methods (#67) 3391056 Add clamp_value (float) and debias (bool) parameters to LAMB optimizer (#65) 92d6818 Add PID optimizer (#66)
Source code(tar.gz)
Source code(zip)
v0.0.1a8(Mar 2, 2020)

Source code(tar.gz)
Source code(zip)
v0.0.1a7(Feb 27, 2020)

Source code(tar.gz)
Source code(zip)
v0.0.1a6(Feb 22, 2020)

Source code(tar.gz)
Source code(zip)
v0.0.1a5(Feb 15, 2020)

Source code(tar.gz)
Source code(zip)
v0.0.1a4(Feb 11, 2020)

Source code(tar.gz)
Source code(zip)
v0.0.1a3(Feb 9, 2020)

Source code(tar.gz)
Source code(zip)
v0.0.1a2(Feb 3, 2020)

Source code(tar.gz)
Source code(zip)
v0.0.1a1(Jan 22, 2020)

Source code(tar.gz)
Source code(zip)

torch-optimizer -- collection of optimizers for Pytorch

Related tags

Overview

torch-optimizer

Simple example

Installation

Documentation

Supported Optimizers

Visualizations

Warning

A2GradExp

A2GradInc

A2GradUni

AccSGD

AdaBelief

AdaBound

AdaMod

Adafactor

Adahessian

AdamP

AggMo

Apollo

DiffGrad

Lamb

Lookahead

NovoGrad

PID

QHAdam

QHM

RAdam

Ranger

RangerQH

RangerVA

SGDP

SGDW

SWATS

Shampoo

Yogi

Adam (PyTorch built-in)

SGD (PyTorch built-in)

Comments

The get method

Drop-in replacement for torch.optim

exp_avg_sq Initialization

exp_avg Initialization

v6.1.2

v6.1.1

v6.1.0

v6.0.1

v6.0.0

v6.0.0b2

v6.0.0b1

v5.3.0

v5.2.3

v5.2.2

v5.2.1

v5.2.0

v5.1.1

v5.1.0

v5.0.2

v5.0.1

v5.0.0

Release 6.1.2 (released Jan 07, 2023)

Bugs fixed

Release 6.1.1 (released Jan 05, 2023)

Bugs fixed

Release 6.1.0 (released Jan 05, 2023)

Dependencies

Incompatible changes

Deprecated

v1.24.1

NumPy 1.24.1 Release Notes

Contributors

Pull requests merged

Checksums

Release Notes

5.11.4

Changes

:package: Dependencies

5.11.3

Drop-in replacement for `torch.optim`

`exp_avg_sq` Initialization

`exp_avg` Initialization