# torch-optimizer -- collection of optimizers for Pytorch

## torch-optimizer

torch-optimizer -- collection of optimizers for PyTorch compatible with optim module.

### Simple example

```import torch_optimizer as optim

# model = ...
optimizer.step()```

### Installation

Installation process is simple, just:

```\$ pip install torch_optimizer
```

### Documentation

https://pytorch-optimizer.rtfd.io

## Supported Optimizers

### Visualizations

Visualizations help us to see how different algorithms deals with simple situations like: saddle points, local minima, valleys etc, and may provide interesting insights into inner workings of algorithm. Rosenbrock and Rastrigin benchmark functions was selected, because:

• Rosenbrock (also known as banana function), is non-convex function that has one global minima (1.0. 1.0). The global minimum is inside a long, narrow, parabolic shaped flat valley. To find the valley is trivial. To converge to the global minima, however, is difficult. Optimization algorithms might pay a lot of attention to one coordinate, and have problems to follow valley which is relatively flat.
• Rastrigin function is a non-convex and has one global minima in (0.0, 0.0). Finding the minimum of this function is a fairly difficult problem due to its large search space and its large number of local minima.

Each optimizer performs 501 optimization steps. Learning rate is best one found by hyper parameter search algorithm, rest of tuning parameters are default. It is very easy to extend script and tune other optimizer parameters.

```python examples/viz_optimizers.py
```

### Warning

Do not pick optimizer based on visualizations, optimization approaches have unique properties and may be tailored for different purposes or may require explicit learning rate schedule etc. Best way to find out, is to try one on your particular problem and see if it improves scores.

If you do not know which optimizer to use start with built in SGD/Adam, once training logic is ready and baseline scores are established, swap optimizer and see if there is any improvement.

```import torch_optimizer as optim

# model = ...
model.parameters(),
kappa=1000.0,
beta=10.0,
lips=10.0,
rho=0.5,
)
optimizer.step()```

```import torch_optimizer as optim

# model = ...
model.parameters(),
kappa=1000.0,
beta=10.0,
lips=10.0,
)
optimizer.step()```

```import torch_optimizer as optim

# model = ...
model.parameters(),
kappa=1000.0,
beta=10.0,
lips=10.0,
)
optimizer.step()```

### AccSGD

```import torch_optimizer as optim

# model = ...
optimizer = optim.AccSGD(
model.parameters(),
lr=1e-3,
kappa=1000.0,
xi=10.0,
small_const=0.7,
weight_decay=0
)
optimizer.step()```

Paper: On the insufficiency of existing momentum schemes for Stochastic Optimization (2019) [https://arxiv.org/abs/1803.05591]

Reference Code: https://github.com/rahulkidambi/AccSGD

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-3,
weight_decay=0,
weight_decouple=False,
fixed_decay=False,
rectify=False,
)
optimizer.step()```

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
betas= (0.9, 0.999),
final_lr = 0.1,
gamma=1e-3,
eps= 1e-8,
weight_decay=0,
amsbound=False,
)
optimizer.step()```

Paper: Adaptive Gradient Methods with Dynamic Bound of Learning Rate (2019) [https://arxiv.org/abs/1902.09843]

AdaMod method restricts the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
beta3=0.999,
eps=1e-8,
weight_decay=0,
)
optimizer.step()```

Paper: An Adaptive and Momental Bound Method for Stochastic Learning. (2019) [https://arxiv.org/abs/1910.12249]

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
eps2= (1e-30, 1e-3),
clip_threshold=1.0,
decay_rate=-0.8,
beta1=None,
weight_decay=0.0,
scale_parameter=True,
relative_step=True,
warmup_init=False,
)
optimizer.step()```

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1.0,
betas= (0.9, 0.999)
eps= 1e-4,
weight_decay=0.0,
hessian_power=1.0,
)
loss_fn(m(input), target).backward(create_graph = True) # create_graph=True is necessary for Hessian calculation
optimizer.step()```

Paper: ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning (2020) [https://arxiv.org/abs/2006.00719]

AdamP propose a simple and effective solution: at each iteration of Adam optimizer applied on scale-invariant weights (e.g., Conv weights preceding a BN layer), AdamP remove the radial component (i.e., parallel to the weight vector) from the update vector. Intuitively, this operation prevents the unnecessary update along the radial direction that only increases the weight norm without contributing to the loss minimization.

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
delta = 0.1,
wd_ratio = 0.1
)
optimizer.step()```

Paper: Slowing Down the Weight Norm Increase in Momentum-based Optimizers. (2020) [https://arxiv.org/abs/2006.08217]

### AggMo

```import torch_optimizer as optim

# model = ...
optimizer = optim.AggMo(
m.parameters(),
lr= 1e-3,
betas=(0.0, 0.9, 0.99),
weight_decay=0,
)
optimizer.step()```

Paper: Aggregated Momentum: Stability Through Passive Damping. (2019) [https://arxiv.org/abs/1804.00325]

Reference Code: https://github.com/AtheMathmo/AggMo

### Apollo

```import torch_optimizer as optim

# model = ...
optimizer = optim.Apollo(
m.parameters(),
lr= 1e-2,
beta=0.9,
eps=1e-4,
warmup=0,
init_lr=0.01,
weight_decay=0,
)
optimizer.step()```

Paper: Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization. (2020) [https://arxiv.org/abs/2009.13586]

Reference Code: https://github.com/XuezheMax/apollo

Optimizer based on the difference between the present and the immediate past gradient, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters.

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
)
optimizer.step()```

Paper: diffGrad: An Optimization Method for Convolutional Neural Networks. (2019) [https://arxiv.org/abs/1909.11015]

### Lamb

```import torch_optimizer as optim

# model = ...
optimizer = optim.Lamb(
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
)
optimizer.step()```

Paper: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (2019) [https://arxiv.org/abs/1904.00962]

Reference Code: https://github.com/cybertronai/pytorch-lamb

```import torch_optimizer as optim

# model = ...
# base optimizer, any other optimizer can be used like Adam or DiffGrad
yogi = optim.Yogi(
m.parameters(),
lr= 1e-2,
betas=(0.9, 0.999),
eps=1e-3,
initial_accumulator=1e-6,
weight_decay=0,
)

optimizer.step()```

Paper: Lookahead Optimizer: k steps forward, 1 step back (2019) [https://arxiv.org/abs/1907.08610]

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
)
optimizer.step()```

Paper: Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks (2019) [https://arxiv.org/abs/1905.11286]

Reference Code: https://github.com/NVIDIA/DeepLearningExamples/

### PID

```import torch_optimizer as optim

# model = ...
optimizer = optim.PID(
m.parameters(),
lr=1e-3,
momentum=0,
dampening=0,
weight_decay=1e-2,
integral=5.0,
derivative=10.0,
)
optimizer.step()```

Paper: A PID Controller Approach for Stochastic Optimization of Deep Networks (2018) [http://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR18_PID.pdf]

Reference Code: https://github.com/tensorboy/PIDOptimizer

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
nus=(1.0, 1.0),
weight_decay=0,
decouple_weight_decay=False,
eps=1e-8,
)
optimizer.step()```

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2019) [https://arxiv.org/abs/1810.06801]

### QHM

```import torch_optimizer as optim

# model = ...
optimizer = optim.QHM(
m.parameters(),
lr=1e-3,
momentum=0,
nu=0.7,
weight_decay=1e-2,
)
optimizer.step()```

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2019) [https://arxiv.org/abs/1810.06801]

```import torch_optimizer as optim

# model = ...
m.parameters(),
lr= 1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0,
)
optimizer.step()```

Paper: On the Variance of the Adaptive Learning Rate and Beyond (2019) [https://arxiv.org/abs/1908.03265]

### Ranger

```import torch_optimizer as optim

# model = ...
optimizer = optim.Ranger(
m.parameters(),
lr=1e-3,
alpha=0.5,
k=6,
N_sma_threshhold=5,
betas=(.95, 0.999),
eps=1e-5,
weight_decay=0
)
optimizer.step()```

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

### RangerQH

```import torch_optimizer as optim

# model = ...
optimizer = optim.RangerQH(
m.parameters(),
lr=1e-3,
betas=(0.9, 0.999),
nus=(.7, 1.0),
weight_decay=0.0,
k=6,
alpha=.5,
decouple_weight_decay=False,
eps=1e-8,
)
optimizer.step()```

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2018) [https://arxiv.org/abs/1810.06801]

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

### RangerVA

```import torch_optimizer as optim

# model = ...
optimizer = optim.RangerVA(
m.parameters(),
lr=1e-3,
alpha=0.5,
k=6,
n_sma_threshhold=5,
betas=(.95, 0.999),
eps=1e-5,
weight_decay=0,
transformer='softplus',
smooth=50,
)
optimizer.step()```

Paper: Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM (2019) [https://arxiv.org/abs/1908.00700v2]

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

### SGDP

```import torch_optimizer as optim

# model = ...
optimizer = optim.SGDP(
m.parameters(),
lr= 1e-3,
momentum=0,
dampening=0,
weight_decay=1e-2,
nesterov=False,
delta = 0.1,
wd_ratio = 0.1
)
optimizer.step()```

Paper: Slowing Down the Weight Norm Increase in Momentum-based Optimizers. (2020) [https://arxiv.org/abs/2006.08217]

### SGDW

```import torch_optimizer as optim

# model = ...
optimizer = optim.SGDW(
m.parameters(),
lr= 1e-3,
momentum=0,
dampening=0,
weight_decay=1e-2,
nesterov=False,
)
optimizer.step()```

Paper: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) [https://arxiv.org/abs/1608.03983]

Reference Code: https://github.com/pytorch/pytorch/pull/22466

### SWATS

```import torch_optimizer as optim

# model = ...
optimizer = optim.SWATS(
model.parameters(),
lr=1e-1,
betas=(0.9, 0.999),
eps=1e-3,
weight_decay= 0.0,
nesterov=False,
)
optimizer.step()```

Paper: Improving Generalization Performance by Switching from Adam to SGD (2017) [https://arxiv.org/abs/1712.07628]

Reference Code: https://github.com/Mrpatekful/swats

### Shampoo

```import torch_optimizer as optim

# model = ...
optimizer = optim.Shampoo(
m.parameters(),
lr=1e-1,
momentum=0.0,
weight_decay=0.0,
epsilon=1e-4,
update_freq=1,
)
optimizer.step()```

Paper: Shampoo: Preconditioned Stochastic Tensor Optimization (2018) [https://arxiv.org/abs/1802.09568]

Reference Code: https://github.com/moskomule/shampoo.pytorch

### Yogi

Yogi is optimization algorithm based on ADAM with more fine grained effective learning rate control, and has similar theoretical guarantees on convergence as ADAM.

```import torch_optimizer as optim

# model = ...
optimizer = optim.Yogi(
m.parameters(),
lr= 1e-2,
betas=(0.9, 0.999),
eps=1e-3,
initial_accumulator=1e-6,
weight_decay=0,
)
optimizer.step()```

Reference Code: https://github.com/4rtemi5/Yogi-Optimizer_Keras

### SGD (PyTorch built-in)

Hello

paper: https://arxiv.org/abs/1908.00700v2 implementation: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

Also: The clamp parameter for the weight_norm (=10) is hardcoded in LAMB, can you add a new parameter to custom it ? `weight_norm = p.data.pow(2).sum().sqrt().clamp(0, 10)` You can use torch.norm(.) to compute the norm.

Thank you.

opened by tkon3 8
• #### Using GPU to train the model

Hello, I'm really appreciate your work. But now I wonder how to use GPU to train the model. There are always mistakes when I use the CUDA device. Thanks a lot. `device = torch.device('cuda' if use_cuda else 'cpu')`

opened by penny9287 7

Hello,

I´m a big fan of this project. Recently a new optimized has been proposed, that promises SOTA for many tasks.

The possibility of using it here is very good!

opened by bratao 6
• #### Lamb optimizer warning in pytorch 1.6

Hi I'm getting this deprecated warning in pytorch 1.6 for Lamb:

``````
| 2020-06-25T01:58:41.682+01:00 | add_(Tensor other, *, Number alpha) (Triggered internally at /opt/conda/conda-bld/pytorch_1592982553767/work/torch/csrc/utils/python_arg_parser.cpp:766.)
-- | -- | --
| 2020-06-25T01:58:41.682+01:00 | 2020-06-25T00:58:41 - WARNING - /opt/conda/envs/py36/lib/python3.6/site-packages/torch_optimizer/lamb.py:120: UserWarning: This overload of add_ is deprecated:
| 2020-06-25T01:58:41.682+01:00 | add_(Number alpha, Tensor other)
| 2020-06-25T01:58:41.682+01:00 | Consider using one of the following signatures instead:
| 2020-06-25T01:58:41.682+01:00 | add_(Tensor other, *, Number alpha) (Triggered internally at /opt/conda/conda-bld/pytorch_1592982553767/work/torch/csrc/utils/python_arg_parser.cpp:766.)

``````
opened by Laksh1997 6
• #### Unfair comparison in Visualizations

Hi,

Thanks a lot for this great repo. For the comparison in the Visualizations example, I found that for each config, you run 100 updates. I am concerned that 100 is too small so that it would favor optimizers that have fast convergence in the first few updates.

For other optimizers that the convergence is relatively slow at beginning, it would select large lr. This could lead to unstable convergence for these optimizers.

Moreover, for hyper-parameter search, the objective is the distance between the last step point and the minimum. I think the function value of the last step point may be a better objective.

At last, some optimizers implicitly implement learning rate decay (such as AdaBound and RAdam), but some not. But in your comparison, no explicit learning rate schedule is used.

opened by XuezheMax 5
• #### 'Yogi' object has no attribute 'Yogi'

Hi, if calling yogi from pytorch-optimizer has some bug ( runtime error, Yogi object has no attribute Yogi). so at this moment, i am calling yogi.py directly.

``````#import torch_optimizer as optim      # in second iteration (for statement), it getting error.
from yogi import Yogi                        # import from yogi.py file (include types.py definition)

for fold, (train_idx, val_idx) in enumerate(...):
model = Net(...)
# optim = optim.Yogi(model, parameters(), lr=1e-2, betas=(0.9, 0.999), eps=1e-3, initial_accumulator=1e-6, weight_decay=0)
optim = Yogi(model, parameters(), lr=1e-2, betas=(0.9, 0.999), eps=1e-3, initial_accumulator=1e-6, weight_decay=0)
...
``````
opened by sailfish009 5

### The get method

How would you feel about a function like this?

It makes it very easy to change between optimizer to test some of them, for example

``````import argparse
import torch_optimizer as optim

parser = argparse.ArgumentParser()

if __name__ == '__main__':
args = parser.parse_args()
opt_class = optim.get(args.optimizer)
optimizer = opt_class(model.parameters(), lr=args.lr, **kwargs)
``````

I can be improved and restricted as well. If it would be me, I'd make aliases, and probably search globals only for things in `__all__` and their aliases, and make the search case-insensitive.

Tell me if you'd be interested.

### Drop-in replacement for `torch.optim`

I would also import all optimizers from `torch.optim` directly, so that `Adam` could also be imported from here. If both these things are adopted, it would be easier than ever to compare between `Adam`, `Radam`, `SGD` and `AccSGD` for exampe. As simple as

``````python train.py --optimizer adam
python train.py --optimizer sgd
python train.py --optimizer accsgd
``````

With only one import (`torch_optimizer`) and no if statements.

opened by mpariente 5

As discussed in #64, just added the three Rangers as a dependency.

For now, the params can't be tested because the error messages are not the same. Also, I could not make the `beale` test work for `Ranger`.

This is just a draft, feel free to push to my fork if you want to change the PR.

Regarding the docstrings and types, I wonder if it wouldn't be easier to sublass the optimizers here so that we can use the object in `types.py`. Or should I copy them there?

opened by mpariente 5

### `exp_avg_sq` Initialization

"Thus, for YOGI, we propose to initialize the vt based on gradient square evaluated at the initial point averaged over a (reasonably large) mini-batch."

The initial `exp_avg_sq` should be initialized to the gradient square.

### `exp_avg` Initialization

The YOGI optimizer `exp_avg` should be initialized to zero instead of `initial_accumulator` based on `m0` above.

opened by PetrochukM 5

I love the illustrations, but I find the absence of any kind of baselines a shame. It'd be nice to see how Adam or SGD do on the example functions and compare them with some of the more fancy optimizers.

Would this be possible?

I can probably run the required experiments myself, if there are no problems.

opened by slckl 5

AdaScale was introduced in this paper: https://openreview.net/forum?id=rygxdA4YPS

The paper showed that the proposed optimizer is able to get the same results across five tasks regardless of the batch size (testing from 32 to 32.8k). Here's the graphic:

It'd be nice to have a batch invariant optimizer included.

opened by PetrochukM 5
• #### Bump sphinx from 4.2.0 to 6.1.2

Bumps sphinx from 4.2.0 to 6.1.2.

Release notes

Sourced from sphinx's releases.

## v5.0.0

No release notes provided.

... (truncated)

Changelog

Sourced from sphinx's changelog.

# Release 6.1.2 (released Jan 07, 2023)

## Bugs fixed

• #11101: LaTeX: `div.topic_padding` key of sphinxsetup documented at 5.1.0 was implemented with name `topic_padding`

• #11099: LaTeX: `shadowrule` key of sphinxsetup causes PDF build to crash since Sphinx 5.1.0

• #11096: LaTeX: `shadowsize` key of sphinxsetup causes PDF build to crash since Sphinx 5.1.0

• #11095: LaTeX: shadow of :dudir:`topic` and contents_ boxes not in page margin since Sphinx 5.1.0

• #11100: Fix copying images when running under parallel mode.

# Release 6.1.1 (released Jan 05, 2023)

## Bugs fixed

• #11091: Fix `util.nodes.apply_source_workaround` for `literal_block` nodes with no source information in the node or the node's parents.

# Release 6.1.0 (released Jan 05, 2023)

## Dependencies

• Adopted the `Ruff`_ code linter.

## Incompatible changes

• #10979: gettext: Removed support for pluralisation in `get_translation`. This was unused and complicated other changes to `sphinx.locale`.

## Deprecated

• `sphinx.util` functions:

• Renamed `sphinx.util.typing.stringify()` to `sphinx.util.typing.stringify_annotation()`

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

• `@dependabot rebase` will rebase this PR
• `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
• `@dependabot merge` will merge this PR after your CI passes on it
• `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
• `@dependabot cancel merge` will cancel a previously requested merge and block automerging
• `@dependabot reopen` will reopen this PR if it is closed
• `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
• `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependencies
opened by dependabot[bot] 0
• #### Bump numpy from 1.21.3 to 1.24.1

Bumps numpy from 1.21.3 to 1.24.1.

Release notes

Sourced from numpy's releases.

# NumPy 1.24.1 Release Notes

NumPy 1.24.1 is a maintenance release that fixes bugs and regressions discovered after the 1.24.0 release. The Python versions supported by this release are 3.8-3.11.

## Contributors

A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

• Andrew Nelson
• Ben Greiner +
• Charles Harris
• Clément Robert
• Matteo Raso
• Matti Picus
• Melissa Weber Mendonça
• Miles Cranmer
• Ralf Gommers
• Rohit Goswami
• Sebastian Berg

## Pull requests merged

A total of 18 pull requests were merged for this release.

• #22830: BLD: CIRRUS_TAG redux
• #22831: DOC: fix a couple typos in 1.23 notes
• #22832: BUG: Fix refcounting errors found using pytest-leaks
• #22834: BUG, SIMD: Fix invalid value encountered in several ufuncs
• #22837: TST: ignore more np.distutils.log imports
• #22839: BUG: Do not use getdata() in np.ma.masked_invalid
• #22847: BUG: Ensure correct behavior for rows ending in delimiter in...
• #22848: BUG, SIMD: Fix the bitmask of the boolean comparison
• #22857: BLD: Help raspian arm + clang 13 about __builtin_mul_overflow
• #22866: BUG: Polynomials now copy properly (#22669)
• #22867: BUG, SIMD: Fix memory overlap in ufunc comparison loops
• #22868: BUG: Fortify string casts against floating point warnings
• #22875: TST: Ignore nan-warnings in randomized out tests
• #22884: BUG: Fix integer overflow in in1d for mixed integer dtypes #22877
• #22887: BUG: Use whole file for encoding checks with `charset_normalizer`.

## Checksums

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

• `@dependabot rebase` will rebase this PR
• `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
• `@dependabot merge` will merge this PR after your CI passes on it
• `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
• `@dependabot cancel merge` will cancel a previously requested merge and block automerging
• `@dependabot reopen` will reopen this PR if it is closed
• `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
• `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependencies
opened by dependabot[bot] 0
• #### Bump wheel from 0.37.0 to 0.38.4

Bumps wheel from 0.37.0 to 0.38.4.

Changelog

Sourced from wheel's changelog.

# Release Notes

UNRELEASED

• Updated vendored `packaging` to 22.0

0.38.4 (2022-11-09)

• Fixed `PKG-INFO` conversion in `bdist_wheel` mangling UTF-8 header values in `METADATA` (PR by Anderson Bravalheri)

0.38.3 (2022-11-08)

• Fixed install failure when used with `--no-binary`, reported on Ubuntu 20.04, by removing `setup_requires` from `setup.cfg`

0.38.2 (2022-11-05)

• Fixed regression introduced in v0.38.1 which broke parsing of wheel file names with multiple platform tags

0.38.1 (2022-11-04)

• Removed install dependency on setuptools
• The future-proof fix in 0.36.0 for converting PyPy's SOABI into a abi tag was faulty. Fixed so that future changes in the SOABI will not change the tag.

0.38.0 (2022-10-21)

• Dropped support for Python < 3.7
• Updated vendored `packaging` to 21.3
• Replaced all uses of `distutils` with `setuptools`
• The handling of `license_files` (including glob patterns and default values) is now delegated to `setuptools>=57.0.0` (#466). The package dependencies were updated to reflect this change.
• Fixed potential DoS attack via the `WHEEL_INFO_RE` regular expression
• Fixed `ValueError: ZIP does not support timestamps before 1980` when using `SOURCE_DATE_EPOCH=0` or when on-disk timestamps are earlier than 1980-01-01. Such timestamps are now changed to the minimum value before packaging.

0.37.1 (2021-12-22)

• Fixed `wheel pack` duplicating the `WHEEL` contents when the build number has changed (#415)
• Fixed parsing of file names containing commas in `RECORD` (PR by Hood Chatham)

0.37.0 (2021-08-09)

• Added official Python 3.10 support
• Updated vendored `packaging` library to v20.9

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

• `@dependabot rebase` will rebase this PR
• `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
• `@dependabot merge` will merge this PR after your CI passes on it
• `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
• `@dependabot cancel merge` will cancel a previously requested merge and block automerging
• `@dependabot reopen` will reopen this PR if it is closed
• `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
• `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependencies
opened by dependabot[bot] 0
• #### Bump isort from 5.9.3 to 5.11.4

Bumps isort from 5.9.3 to 5.11.4.

Release notes

Sourced from isort's releases.

## Changes December 12 2022

... (truncated)

Changelog

Sourced from isort's changelog.

### 5.11.0 December 12 2022

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

• `@dependabot rebase` will rebase this PR
• `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
• `@dependabot merge` will merge this PR after your CI passes on it
• `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
• `@dependabot cancel merge` will cancel a previously requested merge and block automerging
• `@dependabot reopen` will reopen this PR if it is closed
• `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
• `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependencies
opened by dependabot[bot] 0
• #### Bump black from 21.9b0 to 23.1a1

Bumps black from 21.9b0 to 23.1a1.

Release notes

Sourced from black's releases.

## 23.1a1

This release provides a preview of Black's 2023 stable style. Black's default formatting style includes the following changes:

• Enforce empty lines before classes and functions with sticky leading comments (#3302) (22.12.0)
• Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348) (22.12.0)
• Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307) (22.12.0)
• Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370) (22.12.0)
• `--skip-string-normalization` / `-S` now prevents docstring prefixes from being normalized as expected (#3168) (since 22.8.0)
• When using `--skip-magic-trailing-comma` or `-C`, trailing commas are stripped from subscript expressions with more than 1 element (#3209) (22.8.0)
• Implicitly concatenated strings inside a list, set, or tuple are now wrapped inside parentheses (#3162) (22.8.0)
• Fix a string merging/split issue when a comment is present in the middle of implicitly concatenated strings on its own line (#3227) (22.8.0)
• Docstring quotes are no longer moved if it would violate the line length limit (#3044, #3430) (22.6.0)
• Parentheses around return annotations are now managed (#2990) (22.6.0)
• Remove unnecessary parentheses around awaited objects (#2991) (22.6.0)
• Remove unnecessary parentheses in `with` statements (#2926) (22.6.0)
• Remove trailing newlines after code block open (#3035) (22.6.0)
• Code cell separators `#%%` are now standardised to `# %%` (#2919) (22.3.0)
• Remove unnecessary parentheses from `except` statements (#2939) (22.3.0)
• Remove unnecessary parentheses from tuple unpacking in `for` loops (#2945) (22.3.0)
• Avoid magic-trailing-comma in single-element subscripts (#2942) (22.3.0)

Please try it out and give feedback here: psf/black#3407

A stable 23.1.0 release will follow in January 2023.

## 22.12.0

### Preview style

• Enforce empty lines before classes and functions with sticky leading comments (#3302)
• Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348)
• Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307)
• Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370)

### Configuration

... (truncated)

Changelog

Sourced from black's changelog.

# Change Log

## Unreleased

### Stable style

• Fix a crash when a colon line is marked between `# fmt: off` and `# fmt: on` (#3439)

### Preview style

• Fix a crash in preview style with assert + parenthesized string (#3415)
• Fix crashes in preview style with walrus operators used in function return annotations and except clauses (#3423)
• Do not put the closing quotes in a docstring on a separate line, even if the line is too long (#3430)
• Long values in dict literals are now wrapped in parentheses; correspondingly unnecessary parentheses around short values in dict literals are now removed; long string lambda values are now wrapped in parentheses (#3440)

### Packaging

• Upgrade mypyc from `0.971` to `0.991` so mypycified Black can be built on armv7 (#3380)
• Drop specific support for the `tomli` requirement on 3.11 alpha releases, working around a bug that would cause the requirement not to be installed on any non-final Python releases (#3448)

### Output

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

• `@dependabot rebase` will rebase this PR
• `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
• `@dependabot merge` will merge this PR after your CI passes on it
• `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
• `@dependabot cancel merge` will cancel a previously requested merge and block automerging
• `@dependabot reopen` will reopen this PR if it is closed
• `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
• `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependencies
opened by dependabot[bot] 0
• #### Bump torch from 1.10.0 to 1.13.1

Bumps torch from 1.10.0 to 1.13.1.

Release notes

Sourced from torch's releases.

## PyTorch 1.13.1 Release, small bug fix release

This release is meant to fix the following issues (regressions / silent correctness):

• RuntimeError by torch.nn.modules.activation.MultiheadAttention with bias=False and batch_first=True #88669
• Installation via pip on Amazon Linux 2, regression #88869
• Installation using poetry on Mac M1, failure #88049
• Missing masked tensor documentation #89734
• torch.jit.annotations.parse_type_line is not safe (command injection) #88868
• Use the Python frame safely in _pythonCallstack #88993
• Double-backward with full_backward_hook causes RuntimeError #88312
• Fix logical error in get_default_qat_qconfig #88876
• Fix cuda/cpu check on NoneType and unit test #88854 and #88970
• Onnx ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops #88504
• Onnx operator_export_type on the new registry #87735
• torchrun AttributeError caused by file_based_local_timer on Windows #85427

The release tracker should contain all relevant pull requests related to this release as well as links to related issues

# Pytorch 1.13 Release Notes

• Highlights
• Backwards Incompatible Changes
• New Features
• Improvements
• Performance
• Documentation
• Developers

# Highlights

We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

Summary:

• The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

• Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

• Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to `import functorch` and use functorch without needing to install another package.

• PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

Stable Beta Prototype
Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

You can check the blogpost that shows the new features here.

# Backwards Incompatible changes

... (truncated)

Changelog

Sourced from torch's changelog.

# Releasing PyTorch

## General Overview

Releasing a new version of PyTorch generally entails 3 major steps:

1. Cutting a release branch preparations
2. Cutting a release branch and making release branch specific changes
3. Drafting RCs (Release Candidates), and merging cherry picks
4. Promoting RCs to stable and performing release day tasks

## Cutting a release branch preparations

Following Requirements needs to be met prior to final RC Cut:

• Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch :

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

• `@dependabot rebase` will rebase this PR
• `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
• `@dependabot merge` will merge this PR after your CI passes on it
• `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
• `@dependabot cancel merge` will cancel a previously requested merge and block automerging
• `@dependabot reopen` will reopen this PR if it is closed
• `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
• `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
• `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependencies
opened by dependabot[bot] 0

# Changes

• Drop RAdam optimizer since it is included in pytorch.
• Do not include tests as installable package.
• Preserver memory layout where possible.
Source code(tar.gz)
Source code(zip)

# Changes

• Initial release.
Source code(tar.gz)
Source code(zip)

# Changes

Source code(tar.gz)
Source code(zip)

# Changes

Source code(tar.gz)
Source code(zip)

# Changes

279d420 Bump sphinx from 3.1.2 to 3.2.0 e345295 Fix warnings in sgdw (#168) 077c72d Fix warnings in qhm (#167) b6d87b2 Fix warning in radam (#166) 03ee2a4 Fix warning in novograd optimizer. (#165) 19026f5 Resolve warnings in pid optimizer (#164) 92ac42e Fix warnings in lamb optimizer (#163) 0df1686 Fix pytorch 1.6.0 compatibility (#162) 6a74cee Reformat code and sort imports (#160) 33a8bbe Add aggmo to the readme (#159) b4cc233 Bump pytest from 6.0.0 to 6.0.1 a48e955 More robust setup.py and bump numpy version (#157) a2749e2 Bump torchvision from 0.6.1 to 0.7.0 b20d9b3 Bump pytest from 5.4.3 to 6.0.0 a545011 Add aggmo optimizer (#153) 1552465 Bump matplotlib from 3.2.2 to 3.3.0 72d4e35 Bump dev version

Source code(tar.gz)
Source code(zip)
• #### v0.0.1a14(Jul 13, 2020)

e90a185 Bump version e21b422 Bump sphinx from 3.1.1 to 3.1.2 1b70e4e Fix numpy version (#148) 9a9d233 Add SGDP optimizer (#145) 8452433 Bump numpy from 1.18.5 to 1.19.0 40c0723 Bump ipython from 7.15.0 to 7.16.1 a891371 Bump ipdb from 0.13.2 to 0.13.3 8f5c382 Add AdamP optimizer (#133) cae9bde Bump mypy from 0.781 to 0.782 17da984 Bump sphinx-autodoc-typehints from 1.10.3 to 1.11.0 387581d Bump mypy from 0.780 to 0.781 a291360 Bump torchvision from 0.6.0 to 0.6.1 fd5badc Bump torch from 1.5.0 to 1.5.1 144e72e Bump matplotlib from 3.2.1 to 3.2.2

Source code(tar.gz)
Source code(zip)

# Changes

Source code(tar.gz)
Source code(zip)

# Changes

63393ae Bump hyperopt from 0.2.3 to 0.2.4 bfc46c4 Bump torchvision from 0.5.0 to 0.6.0 660d831 Bump torch from 1.4.0 to 1.5.0 f21c85a Bump numpy from 1.18.2 to 1.18.3 e8d4866 Bump sphinx from 3.0.1 to 3.0.2 2cfbf20 RAdam fix for issue #96. (#103) df65965 Bump sphinx from 3.0.0 to 3.0.1 19408f5 Return type hint fixed for torch_optimizer.get (#101) 243509b Bump sphinx from 2.4.4 to 3.0.0 0320faa Raise exception if optimizer not found. Fix few mypy types. (#98) 05b2da5 Bump dev version

Source code(tar.gz)
Source code(zip)

# Changes

Source code(tar.gz)
Source code(zip)

# Changes

Source code(tar.gz)
Source code(zip)
• #### v0.0.1a9(Mar 4, 2020)

9b80b11 Better code coverage for PID optimizer (#68) a3f2fdc Change default values for yogi optimizer (#62) 564584e Add grad_clip to the example and re-tune p. for all methods (#67) 3391056 Add clamp_value (float) and debias (bool) parameters to LAMB optimizer (#65) 92d6818 Add PID optimizer (#66)

Source code(tar.gz)
Source code(zip)

• #### v0.0.1a1(Jan 22, 2020)

###### Bunch of optimizer implementations in PyTorch

Bunch of optimizer implementations in PyTorch

76 Jan 3, 2023
###### ocaml-torch provides some ocaml bindings for the PyTorch tensor library.

ocaml-torch provides some ocaml bindings for the PyTorch tensor library. This brings to OCaml NumPy-like tensor computations with GPU acceleration and tape-based automatic differentiation.

369 Jan 3, 2023
###### An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

2.9k Dec 27, 2022
###### Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

405 Nov 27, 2022
###### A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Torchmeta A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning bench

1.7k Jan 6, 2023
###### Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

1.1k Jan 4, 2023
###### A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

56 Sep 13, 2022
###### A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

878 Dec 30, 2022
###### Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

251 Dec 25, 2022
###### PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

12 Dec 19, 2021
###### Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

Pretrained models for Pytorch (Work in progress) The goal of this repo is: to help to reproduce research papers results (transfer learning setups for

8.7k Dec 31, 2022
###### Model summary in PyTorch similar to `model.summary()` in Keras

Keras style model.summary() in PyTorch Keras has a neat API to view the visualization of the model which is very helpful while debugging your network.

3.7k Dec 29, 2022
###### A PyTorch implementation of EfficientNet

EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor

7.2k Jan 6, 2023
###### The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News March 3: v0.9.97 has various bug fixes and improvements: Bug fixes for NTXentLoss Efficiency improvement for AccuracyCalculator, by using torch i

5k Jan 2, 2023
###### PyTorch Extension Library of Optimized Scatter Operations

PyTorch Scatter Documentation This package consists of a small extension library of highly optimized sparse update (scatter and segment) operations fo

1.2k Jan 7, 2023
###### PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

PyTorch Sparse This package consists of a small extension library of optimized sparse matrix operations with autograd support. This package currently

757 Jan 4, 2023
###### Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

1.8k Jan 6, 2023
###### higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

1.5k Jan 3, 2023