Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Overview

Ranger-Deep-Learning-Optimizer


Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead, and now GC (gradient centralization) in one optimizer.

quick note - Ranger21 is now in beta and is Ranger with a host of new improvements.

Recommend you compare results with Ranger21: https://github.com/lessw2020/Ranger21

Latest version 20.9.4 - updates Gradient Centralization to GC2 (thanks to GC developer) and removes addcmul_ deprecation warnings in PyTorch 1.60.



*Latest version is in ranger2020.py - looking at a few other additions before integrating into the main ranger.py.

What is Gradient Centralization? = "GC can be viewed as a projected gradient descent method with a constrained loss function. The Lipschitzness of the constrained loss function and its gradient is better so that the training process becomes more efficient and stable." Source paper: https://arxiv.org/abs/2004.01461v2
Ranger now uses Gradient Centralization by default, and applies it to all conv and fc layers by default. However, everything is customizable so you can test with and without on your own datasets. (Turn on off via "use_gc" flag at init).

Best training results - use a 75% flat lr, then step down and run lower lr for 25%, or cosine descend last 25%.


Per extensive testing - It's important to note that simply running one learning rate the entire time will not produce optimal results.
Effectively Ranger will end up 'hovering' around the optimal zone, but can't descend into it unless it has some additional run time at a lower rate to drop down into the optimal valley.

Full customization at init:


Ranger will now print out id and gc settings at init so you can confirm the optimizer settings at train time:

/////////////////////

Medium article with more info:
https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d

Multiple updates: 1 - Ranger is the optimizer we used to beat the high scores for 12 different categories on the FastAI leaderboards! (Previous records all held with AdamW optimizer).

2 - Highly recommend combining Ranger with: Mish activation function, and flat+ cosine anneal training curve.

3 - Based on that, also found .95 is better than .90 for beta1 (momentum) param (ala betas=(0.95, 0.999)).

Fixes: 1 - Differential Group learning rates now supported. This was fix in RAdam and ported here thanks to @sholderbach. 2 - save and then load may leave first run weights stranded in memory, slowing down future runs = fixed.

Installation

Clone the repo, cd into it and install it in editable mode (-e option). That way, these is no more need to re-install the package after modification.

git clone https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
cd Ranger-Deep-Learning-Optimizer
pip install -e . 

Usage

from ranger import Ranger  # this is from ranger.py
from ranger import RangerVA  # this is from ranger913A.py
from ranger import RangerQH  # this is from rangerqh.py

# Define your model
model = ...
# Each of the Ranger, RangerVA, RangerQH have different parameters.
optimizer = Ranger(model.parameters(), **kwargs)

Usage and notebook to test are available here: https://github.com/lessw2020/Ranger-Mish-ImageWoof-5

Citing this work

We recommend you use the following to cite Ranger in your publications:

@misc{Ranger,
  author = {Wright, Less},
  title = {Ranger - a synergistic optimizer.},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer}}
}
Comments
  • BUG: Module not added to package; not importable

    BUG: Module not added to package; not importable

    ranger currently cannot be used from a pip install because the ranger module was not added to the package. The package is entirely empty, resulting in the following error:

    $ pip install git+https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer.git@73811db2eb55e1e3e3b736177cafaebe4807d669
    [...]
    Installing collected packages: ranger
    Successfully installed ranger-0.0.1
    $ python
    >>> import ranger
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ModuleNotFoundError: No module named 'ranger'
    >>> from ranger import Ranger
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ModuleNotFoundError: No module named 'ranger'
    >>> import ranger.ranger
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ModuleNotFoundError: No module named 'ranger'
    

    This is resolved by using setuptools.findpackages to add the ranger module to the package.

    I also added the README.md contents as long_description in setup.py, and incremented the version number to 0.1.dev0. Using dev or dev0 as the patch number indicates that the version is unstable, i.e. there is a one-to-many mapping from the version number, 0.1.dev0, to the state of the codebase in the repository.

    I can split this out into multiple PRs if you'd prefer.

    opened by scottclowe 8
  • How to cite Ranger in a paper?

    How to cite Ranger in a paper?

    In my recent paper I used Ranger. I wish to give all the credit the author(s) deserves, but I'm not sure how to properly cite it? Currently I cited the medium article. Should I cite this github repo instead? Thanks.

    opened by askerlee 6
  • step_counter not set

    step_counter not set

    Hi, thanks for your work.

    I just plugged it into my model and found that step_counter was not set for all param_groups.

    I fixed it with this hack:

            #look ahead tracking and updating if latest batch = k
            for group,slow_weights in zip(self.param_groups,self.slow_weights):
                if 'step_counter' not in group:
                    group["step_counter"] = 0
    

    but I suspect it's not optimal... this would mean that self.param_groups changed between the constructor and step(), but I have no idea why. Have you seen something similar before?

    Thanks

    opened by m-toman 5
  • Making it a python package

    Making it a python package

    Would you like to make this a python package that could be installed with pip? It would be more practical.

    I'd like to include it in my repo asteroid and give you proper credit for it.

    One way is to install a python package (I can make a PR for that), the other one would be to copy-paste some of the code and point to the license file. Which way would you prefer?

    opened by mpariente 4
  • Do we need some kind of Learning rate decay with Ranger?

    Do we need some kind of Learning rate decay with Ranger?

    For AdamW people usually add some sort of learning rate decay: linear, cosine triangle, etc. Also, warm up steps are also popular.

    Do we need all of these with Ranger or just use a fixed learning rate?

    opened by avostryakov 3
  • Is there a publication of Ranger?

    Is there a publication of Ranger?

    I want to cite ranger on a Medium article and I would like to know if there is an arXiv publication of Ranger or a published peer-reviewed paper on some conference or journal.

    I saw you linked a paper o the README.md, but it does not seem to be about ranger, as the very word does not appear in any part of it. I know the Radam and Lookahead paper, but the Ranger one is missing on my library. Thanks

    opened by nuzrub 2
  • Make Ranger a python package

    Make Ranger a python package

    As discussed in #20, it would be really practical to have these optimizers in a package. This PR makes that possible. ranger can be installed and imported in all python projects, no need to copy paste the ranger optimizer anymore. This also allows to give you proper credit where it is used (I'll add it in the requirements in asteroid for example).

    I also updated the README with the install and usage instruction. You can clone the repo, then install it with pip in editable mode (pretty practical for research) or not.

    Note : In the __init__.py I imported the three optimizers so that they can be imported from the package directly (from ranger import RangerQH instead of from ranger.rangerqh import RangerQH). Both ranger.py and ranger913A.py had the same name for the class, so I changed the class in ranger913A.py to RangerVA (for versionA).

    I'd like to here from you, this would be very practical to have it as a package.

    opened by mpariente 2
  • N_sma_threshhold should be instance variable

    N_sma_threshhold should be instance variable

    Thank you for the great implementation. I think I found a small part to modify at ranger.py line 116.

    original code: if N_sma > N_sma_threshhold:

    to be left: if N_sma > self.N_sma_threshhold:

    opened by ohmorimori 2
  • N_sma_threshhold

    N_sma_threshhold

    You first have if N_sma > self.N_sma_threshhold:

    and then you have if N_sma > 4:

    Is it right that the second one is constant or should that also be N_sma_threshhold parameter?

    opened by kayuksel 2
  • Let's revolutionize the AI research field

    Let's revolutionize the AI research field

    Hi, I have a dream and I'll try to share it to you.

    But before explaining further, I'll need your brain to analyze this input and output me what you think about it!

    Small rant on the inertia of AI research

    First of all, thank you for advancing progress in deep learning.

    I'm just a random guy that want to implement an AGI (lol) and like many Nlp engeeners, I need HIGHLY accurate neural networks for fundamental NLP tasks (e.g POS tag, NER, dep parsing, Coref resolution, WSD, etc) They are all not very accurate (often sub 95% F1 score) and their errors add up.

    Such limitations make Nlp not yet suitable for many things. This is why improving the state of the art (which can be observed on paperswithcode.com) is a crucial priority from academicians.

    Effectively, many researchers have smart ideas to improve the state of the art and often slightly improve it by: Having a "standard neural network" for the task and mix with it their new fancy idea.

    I talk from knowledge, I've read most papers from state of the art leaderboards from most fundamental NLP tasks. Almost always they have this common baseline + one idea, theirs. The common baseline sometimes slowly evolve (e.g now it's often a pre trained model (say BERT) + fine tuning + their idea.

    Sorry to say, but "this" is to me retarded Where "this" mean the fact that by far, most researchers work in isolation, not integrating others ideas (or with such a slow inertia). I would have wished that state of the art in one Nlp task would be a combination of e.g 50 innovative and complementary ideas from researchers. You are researchers, do you have an idea why that is the case? If someone actually tried to merge all good complementary and compatible ideas, would they have the best, unmatchable state of the art? Why facebookresearch, Microsoft, Google don't try the low hanging fruit in addition to producing X new shiny ideas per month, actually try to merge them in a coherent, synergetic manner?? I would like you to tell me what you think of this major issue that slow AI progress.

    As an example of such inertia let's talk about Swish, Mish or RAdam : Those things are incredibly easy to try and see "hey does it give to my neural network free accuracy gains?" Yet not any paper on state of the art leaderboards has tried Swish, Mish or RAdam despite being soo simple to try (you don't need to change the neural network) Not even pre trained models where so many papers depend on them (I opened issues for each of them).

    Once I know what you think about this research inertia, I'll explain my vision of what needs to be done to fix it.

    opened by LifeIsStrange 2
  • Not working using cuda

    Not working using cuda

    Variables self.slow_weights are always on cpu. You can easily fix this by adding a .to() method in Ranger class like so:

    def to(self, device):    
        if device is "cuda":
            for i in range(len(self.slow_weights)):
                for j, w in enumerate(self.slow_weights[i]):
                    self.slow_weights[i][j] = w.cuda()
        elif device is "cpu":
            for i in range(len(self.slow_weights)):
                for j, w in enumerate(self.slow_weights[i]):
                    self.slow_weights[i][j] = w.cpu()
    
    opened by Fable67 2
  • Collate pip package so that it picks up from main repo.

    Collate pip package so that it picks up from main repo.

    Actually, there is a pip package but it is based out of a fork of this repo. I think it would make sense to collate this effort to the main repo.

    Originally posted by @sarthakpati in https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/issues/33#issuecomment-821314754

    opened by sarthakpati 2
  • Please note in the documentation (or in the constructor) that closures must be enabled

    Please note in the documentation (or in the constructor) that closures must be enabled

    Hi,

    I had today a relatively long debug session, after I've upgraded my Pytorch Lightning installation, that the training_step wasn't called.

    It finally turned out, that the problem was that the "closure" argument is not used in the step function (it is commented out - as also noted in the source code).

    However, as it is apparently required by some libraries and is also recommended by the official PyTorch guidelines, it would be great if it would be better documented, that people might need to enable these lines.

    Thanks in advance.

    opened by ABotond 0
  • This overload of addcmul_ is deprecated:         addcmul_(Number value, Tensor tensor1, Tensor tensor2)

    This overload of addcmul_ is deprecated: addcmul_(Number value, Tensor tensor1, Tensor tensor2)

    I get the following warning when using ranger with pytorch 1.6.0

    /path/Ranger-Deep-Learning-Optimizer/ranger/ranger.py:138: UserWarning: This overload of addcmul_ is deprecated:
            addcmul_(Number value, Tensor tensor1, Tensor tensor2)
    Consider using one of the following signatures instead:
            addcmul_(Tensor tensor1, Tensor tensor2, *, Number value) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
      exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
    
    opened by neuronflow 5
  • RangerVA with GC

    RangerVA with GC

    Hello,

    Thank you for your work on these optimizers btw. I was testing a couple out and was performing quite well with the RangerVA originally. Then, when your gradient centralization was added I got further improvements but it also seemed to be overtraining the train set more easily despite using the same parameters. Therefore, I tried to implement combining the gradient centralization into the RangerVA algorithm and so far it seems to be performing quite well and faster since it seems I can use larger batch sizes. I was wondering if you could quickly check, whenever you have some free time, if I implemented correctly in the code below since you are so used to this optimizer.

    Best

    `` class RangerVA(Optimizer):

    def __init__(self, params, lr=1e-3, 
                 alpha=0.5, k=6, n_sma_threshhold=5, betas=(.95,0.999), 
                 eps=1e-5, weight_decay=0, amsgrad=True, transformer='softplus', smooth=50,
                 grad_transformer='square',use_gc=True, gc_conv_only=False):
        #parameter checks
        if not 0.0 <= alpha <= 1.0:
            raise ValueError(f'Invalid slow update rate: {alpha}')
        if not 1 <= k:
            raise ValueError(f'Invalid lookahead steps: {k}')
        if not lr > 0:
            raise ValueError(f'Invalid Learning Rate: {lr}')
        if not eps > 0:
            raise ValueError(f'Invalid eps: {eps}')
    
        #prep defaults and init torch.optim base
        defaults = dict(lr=lr, alpha=alpha, k=k, step_counter=0, betas=betas, 
                        n_sma_threshhold=n_sma_threshhold, eps=eps, weight_decay=weight_decay,
                        smooth=smooth, transformer=transformer, grad_transformer=grad_transformer,
                       amsgrad=amsgrad,use_gc=use_gc, gc_conv_only=gc_conv_only )
        super().__init__(params,defaults)
    
        #adjustable threshold
        self.n_sma_threshhold = n_sma_threshhold   
    
        #look ahead params
        self.alpha = alpha
        self.k = k 
    
        #radam buffer for state
        self.radam_buffer = [[None,None,None] for ind in range(10)]
        
        #gc on or off
        self.use_gc=use_gc
        #level of gradient centralization
        self.gc_gradient_threshold = 3 if gc_conv_only else 1
        print(f"Ranger optimizer loaded. \nGradient Centralization usage = {self.use_gc}")
        if (self.use_gc and self.gc_gradient_threshold==1):
            print(f"GC applied to both conv and fc layers")
        elif (self.use_gc and self.gc_gradient_threshold==3):
            print(f"GC applied to conv layers only")
    
    
    def __setstate__(self, state):
        print("set state called")
        super(RangerVA, self).__setstate__(state)
    
    
    def step(self, closure=None):
        loss = None
        #Evaluate averages and grad, update param tensors
        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data.double()
                if grad.is_sparse:
                    raise RuntimeError('Ranger optimizer does not support sparse gradients')
                
                amsgrad = group['amsgrad']
                smooth = group['smooth']
                grad_transformer = group['grad_transformer']
    
                p_data_fp32 = p.data.double()
    
                state = self.state[p]  #get state dict for this param
    
                if len(state) == 0:   
                    state['step'] = 0
                    state['exp_avg'] = torch.zeros_like(p_data_fp32)
                    state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)
                    if amsgrad:
                        # Maintains max of all exp. moving avg. of sq. grad. values
                        state['max_exp_avg_sq'] = torch.zeros_like(p.data)                    
    
                    #look ahead weight storage now in state dict 
                    state['slow_buffer'] = torch.empty_like(p.data)
                    state['slow_buffer'].copy_(p.data)
    
                else:
                    state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
                    state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)
                                      
    
                #begin computations 
                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2 = group['betas']
                if amsgrad:
                    max_exp_avg_sq = state['max_exp_avg_sq']  
                    # Maintains the maximum of all 2nd moment running avg. till now
                    torch.max(max_exp_avg_sq, exp_avg_sq, out=max_exp_avg_sq)
                    # Use the max. for normalizing running avg. of gradient
                    denomc = max_exp_avg_sq.clone()
                else:
                    denomc = exp_avg_sq.clone()
                #GC operation for Conv layers and FC layers       
                if grad.dim() > self.gc_gradient_threshold:                    
                    grad.add_(-grad.mean(dim = tuple(range(1,grad.dim())), keepdim = True))
    
                state['step'] += 1              
    
                #compute variance mov avg
                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
                #compute mean moving avg
                exp_avg.mul_(beta1).add_(1 - beta1, grad)
                buffered = self.radam_buffer[int(state['step'] % 10)]
                if state['step'] == buffered[0]:
                    N_sma, step_size = buffered[1], buffered[2]
                else:
                    buffered[0] = state['step']
                    beta2_t = beta2 ** state['step']
                    N_sma_max = 2 / (1 - beta2) - 1
                    N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)
                    buffered[1] = N_sma
                    if N_sma > self.n_sma_threshhold:
                        step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
                    else:
                        step_size = 1.0 / (1 - beta1 ** state['step'])
                    buffered[2] = step_size
    
                
                ##transformer
                if grad_transformer == 'square':
                    grad_tmp = grad**2
                    denomc.sqrt_() 
                elif grad_transformer == 'abs':
                    grad_tmp = grad.abs()
    
    
                exp_avg_sq.mul_(beta2).add_((1 - beta2)*grad_tmp)
    
                if group['weight_decay'] != 0:
                    p_data_fp32.add_(-group['weight_decay'] * group['lr'], p_data_fp32)
                bias_correction1 = 1 - beta1 ** state['step']
                bias_correction2 = 1 - beta2 ** state['step']
                step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1                
    
                
                # ...let's use calibrated alr 
                if N_sma > self.n_sma_threshhold:
                    if  group['transformer'] =='softplus':
                        sp = torch.nn.Softplus( smooth)
                        denomf = sp( denomc)
                        p_data_fp32.addcdiv_(-step_size, exp_avg, denomf )
                    else:
                        denom = exp_avg_sq.sqrt().add_(group['eps'])
                        p_data_fp32.addcdiv_(-step_size * group['lr'], exp_avg, denom)
                else:
                    p_data_fp32.add_(-step_size * group['lr'], exp_avg)
                p.data.copy_(p_data_fp32)
    
                #integrated look ahead...
                #we do it at the param level instead of group level
                if state['step'] % group['k'] == 0:
                    slow_p = state['slow_buffer'] #get access to slow param tensor
                    slow_p.add_(self.alpha, p.data - slow_p)  #(fast weights - slow weights) * alpha
                    p.data.copy_(slow_p)  #copy interpolated weights to RAdam param tensor
    
        return loss
    
    opened by ryancinsight 0
Owner
Less Wright
Principal Software Engineer at Audere PM/Test/Dev at Microsoft Software Architect at X10 Wireless
Less Wright
Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

Gradient Centralization TensorFlow This Python package implements Gradient Centralization in TensorFlow, a simple and effective optimization technique

Rishit Dagli 101 Nov 1, 2022
Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

KnowPrompt Code and datasets for our paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction" Requireme

ZJUNLP 137 Dec 31, 2022
With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function. At the moment, only TensorFlow sequential models are supported. Interfaces to either the Pyomo or Gurobi modeling environments are offered.

ChemEngAI 40 Dec 27, 2022
How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

AdamBNN This is the pytorch implementation of our paper "How Do Adam and Training Strategies Help BNNs Optimization?", published in ICML 2021. In this

Zechun Liu 47 Sep 20, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
AdamW optimizer and cosine learning rate annealing with restarts

AdamW optimizer and cosine learning rate annealing with restarts This repository contains an implementation of AdamW optimization algorithm and cosine

Maksym Pyrozhok 133 Dec 20, 2022
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

NNAISENSE 56 Jan 1, 2023
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
auto-tuning momentum SGD optimizer

YellowFin YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measure

Jian Zhang 288 Nov 19, 2022
Apollo optimizer in tensorflow

Apollo Optimizer in Tensorflow 2.x Notes: Warmup is important with Apollo optimizer, so be sure to pass in a learning rate schedule vs. a constant lea

Evan Walters 1 Nov 9, 2021
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

null 14 Sep 13, 2022
DeepOBS: A Deep Learning Optimizer Benchmark Suite

DeepOBS - A Deep Learning Optimizer Benchmark Suite DeepOBS is a benchmarking suite that drastically simplifies, automates and improves the evaluation

Aaron Bahde 7 May 12, 2020
An Implicit Function Theorem (IFT) optimizer for bi-level optimizations

iftopt An Implicit Function Theorem (IFT) optimizer for bi-level optimizations. Requirements Python 3.7+ PyTorch 1.x Installation $ pip install git+ht

The Money Shredder Lab 2 Dec 2, 2021
AdamW optimizer for bfloat16 models in pytorch.

Image source AdamW optimizer for bfloat16 models in pytorch. Bfloat16 is currently an optimal tradeoff between range and relative error for deep netwo

Alex Rogozhnikov 8 Nov 20, 2022
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

Zaher Mousa 1 Feb 13, 2022
ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

Katherine Crowson 53 Dec 29, 2022
This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Trajectory Prediction using Equivariant Continuous Convolution (ECCO) This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivar

Spatiotemporal Machine Learning 45 Jul 22, 2022