Riemannian Adaptive Optimization Methods with pytorch optim

Overview

geoopt

Python Package Index Documentation Status Build Status Coverage Status Codestyle Black Gitter

Manifold aware pytorch.optim.

Unofficial implementation for “Riemannian Adaptive Optimization Methods” ICLR2019 and more.

Installation

Make sure you have pytorch>=1.9.0 installed

There are two ways to install geoopt:

  1. GitHub (preferred so far) due to active development
pip install git+https://github.com/geoopt/geoopt.git
  1. pypi (this might be significantly behind master branch)
pip install geoopt

The preferred way to install geoopt will change once stable project stage is achieved. Now, pypi is behind master as we actively develop and implement new features.

PyTorch Support

Geoopt officially supports 2 latest stable versions (1.9.0 so far) of pytorch upstream or the latest major release. We also test (TODO: there were complications with github workflows, need help) against the nightly build, but do not be 100% sure about compatibility. As for older pytorch versions, you may use it on your own risk (do not forget to run tests).

What is done so far

Work is in progress but you can already use this. Note that API might change in future releases.

Tensors

  • geoopt.ManifoldTensor – just as torch.Tensor with additional manifold keyword argument.
  • geoopt.ManifoldParameter – same as above, recognized in torch.nn.Module.parameters as correctly subclassed.

All above containers have special methods to work with them as with points on a certain manifold

  • .proj_() – inplace projection on the manifold.
  • .proju(u) – project vector u on the tangent space. You need to project all vectors for all methods below.
  • .egrad2rgrad(u) – project gradient u on Riemannian manifold
  • .inner(u, v=None) – inner product at this point for two tangent vectors at this point. The passed vectors are not projected, they are assumed to be already projected.
  • .retr(u) – retraction map following vector u
  • .expmap(u) – exponential map following vector u (if expmap is not available in closed form, best approximation is used)
  • .transp(v, u) – transport vector v with direction u
  • .retr_transp(v, u) – transport self, vector v (and possibly more vectors) with direction u (returns are plain tensors)

Manifolds

  • geoopt.Euclidean – unconstrained manifold in R with Euclidean metric
  • geoopt.Stiefel – Stiefel manifold on matrices A in R^{n x p} : A^t A=I, n >= p
  • geoopt.Sphere - Sphere manifold ||x||=1
  • geoopt.BirkhoffPolytope - manifold of Doubly Stochastic matrices
  • geoopt.Stereographic - Constant curvature stereographic projection model
  • geoopt.SphereProjection - Sphere stereographic projection model
  • geoopt.PoincareBall - Poincare ball model
  • geoopt.Lorentz - Hyperboloid model
  • geoopt.ProductManifold - Product manifold constructor
  • geoopt.Scaled - Scaled version of the manifold. Similar to Learning Mixed-Curvature Representations in Product Spaces if combined with ProductManifold
  • geoopt.SymmetricPositiveDefinite - SPD matrix manifold
  • geoopt.UpperHalf - Siegel Upper half manifold. Supports Riemannian and Finsler metrics, as in Symmetric Spaces for Graph Embeddings: A Finsler-Riemannian Approach.
  • geoopt.BoundedDomain - Siegel Bounded domain manifold. Supports Riemannian and Finsler metrics.

All manifolds implement methods necessary to manipulate tensors on manifolds and tangent vectors to be used in general purpose. See more in documentation.

Optimizers

  • geoopt.optim.RiemannianSGD – a subclass of torch.optim.SGD with the same API
  • geoopt.optim.RiemannianAdam – a subclass of torch.optim.Adam

Samplers

  • geoopt.samplers.RSGLD – Riemannian Stochastic Gradient Langevin Dynamics
  • geoopt.samplers.RHMC – Riemannian Hamiltonian Monte-Carlo
  • geoopt.samplers.SGRHMC – Stochastic Gradient Riemannian Hamiltonian Monte-Carlo

Citing Geoopt

If you find this project useful in your research, please kindly add this bibtex entry in references and cite.

@misc{geoopt2020kochurov,
    title={Geoopt: Riemannian Optimization in PyTorch},
    author={Max Kochurov and Rasul Karimov and Serge Kozlukov},
    year={2020},
    eprint={2005.02819},
    archivePrefix={arXiv},
    primaryClass={cs.CG}
}
Comments
  • Line search

    Line search

    I made a Riemannian line search optimizer with strong Wolfe conditions.

    It's not yet perfect. I think it makes some redundant calls to closure during stepping, and when it's close to a local minimum it suffers from numerical errors and can sometimes take strange steps

    opened by RikVoorhaar 26
  • Poincare ball model

    Poincare ball model

    Huray, we are ready to start.

    image Image from here

    Interesting reading I've done so far:

    Hyperbolic Networks Hyperbolic Entailment Cones for Learning Hierarchical Embeddings Poincaré GloVe: Hyperbolic Word Embeddings

    Some implementation takeouts (mostly from here):

    • Project results of all operations in the ball of radius 1 − eps, where eps = 10^{-5}
    • Numerical errors also appear when hyperbolic vectors get closer to 0, perturb with eps = 10^{-15}
    • Pass clipped to [-15, 15] input to tanh and clip tanh^{-1} to [-1+eps, 1-eps]

    CC @leuchine!

    opened by ferrine 23
  • StereographicProductManifold to use gyrovector space functions in product manifolds

    StereographicProductManifold to use gyrovector space functions in product manifolds

    Hi,

    I subclassed the ProductManifold and created a StereographicProductManifold. Here I added the functions dist2plane, expmap0and mobius_add by calling the respective functions in the underlying Stereographic manifolds. I added some test functions for this. Additionally, I added wrapped_normal to Stereographic as an alternative random function and I added scipy to the requirements in setup.py as mentioned in #161

    opened by gatoniel 20
  • $c$ counter-inuitively stands for **negative** curvature

    $c$ counter-inuitively stands for **negative** curvature

    Currently the c parameter in PoincareBall defines negative curvature though it is most natural to assume c to be just curvature. That is unconventional, misleading, and is likely to cause inconsistencies some time later

    opened by SomeoneSerge 17
  • How to properly use ManifoldParameter

    How to properly use ManifoldParameter

    I currently use the following piece of code:

    if self.init_id:
        init = torch.eye(input_dims[0]).view((input_dims[0], input_dims[0], 1, 1))
    else:
        init = torch.randn(input_dims[0], input_dims[0])# + input_dims[0]*torch.eye(input_dims[0])
    self.orth_w = geoopt.ManifoldParameter(init, manifold=stiefel_man)
    

    but if init is random, it won't be on the stiefel-manifold! So i tried self.orth_w.proj_(), but it tells me:

    RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

    I can guard the .proj_() with a with torch.no_grad(), but I don't know whether this is the right approach.

    I just want to do a straightforward matrix-multiplication on the forward pass and want geoopt to not leave the stiefel-manifold during my training.

    opened by LeanderK 12
  • Hyperboloid

    Hyperboloid

    There are some numerical issues, and I'm starting to think that there is no easy solution for it at this point. Maybe something like this paper should be investigated more.

    • [x] Add curvature
    • [x] Add tests
    • [x] Add an example with hgnn/attention [example with basic usage]
    • [x] Add docs

    Example with optimization should be added on the separate PR

    opened by rrkarim 11
  • Not working properly with CUDA

    Not working properly with CUDA

    I am working with a model similar to the example below:

    class Model(nn.Module):
        def __init__(self, word2vec):
            super(Model, self).__init__()
            self.word_lut = gt.ManifoldParameter(word2vec, manifold=gt.PoincareBall())
    

    When I move the model to CUDA, and run it there is a problem during the optimization (See full traceback below [1]): RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'other'

    The reason of the error is that when I move the model to CUDA, the tensor in the ManifoldParameter is moved, but its manifold is not, therefore the attribute tensor self.c of the manifold is still allocated in the CPU.

    I solved it doing this:

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")    
    for p in model.parameters():
        if isinstance(p, (ManifoldParameter, ManifoldTensor)):
            p.manifold.to(device)
    

    I don't know if issue #49 was an attempt to solve this and I didn't moved the model in the proper way to CUDA or this is not completely documented. In case that the explanation on how to do this properly exists and I couldn't find it, my apologies for creating this issue.

    [1] Full traceback

    Traceback (most recent call last):
      File "./train.py", line 122, in <module>
        main()
      File "./train.py", line 113, in main
        coach.train()
      File "/home/lopezfo/projects/hyfi/hyfi/Coach.py", line 43, in train
        train_loss = self.train_epoch(epoch)
      File "/home/lopezfo/projects/hyfi/hyfi/Coach.py", line 99, in train_epoch
        self.optim.step()
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/optim/radam.py", line 145, in step
        state["exp_avg_sq"],
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/optim/tracing.py", line 34, in partial
        step(manifold, *args, **kwargs)
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/optim/radam.py", line 191, in perform_step
        point, -step_size * direction, exp_avg
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/manifolds/poincare/__init__.py", line 110, in retr_transp
        y = self.retr(x, u)
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/manifolds/poincare/__init__.py", line 69, in retr
        return math.project(approx, c=self.c)
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/manifolds/poincare/math.py", line 78, in project
        return _project(x, c, dim, eps)
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/manifolds/poincare/math.py", line 86, in _project
        cond = norm > maxnorm
    RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'other'
    
    opened by fedelopez77 11
  • implementation of SPD manifolds

    implementation of SPD manifolds

    I'm trying to implement the SPD manifolds, which is widely used in many papers.

    Current commits have implemented all abstract methods of base.Manifold, while more method implementations are on the way. Some test has been done locally and constructions of the testing module are also on the way.

    Progress

    • [x] move symmetric matrix operations to batch_linalg
    • [x] keepdims functionality for inner and _stein_metric.
    • [x] mention in documentation about SPD manifolds.
    • [x] implementation for random and origin
    • [x] manifold test module.
      • [x] shape case for a symmetric positive-definite matrix.
      • [x] test basic operation of SPD manifolds.
      • [x] simple optimization problem on SPD manifolds.
    • [ ] mention the PR in the CHANGELOG.

    Reference Implementation

    Some paper using SPD Manifolds

    • Computationally Tractable Riemannian Manifolds for Graph Embeddings [arxiv]
    • A Riemannian Network for SPD Matrix Learning [arxiv]

    All suggestions will be accepted with an open mind. Hoping this pull request will be merged when all works have done.

    enhancement 
    opened by tao-harald 10
  • geoopt.optim.RiemannianSGD does not work with Distributed Data Parallel

    geoopt.optim.RiemannianSGD does not work with Distributed Data Parallel

    First of all, thank you for this library!

    Description of the bug

    When training with Distributed Data Parallel (DDP), the gradient between different devices is not correctly synchronized when using RiemannianSGD (or RiemannianAdam). Replacing it with a standard torch.optim.SGD works well. Note that when using DDP the gradient is synchronized during .backprop() (see this link).

    To Reproduce

    Simple code training on ImageNet:

    import os
    
    import geoopt
    import torch
    import torch.distributed
    import torch.multiprocessing as mp
    import torchvision
    import torchvision.models as models
    from torch.nn.parallel import DistributedDataParallel as DDP
    from torch.utils import data
    from torchvision import transforms
    
    
    def process_ddp(master_port, local_rank, world_size):
    
        os.environ['MASTER_ADDR'] = 'localhost'
        os.environ['MASTER_PORT'] = str(master_port)
        torch.cuda.set_device(local_rank)
        device = torch.device("cuda", local_rank)
        torch.distributed.init_process_group("nccl", rank=local_rank, world_size=world_size, init_method='env://')
        assert world_size == torch.distributed.get_world_size()
    
        return device
    
    
    def main(local_rank, world_size):
    
        path_dataset = '/path/to/ImageNet'  # Any other dataset should result in a similar behavior
        master_port = 9999
    
        device = process_ddp(master_port, local_rank, world_size)
    
        model = models.resnet18()
        model = model.to(device)
    
        # optimizer = geoopt.optim.RiemannianSGD(model.parameters(), lr=0.1, stabilize=10)
        optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
    
        # Data parallelization
        model = DDP(model, device_ids=[local_rank], output_device=local_rank)
    
        # Prepare dataset
        transform = transforms.Compose([
            transforms.CenterCrop(size=256),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ])
        dataset = torchvision.datasets.ImageNet(split='train', root=path_dataset, transform=transform)
        sampler = torch.utils.data.distributed.DistributedSampler(dataset, shuffle=True)
        data_loader = torch.utils.data.DataLoader(dataset, batch_size=4, sampler=sampler, shuffle=False, num_workers=8)
    
        # Train part of the first epoch
        model.train()
    
        for idx, (images, labels) in enumerate(data_loader):
            if idx >= 10:
                break
            images = images.to(device)
            labels = labels.to(device)
    
            with torch.set_grad_enabled(True):
                features = model(images)
                loss = torch.nn.functional.cross_entropy(features, labels)
    
            loss.backward()
            print(f'grad iteration {idx} on gpu {device}: {model.module.conv1.weight.grad.mean()}', flush=True)
            optimizer.step()
            optimizer.zero_grad()
            print(f'weight iteration {idx} on gpu {device}: {model.module.conv1.weight.mean()}', flush=True)
    
        # cleanup
        torch.distributed.destroy_process_group()
    
    
    if __name__ == '__main__':
        world_size_main = torch.cuda.device_count()
        mp.spawn(main,
                 args=(world_size_main,),
                 nprocs=world_size_main,
                 join=True)
    

    In order to run, use: CUDA_VISIBLE_DEVICES=0,1 python run.py

    Expected behavior

    The expected behavior is the one that occurs when the line optimizer = torch.optim.SGD(model.parameters(), lr=0.1) is uncommented, and the line optimizer = geoopt.optim.RiemannianSGD(model.parameters(), lr=0.1, , stabilize=10) is commented. In that case, the output is:

    grad iteration 0 on gpu cuda:1: 0.011769304051995277
    grad iteration 0 on gpu cuda:0: 0.011769304051995277
    weight iteration 0 on gpu cuda:1: -0.001249525579623878
    weight iteration 0 on gpu cuda:0: -0.001249525579623878
    grad iteration 1 on gpu cuda:0: -0.015764284878969193
    grad iteration 1 on gpu cuda:1: -0.015764284878969193
    weight iteration 1 on gpu cuda:1: 0.0003269027511123568
    weight iteration 1 on gpu cuda:0: 0.0003269027511123568
    grad iteration 2 on gpu cuda:1: -0.006310341879725456
    grad iteration 2 on gpu cuda:0: -0.006310341879725456
    weight iteration 2 on gpu cuda:1: 0.000957937038037926
    weight iteration 2 on gpu cuda:0: 0.000957937038037926
    grad iteration 3 on gpu cuda:0: 0.0021547293290495872
    grad iteration 3 on gpu cuda:1: 0.0021547293290495872
    weight iteration 3 on gpu cuda:1: 0.000742464151699096
    weight iteration 3 on gpu cuda:0: 0.000742464151699096
    grad iteration 4 on gpu cuda:1: -0.002606849418953061
    grad iteration 4 on gpu cuda:0: -0.002606849418953061
    weight iteration 4 on gpu cuda:1: 0.001003148965537548
    weight iteration 4 on gpu cuda:0: 0.001003148965537548
    grad iteration 5 on gpu cuda:1: 0.00043087091762572527
    grad iteration 5 on gpu cuda:0: 0.00043087091762572527
    weight iteration 5 on gpu cuda:1: 0.0009600619086995721
    weight iteration 5 on gpu cuda:0: 0.0009600619086995721
    grad iteration 6 on gpu cuda:0: 0.00014396056940313429
    grad iteration 6 on gpu cuda:1: 0.00014396056940313429
    weight iteration 6 on gpu cuda:1: 0.0009456658735871315
    weight iteration 6 on gpu cuda:0: 0.0009456658735871315
    grad iteration 7 on gpu cuda:1: -0.002603260101750493
    grad iteration 7 on gpu cuda:0: -0.002603260101750493
    weight iteration 7 on gpu cuda:1: 0.001205991953611374
    weight iteration 7 on gpu cuda:0: 0.001205991953611374
    grad iteration 8 on gpu cuda:0: 0.000458348571555689
    grad iteration 8 on gpu cuda:1: 0.000458348571555689
    weight iteration 8 on gpu cuda:0: 0.0011601571459323168
    weight iteration 8 on gpu cuda:1: 0.0011601571459323168
    grad iteration 9 on gpu cuda:1: -0.0004215179360471666
    grad iteration 9 on gpu cuda:0: -0.0004215179360471666
    weight iteration 9 on gpu cuda:1: 0.0012023089220747352
    weight iteration 9 on gpu cuda:0: 0.0012023089220747352
    

    The gradients in the two GPUs are correctly synchronized. However, when using RiemannianSGD, the output is:

    grad iteration 0 on gpu cuda:1: 0.0035285688936710358
    grad iteration 0 on gpu cuda:0: 0.0035285688936710358
    weight iteration 0 on gpu cuda:0: -7.928906597953755e-06
    weight iteration 0 on gpu cuda:1: -7.928906597953755e-06
    grad iteration 1 on gpu cuda:0: -0.04444637894630432
    grad iteration 1 on gpu cuda:1: 0.002550020581111312
    weight iteration 1 on gpu cuda:0: 0.00018905977776739746
    weight iteration 1 on gpu cuda:1: -2.1470928913913667e-05
    grad iteration 2 on gpu cuda:0: 0.009863540530204773
    grad iteration 2 on gpu cuda:1: 0.0026304360944777727
    weight iteration 2 on gpu cuda:0: 0.00013691956701222807
    weight iteration 2 on gpu cuda:1: -3.7374120438471437e-05
    grad iteration 3 on gpu cuda:0: -0.0161017756909132
    grad iteration 3 on gpu cuda:1: -0.0023103044368326664
    weight iteration 3 on gpu cuda:0: 0.0002405263076070696
    weight iteration 3 on gpu cuda:1: -2.4383840354857966e-05
    grad iteration 4 on gpu cuda:0: 0.010763526894152164
    grad iteration 4 on gpu cuda:1: -0.017034146934747696
    weight iteration 4 on gpu cuda:0: 0.00017218326684087515
    weight iteration 4 on gpu cuda:1: 7.665574958082289e-05
    grad iteration 5 on gpu cuda:0: 0.008465449325740337
    weight iteration 5 on gpu cuda:0: 0.00012061965389875695
    grad iteration 5 on gpu cuda:1: 0.0011690922547131777
    weight iteration 5 on gpu cuda:1: 6.942617619642988e-05
    grad iteration 6 on gpu cuda:0: 0.0013559082290157676
    weight iteration 6 on gpu cuda:0: 0.00011242596519878134
    grad iteration 6 on gpu cuda:1: 0.0008932517375797033
    weight iteration 6 on gpu cuda:1: 6.43157254671678e-05
    grad iteration 7 on gpu cuda:0: 0.02651313878595829
    weight iteration 7 on gpu cuda:0: -1.233588955074083e-05
    grad iteration 7 on gpu cuda:1: -0.007853103801608086
    weight iteration 7 on gpu cuda:1: 0.00010782096069306135
    grad iteration 8 on gpu cuda:0: 0.009321866557002068
    weight iteration 8 on gpu cuda:0: -7.130965968826786e-05
    grad iteration 8 on gpu cuda:1: -0.0039948648773133755
    grad iteration 9 on gpu cuda:0: -0.0119229881092906
    weight iteration 8 on gpu cuda:1: 0.00013168319128453732
    weight iteration 9 on gpu cuda:0: -1.202533780997328e-06
    grad iteration 9 on gpu cuda:1: 0.002446404891088605
    weight iteration 9 on gpu cuda:1: 0.00011651107342913747
    

    There is some problem with the gradient synchronization, which causes the weights in the two devices to diverge.

    Library version information:

    • python -c 'import torch;print("torch:", torch.version.__version__, end=" ");print("cuda:", torch.version.cuda)' torch: 1.8.1 cuda: 11.1

    • the way you installed geoopt, github, pip pip

    • OS Ubuntu 18.04.5 LTS

    EDIT: I simplified a little bit the code by removing mixed precision.

    bug 
    opened by surisdi 9
  • Regression to points in Poincarè Disk Model

    Regression to points in Poincarè Disk Model

    Hi,

    I'm building neural network in pytorch which has to learn to make regression from vectors in Euclidean Space to some vectors in Poincarè Disk Model, so I think that the usage of RiemannianSGD can be a good choice as the optimizer.

    I'm trying to use the library but I have some question:

    • When and how I have to cast tensors? Now I transform the target tensors with ManifoldTensor with Y = ManifoldTensor(Y, manifold = geoopt.manifolds.PoincareBall(), I think that even the prediction has to be a Manifold Tensor, but if I transform the prediction outside the model the ManifoldTensor does not maintain the grad value

    • Have I to build each layer of the network with ManifoldTensor or torch.nn.Linear can be used?

    • Have I to specify to the optimizer RiemannianSGD the manifold? From the documentation I think that I don't have to do it

    Anyway thanks for your work on this project ; - )

    opened by NooneBug 8
  • Error on import geoopt

    Error on import geoopt

    Hi,

    I recently updated the pytorch environment library and so I installed again geoopt via : pip install git+https://github.com/geoopt/geoopt.git

    When I try to import the library this is the error that is reported to me:

    
      import geoopt
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/__init__.py", line 1, in <module>
        from . import manifolds
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/manifolds/__init__.py", line 3, in <module>
        from .stiefel import Stiefel, EuclideanStiefel, CanonicalStiefel, EuclideanStiefelExact
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/manifolds/stiefel.py", line 4, in <module>
        from .. import linalg
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/linalg/__init__.py", line 1, in <module>
        from .batch_linalg import svd, qr, sym, extract_diag, matrix_rank, expm, block_matrix
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/linalg/batch_linalg.py", line 3, in <module>
        from . import _expm
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/linalg/_expm.py", line 8, in <module>
        @torch.jit.script
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/torch/jit/__init__.py", line 364, in script
        graph = _script_graph(fn, _frames_up=_frames_up + 1)
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/torch/jit/__init__.py", line 360, in _script_graph
        return _jit_script_compile(ast, rcb)
    RuntimeError: 
    builtin cannot be used as a value:
            33522128640.0,
            1323241920.0,
            40840800.0,
            960960.0,
            16380.0,
            182.0,
            1.0,
        )
    
        ident = torch.eye(A.shape[1], dtype=A.dtype, device=A.device)
                          ~~~~~~~ <--- HERE
        A2 = torch.matmul(A, A)
        A4 = torch.matmul(A2, A2)
        A6 = torch.matmul(A4, A2)
        U = torch.matmul(
            A,
            torch.matmul(A6, b13 * A6 + b11 * A4 + b9 * A2)
            + b7 * A6
            + b5 * A4
            + b3 * A2
    

    Can you help me?

    opened by NooneBug 7
  • `_dist2plane`  triggers codegen Warning

    `_dist2plane` triggers codegen Warning

    Describe the bug/To Reproduce The following warning appears when I use Distance2PoincareHyperplanes as the classifier for torchvision.models.efficientnet_v2_s.

    ~/miniconda3/envs/a5000/lib/python3.10/site-packages/geoopt/manifolds/stereographic/math.py:1562: UserWarning: operator() profile_node %301 : int = prim::profile_ivalue(%299)
     does not have profile information (Triggered internally at  /opt/conda/conda-bld/pytorch_1656352645774/work/torch/csrc/jit/codegen/cuda/graph_fuser.cpp:104.)
      return _dist2plane(
    ~/miniconda3/envs/a5000/lib/python3.10/site-packages/geoopt/manifolds/stereographic/math.py:1562: UserWarning: FALLBACK path has been taken inside: compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
    To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
    To report the issue, try enable logging via setting the envvariable ` export PYTORCH_JIT_LOG_LEVEL=manager.cpp`
     (Triggered internally at  /opt/conda/conda-bld/pytorch_1656352645774/work/torch/csrc/jit/codegen/cuda/manager.cpp:237.)
      return _dist2plane(
    

    Expected behavior Warning appears during model training.

    Please complete the following information:

    • torch: 1.12.0, cuda: 11.3, geoopt 0.4.1
    • The way you installed geoopt: pip
    • OS: Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-109-generic x86_64)

    Additional context NIL

    bug 
    opened by jin-zhe 2
  • Manifold projection fails

    Manifold projection fails

    Describe the bug With certain matrices, the use of the projx under the Lorentz Manifold fails with_check_point_on_manifold

    To Reproduce Steps to reproduce the behavior:

    • I provided an example problem tensor for this case
    • Call projx on the tensor with a k of 5.0
    • Check if the result is on the manifold

    Expected behavior With float 64 and a defined error value, check on manifold should result in True

    Screenshots

    • It may be a training/testing loss, other information for more context

    Please complete the following information:

    bug 
    opened by inboxedshoe 4
  • Add mobius methods for Lorentz model

    Add mobius methods for Lorentz model

    Thanks for your useful library. However, I found that Lorentz model has not implemented any mobius methods such as mobius_add() or mobius_matvec(). I hope you will update these methods to this library soon.

    enhancement 
    opened by IceIce1ce 1
  • Extended lorenz

    Extended lorenz

    following #142

    • [ ] mobius_add

    • [ ] mobius_matvec

    • [ ] mobius_scalar_mul

    • [ ] mobius_pointwise_mul

    • [ ] mobius_fn_apply

    • [ ] test new functions

    • [ ] re-review Rasul's implementation

    help wanted wip 
    opened by ferrine 1
  • Add mobius_add() and mobius_matvec() methods for Lorentz manifold

    Add mobius_add() and mobius_matvec() methods for Lorentz manifold

    Hi,

    Thanks for the useful tool you've developed, and I really appreciate it. When using geoopt, I found that the mobius_add() and mobius_matvec() methods for Lorentz manifold are missing, and there's an implementation at https://www.github.com/HazyResearch/hgcn. Could those methods be added into the package soon? Thanks a lot!

    enhancement 
    opened by martinwhl 1
Releases(v.0.5.1)
  • v.0.5.1(Nov 28, 2022)

    What's Changed

    • Update testing.yml by @ferrine in https://github.com/geoopt/geoopt/pull/198

    Full Changelog: https://github.com/geoopt/geoopt/compare/v0.5.0...v.0.5.1

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jun 29, 2022)

    What's Changed

    • fix typos by @ferrine in https://github.com/geoopt/geoopt/pull/190
    • StereographicProductManifold to use gyrovector space functions in product manifolds by @gatoniel in https://github.com/geoopt/geoopt/pull/163
    • Seminar by @ferrine in https://github.com/geoopt/geoopt/pull/192

    New Contributors

    • @gatoniel made their first contribution in https://github.com/geoopt/geoopt/pull/163

    Full Changelog: https://github.com/geoopt/geoopt/compare/v0.4.1...v0.5.0

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Mar 15, 2022)

    What's Changed

    • add tests for pytorch 1.10.0 by @ferrine in https://github.com/geoopt/geoopt/pull/186
    • add a test, fix deepcopy and copy by @ferrine in https://github.com/geoopt/geoopt/pull/189

    Full Changelog: https://github.com/geoopt/geoopt/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Sep 2, 2021)

    geoopt (0.4.0)

    New Features

    • new Symmetric Positive Definite manifold (#153)
    • new Siegel manifolds: Upper half model and Bounded domain model, with support for Riemannian and Finsler metrics (#179)

    Maintainance

    • create pull request templates (#154)
    • update tests for pytorch 1.9.0

    Bug Fixes

    • fix step increments in optimizers (#165)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0rc1(Jul 1, 2021)

    geoopt (0.4.0rc1)

    New Features

    • new Symmetric Positive Definite manifold (#153)
    • new Siegel manifolds: Upper half model and Bounded domain model, with support for Riemannian and Finsler metrics (#179)

    Maintainance

    • create pull request templates (#154)
    • update tests for pytorch 1.9.0

    Bug Fixes

    • fix step increments in optimizers (#165)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Oct 29, 2020)

  • v0.3.0(Oct 7, 2020)

    New Features

    • Riemannian Line Search (#140)
    • Per group stabilization (#149)

    Maintenance

    • Fix API warnings (mentioned in #148)
    • support torch >= 1.4.0
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jun 12, 2020)

    geoopt v0.2.0

    New Features

    • BirkhoffPolytope (#125)
    • Lorenz Manifold (#121)
    • kappa-Stereographic model (#126)
    • Sparse optimizers (#130)

    Maintenance

    • Tests for pytorch>=1.4, cpuonly (#133)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Nov 30, 2019)

    Bug Fixes

    • Fix scaling issues with random methods
    • Fix poincare methods cosub and norm that were working not properly
    • Fix Sphere distance for small values
    Source code(tar.gz)
    Source code(zip)
On the Variance of the Adaptive Learning Rate and Beyond

RAdam On the Variance of the Adaptive Learning Rate and Beyond We are in an early-release beta. Expect some adventures and rough edges. Table of Conte

Liyuan Liu 2.5k Dec 27, 2022
OptNet: Differentiable Optimization as a Layer in Neural Networks

OptNet: Differentiable Optimization as a Layer in Neural Networks This repository is by Brandon Amos and J. Zico Kolter and contains the PyTorch sourc

CMU Locus Lab 428 Dec 24, 2022
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

abhishek thakur 1.1k Jan 4, 2023
null 270 Dec 24, 2022
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

Fidelity Investments 56 Sep 13, 2022
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

null 878 Dec 30, 2022
Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Martin Krasser 251 Dec 25, 2022
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Cong Cai 12 Dec 19, 2021
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

Pretrained models for Pytorch (Work in progress) The goal of this repo is: to help to reproduce research papers results (transfer learning setups for

Remi 8.7k Dec 31, 2022
Model summary in PyTorch similar to `model.summary()` in Keras

Keras style model.summary() in PyTorch Keras has a neat API to view the visualization of the model which is very helpful while debugging your network.

Shubham Chandel 3.7k Dec 29, 2022
torch-optimizer -- collection of optimizers for Pytorch

torch-optimizer torch-optimizer -- collection of optimizers for PyTorch compatible with optim module. Simple example import torch_optimizer as optim

Nikolay Novik 2.6k Jan 3, 2023
A PyTorch implementation of EfficientNet

EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor

Luke Melas-Kyriazi 7.2k Jan 6, 2023
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News March 3: v0.9.97 has various bug fixes and improvements: Bug fixes for NTXentLoss Efficiency improvement for AccuracyCalculator, by using torch i

Kevin Musgrave 5k Jan 2, 2023
A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Torchmeta A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning bench

Tristan Deleu 1.7k Jan 6, 2023
PyTorch Extension Library of Optimized Scatter Operations

PyTorch Scatter Documentation This package consists of a small extension library of highly optimized sparse update (scatter and segment) operations fo

Matthias Fey 1.2k Jan 7, 2023
PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

PyTorch Sparse This package consists of a small extension library of optimized sparse matrix operations with autograd support. This package currently

Matthias Fey 757 Jan 4, 2023
Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

Phil Wang 1.8k Jan 6, 2023
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

Facebook Research 1.5k Jan 3, 2023
PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

README TabNet : Attentive Interpretable Tabular Learning This is a pyTorch implementation of Tabnet (Arik, S. O., & Pfister, T. (2019). TabNet: Attent

DreamQuark 2k Dec 27, 2022