Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

Overview

Proximal Backpropagation

Proximal Backpropagation (ProxProp) is a neural network training algorithm that takes implicit instead of explicit gradient steps to update the network parameters. We have analyzed this algorithm in our ICLR 2018 paper:

Proximal Backpropagation (Thomas Frerix, Thomas Möllenhoff, Michael Moeller, Daniel Cremers; ICLR 2018) [https://arxiv.org/abs/1706.04638]

tl;dr

  • We provide a PyTorch implementation of ProxProp for Python 3 and PyTorch 1.0.1.
  • The results of our paper can be reproduced by executing the script paper_experiments.sh.
  • ProxProp is implemented as a torch.nn.Module (a 'layer') and can be combined with any other layer and first-order optimizer. While a ProxPropConv2d and a ProxPropLinear layer already exist, you can generate a ProxProp layer for your favorite linear layer with one line of code.

Installation

  1. Make sure you have a running Python 3 (tested with Python 3.7) ecosytem. We recommend that you use a conda install, as this is also the recommended option to get the latest PyTorch running. For this README and for the scripts, we assume that you have conda running with Python 3.7.
  2. Clone this repository and switch to the directory.
  3. Install the dependencies via conda install --file conda_requirements.txt and pip install -r pip_requirements.txt.
  4. Install PyTorch with magma support. We have tested our code with PyTorch 1.0.1 and CUDA 10.0. You can install this setup via
    conda install -c pytorch magma-cuda100
    conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
    
  5. (optional, but necessary to reproduce paper experiments) Download the CIFAR-10 dataset by executing get_data.sh

Training neural networks with ProxProp

ProxProp is implemented as a custom linear layer (torch.nn.Module) with its own backward pass to take implicit gradient steps on the network parameters. With this design choice it can be combined with any other layer, for which one takes explicit gradient steps. Furthermore, the resulting update direction can be used with any first-order optimizer that expects a suitable update direction in parameter space. In our paper we prove that ProxProp generates a descent direction and show experiments with Nesterov SGD and Adam.

You can use our pre-defined layers ProxPropConv2d and ProxPropLinear, corresponding to nn.Conv2d and nn.Linear, by importing

from ProxProp import ProxPropConv2d, ProxPropLinear

Besides the usual layer parameters, as detailed in the PyTorch docs, you can provide:

  • tau_prox: step size for a proximal step; default is tau_prox=1
  • optimization_mode: can be one of 'prox_exact', 'prox_cg{N}', 'gradient' for an exact proximal step, an approximate proximal step with N conjugate gradient steps and an explicit gradient step, respectively; default is optimization_mode='prox_cg1'. The 'gradient' mode is for a fair comparison with SGD, as it incurs the same overhead as the other methods in exploiting a generic implementation with the provided PyTorch API.

If you want to use ProxProp to optimize your favorite linear layer, you can generate the respective module with one line of code. As an example for the the Conv3d layer:

from ProxProp import proxprop_module_generator
ProxPropConv3d = proxprop_module_generator(torch.nn.Conv3d)

This gives you a default implementation for the approximate conjugate gradient solver, which treats all parameters as a stacked vector. If you want to use the exact solver or want to use the conjugate gradient solver more efficiently, you have to provide the respective reshaping methods to proxprop_module_generator, as this requires specific knowledge of the layer's structure and cannot be implemented generically. As a template, take a look at the ProxProp.py file, where we have done this for the ProxPropLinear layer.

By reusing the forward/backward implementations of existing PyTorch modules, ProxProp becomes readily accessible. However, we pay an overhead associated with generically constructing the backward pass using the PyTorch API. We have intentionally sided with genericity over speed.

Reproduce paper experiments

To reproduce the paper experiments execute the script paper_experiments.sh. This will run our paper's experiments, store the results in the directory paper_experiments/ and subsequently compile the results into the file paper_plots.pdf. We use an NVIDIA Titan X GPU; executing the script takes roughly 3 hours.

Acknowledgement

We want to thank Soumith Chintala for helping us track down a mysterious bug and the whole PyTorch dev team for their continued development effort and great support to the community.

Publication

If you use ProxProp, please acknowledge our paper by citing

@article{Frerix-et-al-18,
    title = {Proximal Backpropagation},
    author={Thomas Frerix, Thomas Möllenhoff, Michael Moeller, Daniel Cremers},
    journal={International Conference on Learning Representations},
    year={2018},
    url = {https://arxiv.org/abs/1706.04638}
}
You might also like...
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

codes for
codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation (EMNLP-2021 main conference) Contents Overview Background Quick to Use Furth

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
A library for hidden semi-Markov models with explicit durations

hsmmlearn hsmmlearn is a library for unsupervised learning of hidden semi-Markov models with explicit durations. It is a port of the hsmm package for

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

Caffe-like explicit model constructor. C(onfig)Model

cmodel Caffe-like explicit model constructor. C(onfig)Model Installation pip install git+https://github.com/bonlime/cmodel Usage In order to allow usi

Using NumPy to solve the equations of fluid mechanics together with Finite Differences, explicit time stepping and Chorin's Projection methods
Using NumPy to solve the equations of fluid mechanics together with Finite Differences, explicit time stepping and Chorin's Projection methods

Computational Fluid Dynamics in Python Using NumPy to solve the equations of fluid mechanics 🌊 🌊 🌊 together with Finite Differences, explicit time

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.
The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algorithms that do the job in the least jargon possible and examples to guide you through every step of the way.

Owner
Thomas Frerix
Thomas Frerix
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 9, 2022
An implementation of the proximal policy optimization algorithm

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 9, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 3, 2023
(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback About This repository accompanies the real-world experiments conducted i

yuta-saito 19 Dec 1, 2022
A PyTorch implementation for Unsupervised Domain Adaptation by Backpropagation(DANN), support Office-31 and Office-Home dataset

DANN A PyTorch implementation for Unsupervised Domain Adaptation by Backpropagation Prerequisites Linux or OSX NVIDIA GPU + CUDA (may CuDNN) and corre

null 8 Apr 16, 2022
Meta Learning Backpropagation And Improving It (VSML)

Meta Learning Backpropagation And Improving It (VSML) This is research code for the NeurIPS 2021 publication Kirsch & Schmidhuber 2021. Many concepts

Louis Kirsch 22 Dec 21, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 163 Dec 26, 2022
ProMP: Proximal Meta-Policy Search

ProMP: Proximal Meta-Policy Search Implementations corresponding to ProMP (Rothfuss et al., 2018). Overall this repository consists of two branches: m

Jonas Rothfuss 212 Dec 20, 2022