PEPit: Performance Estimation in Python
This open source Python library provides a generic way to use PEP framework in Python. Performance estimation problems were introduced in 2014 by Yoel Drori and Marc Teboulle, see [1]. PEPit is mainly based on the formalism and developments from [2, 3] by a subset of the authors of this toolbox. A friendly informal introduction to this formalism is available in this blog post and a corresponding Matlab library is presented in [4] (PESTO).
Website and documentation of PEPit: https://pepit.readthedocs.io/
Source Code (MIT): https://github.com/bgoujaud/PEPit
Using and citing the toolbox
This code comes jointly with the following reference
:
B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, A. Dieuleveut (2022).
"PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python."
When using the toolbox in a project, please refer to this note via this Bibtex entry:
@article{pepit2022,
title={{PEPit}: computer-assisted worst-case analyses of first-order optimization methods in {P}ython},
author={Goujaud, Baptiste and Moucer, C\'eline and Glineur, Fran\c{c}ois and Hendrickx, Julien and Taylor, Adrien and Dieuleveut, Aymeric},
journal={arXiv preprint arXiv:2201.04040},
year={2022}
}
Demo
This notebook provides a demonstration of how to use PEPit to obtain a worst-case guarantee on a simple algorithm (gradient descent), and a more advanced analysis of three other examples.
Installation
The library has been tested on Linux and MacOSX. It relies on the following Python modules:
- Numpy
- Scipy
- Cvxpy
- Matplotlib (for the demo notebook)
Pip installation
You can install the toolbox through PyPI with:
pip install pepit
or get the very latest version by running:
pip install -U https://github.com/bgoujaud/PEPit/archive/master.zip # with --user for user install (no root)
Post installation check
After a correct installation, you should be able to import the module without errors:
import PEPit
Online environment
You can also try the package in this Binder repository.
Example
The folder Examples contains numerous introductory examples to the toolbox.
Among the other examples, the following code (see GradientMethod
) generates a worst-case scenario for iterations of the gradient method, applied to the minimization of a smooth (possibly strongly) convex function f(x). More precisely, this code snippet allows computing the worst-case value of when is generated by gradient descent, and when .
from PEPit import PEP
from PEPit.functions import SmoothStronglyConvexFunction
def wc_gradient_descent(L, gamma, n, verbose=True):
"""
Consider the convex minimization problem
.. math:: f_\\star \\triangleq \\min_x f(x),
where :math:`f` is :math:`L`-smooth and convex.
This code computes a worst-case guarantee for **gradient descent** with fixed step-size :math:`\\gamma`.
That is, it computes the smallest possible :math:`\\tau(n, L, \\gamma)` such that the guarantee
.. math:: f(x_n) - f_\\star \\leqslant \\tau(n, L, \\gamma) || x_0 - x_\\star ||^2
is valid, where :math:`x_n` is the output of gradient descent with fixed step-size :math:`\\gamma`, and
where :math:`x_\\star` is a minimizer of :math:`f`.
In short, for given values of :math:`n`, :math:`L`, and :math:`\\gamma`, :math:`\\tau(n, L, \\gamma)` is computed as the worst-case
value of :math:`f(x_n)-f_\\star` when :math:`||x_0 - x_\\star||^2 \\leqslant 1`.
**Algorithm**:
Gradient descent is described by
.. math:: x_{t+1} = x_t - \\gamma \\nabla f(x_t),
where :math:`\\gamma` is a step-size.
**Theoretical guarantee**:
When :math:`\\gamma \\leqslant \\frac{1}{L}`, the **tight** theoretical guarantee can be found in [1, Theorem 1]:
.. math:: f(x_n)-f_\\star \\leqslant \\frac{L||x_0-x_\\star||^2}{4nL\\gamma+2},
which is tight on some Huber loss functions.
**References**:
`[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel
approach. Mathematical Programming 145(1–2), 451–482.
<https://arxiv.org/pdf/1206.3209.pdf>`_
Args:
L (float): the smoothness parameter.
gamma (float): step-size.
n (int): number of iterations.
verbose (bool): if True, print conclusion
Returns:
pepit_tau (float): worst-case value
theoretical_tau (float): theoretical value
Example:
>>> L = 3
>>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=True)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
function 1 : 30 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.16666666497937685
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
"""
# Instantiate PEP
problem = PEP()
# Declare a strongly convex smooth function
func = problem.declare_function(SmoothStronglyConvexFunction, param={'mu': 0, 'L': L})
# Start by defining its unique optimal point xs = x_* and corresponding function value fs = f_*
xs = func.stationary_point()
fs = func.value(xs)
# Then define the starting point x0 of the algorithm
x0 = problem.set_initial_point()
# Set the initial constraint that is the distance between x0 and x^*
problem.set_initial_condition((x0 - xs) ** 2 <= 1)
# Run n steps of the GD method
x = x0
for _ in range(n):
x = x - gamma * func.gradient(x)
# Set the performance metric to the function values accuracy
problem.set_performance_metric(func.value(x) - fs)
# Solve the PEP
pepit_tau = problem.solve(verbose=verbose)
# Compute theoretical guarantee (for comparison)
theoretical_tau = L / (2 * (2 * n * L * gamma + 1))
# Print conclusion if required
if verbose:
print('*** Example file: worst-case performance of gradient descent with fixed step-sizes ***')
print('\tPEPit guarantee:\t\t f(x_n)-f_* <= {:.6} ||x_0 - x_*||^2'.format(pepit_tau))
print('\tTheoretical guarantee:\t f(x_n)-f_* <= {:.6} ||x_0 - x_*||^2'.format(theoretical_tau))
# Return the worst-case guarantee of the evaluated method (and the reference theoretical value)
return pepit_tau, theoretical_tau
if __name__ == "__main__":
L = 3
pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=True)
Included tools
A lot of common optimization methods can be studied through this framework, using numerous steps and under a large variety of function / operator classes.
PEPit provides the following steps (often referred to as "oracles"):
- Inexact gradient step
- Exact line-search step
- Proximal step
- Inexact proximal step
- Bregman gradient step
- Bregman proximal step
- Linear optimization step
PEPit provides the following function classes CNIs:
- Convex
- Strongly convex
- Smooth
- Convex and smooth
- Strongly convex and smooth
- Convex and Lipschitz continuous
- Convex indicator
PEPit provides the following operator classes CNIs:
- Monotone
- Strongly monotone
- Lipschitz continuous
- Strongly monotone and Lipschitz continuous
- Cocoercive
Authors
This toolbox has been created by
- Baptiste Goujaud (main contributor #1)
- Céline Moucer (main contributor #2)
- Julien Hendrickx (project supervision)
- François Glineur (project supervision)
- Adrien Taylor (contributor & main project supervision)
- Aymeric Dieuleveut (contributor & main project supervision)
Acknowledgments
The authors would like to thank Rémi Flamary for his feedbacks on preliminary versions of the toolbox, as well as for support regarding the continuous integration.
Contributions
All external contributions are welcome. Please read the contribution guidelines.
References
[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.
[2] A. Taylor, J. Hendrickx, F. Glineur (2017). Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1-2), 307-345.
[3] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.
[4] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).
[5] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.
[6] O. Güler (1992). New proximal point algorithms for convex minimization. SIAM Journal on Optimization, 2(4):649–664.
[7] Y. Drori (2017). The exact information-based complexity of smooth convex minimization. Journal of Complexity, 39, 1-16.
[8] E. De Klerk, F. Glineur, A. Taylor (2017). On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optimization Letters, 11(7), 1185-1199.
[9] B.T. Polyak (1964). Some methods of speeding up the convergence of iteration method. URSS Computational Mathematics and Mathematical Physics.
[10] E. Ghadimi, H. R. Feyzmahdavian, M. Johansson (2015). Global convergence of the Heavy-ball method for convex optimization. European Control Conference (ECC).
[11] E. De Klerk, F. Glineur, A. Taylor (2020). Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization, 30(3), 2053-2082.
[12] O. Gannot (2021). A frequency-domain analysis of inexact gradient methods. Mathematical Programming.
[13] D. Kim, J. Fessler (2016). Optimized first-order methods for smooth convex minimization. Mathematical Programming 159.1-2: 81-107.
[14] S. Cyrus, B. Hu, B. Van Scoy, L. Lessard (2018). A robust accelerated optimization algorithm for strongly convex functions. American Control Conference (ACC).
[15] Y. Nesterov (2003). Introductory lectures on convex optimization: A basic course. Springer Science & Business Media.
[16] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).
[17] Y. Drori, M. Teboulle (2016). An optimal variant of Kelley's cutting-plane method. Mathematical Programming, 160(1), 321-351.
[18] Van Scoy, B., Freeman, R. A., Lynch, K. M. (2018). The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Letters, 2(1), 49-54.
[19] P. Patrinos, L. Stella, A. Bemporad (2014). Douglas-Rachford splitting: Complexity estimates and accelerated variants. In 53rd IEEE Conference on Decision and Control (CDC).
[20] Y. Censor, S.A. Zenios (1992). Proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73(3), 451-464.
[21] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.
[22] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.
[23] P. Giselsson, and S. Boyd (2016). Linear convergence and metric selection in Douglas-Rachford splitting and ADMM. IEEE Transactions on Automatic Control, 62(2), 532-544.
[24] M .Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.
[25] M. Jaggi (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In 30th International Conference on Machine Learning (ICML).
[26] A. Auslender, M. Teboulle (2006). Interior gradient and proximal methods for convex and conic optimization. SIAM Journal on Optimization 16.3 (2006): 697-725.
[27] H.H. Bauschke, J. Bolte, M. Teboulle (2017). A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 2017, vol. 42, no 2, p. 330-348
[28] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.
[29] A. Taylor, J. Hendrickx, F. Glineur (2018). Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. Journal of Optimization Theory and Applications, 178(2), 455-476.
[30] B. Polyak (1987). Introduction to Optimization. Optimization Software New York.
[31] L. Lessard, B. Recht, A. Packard (2016). Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization 26(1), 57–95.
[32] D. Davis, W. Yin (2017). A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis, 25(4), 829-858.
[33] Taylor, A. B. (2017). Convex interpolation and performance estimation of first-order methods for convex optimization. PhD Thesis, UCLouvain.
[34] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2021). The exact worst-case convergence rate of the gradient method with fixed step lengths for L-smooth functions. arXiv 2104.05468.
[35] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.
[36] A. Defazio (2016). A simple practical accelerated method for finite sums. Advances in Neural Information Processing Systems (NIPS), 29, 676-684.
[37] A. Defazio, F. Bach, S. Lacoste-Julien (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS).
[38] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming (to appear).
[39] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).
[40] D. Kim (2021). Accelerated proximal point method for maximally monotone operators. Mathematical Programming, 1-31.
[41] W. Moursi, L. Vandenberghe (2019). Douglas–Rachford Splitting for the Sum of a Lipschitz Continuous and a Strongly Monotone Operator. Journal of Optimization Theory and Applications 183, 179–198.
[42] G. Gu, J. Yang (2020). Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problem. SIAM Journal on Optimization, 30(3), 1905-1921.
[43] F. Lieder (2021). On the convergence rate of the Halpern-iteration. Optimization Letters, 15(2), 405-418.
[44] F. Lieder (2018). Projection Based Methods for Conic Linear Programming Optimal First Order Complexities and Norm Constrained Quasi Newton Methods. PhD thesis, HHU Düsseldorf.
[45] Y. Nesterov (1983). A method for solving the convex programming problem with convergence rate :math:O(1/k^2)
. In Dokl. akad. nauk Sssr (Vol. 269, pp. 543-547).
[46] N. Bansal, A. Gupta (2019). Potential-function proofs for gradient methods. Theory of Computing, 15(1), 1-32.
[47] M. Barre, A. Taylor, F. Bach (2021). A note on approximate accelerated forward-backward methods with absolute and relative errors, and possibly strongly convex objectives. arXiv:2106.15536v2.
[48] J. Eckstein and W. Yao (2018). Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Mathematical Programming, 170(2), 417-444.
[49] M. Barré, A. Taylor, A. d’Aspremont (2020). Complexity guarantees for Polyak steps with momentum. In Conference on Learning Theory (COLT).
[50] D. Kim, J. Fessler (2017). On the convergence analysis of the optimized gradient method. Journal of Optimization Theory and Applications, 172(1), 187-205.
[51] Steven Diamond and Stephen Boyd (2016). CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research (JMLR) 17.83.1--5 (2016).
[52] Agrawal, Akshay and Verschueren, Robin and Diamond, Steven and Boyd, Stephen (2018). A rewriting system for convex optimization problems. Journal of Control and Decision (JCD) 5.1.42--60 (2018).
[53] Adrien Taylor, Bryan Van Scoy, Laurent Lessard (2018). Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees. International Conference on Machine Learning (ICML).
[54] B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, A. Dieuleveut (2022). PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python.