Experiments for distributed optimization algorithms

Boyue Li

Last update: Dec 4, 2022

Related tags

Overview

Network-Distributed Algorithm Experiments

This repository contains a set of optimization algorithms and objective functions, and all code needed to reproduce experiments in:

"DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization" [PDF]. (code is in this file [link])
"Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction" [PDF]. (code is in the previous version of this repo [link])

Due to the random data generation procedure, resulting graphs may be slightly different from those appeared in the paper, but conclusions remain the same.

If you find this code useful, please cite our papers:

@article{li2021destress,
  title={DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization},
  author={Li, Boyue and Li, Zhize and Chi, Yuejie},
  journal={arXiv preprint arXiv:2110.01165},
  year={2021}
}

@article{li2020communication,
  title={Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction},
  author={Li, Boyue and Cen, Shicong and Chen, Yuxin and Chi, Yuejie},
  journal={Journal of Machine Learning Research},
  volume={21},
  pages={1--51},
  year={2020}
}

Implemented objective functions

The gradient implementations of all objective functions are checked numerically.

Linear regression

Linear regression with random generated data. The objective function is $f(w) = \frac{1}{N} \sum_i (y_i - x_i^\top w)^2$

Logistic regression

Logistic regression with $l$-2 or nonconvex regularization with random generated data or the Gisette dataset or datasets from libsvmtools. The objective function is $$ f(w) = - \frac{1}{N} * \Big(\sum_i y_i \log \frac{1}{1 + exp(w^T x_i)} + (1 - y_i) \log \frac{exp(w^T x_i)}{1 + exp(w^T x_i)} \Big) + \frac{\lambda}{2} | w |_2^2 + \alpha \sum_j \frac{w_j^2}{1 + w_j^2} $$

One-hidden-layer fully-connected neural netowrk

One-hidden-layer fully-connected neural network with softmax loss on the MNIST dataset.

Implemented optimization algorithms

Centralized optimization algorithms

Gradient descent
Stochastic gradient descent
Nesterov's accelerated gradient descent
SVRG
SARAH

Distributed optimization algorithms (i.e. with parameter server)

ADMM
DANE

Decentralized optimization algorithms

Decentralized gradient descent
Decentralized stochastic gradient descent
Decentralized gradient descent with gradient tracking
EXTRA
NIDS
Network-DANE/SARAH/SVRG
GT-SARAH
DESTRESS

Comments

The algorithms e.g. SGD,DSGD always diverge

Maybe there are some subtle wrong in this code? and I've found this error:

Process Process-8: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/true_nobility/Desktop/Network-Distributed-Algorithm-master/nda/experiment_utils/utils.py", line 21, in multi_process_helper log.info(f'{opt.get_name()} started') AttributeError: 'str' object has no attribute 'get_name'

How could solve this problem

opened by Forshining 9

cuda driver compatibility issue

Hi,

I encountered the following issue when running the code:

(Randy) randy233@Randy-Desktop:/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/experiments/convex$ python logistic_regression.py
INFO 10:47:30.9716 5965 utils.py:160] NumExpr defaulting to 6 threads.
INFO 10:47:31.1224 5965 logistic_regression.py:63] Initializing using GPU
Process Process-1:
Traceback (most recent call last):
  File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/nda/problems/logistic_regression.py", line 85, in _init
    self.cuda()
  File "/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/nda/problems/problem.py", line 93, in cuda
    self.__dict__[k] = xp.array(self.__dict__[k])
  File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/site-packages/cupy/_creation/from_data.py", line 46, in array
    return _core.array(obj, dtype, copy, order, subok, ndmin)
  File "cupy/_core/core.pyx", line 2357, in cupy._core.core.array
  File "cupy/_core/core.pyx", line 2381, in cupy._core.core.array
  File "cupy/_core/core.pyx", line 2513, in cupy._core.core._array_default
  File "cupy/_core/core.pyx", line 136, in cupy._core.core.ndarray.__new__
  File "cupy/_core/core.pyx", line 224, in cupy._core.core._ndarray_base._init
  File "cupy/cuda/memory.pyx", line 742, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1419, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1439, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/device.pyx", line 48, in cupy.cuda.device.get_device_id
  File "cupy_backends/cuda/api/runtime.pyx", line 178, in cupy_backends.cuda.api.runtime.getDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 143, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version
^CTraceback (most recent call last):
  File "/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/experiments/convex/logistic_regression.py", line 26, in <module>
    p = LogisticRegression(n_agent=n_agent, m=m, dim=dim, noise_ratio=0.05, graph_type='er', kappa=kappa, graph_params=0.3)
  File "/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/nda/problems/logistic_regression.py", line 68, in __init__
    norm = q.get()
  File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/queues.py", line 103, in get
    res = self._recv_bytes()
  File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/connection.py", line 221, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

I wonder how should I check my current CUDA version, and should I download the latest CUDA driver?

opened by RandyChen233 3

how is the training data generated randomly in the base `Problem` class?
Hi,

I am a little confused over how the dataset in the base Problem class is randomly generated. In the code, I cannot find the following function named generate_data():

if dataset == 'random': self.m_total = m * n_agent # Total number of data samples of all agents self.generate_data() else:

I wonder if this is a function embedded in numpy or some other packages?

Thanks
opened by RandyChen233 2

a question about the warning messages after running `linear_regression.py`

Hi,

The following warning shows up when linear_regression.py is done running, and I wonder what they're referring to?


WARNING 22:56:49.4068 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
WARNING 22:56:49.6691 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
WARNING 22:56:50.0759 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
WARNING 22:56:50.4188 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
WARNING 22:56:50.7460 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
WARNING 22:56:51.0289 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.

opened by RandyChen233 1

bugs

I found that actually setting the batch_size of the DSGD algorithm to all samples, the convergence results are the same as DGD_tracking, but I don't know what is causing this. In theory, DGD_tracking should be faster than DSGD?

opened by Plutosss 1

Owner

Boyue Li

GitHub

Distributed Asynchronous Hyperparameter Optimization in Python

Hyperopt: Distributed Hyperparameter Optimization Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which

6.5k Jan 1, 2023

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

8.4k Jan 1, 2023

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

163 Dec 26, 2022

Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

4.9k Jan 5, 2023

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 8, 2023

Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

4.1k Mar 8, 2021

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

3.7k Jan 3, 2023

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

6 Dec 14, 2022

Filtering variational quantum algorithms for combinatorial optimization

Current gate-based quantum computers have the potential to provide a computational advantage if algorithms use quantum hardware efficiently.

1 Feb 9, 2022

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

81 Nov 26, 2022

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

528 Nov 25, 2022

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

68 Dec 29, 2022

Simple reimplemetation experiments about FcaNet

FcaNet-CIFAR An implementation of the paper FcaNet: Frequency Channel Attention Networks on CIFAR10/CIFAR100 dataset. how to run Code: python Cifar.py

76 Feb 4, 2021

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

36 Dec 5, 2022

Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

1.4k Jan 2, 2023

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

27 Dec 12, 2022

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

80 Dec 30, 2022

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

26 Oct 5, 2022

Neural implicit reconstruction experiments for the Vector Neuron paper

Neural Implicit Reconstruction with Vector Neurons This repository contains code for the neural implicit reconstruction experiments in the paper Vecto

35 Jan 2, 2023

Experiments for distributed optimization algorithms

Related tags

Overview

Network-Distributed Algorithm Experiments

Implemented objective functions

Linear regression

Logistic regression

One-hidden-layer fully-connected neural netowrk

Implemented optimization algorithms

Centralized optimization algorithms

Distributed optimization algorithms (i.e. with parameter server)

Decentralized optimization algorithms

Comments

The algorithms e.g. SGD,DSGD always diverge

cuda driver compatibility issue

how is the training data generated randomly in the base `Problem` class?

a question about the warning messages after running `linear_regression.py`

bugs

Owner

Boyue Li

Distributed Asynchronous Hyperparameter Optimization in Python

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Distributed Evolutionary Algorithms in Python

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Distributed Evolutionary Algorithms in Python

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Filtering variational quantum algorithms for combinatorial optimization

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

Simple reimplemetation experiments about FcaNet

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Algorithmic trading with deep learning experiments

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Neural implicit reconstruction experiments for the Vector Neuron paper