Experiments for distributed optimization algorithms

Overview

Network-Distributed Algorithm Experiments

--

This repository contains a set of optimization algorithms and objective functions, and all code needed to reproduce experiments in:

  1. "DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization" [PDF]. (code is in this file [link])

  2. "Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction" [PDF]. (code is in the previous version of this repo [link])

Due to the random data generation procedure, resulting graphs may be slightly different from those appeared in the paper, but conclusions remain the same.

If you find this code useful, please cite our papers:

@article{li2021destress,
  title={DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization},
  author={Li, Boyue and Li, Zhize and Chi, Yuejie},
  journal={arXiv preprint arXiv:2110.01165},
  year={2021}
}
@article{li2020communication,
  title={Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction},
  author={Li, Boyue and Cen, Shicong and Chen, Yuxin and Chi, Yuejie},
  journal={Journal of Machine Learning Research},
  volume={21},
  pages={1--51},
  year={2020}
}

Implemented objective functions

The gradient implementations of all objective functions are checked numerically.

Linear regression

Linear regression with random generated data. The objective function is $f(w) = \frac{1}{N} \sum_i (y_i - x_i^\top w)^2$

Logistic regression

Logistic regression with $l$-2 or nonconvex regularization with random generated data or the Gisette dataset or datasets from libsvmtools. The objective function is $$ f(w) = - \frac{1}{N} * \Big(\sum_i y_i \log \frac{1}{1 + exp(w^T x_i)} + (1 - y_i) \log \frac{exp(w^T x_i)}{1 + exp(w^T x_i)} \Big) + \frac{\lambda}{2} | w |_2^2 + \alpha \sum_j \frac{w_j^2}{1 + w_j^2} $$

One-hidden-layer fully-connected neural netowrk

One-hidden-layer fully-connected neural network with softmax loss on the MNIST dataset.

Implemented optimization algorithms

Centralized optimization algorithms

  • Gradient descent
  • Stochastic gradient descent
  • Nesterov's accelerated gradient descent
  • SVRG
  • SARAH

Distributed optimization algorithms (i.e. with parameter server)

  • ADMM
  • DANE

Decentralized optimization algorithms

  • Decentralized gradient descent
  • Decentralized stochastic gradient descent
  • Decentralized gradient descent with gradient tracking
  • EXTRA
  • NIDS
  • Network-DANE/SARAH/SVRG
  • GT-SARAH
  • DESTRESS
Comments
  • The algorithms e.g. SGD,DSGD always diverge

    The algorithms e.g. SGD,DSGD always diverge

    Maybe there are some subtle wrong in this code? and I've found this error:

    Process Process-8: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/Users/true_nobility/Desktop/Network-Distributed-Algorithm-master/nda/experiment_utils/utils.py", line 21, in multi_process_helper log.info(f'{opt.get_name()} started') AttributeError: 'str' object has no attribute 'get_name'

    How could solve this problem

    opened by Forshining 9
  • cuda driver compatibility issue

    cuda driver compatibility issue

    Hi,

    I encountered the following issue when running the code:

    (Randy) randy233@Randy-Desktop:/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/experiments/convex$ python logistic_regression.py
    INFO 10:47:30.9716 5965 utils.py:160] NumExpr defaulting to 6 threads.
    INFO 10:47:31.1224 5965 logistic_regression.py:63] Initializing using GPU
    Process Process-1:
    Traceback (most recent call last):
      File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
        self.run()
      File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/nda/problems/logistic_regression.py", line 85, in _init
        self.cuda()
      File "/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/nda/problems/problem.py", line 93, in cuda
        self.__dict__[k] = xp.array(self.__dict__[k])
      File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/site-packages/cupy/_creation/from_data.py", line 46, in array
        return _core.array(obj, dtype, copy, order, subok, ndmin)
      File "cupy/_core/core.pyx", line 2357, in cupy._core.core.array
      File "cupy/_core/core.pyx", line 2381, in cupy._core.core.array
      File "cupy/_core/core.pyx", line 2513, in cupy._core.core._array_default
      File "cupy/_core/core.pyx", line 136, in cupy._core.core.ndarray.__new__
      File "cupy/_core/core.pyx", line 224, in cupy._core.core._ndarray_base._init
      File "cupy/cuda/memory.pyx", line 742, in cupy.cuda.memory.alloc
      File "cupy/cuda/memory.pyx", line 1419, in cupy.cuda.memory.MemoryPool.malloc
      File "cupy/cuda/memory.pyx", line 1439, in cupy.cuda.memory.MemoryPool.malloc
      File "cupy/cuda/device.pyx", line 48, in cupy.cuda.device.get_device_id
      File "cupy_backends/cuda/api/runtime.pyx", line 178, in cupy_backends.cuda.api.runtime.getDevice
      File "cupy_backends/cuda/api/runtime.pyx", line 143, in cupy_backends.cuda.api.runtime.check_status
    cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version
    ^CTraceback (most recent call last):
      File "/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/experiments/convex/logistic_regression.py", line 26, in <module>
        p = LogisticRegression(n_agent=n_agent, m=m, dim=dim, noise_ratio=0.05, graph_type='er', kappa=kappa, graph_params=0.3)
      File "/mnt/c/Users/Randy666/Documents/projects/ICON_lab/Network-Distributed-Algorithm/nda/problems/logistic_regression.py", line 68, in __init__
        norm = q.get()
      File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/queues.py", line 103, in get
        res = self._recv_bytes()
      File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/connection.py", line 221, in recv_bytes
        buf = self._recv_bytes(maxlength)
      File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
        buf = self._recv(4)
      File "/home/randy233/anaconda3/envs/Randy/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
        chunk = read(handle, remaining)
    KeyboardInterrupt
    

    I wonder how should I check my current CUDA version, and should I download the latest CUDA driver?

    opened by RandyChen233 3
  • how is the training data generated randomly in the base `Problem` class?

    how is the training data generated randomly in the base `Problem` class?

    Hi,

    I am a little confused over how the dataset in the base Problem class is randomly generated. In the code, I cannot find the following function named generate_data():

    
    if dataset == 'random':
                self.m_total = m * n_agent      # Total number of data samples of all agents
                self.generate_data()
            else:
    

    I wonder if this is a function embedded in numpy or some other packages?

    Thanks

    opened by RandyChen233 2
  • a question about the warning messages after running `linear_regression.py`

    a question about the warning messages after running `linear_regression.py`

    Hi,

    The following warning shows up when linear_regression.py is done running, and I wonder what they're referring to?

    
    WARNING 22:56:49.4068 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
    WARNING 22:56:49.6691 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
    WARNING 22:56:50.0759 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
    WARNING 22:56:50.4188 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
    WARNING 22:56:50.7460 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
    WARNING 22:56:51.0289 22008 backend_ps.py:303] The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
    
    opened by RandyChen233 1
  • bugs

    bugs

    I found that actually setting the batch_size of the DSGD algorithm to all samples, the convergence results are the same as DGD_tracking, but I don't know what is causing this. In theory, DGD_tracking should be faster than DSGD?

    opened by Plutosss 1
Owner
Boyue Li
Boyue Li
Distributed Asynchronous Hyperparameter Optimization in Python

Hyperopt: Distributed Hyperparameter Optimization Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which

null 6.5k Jan 1, 2023
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 1, 2023
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 163 Dec 26, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Jan 5, 2023
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 8, 2023
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.1k Mar 8, 2021
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 3, 2023
Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

Parsa Dahesh 6 Dec 14, 2022
Filtering variational quantum algorithms for combinatorial optimization

Current gate-based quantum computers have the potential to provide a computational advantage if algorithms use quantum hardware efficiently.

null 1 Feb 9, 2022
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Algo Phantoms 81 Nov 26, 2022
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

ETS 528 Nov 25, 2022
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

Facebook Research 68 Dec 29, 2022
Simple reimplemetation experiments about FcaNet

FcaNet-CIFAR An implementation of the paper FcaNet: Frequency Channel Attention Networks on CIFAR10/CIFAR100 dataset. how to run Code: python Cifar.py

null 76 Feb 4, 2021
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

CopeNLU 36 Dec 5, 2022
Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

Alex Honchar 1.4k Jan 2, 2023
Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Google Research 27 Dec 12, 2022
This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

TSForecasting This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the tim

Rakshitha Godahewa 80 Dec 30, 2022
Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

Julius Kunze 26 Oct 5, 2022
Neural implicit reconstruction experiments for the Vector Neuron paper

Neural Implicit Reconstruction with Vector Neurons This repository contains code for the neural implicit reconstruction experiments in the paper Vecto

Congyue Deng 35 Jan 2, 2023