FedML: A Research Library and Benchmark for Federated Machine Learning

FedML-AI

Last update: Jan 8, 2023

Related tags

Overview

FedML: A Research Library and Benchmark for Federated Machine Learning

📄 https://arxiv.org/abs/2007.13518

News

2021-02-01 (Award): #NeurIPS 2020# FedML won Best Paper Award at NeurIPS Federated Learning workshop 2020

2020-12-12 (Conference Presentation): #NeurIPS 2020# We gave a contributed talk at NeurIPS 2020. Please check the video here: https://www.youtube.com/watch?v=93SETZGZMyI

The slides of this presentation are also open source (yes, we open source everything of FedML!): https://docs.google.com/presentation/d/1ykAQ_GpzEoRVJeMr1hXUHlJSBpzW2P6Wuzud8RTF0oo/edit#slide=id.p

2020-11-27 (System): FedML architecture has evolved into an ecosystem including multiple GitHub repositories. With FedML at its core, we can support more advanced FL applications and platforms.
FedML: https://github.com/FedML-AI/FedML

FedNLP: https://github.com/FedML-AI/FedNLP (We plan to release at the end of March 2021. Please stay tuned!)

FedML-IoT: https://github.com/FedML-AI/FedML-IoT

FedML-Server: https://github.com/FedML-AI/FedML-Server

FedML-Mobile: https://github.com/FedML-AI/FedML-Mobile

2020-11-24 (Publication): We are thrilled to share that the short version of our FedML white paper has been accepted to NeurIPS 2020 workshop. Thanks for reviewers from NeurIPS, supporting us to do a presentation there.

2020-11-05 (System): Do you want to run federated learning on IoT devices? FedML architecture design can smoothly transplant the distributed computing code to the IoT platform. FedML can support edge training on two IoT devices: Raspberry Pi 4 and NVIDIA Jetson Nano!!! Please check it out here: https://github.com/FedML-AI/FedML/blob/master/fedml_iot/README.md

2020-10-28 (Algorithms) : We released more advanced federated optimization algorithms, more than just FedAvg! http://doc.fedml.ai/#/algorithm-reference-implementation

2020-10-26 (Publication) : V2 of our white paper is released. Please check out here: https://arxiv.org/pdf/2007.13518.pdf

2020-10-07 (Model and Dataset) : Datasets + Models ALL IN ONE!!! FedML supports comprehensive research-oriented FL datasets and models:

cross-device CV: Federated EMNIST + CNN (2 conv layers)
cross-device CV: CIFAR100 + ResNet18 (Group Normalization)
cross-device NLP: shakespeare + RNN (bi-LSTM)
cross-device NLP: stackoverflow (NWP) + RNN (bi-LSTM)
cross-silo CV: CIFAR10, CIFAR100, CINIC10 + ResNet
cross-silo CV: CIFAR10, CIFAR100, CINIC10 + MobileNet
linear: MNIST + Logistic Regression

Please check create_model(args, model_name, output_dim) and load_data(args, dataset_name) at fedml_experiments/distributed/fedavg/main_fedavg.py for details.

We will support more advanced models and datasets, please stay tuned!

2020-09-30 (Publication): We maintained a comprehensive publication list of Federated Learning here: https://github.com/chaoyanghe/Awesome-Federated-Learning

2020-09-28 (Publication): Authors of FedML (https://fedml.ai) have 7 papers that got accepted to NeurIPS 2020. Big congratulations!!! Here is the publication list: https://github.com/FedML-AI/FedML/blob/master/publications.md. Highlighted ones are related to large-scale distributed learning and federated learning.

What is Federated Learning?

Please read this long vision paper Advances and Open Problems in Federated Learning.

This publication list is also helpful: https://github.com/chaoyanghe/Awesome-Federated-Learning

Introduction

Federated learning is a rapidly growing research field in the machine learning domain. Although considerable research efforts have been made, existing libraries cannot adequately support diverse algorithmic development (e.g., diverse topology and flexible message exchange), and inconsistent dataset and model usage in experiments make fair comparisons difficult. In this work, we introduce FedML, an open research library and benchmark that facilitates the development of new federated learning algorithms and fair performance comparisons. FedML supports three computing paradigms (distributed training, mobile on-device training, and standalone simulation) for users to conduct experiments in different system environments. FedML also promotes diverse algorithmic research with flexible and generic API design and reference baseline implementations. A curated and comprehensive benchmark dataset for the non-I.I.D setting aims at making a fair comparison. We believe FedML can provide an efficient and reproducible means of developing and evaluating algorithms for the federated learning research community. We maintain the source code, documents, and user community at https://FedML.ai.

For more details, please read our full paper: https://arxiv.org/abs/2007.13518

Usage

Research on FL algorithm or system
Teaching in a ML course
System prototype for industrial production.
Self-study FL: understanding code level details of FL algorithms.

Architecture

The functionality of each package is as follows:

fedml_core: The FedML low level API package. This package implements distributed computing by communication backend like MPI, and also support topology management. Other low-level APIs related to security and privacy are also supported.

fedml_api: The FedML high level API package. This package support different federated learning algorithm with only one line code. All algorithms are built based on the "fedml_core" package. Users can change this package to add more advanced algorithms.

fedml_experiments: This package is used to test algorithms in "fedml" package by calling high level APIs.

fedml_mobile: This package is used to support on-device training using Android/iOS smartphones.

fedml_IoT: This package is used to support on-device training using IoT devices.

applications: This package is a collection of applications based on FedML.

benchmark: This package is used to run benchmark experiments.

Join our Community

Please join our community. We will post updated features and answer questions on Slack.

Join fedml.slack.com (this is a link that never expires)

Citation

Please cite FedML in your publications if it helps your research:

@article{chaoyanghe2020fedml,
  Author = {He, Chaoyang and Li, Songze and So, Jinhyun and Zhang, Mi and Wang, Hongyi and Wang, Xiaoyang and Vepakomma, Praneeth and Singh, Abhishek and Qiu, Hang and Shen, Li and Zhao, Peilin and Kang, Yan and Liu, Yang and Raskar, Ramesh and Yang, Qiang and Annavaram, Murali and Avestimehr, Salman},
  Journal = {arXiv preprint arXiv:2007.13518},
  Title = {FedML: A Research Library and Benchmark for Federated Machine Learning},
  Year = {2020}
}

Contacts

The corresponding author is:

Chaoyang He
[email protected]
http://chaoyanghe.com

Comments

FedAvg accuracy stucks under 50

I am training Fedavg to get the benchmark accuracy with the given parameters. But, the accuracy is stuck under 50.

Here is all my code:

!git clone https://github.com/FedML-AI/FedML

cd /content/FedML/fedml_experiments/standalone/fedavg

!python main_fedavg.py --model mobilenet --dataset cifar10 --data_dir ./../../../data/cifar10 --partition_method hetero --comm_round 100 --epochs 20 --batch_size 64 --lr 0.001

I suppose to get over 80% accuracy at least according to these benchmark results.

https://wandb.ai/automl/fedml/runs/390hdz0e

opened by AbdulMoqeet 39
About the args in FedML parrot examples.
from the step3 in docs https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html, I am told to execute the following command to run the example code: python torch_fedavg_mnist_lr_one_line_example.py --cf fedml_config.yaml However, when I tried to modify the args in this YAML file (eg. set the using_gpu to true), I found that in the runtime the training is still based on the CPU. So I checked the code in fedml/lib/python3.7/site-packages/fedml/arguments.py, line 63, and I found a snippet of code as follows: `

path_current_file = path.abspath(path.dirname(__file__)) if training_type == "simulation" and comm_backend == "single_process": config_file = path.join(path_current_file, "config/simulation_sp/fedml_config.yaml") cmd_args.yaml_config_file = config_file elif training_type == "simulation" and comm_backend == "MPI": config_file = path.join( path_current_file, "config/simulaton_mpi/fedml_config.yaml" ) cmd_args.yaml_config_file = config_file elif training_type == "cross_silo": pass elif training_type == "cross_device": pass else: pass

` It seems during the simulation, it does not matter how you set the YAML file, the default one would be loaded.
opened by JLU-Neal 12

Running error with mpi_torch_fedavg_mnist_lr_example

When running the following code

#!/usr/bin/env bash

WORKER_NUM=$1

PROCESS_NUM=`expr $WORKER_NUM + 1`
echo $PROCESS_NUM

hostname > mpi_host_file

$(which mpirun) -np $PROCESS_NUM \
-hostfile mpi_host_file \
python torch_fedavg_mnist_lr_one_line_example.py --cf config/fedml_config.yaml

Encounter an error

(base) PS C:\Users\doubl\Desktop\script\fedml> bash run_step_by_step_example.sh 4                                       
5                                                                                                                       
run_step_by_step_example.sh: line 10: -np: command not found

The code is excuted as the instructions in https://doc.fedml.ai/simulation/examples/mpi_torch_fedavg_mnist_lr_example.html, and all the dependency is installed.

opened by mh-lan 10

Installation encountered multiple errors

The first error encountered is :

Building wheels for collected packages: fedml
  Building wheel for fedml (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [1027 lines of output]

The second error :

error: could not create 'build\bdist.win-amd64\wheel\fedml-0.7.305.data\purelib\examples\cross_silo\mqtt_s3_fedavg_hierarchical_manual_mnist_lr_example\one_line\main_fedml_cross_silo_hi.py': No such file or directory
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for fedml
  Running setup.py clean for fedml
Failed to build fedml

The last message displayed is :

DEPRECATION: fedml was installed using the legacy 'setup.py install' method, because a wheel could not be built for it. A possible replacement is to fix the wheel build issue reported above. Discussion can be found at https://github.com/pypa/pip/issues/8368

The installation steps are all installed in accordance with the FEDML INSTALLATION ON WINDOWS OS teaching, also tried using older versions of python (eg: 3.7, 3.9), but still got the same problem.

opened by MING-LI-JIANG 9

ModuleNotFoundError: No module named 'fedml.data.fednlp'

When running the example of TEXT CLASSIFICATION in fednlp. bash run_simulation.sh 1 Encounter an error Is this caused by fedml or what else? All dependency is installed. Link: https://github.com/FedML-AI/FedML/tree/master/python/app/fednlp

opened by Luoyang144 8
standalone Benchmark running the MNIST and Shakespeare

I can not get the same results when I run the command: sh run_fedavg_standalone_pytorch.sh 0 10 10 10 shakespeare ./../../../data/shakespeare rnn hetero 100 1 0.8 sgd 0 my results is: https://wandb.ai/pilgrim_cz/fedml/runs/5jqpw1i4?workspace=user-pilgrim_cz the benchmart res is https://wandb.ai/automl/fedml/runs/144ey9w6 same problem when I run minst dataset Is there anything wrong? I did not change any code.

opened by sjtu-cz 8
Using your specified gpus' list by customizing the function "init_training_device"!

Running the fedavg on the configure: 20 rounds, 10 epochs, 2 clients, cifar-10 dataset, resnet56, but the program always crashed! The errors as the following:

I don't know the reason for the errors.
good first issue

opened by weiyikang 8
How to run cross-silo with MPI?

How to run fedml /python/examples/simulation/MPI locally_ base_ framework_ The code in example, I use readme.md runs. The error in the figure appears. I have deployed the mqtt server locally. I have configured comm in the configuration file_ args.mqtt_ config_ Path, but it doesn't work

How to remove mlops? I set using_mlops: false has no effect

opened by JessKXWL 7
Question about FedGKT

Hi, I tried to run the FedGKT algorithm but the code got stuck in this part. For the FedAvg, it worked well. For my setting case, I just ran the code on the CPU. So can anyone help me with this? Thank you! P/s: Here I ran with 8 clients.

opened by Agent2H 7
After few communication round, client and server are losing the connection (fedcv/object_detection)
I was using FedML platform for testing distributed training results for object detection between single server and multiple clients. Unfortunately, MQTT has some issue where server or client sometimes cannot receive the packet properly and showing only:

receive_message. msg_type = 0, sender_id = 0, receiver_id = 0 receive_message. msg_type = 0, sender_id = 0, receiver_id = 0

where msg_type = 0 means received message has been corrupted. Although, I am using one server with multiple GPU to simulate server and clients based distributed training CROSS_SILO_HORIZONTAL mode, I could not understand why is this happening. Especially, when i use bigger data, this problem is happening in very early communication rounds.

So, if anyone has solution with this problem, please share your experience.

Just in case, I am attaching output of server and clients screenshots.

SERVER OUTPUT

CLIENT-1 OUTPUT

CLIENT-2 OUTPUT
opened by Adeelbek 6

Training and Test mAP@50 and mAP50:95 are showing very strange results in fedcv object detection

I have training Cross-Silo Horizontal distributed training mode with following configuration settings:

common_args:
  training_type: "cross_silo"
  random_seed: 0
  scenario: "horizontal"
  using_mlops: false
  config_version: release
  name: "exp" # yolo
  project: "runs/train" # yolo
  exist_ok: true # yolo

environment_args:
  bootstrap: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/bootstrap.sh

data_args:
  dataset: "bdd"
  data_cache_dir: ~/fedcv_data
  partition_method: "homo"
  partition_alpha: 0.5
  data_conf: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/data/bdd.yaml" # yolo
  img_size: [640, 640] # [640, 640]

model_args:
  model: "yolov5" # "yolov5"
  class_num: 13
  yolo_cfg: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/model/yolov5/models/yolov5s.yaml" # "./model/yolov6/configs/yolov6s.py" # yolo
  yolo_hyp: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/hyps/hyp.scratch.yaml" # yolo
  weights: "none" # "best.pt" # yolo
  single_cls: false # yolo
  conf_thres: 0.001 # yolo
  iou_thres: 0.6 # for yolo NMS
  yolo_verbose: true # yolo

train_args:
  federated_optimizer: "FedAvg"
  client_id_list:
  client_num_in_total: 2
  client_num_per_round: 2
  comm_round: 10
  epochs: 4
  batch_size: 64
  client_optimizer: sgd
  lr: 0.01
  weight_decay: 0.001
  checkpoint_interval: 1
  server_checkpoint_interval: 1

validation_args:
  frequency_of_the_test: 2

device_args:
  worker_num: 2
  using_gpu: true
  gpu_mapping_file: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/gpu_mapping.yaml
  gpu_mapping_key: mapping_config5_2
  gpu_ids: [0,1,2,3,4,5,6,7]

comm_args:
  backend: "MQTT_S3"
  mqtt_config_path: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/mqtt_config.yaml
  s3_config_path: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/s3_config.yaml

tracking_args:
  log_file_dir: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/log
  enable_wandb: true
  wandb_key: ee0b5f53d949c84cee7decbe7a6
  wandb_project: fedml
  wandb_name: fedml_torch_object_detection

During the training I have got very high mAP@50 and mAP@50:95 almost for every epoch from beginning till the end. Normally, mAP should be small for early training epochs and it should grow slowly in the later epochs. But in my case it is just fluctuating in range of 0.985 ~ 0.9885 for both clients. I have checked metric calculation functions borrowed from original YOLOv5 PyTorch platform. They are working fine. IF ANYBODY CAN SHARE THEIR PRELIMINARY RESULTS FOR DISTRIBUTED OBJECT DETECTION FOR ANY DATASET (COCO or PASCAL VOC). I would like to verify my result with their result. For solo YOLOv5s model training, ,mAP is much smaller and it is growing epochs by epochs.

Any clue from the Authors would be very much appreciated.

P.S. For mAP calculation, I used default val(train_data, device, args) function inside the YOLOv5Trainer class in the yolov5_trainer.py .

opened by Adeelbek 6

where is the test_arm_android_64.sh in fedmlsdk MobileNN?

Hello, recently I'm trying to reproduce the demo of fedmlsdk on android device,and I followed the tutorial at https://github.com/FedML-AI/FedML/tree/master/android/fedmlsdk)/MobileNN/, and I see the line below：

run ./test_arm_android_64.sh, which will push demo.out to your android device and execute it.

however，i can not find the test_arm_android_64.sh in the repo? Could you please check the issue?

opened by T122D002L 0
(fednlp) When executed run_ simulation.sh in text_classification, the mpi has a problem

I have already installed mpi4py3.1.4 and libopenmpi-dev,and the Ubuntu is 18。I searched a lot of posts on the Internet, but I couldn't figure it out。

opened by sxc225 0
How to run hierarchical_fedavg_mnist_lr_example on Raspberry Pi 4

https://doc.fedml.ai/cross-silo/examples/mqtt_s3_fedavg_hierarchical_mnist_lr_example.html

Hi,@chaoyanghe I refer to the URL above The server uses ubuntu 22.04LTS+RTX2080ti Both silo1 and server are executed on ubuntu

silo2 executes on Raspberry Pi 4 + 64bit OS But the raspberry pie has no GPU, and the mapping gpu has been changed to false What else needs to be modified?

Thank you for your reply

opened by LCH517 0
custom criteria support; custom trainer support; better warning info.

Allow use of custom criteria of client trainer for classification task; allow custom client trainer for single process simulation; better exception info and hint for these two condition.

opened by chengza 1
How do different algorithms work?

Hello，I want to ask for a help： In the "main_fedml_image_segmentation. py" file of the fedcv module, "SegmentationTrainer. py" and "SegmentationAggregator. py" under the "image_segmentation/trainer" module are imported during training. Suppose I modify different algorithms in "fedml_config. yaml", such as FedAvg, FedOpt, FedProx, but still use the same SegmentationTrainer and SegmentationAggregator, without any difference? How should different algorithms be implemented? In fedcv/segmentation， how do the algorithms in fedml/simulation/mpi work? They are all the same?

opened by caozhantao 1
fedgraphnn only support MPI?

Hello, I tried to change the common args from 'backend:MPI' to 'backend:sp' in ego_network_node_clf, and it failed. So I wonder if fedgraphnn only support MPI. If not, how can i adjust the config parameter, thanks!

opened by MaoPopovich 2

Releases(fedml_v0.6_before_fundraising)

fedml_v0.6_before_fundraising(Apr 30, 2022)

Source code(tar.gz)
Source code(zip)

FedML: A Research Library and Benchmark for Federated Machine Learning

Related tags

Overview

FedML: A Research Library and Benchmark for Federated Machine Learning

News

What is Federated Learning?

Introduction

Usage

Architecture

Join our Community

Citation

Contacts

Comments

Releases(fedml_v0.6_before_fundraising)

fedml_v0.6_before_fundraising(Apr 30, 2022)

Owner

FedML-AI

Federated Learning - Including common test models for federated learning, like CNN, Resnet18 and lstm, controlled by different parser

Plato: A New Framework for Federated Learning Research

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

modelvshuman is a Python library to benchmark the gap between human and machine vision

Everything you want about DP-Based Federated Learning, including Papers and Code. (Mechanism: Laplace or Gaussian, Dataset: femnist, shakespeare, mnist, cifar-10 and fashion-mnist. )

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

FedScale: Benchmarking Model and System Performance of Federated Learning

Breaching - Breaching privacy in federated learning scenarios for vision and text

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

An open framework for Federated Learning.

Official code implementation for "Personalized Federated Learning using Hypernetworks"

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space