FedML: A Research Library and Benchmark for Federated Machine Learning

Overview

FedML: A Research Library and Benchmark for Federated Machine Learning

📄 https://arxiv.org/abs/2007.13518

Roadmap Python3 PyTorch Travis Contributors

News

2021-02-01 (Award): #NeurIPS 2020# FedML won Best Paper Award at NeurIPS Federated Learning workshop 2020

2020-12-12 (Conference Presentation): #NeurIPS 2020# We gave a contributed talk at NeurIPS 2020. Please check the video here: https://www.youtube.com/watch?v=93SETZGZMyI

The slides of this presentation are also open source (yes, we open source everything of FedML!): https://docs.google.com/presentation/d/1ykAQ_GpzEoRVJeMr1hXUHlJSBpzW2P6Wuzud8RTF0oo/edit#slide=id.p

2020-11-27 (System): FedML architecture has evolved into an ecosystem including multiple GitHub repositories. With FedML at its core, we can support more advanced FL applications and platforms.
FedML: https://github.com/FedML-AI/FedML

FedNLP: https://github.com/FedML-AI/FedNLP (We plan to release at the end of March 2021. Please stay tuned!)

FedML-IoT: https://github.com/FedML-AI/FedML-IoT

FedML-Server: https://github.com/FedML-AI/FedML-Server

FedML-Mobile: https://github.com/FedML-AI/FedML-Mobile

2020-11-24 (Publication): We are thrilled to share that the short version of our FedML white paper has been accepted to NeurIPS 2020 workshop. Thanks for reviewers from NeurIPS, supporting us to do a presentation there.

2020-11-05 (System): Do you want to run federated learning on IoT devices? FedML architecture design can smoothly transplant the distributed computing code to the IoT platform. FedML can support edge training on two IoT devices: Raspberry Pi 4 and NVIDIA Jetson Nano!!! Please check it out here: https://github.com/FedML-AI/FedML/blob/master/fedml_iot/README.md

2020-10-28 (Algorithms) : We released more advanced federated optimization algorithms, more than just FedAvg! http://doc.fedml.ai/#/algorithm-reference-implementation

2020-10-26 (Publication) : V2 of our white paper is released. Please check out here: https://arxiv.org/pdf/2007.13518.pdf

2020-10-07 (Model and Dataset) : Datasets + Models ALL IN ONE!!! FedML supports comprehensive research-oriented FL datasets and models:

  • cross-device CV: Federated EMNIST + CNN (2 conv layers)
  • cross-device CV: CIFAR100 + ResNet18 (Group Normalization)
  • cross-device NLP: shakespeare + RNN (bi-LSTM)
  • cross-device NLP: stackoverflow (NWP) + RNN (bi-LSTM)
  • cross-silo CV: CIFAR10, CIFAR100, CINIC10 + ResNet
  • cross-silo CV: CIFAR10, CIFAR100, CINIC10 + MobileNet
  • linear: MNIST + Logistic Regression

Please check create_model(args, model_name, output_dim) and load_data(args, dataset_name) at fedml_experiments/distributed/fedavg/main_fedavg.py for details.

We will support more advanced models and datasets, please stay tuned!


2020-09-30 (Publication): We maintained a comprehensive publication list of Federated Learning here: https://github.com/chaoyanghe/Awesome-Federated-Learning


2020-09-28 (Publication): Authors of FedML (https://fedml.ai) have 7 papers that got accepted to NeurIPS 2020. Big congratulations!!! Here is the publication list: https://github.com/FedML-AI/FedML/blob/master/publications.md. Highlighted ones are related to large-scale distributed learning and federated learning.

What is Federated Learning?

Please read this long vision paper Advances and Open Problems in Federated Learning.

This publication list is also helpful: https://github.com/chaoyanghe/Awesome-Federated-Learning

Introduction

Federated learning is a rapidly growing research field in the machine learning domain. Although considerable research efforts have been made, existing libraries cannot adequately support diverse algorithmic development (e.g., diverse topology and flexible message exchange), and inconsistent dataset and model usage in experiments make fair comparisons difficult. In this work, we introduce FedML, an open research library and benchmark that facilitates the development of new federated learning algorithms and fair performance comparisons. FedML supports three computing paradigms (distributed training, mobile on-device training, and standalone simulation) for users to conduct experiments in different system environments. FedML also promotes diverse algorithmic research with flexible and generic API design and reference baseline implementations. A curated and comprehensive benchmark dataset for the non-I.I.D setting aims at making a fair comparison. We believe FedML can provide an efficient and reproducible means of developing and evaluating algorithms for the federated learning research community. We maintain the source code, documents, and user community at https://FedML.ai.

For more details, please read our full paper: https://arxiv.org/abs/2007.13518

Usage

  1. Research on FL algorithm or system
  2. Teaching in a ML course
  3. System prototype for industrial production.
  4. Self-study FL: understanding code level details of FL algorithms.

Architecture

The functionality of each package is as follows:

fedml_core: The FedML low level API package. This package implements distributed computing by communication backend like MPI, and also support topology management. Other low-level APIs related to security and privacy are also supported.

fedml_api: The FedML high level API package. This package support different federated learning algorithm with only one line code. All algorithms are built based on the "fedml_core" package. Users can change this package to add more advanced algorithms.

fedml_experiments: This package is used to test algorithms in "fedml" package by calling high level APIs.

fedml_mobile: This package is used to support on-device training using Android/iOS smartphones.

fedml_IoT: This package is used to support on-device training using IoT devices.

applications: This package is a collection of applications based on FedML.

benchmark: This package is used to run benchmark experiments.

Join our Community

Please join our community. We will post updated features and answer questions on Slack.

Join fedml.slack.com (this is a link that never expires)

Citation

Please cite FedML in your publications if it helps your research:

@article{chaoyanghe2020fedml,
  Author = {He, Chaoyang and Li, Songze and So, Jinhyun and Zhang, Mi and Wang, Hongyi and Wang, Xiaoyang and Vepakomma, Praneeth and Singh, Abhishek and Qiu, Hang and Shen, Li and Zhao, Peilin and Kang, Yan and Liu, Yang and Raskar, Ramesh and Yang, Qiang and Annavaram, Murali and Avestimehr, Salman},
  Journal = {arXiv preprint arXiv:2007.13518},
  Title = {FedML: A Research Library and Benchmark for Federated Machine Learning},
  Year = {2020}
}

Contacts

The corresponding author is:

Chaoyang He
[email protected]
http://chaoyanghe.com

Comments
  • FedAvg accuracy stucks under 50

    FedAvg accuracy stucks under 50

    I am training Fedavg to get the benchmark accuracy with the given parameters. But, the accuracy is stuck under 50.

    Here is all my code:

    !git clone https://github.com/FedML-AI/FedML

    cd /content/FedML/fedml_experiments/standalone/fedavg

    !python main_fedavg.py --model mobilenet --dataset cifar10 --data_dir ./../../../data/cifar10 --partition_method hetero --comm_round 100 --epochs 20 --batch_size 64 --lr 0.001

    I suppose to get over 80% accuracy at least according to these benchmark results.

    https://wandb.ai/automl/fedml/runs/390hdz0e

    opened by AbdulMoqeet 39
  • About the args in FedML parrot examples.

    About the args in FedML parrot examples.

    from the step3 in docs https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html, I am told to execute the following command to run the example code: python torch_fedavg_mnist_lr_one_line_example.py --cf fedml_config.yaml However, when I tried to modify the args in this YAML file (eg. set the using_gpu to true), I found that in the runtime the training is still based on the CPU. So I checked the code in fedml/lib/python3.7/site-packages/fedml/arguments.py, line 63, and I found a snippet of code as follows: `

        path_current_file = path.abspath(path.dirname(__file__))
        if training_type == "simulation" and comm_backend == "single_process":
            config_file = path.join(path_current_file, "config/simulation_sp/fedml_config.yaml")
            cmd_args.yaml_config_file = config_file
        elif training_type == "simulation" and comm_backend == "MPI":
            config_file = path.join(
                path_current_file, "config/simulaton_mpi/fedml_config.yaml"
            )
            cmd_args.yaml_config_file = config_file
        elif training_type == "cross_silo":
            pass
        elif training_type == "cross_device":
            pass
        else:
            pass
    

    ` It seems during the simulation, it does not matter how you set the YAML file, the default one would be loaded.

    opened by JLU-Neal 12
  • Running error with mpi_torch_fedavg_mnist_lr_example

    Running error with mpi_torch_fedavg_mnist_lr_example

    When running the following code

    #!/usr/bin/env bash
    
    WORKER_NUM=$1
    
    PROCESS_NUM=`expr $WORKER_NUM + 1`
    echo $PROCESS_NUM
    
    hostname > mpi_host_file
    
    $(which mpirun) -np $PROCESS_NUM \
    -hostfile mpi_host_file \
    python torch_fedavg_mnist_lr_one_line_example.py --cf config/fedml_config.yaml
    

    Encounter an error

    (base) PS C:\Users\doubl\Desktop\script\fedml> bash run_step_by_step_example.sh 4                                       
    5                                                                                                                       
    run_step_by_step_example.sh: line 10: -np: command not found   
    

    The code is excuted as the instructions in https://doc.fedml.ai/simulation/examples/mpi_torch_fedavg_mnist_lr_example.html, and all the dependency is installed.

    opened by mh-lan 10
  • Installation encountered multiple errors

    Installation encountered multiple errors

    The first error encountered is :

    Building wheels for collected packages: fedml
      Building wheel for fedml (setup.py) ... error
      error: subprocess-exited-with-error
    
      × python setup.py bdist_wheel did not run successfully.
      │ exit code: 1
      ╰─> [1027 lines of output]
    

    The second error :

    error: could not create 'build\bdist.win-amd64\wheel\fedml-0.7.305.data\purelib\examples\cross_silo\mqtt_s3_fedavg_hierarchical_manual_mnist_lr_example\one_line\main_fedml_cross_silo_hi.py': No such file or directory
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for fedml
      Running setup.py clean for fedml
    Failed to build fedml
    

    The last message displayed is :

    DEPRECATION: fedml was installed using the legacy 'setup.py install' method, because a wheel could not be built for it. A possible replacement is to fix the wheel build issue reported above. Discussion can be found at https://github.com/pypa/pip/issues/8368
    

    The installation steps are all installed in accordance with the FEDML INSTALLATION ON WINDOWS OS teaching, also tried using older versions of python (eg: 3.7, 3.9), but still got the same problem.

    opened by MING-LI-JIANG 9
  •  ModuleNotFoundError: No module named 'fedml.data.fednlp'

    ModuleNotFoundError: No module named 'fedml.data.fednlp'

    When running the example of TEXT CLASSIFICATION in fednlp. bash run_simulation.sh 1 Encounter an error image Is this caused by fedml or what else? All dependency is installed. Link: https://github.com/FedML-AI/FedML/tree/master/python/app/fednlp

    opened by Luoyang144 8
  • standalone Benchmark running the MNIST and Shakespeare

    standalone Benchmark running the MNIST and Shakespeare

    I can not get the same results when I run the command: sh run_fedavg_standalone_pytorch.sh 0 10 10 10 shakespeare ./../../../data/shakespeare rnn hetero 100 1 0.8 sgd 0 my results is: https://wandb.ai/pilgrim_cz/fedml/runs/5jqpw1i4?workspace=user-pilgrim_cz the benchmart res is https://wandb.ai/automl/fedml/runs/144ey9w6 same problem when I run minst dataset Is there anything wrong? I did not change any code.

    opened by sjtu-cz 8
  • Using your specified gpus' list by customizing the function

    Using your specified gpus' list by customizing the function "init_training_device"!

    Running the fedavg on the configure: 20 rounds, 10 epochs, 2 clients, cifar-10 dataset, resnet56, but the program always crashed! The errors as the following:

    W1GVSRF_Z(@7RQL9}K2ET$X

    I don't know the reason for the errors.

    good first issue 
    opened by weiyikang 8
  • How to run cross-silo with MPI?

    How to run cross-silo with MPI?

    How to run fedml /python/examples/simulation/MPI locally_ base_ framework_ The code in example, I use readme.md runs. The error in the figure appears. I have deployed the mqtt server locally. I have configured comm in the configuration file_ args.mqtt_ config_ Path, but it doesn't work

    image

    How to remove mlops? I set using_mlops: false has no effect

    opened by JessKXWL 7
  • Question about FedGKT

    Question about FedGKT

    Hi, I tried to run the FedGKT algorithm but the code got stuck in this part. For the FedAvg, it worked well. For my setting case, I just ran the code on the CPU. So can anyone help me with this? Thank you! P/s: Here I ran with 8 clients. fedgtkprob2 fedgtkprob1

    opened by Agent2H 7
  • After few communication round, client and server are losing the connection (fedcv/object_detection)

    After few communication round, client and server are losing the connection (fedcv/object_detection)

    I was using FedML platform for testing distributed training results for object detection between single server and multiple clients. Unfortunately, MQTT has some issue where server or client sometimes cannot receive the packet properly and showing only:

    receive_message. msg_type = 0, sender_id = 0, receiver_id = 0
    receive_message. msg_type = 0, sender_id = 0, receiver_id = 0
    

    where msg_type = 0 means received message has been corrupted. Although, I am using one server with multiple GPU to simulate server and clients based distributed training CROSS_SILO_HORIZONTAL mode, I could not understand why is this happening. Especially, when i use bigger data, this problem is happening in very early communication rounds.

    So, if anyone has solution with this problem, please share your experience.

    Just in case, I am attaching output of server and clients screenshots.

    SERVER OUTPUT server_output

    CLIENT-1 OUTPUT client_output

    CLIENT-2 OUTPUT client2_output

    opened by Adeelbek 6
  • Training and Test mAP@50 and mAP50:95 are showing very strange results in fedcv object detection

    Training and Test mAP@50 and mAP50:95 are showing very strange results in fedcv object detection

    I have training Cross-Silo Horizontal distributed training mode with following configuration settings:

    common_args:
      training_type: "cross_silo"
      random_seed: 0
      scenario: "horizontal"
      using_mlops: false
      config_version: release
      name: "exp" # yolo
      project: "runs/train" # yolo
      exist_ok: true # yolo
    
    environment_args:
      bootstrap: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/bootstrap.sh
    
    data_args:
      dataset: "bdd"
      data_cache_dir: ~/fedcv_data
      partition_method: "homo"
      partition_alpha: 0.5
      data_conf: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/data/bdd.yaml" # yolo
      img_size: [640, 640] # [640, 640]
    
    model_args:
      model: "yolov5" # "yolov5"
      class_num: 13
      yolo_cfg: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/model/yolov5/models/yolov5s.yaml" # "./model/yolov6/configs/yolov6s.py" # yolo
      yolo_hyp: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/hyps/hyp.scratch.yaml" # yolo
      weights: "none" # "best.pt" # yolo
      single_cls: false # yolo
      conf_thres: 0.001 # yolo
      iou_thres: 0.6 # for yolo NMS
      yolo_verbose: true # yolo
    
    train_args:
      federated_optimizer: "FedAvg"
      client_id_list:
      client_num_in_total: 2
      client_num_per_round: 2
      comm_round: 10
      epochs: 4
      batch_size: 64
      client_optimizer: sgd
      lr: 0.01
      weight_decay: 0.001
      checkpoint_interval: 1
      server_checkpoint_interval: 1
    
    validation_args:
      frequency_of_the_test: 2
    
    device_args:
      worker_num: 2
      using_gpu: true
      gpu_mapping_file: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/gpu_mapping.yaml
      gpu_mapping_key: mapping_config5_2
      gpu_ids: [0,1,2,3,4,5,6,7]
    
    comm_args:
      backend: "MQTT_S3"
      mqtt_config_path: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/mqtt_config.yaml
      s3_config_path: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/s3_config.yaml
    
    tracking_args:
      log_file_dir: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/log
      enable_wandb: true
      wandb_key: ee0b5f53d949c84cee7decbe7a6
      wandb_project: fedml
      wandb_name: fedml_torch_object_detection
    
    

    During the training I have got very high mAP@50 and mAP@50:95 almost for every epoch from beginning till the end. Normally, mAP should be small for early training epochs and it should grow slowly in the later epochs. But in my case it is just fluctuating in range of 0.985 ~ 0.9885 for both clients. I have checked metric calculation functions borrowed from original YOLOv5 PyTorch platform. They are working fine. IF ANYBODY CAN SHARE THEIR PRELIMINARY RESULTS FOR DISTRIBUTED OBJECT DETECTION FOR ANY DATASET (COCO or PASCAL VOC). I would like to verify my result with their result. For solo YOLOv5s model training, ,mAP is much smaller and it is growing epochs by epochs.

    Any clue from the Authors would be very much appreciated.

    P.S. For mAP calculation, I used default val(train_data, device, args) function inside the YOLOv5Trainer class in the yolov5_trainer.py .

    opened by Adeelbek 6
  • where is the test_arm_android_64.sh in fedmlsdk MobileNN?

    where is the test_arm_android_64.sh in fedmlsdk MobileNN?

    Hello, recently I'm trying to reproduce the demo of fedmlsdk on android device,and I followed the tutorial at https://github.com/FedML-AI/FedML/tree/master/android/fedmlsdk)/MobileNN/, and I see the line below:

    run ./test_arm_android_64.sh, which will push demo.out to your android device and execute it.

    however,i can not find the test_arm_android_64.sh in the repo? Could you please check the issue?

    opened by T122D002L 0
  • (fednlp) When executed run_ simulation.sh in text_classification, the mpi has a problem

    (fednlp) When executed run_ simulation.sh in text_classification, the mpi has a problem

    image I have already installed mpi4py3.1.4 and libopenmpi-dev,and the Ubuntu is 18。I searched a lot of posts on the Internet, but I couldn't figure it out。

    opened by sxc225 0
  • How to run hierarchical_fedavg_mnist_lr_example on Raspberry Pi 4

    How to run hierarchical_fedavg_mnist_lr_example on Raspberry Pi 4

    https://doc.fedml.ai/cross-silo/examples/mqtt_s3_fedavg_hierarchical_mnist_lr_example.html

    Hi,@chaoyanghe I refer to the URL above The server uses ubuntu 22.04LTS+RTX2080ti Both silo1 and server are executed on ubuntu

    silo2 executes on Raspberry Pi 4 + 64bit OS But the raspberry pie has no GPU, and the mapping gpu has been changed to false What else needs to be modified?

    Thank you for your reply 2022-12-13 01 44 47

    opened by LCH517 0
  • custom criteria support; custom trainer support; better warning info.

    custom criteria support; custom trainer support; better warning info.

    Allow use of custom criteria of client trainer for classification task; allow custom client trainer for single process simulation; better exception info and hint for these two condition.

    opened by chengza 1
  • How do different algorithms work?

    How do different algorithms work?

    Hello,I want to ask for a help: In the "main_fedml_image_segmentation. py" file of the fedcv module, "SegmentationTrainer. py" and "SegmentationAggregator. py" under the "image_segmentation/trainer" module are imported during training. Suppose I modify different algorithms in "fedml_config. yaml", such as FedAvg, FedOpt, FedProx, but still use the same SegmentationTrainer and SegmentationAggregator, without any difference? How should different algorithms be implemented? In fedcv/segmentation, how do the algorithms in fedml/simulation/mpi work? They are all the same?

    opened by caozhantao 1
  • fedgraphnn only support MPI?

    fedgraphnn only support MPI?

    Hello, I tried to change the common args from 'backend:MPI' to 'backend:sp' in ego_network_node_clf, and it failed. So I wonder if fedgraphnn only support MPI. If not, how can i adjust the config parameter, thanks!

    opened by MaoPopovich 2
Releases(fedml_v0.6_before_fundraising)
Owner
FedML-AI
FedML: A Research Library and Benchmark for Federated Machine Learning
FedML-AI
TianyuQi 10 Dec 11, 2022
Plato: A New Framework for Federated Learning Research

a new software framework to facilitate scalable federated learning research.

System Group@Theory Lab 192 Jan 5, 2023
FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cross-device use-cases over FEDn networks.

Scaleout 75 Nov 9, 2022
FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

Google 208 Dec 14, 2022
GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning, as well as corresponding mitigation strategies.

null 129 Dec 30, 2022
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

Facebook Research 25.5k Jan 7, 2023
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Jan 4, 2023
modelvshuman is a Python library to benchmark the gap between human and machine vision

modelvshuman is a Python library to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with high-quality human comparison data.

Bethge Lab 244 Jan 3, 2023
Everything you want about DP-Based Federated Learning, including Papers and Code. (Mechanism: Laplace or Gaussian, Dataset: femnist, shakespeare, mnist, cifar-10 and fashion-mnist. )

Differential Privacy (DP) Based Federated Learning (FL) Everything about DP-based FL you need is here. (所有你需要的DP-based FL的信息都在这里) Code Tip: the code o

wenzhu 83 Dec 24, 2022
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
FedScale: Benchmarking Model and System Performance of Federated Learning

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper) This repository contains scripts and instructions of building FedSca

null 268 Jan 1, 2023
Breaching - Breaching privacy in federated learning scenarios for vision and text

Breaching - A Framework for Attacks against Privacy in Federated Learning This P

Jonas Geiping 139 Jan 3, 2023
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
An open framework for Federated Learning.

Welcome to Intel® Open Federated Learning Federated learning is a distributed machine learning approach that enables organizations to collaborate on m

Intel Corporation 397 Dec 27, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

Med-AIR@CUHK 156 Dec 15, 2022
[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

Quande Liu 178 Jan 6, 2023