An easy-to-use federated learning platform

Overview

federatedscope-logo

Website Playground Contributing

FederatedScope is a comprehensive federated learning platform that provides convenient usage and flexible customization for various federated learning tasks in both academia and industry. Based on an event-driven architecture, FederatedScope integrates rich collections of functionalities to satisfy the burgeoning demands from federated learning, and aims to build up an easy-to-use platform for promoting learning safely and effectively.

A detailed tutorial is provided on our website.

News

  • [06-17-2022] We release pFL-Bench, a comprehensive benchmark for personalized Federated Learning (pFL), containing 10+ datasets and 20+ baselines. [code, pdf]
  • [06-17-2022] We release FedHPO-B, a benchmark suite for studying federated hyperparameter optimization. [code, pdf]
  • [06-17-2022] We release B-FHTL, a benchmark suit for studying federated hetero-task learning. [code, pdf]
  • [06-13-2022] Our project was receiving an attack, which has been resolved. More details.
  • [05-25-2022] Our paper FederatedScope-GNN has been accepted by KDD'2022!
  • [05-06-2022] We release FederatedScope v0.1.0!

Quick Start

We provide an end-to-end example for users to start running a standard FL course with FederatedScope.

Step 1. Installation

First of all, users need to clone the source code and install the required packages (we suggest python version >= 3.9).

git clone https://github.com/alibaba/FederatedScope.git
cd FederatedScope

You can install the dependencies from the requirement file:

# For minimal version
conda install --file enviroment/requirements-torch1.10.txt -c pytorch -c conda-forge -c nvidia

# For application version
conda install --file enviroment/requirements-torch1.10-application.txt -c pytorch -c conda-forge -c nvidia -c pyg

or build docker image and run with docker env (cuda 11 and torch 1.10):

docker build -f enviroment/docker_files/federatedscope-torch1.10.Dockerfile -t alibaba/federatedscope:base-env-torch1.10 .
docker run --gpus device=all --rm -it --name "fedscope" -w $(pwd) alibaba/federatedscope:base-env-torch1.10 /bin/bash

If you need to run with down-stream tasks such as graph FL, change the requirement/docker file name into another one when executing the above commands:

# enviroment/requirements-torch1.10.txt -> 
enviroment/requirements-torch1.10-application.txt

# enviroment/docker_files/federatedscope-torch1.10.Dockerfile ->
enviroment/docker_files/federatedscope-torch1.10-application.Dockerfile

Note: You can choose to use cuda 10 and torch 1.8 via changing torch1.10 to torch1.8. The docker images are based on the nvidia-docker. Please pre-install the NVIDIA drivers and nvidia-docker2 in the host machine. See more details here.

Finally, after all the dependencies are installed, run:

python setup.py install

# Or (for dev mode)
pip install -e .

Step 2. Prepare datasets

To run an FL task, users should prepare a dataset. The DataZoo provided in FederatedScope can help to automatically download and preprocess widely-used public datasets for various FL applications, including CV, NLP, graph learning, recommendation, etc. Users can directly specify cfg.data.type = DATASET_NAMEin the configuration. For example,

cfg.data.type = 'femnist'

To use customized datasets, you need to prepare the datasets following a certain format and register it. Please refer to Customized Datasets for more details.

Step 3. Prepare models

Then, users should specify the model architecture that will be trained in the FL course. FederatedScope provides a ModelZoo that contains the implementation of widely adopted model architectures for various FL applications. Users can set up cfg.model.type = MODEL_NAME to apply a specific model architecture in FL tasks. For example,

cfg.model.type = 'convnet2'

FederatedScope allows users to use customized models via registering. Please refer to Customized Models for more details about how to customize a model architecture.

Step 4. Start running an FL task

Note that FederatedScope provides a unified interface for both standalone mode and distributed mode, and allows users to change via configuring.

Standalone mode

The standalone mode in FederatedScope means to simulate multiple participants (servers and clients) in a single device, while participants' data are isolated from each other and their models might be shared via message passing.

Here we demonstrate how to run a standard FL task with FederatedScope, with setting cfg.data.type = 'FEMNIST'and cfg.model.type = 'ConvNet2' to run vanilla FedAvg for an image classification task. Users can customize training configurations, such as cfg.federated.total_round_num, cfg.data.batch_size, and cfg.optimizer.lr, in the configuration (a .yaml file), and run a standard FL task as:

# Run with default configurations
python federatedscope/main.py --cfg federatedscope/example_configs/femnist.yaml
# Or with custom configurations
python federatedscope/main.py --cfg federatedscope/example_configs/femnist.yaml federated.total_round_num 50 data.batch_size 128

Then you can observe some monitored metrics during the training process as:

INFO: Server #0 has been set up ...
INFO: Model meta-info: <class 'federatedscope.cv.model.cnn.ConvNet2'>.
... ...
INFO: Client has been set up ...
INFO: Model meta-info: <class 'federatedscope.cv.model.cnn.ConvNet2'>.
... ...
INFO: {'Role': 'Client #5', 'Round': 0, 'Results_raw': {'train_loss': 207.6341676712036, 'train_acc': 0.02, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.152683353424072}}
INFO: {'Role': 'Client #1', 'Round': 0, 'Results_raw': {'train_loss': 209.0940284729004, 'train_acc': 0.02, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.1818805694580075}}
INFO: {'Role': 'Client #8', 'Round': 0, 'Results_raw': {'train_loss': 202.24929332733154, 'train_acc': 0.04, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.0449858665466305}}
INFO: {'Role': 'Client #6', 'Round': 0, 'Results_raw': {'train_loss': 209.43883895874023, 'train_acc': 0.06, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.1887767791748045}}
INFO: {'Role': 'Client #9', 'Round': 0, 'Results_raw': {'train_loss': 208.83140087127686, 'train_acc': 0.0, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.1766280174255375}}
INFO: ----------- Starting a new training round (Round #1) -------------
... ...
INFO: Server #0: Training is finished! Starting evaluation.
INFO: Client #1: (Evaluation (test set) at Round #20) test_loss is 163.029045
... ...
INFO: Server #0: Final evaluation is finished! Starting merging results.
... ...

Distributed mode

The distributed mode in FederatedScope denotes running multiple procedures to build up an FL course, where each procedure plays as a participant (server or client) that instantiates its model and loads its data. The communication between participants is already provided by the communication module of FederatedScope.

To run with distributed mode, you only need to:

  • Prepare isolated data file and set up cfg.distribute.data_file = PATH/TO/DATA for each participant;
  • Change cfg.federate.model = 'distributed', and specify the role of each participant by cfg.distributed.role = 'server'/'client'.
  • Set up a valid address by cfg.distribute.host = x.x.x.x and cfg.distribute.port = xxxx. (Note that for a server, you need to set up server_host/server_port for listening messge, while for a client, you need to set up client_host/client_port for listening and server_host/server_port for sending join-in applications when building up an FL course)

We prepare a synthetic example for running with distributed mode:

# For server
python main.py --cfg federatedscope/example_configs/distributed_server.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx

# For clients
python main.py --cfg federatedscope/example_configs/distributed_client_1.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python main.py --cfg federatedscope/example_configs/distributed_client_2.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python main.py --cfg federatedscope/example_configs/distributed_client_3.yaml distribute.data_file 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx

An executable example with generated toy data can be run with:

# Generate the toy data
python scripts/gen_data.py

# Firstly start the server that is waiting for clients to join in
python federatedscope/main.py --cfg federatedscope/example_configs/distributed_server.yaml distribute.data_file toy_data/server_data distribute.server_host 127.0.0.1 distribute.server_port 50051

# Start the client #1 (with another process)
python federatedscope/main.py --cfg federatedscope/example_configs/distributed_client_1.yaml distribute.data_file toy_data/client_1_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50052
# Start the client #2 (with another process)
python federatedscope/main.py --cfg federatedscope/example_configs/distributed_client_2.yaml distribute.data_file toy_data/client_2_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50053
# Start the client #3 (with another process)
python federatedscope/main.py --cfg federatedscope/example_configs/distributed_client_3.yaml distribute.data_file toy_data/client_3_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50054

And you can observe the results as (the IP addresses are anonymized with 'x.x.x.x'):

INFO: Server #0: Listen to x.x.x.x:xxxx...
INFO: Server #0 has been set up ...
Model meta-info: <class 'federatedscope.core.lr.LogisticRegression'>.
... ...
INFO: Client: Listen to x.x.x.x:xxxx...
INFO: Client (address x.x.x.x:xxxx) has been set up ...
Client (address x.x.x.x:xxxx) is assigned with #1.
INFO: Model meta-info: <class 'federatedscope.core.lr.LogisticRegression'>.
... ...
{'Role': 'Client #2', 'Round': 0, 'Results_raw': {'train_avg_loss': 5.215108394622803, 'train_loss': 333.7669372558594, 'train_total': 64}}
{'Role': 'Client #1', 'Round': 0, 'Results_raw': {'train_total': 64, 'train_loss': 290.9668884277344, 'train_avg_loss': 4.54635763168335}}
----------- Starting a new training round (Round #1) -------------
... ...
INFO: Server #0: Training is finished! Starting evaluation.
INFO: Client #1: (Evaluation (test set) at Round #20) test_loss is 30.387419
... ...
INFO: Server #0: Final evaluation is finished! Starting merging results.
... ...

Advanced

As a comprehensive FL platform, FederatedScope provides the fundamental implementation to support requirements of various FL applications and frontier studies, towards both convenient usage and flexible extension, including:

  • Personalized Federated Learning: Client-specific model architectures and training configurations are applied to handle the non-IID issues caused by the diverse data distributions and heterogeneous system resources.
  • Federated Hyperparameter Optimization: When hyperparameter optimization (HPO) comes to Federated Learning, each attempt is extremely costly due to multiple rounds of communication across participants. It is worth noting that HPO under the FL is unique and more techniques should be promoted such as low-fidelity HPO.
  • Privacy Attacker: The privacy attack algorithms are important and convenient to verify the privacy protection strength of the design FL systems and algorithms, which is growing along with Federated Learning.
  • Graph Federated Learning: Working on the ubiquitous graph data, Graph Federated Learning aims to exploit isolated sub-graph data to learn a global model, and has attracted increasing popularity.
  • Recommendation: As a number of laws and regulations go into effect all over the world, more and more people are aware of the importance of privacy protection, which urges the recommender system to learn from user data in a privacy-preserving manner.
  • Differential Privacy: Different from the encryption algorithms that require a large amount of computation resources, differential privacy is an economical yet flexible technique to protect privacy, which has achieved great success in database and is ever-growing in federated learning.
  • ...

More supports are coming soon! We have prepared a tutorial to provide more details about how to utilize FederatedScope to enjoy your journey of Federated Learning!

Materials of related topics are constantly being updated, please refer to FL-Recommendation, Federated-HPO, Personalized FL, Federated Graph Learning, FL-NLP, FL-privacy-attacker and so on.

Documentation

The classes and methods of FederatedScope have been well documented so that users can generate the API references by:

pip install -r requirements-doc.txt
make html

We put the API references on our website.

License

FederatedScope is released under Apache License 2.0.

Publications

If you find FederatedScope useful for your research or development, please cite the following paper:

@article{federatedscope,
  title = {FederatedScope: A Flexible Federated Learning Platform for Heterogeneity},
  author = {Xie, Yuexiang and Wang, Zhen and Chen, Daoyuan and Gao, Dawei and Yao, Liuyi and Kuang, Weirui and Li, Yaliang and Ding, Bolin and Zhou, Jingren},
  journal={arXiv preprint arXiv:2204.05011},
  year = {2022},
}

More publications can be found in the Publications.

Contributing

We greatly appreciate any contribution to FederatedScope! You can refer to Contributing to FederatedScope for more details.

Welcome to join in our Slack channel, or DingDing group (please scan the following QR code) for discussion.

federatedscope-logo

Comments
  • Support optimizers with different parameters

    Support optimizers with different parameters

    • This PR is to solve the issue #91
    • Solution
      • Specific the parameters of the local optimizer by adding new parameters under the config cfg.optimizer and cfg.fedopt.optimizer. :
      • The calling of get_optimizer is as follows
        optimizer = get_optimizer(model=model, **cfg.optimizer)
    
    • Example:
      • Taking cfg.optimizer as an example, the original config file is as follows
        # ------------------------------------------------------------------------ #
        # Optimizer related options
        # ------------------------------------------------------------------------ #
        cfg.optimizer = CN(new_allowed=True)
    
        cfg.optimizer.type = 'SGD'
        cfg.optimizer.lr = 0.1
    
    • By setting new_allowed=True in cfg.optimizer, we allow the users to add new parameters according to the type of their optimizers. For example, if I want to use the optimizer registered as myoptimizer, as well as its new parameters mylr and mynorm. I just need to write the yaml file as follows, and the new parameters will be added automatically.
    optimizer:
        type: myoptimizer
        mylr: 0.1
        mynorm: 1
    
    bug 
    opened by DavdGao 7
  • Is there script/config file example for FL training on celeba, toy and nlp task?

    Is there script/config file example for FL training on celeba, toy and nlp task?

    Is your feature request related to a problem? Please describe. I am having trouble modifying the example to train on celeba, toy data for vision tasks as well as modifying the existing script (01_quick_start.py) to run nlp tasks.

    Describe the solution you'd like Is there a script/config file that I can follow? A simple one for vanilla fedavg will do.

    Describe alternatives you've considered simply change cfg.data.type = 'femnist' to cfg.data.type = 'celeba', cfg.data.type = 'CIFAR10@torchvision' or cfg.data.type = 'toy' will cause error. It is not clear to me which part of the script I should modify to run nlp tasks.

    error when set cfg.data.type = 'celeba':

    Traceback (most recent call last):
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\01_quick_start.py", line 36, in <module>
        data, modified_cfg = get_data(cfg.clone())
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\federatedscope\core\auxiliaries\data_builder.py", line 629, in get_data
        data, modified_config = load_cv_dataset(config)
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\federatedscope\cv\dataloader\dataloader.py", line 25, in load_cv_dataset
        dataset = LEAF_CV(root=path,
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\federatedscope\cv\dataset\leaf_cv.py", line 54, in __init__
        super(LEAF_CV, self).__init__(root, name, transform, target_transform)
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\federatedscope\cv\dataset\leaf.py", line 38, in __init__
        self.process_file()
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\federatedscope\cv\dataset\leaf.py", line 85, in process_file
        self.extract()
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\federatedscope\cv\dataset\leaf.py", line 76, in extract
        with zipfile.ZipFile(osp.join(self.raw_dir, name), 'r') as f:
      File "C:\Users\Zhang Ze Yu\anaconda3\lib\zipfile.py", line 1266, in __init__
        self._RealGetContents()
      File "C:\Users\Zhang Ze Yu\anaconda3\lib\zipfile.py", line 1333, in _RealGetContents
        raise BadZipFile("File is not a zip file")
    zipfile.BadZipFile: File is not a zip file
    

    error when set cfg.data.type = 'CIFAR10@torchvision':

    Files already downloaded and verified
    Files already downloaded and verified
    2022-12-02 01:24:58,821 (splitter_builder:43)WARNING: Splitter  not found.
    Traceback (most recent call last):
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\01_quick_start.py", line 36, in <module>
        data, modified_cfg = get_data(cfg.clone())
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\federatedscope\core\auxiliaries\data_builder.py", line 661, in get_data
        data, modified_config = load_external_data(config)
      File "C:\Users\Zhang Ze Yu\OneDrive - National University of Singapore\Documents\GitHub\FederatedScope\federatedscope\core\auxiliaries\data_builder.py", line 583, in load_external_data
        splitter(dataset[split], prior=train_label_distribution)):
    TypeError: 'NoneType' object is not callable
    
    opened by zyzhang1130 6
  • Suspected bug in merge_data

    Suspected bug in merge_data

    Describe the bug

    federatedscope.core.auxiliaries.data_builder.merge_data causes all_data[1] == merged_data. Furthermore, it appears the size of the merged data is not consistent with the original dataset.

    To Reproduce

    from federatedscope.core.configs.config import global_cfg
    
    cfg = global_cfg.clone()
    
    
    
    from federatedscope.core.auxiliaries.data_builder import get_data, merge_data
    
    cfg.data.type = 'femnist'
    cfg.data.splits = [0.6, 0.2, 0.2]
    cfg.data.batch_size = 10
    cfg.data.subsample = 0.05
    cfg.data.transform = [['ToTensor'], ['Normalize', {'mean': [0.1307], 'std': [0.3081]}]]
    cfg.federate.client_num = 16
    
    data, modified_cfg = get_data(cfg.clone())
    cfg.merge_from_other_cfg(modified_cfg)
    
    merged_data = merge_data(data,5)
    
    train_count = 0
    test_count = 0
    val_count = 0
    for i in range(1,5):
        train_count += len(data[i]['train'])
        test_count += len(data[i]['test'])
        val_count += len(data[i]['val'])
    

    Expected behavior

    1. data[1] == merged_data should be false (there is no obvious reason why it should be true, especially when it size of data remains the same).
    2. train_count, test_count and val_count should match merged_data. Screenshots image

    image

    Desktop (please complete the following information):

    • OS: Windows11
    opened by zyzhang1130 6
  • ">

    " How to start with customized function?" Some error about

    When I follow "https://federatedscope.io/docs/graph/". "Start with customized function", I get a error "ValueError: Data my_cora not found." It seems like register_data doesn't regist my_cora data. My data file is mydata.py and data's feature is "mycora". My model file is mygcn.py and my model feature is "gnn_mygcb". But it doesn'r work.

    How to start with customized function?

    wontfix 
    opened by SakuraXiaMF 6
  • Redundancy in the log files

    Redundancy in the log files

    A Fedavg on 5% of FEMNIST trail will produce a 500 kb log each round: with 80% eval logs like 2022-04-13 16:33:24,901 (client:264) INFO: Client #1: (Evaluation (test set) at Round #26) test_loss is 79.352451. And 10% is server results and 10% is train informations.

    If the round is 500, 1000, or much larger, the log files will take up too much space with a lot of redundancy. @yxdyc

    enhancement 
    opened by rayrayraykk 6
  • FedGlobalContrast and FedSimCLR baseline

    FedGlobalContrast and FedSimCLR baseline

    current problem: global calculate contrast loss is slow because of epoch data size, but it will take more communication cost with batch size calculate global contrast loss

    Feature 
    opened by xkxxfyf 5
  • Link Prediction任务中,Server端评估指标:client_best_individual有误

    Link Prediction任务中,Server端评估指标:client_best_individual有误

    client_best_individual应该是选取表现最好的client的评估参数,但是我发现,test_acc这个指标反而选取了表现最差(最低)的client的test_acc,与此同时,除了test_acc这个指标外,其余指标均选取了表现最好client的评估结果。 换句话说,client_best_individual:最好client的loss、最好client的test_total、......、最差client的test_acc、...... 我觉得这里有点小问题。

    opened by YoungOGtiger 5
  • report cuda error when trying to launch up the demo case

    report cuda error when trying to launch up the demo case

    Hi when I am trying to launch up the demo case, cuda relevant error was reported as below:

    I am using conda to manage the environment. in other env I have the pytorch works on cuda without any problem. I think this could be the installation issue-- I did not install anything by myself, totally following your guidance. My cuda version: NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6
    and my torch version: 1.10.1

    (fedscope) liangma@lMa-X1:~/prj/FederatedScope$ python federatedscope/main.py --cfg federatedscope/example_configs/femnist.yaml
    
    ...
    2022-05-13 22:06:09,249 (server:520) INFO: ----------- Starting training (Round #0) -------------
    Traceback (most recent call last):
     File "/home/liangma/prj/FederatedScope/federatedscope/main.py", line 41, in <module>
       _ = runner.run()
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/fed_runner.py", line 136, in run
       self._handle_msg(msg)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/fed_runner.py", line 254, in _handle_msg
       self.client[each_receiver].msg_handlers[msg.msg_type](msg)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/worker/client.py", line 202, in callback_funcs_for_model_para
       sample_size, model_para_all, results = self.trainer.train()
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 374, in train
       self._run_routine("train", hooks_set, target_data_split_name)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 208, in _run_routine
       hook(self.ctx)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 474, in _hook_on_fit_start_init
       ctx.model.to(ctx.device)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 899, in to
       return self._apply(convert)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 570, in _apply
       module._apply(fn)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 593, in _apply
       param_applied = fn(param)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 897, in convert
       return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
     File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/cuda/__init__.py", line 208, in _lazy_init
       raise AssertionError("Torch not compiled with CUDA enabled")
    AssertionError: Torch not compiled with CUDA enabled
    
    
    opened by lmaxeniro 5
  •  The size of tensor a (10) must match the size of tensor b (11) at non-singleton dimension

    The size of tensor a (10) must match the size of tensor b (11) at non-singleton dimension

    # For server
    python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_femnist_server.yaml
    
    # For clients
    python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_femnist_client_1.yaml
    
    python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_femnist_client_2.yaml
    
    python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_femnist_client_3.yaml
    

    I run the above corresponding codes on the terminals of the four containers respectively. When the mode is synchronous, it runs normally, and when the mode is asynchronous, it reports following errors:

    2022-09-06 15:19:54,209 (server:786)INFO: ----------- Starting training (Round #0) ------------- Traceback (most recent call last): File "/workspace/workfile/FederatedScope/federatedscope/main.py", line 62, in _ = runner.run() File "/workspace/workfile/FederatedScope/federatedscope/core/fed_runner.py", line 207, in run self.server.run() File "/workspace/workfile/FederatedScope/federatedscope/core/workers/server.py", line 247, in run move_on_flag = self.msg_handlersmsg.msg_type File "/workspace/workfile/FederatedScope/federatedscope/core/workers/server.py", line 909, in callback_funcs_model_para move_on_flag = self.check_and_move_on() File "/workspace/workfile/FederatedScope/federatedscope/core/workers/server.py", line 316, in check_and_move_on aggregated_num = self._perform_federated_aggregation() File "/workspace/workfile/FederatedScope/federatedscope/core/workers/server.py", line 455, in _perform_federated_aggregation result = aggregator.aggregate(agg_info) File "/workspace/workfile/FederatedScope/federatedscope/core/aggregators/asyn_clients_avg_aggregator.py", line 28, in aggregate avg_model = self._para_weighted_avg(models, File "/workspace/workfile/FederatedScope/federatedscope/core/aggregators/asyn_clients_avg_aggregator.py", line 77, in _para_weighted_avg avg_model[key] += local_model[key] * weight RuntimeError: The size of tensor a (10) must match the size of tensor b (11) at non-singleton dimension

    opened by young-chao 4
  • Running a distributed example across four containers fails

    Running a distributed example across four containers fails

    四个容器上分别准备了数据

    python scripts/distributed_scripts/gen_data.py

    服务端容器运行

    python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_server.yaml distribute.data_file toy_data/server_data distribute.server_host 172.17.0.8 distribute.server_port 50051

    客户端1运行

    python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_1.yaml distribute.data_file toy_data/client_1_data distribute.server_host 172.17.0.8 distribute.server_port 50051 distribute.client_host 172.17.0.9 distribute.client_port 50052

    客户端2运行

    python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_2.yaml distribute.data_file toy_data/client_2_data distribute.server_host 172.17.0.8 distribute.server_port 50051 distribute.client_host 172.17.0.10 distribute.client_port 50053

    客户端3运行

    python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_3.yaml distribute.data_file toy_data/client_3_data distribute.server_host 172.17.0.8 distribute.server_port 50051 distribute.client_host 172.17.0.11 distribute.client_port 50054

    终端输出始终停在如下状态:

    服务端

    2022-09-02 11:53:52,076 (server:185)INFO: Server: Listen to 172.17.0.8:50051... 2022-09-02 11:53:52,076 (fed_runner:348)INFO: Server has been set up ...

    客户端1

    2022-09-02 11:51:55,487 (client:124)INFO: Client: Listen to 172.17.0.9:50052... 2022-09-02 11:51:55,487 (fed_runner:401)INFO: Client (address 172.17.0.9:50052) has been set up ...

    客户端2

    2022-09-02 11:53:59,381 (client:124)INFO: Client: Listen to 172.17.0.1:50053... 2022-09-02 11:53:59,381 (fed_runner:401)INFO: Client (address 172.17.0.1:50053) has been set up ...

    客户端3

    2022-09-02 11:53:59,381 (client:124)INFO: Client: Listen to 172.17.0.1:50053... 2022-09-02 11:53:59,381 (fed_runner:401)INFO: Client (address 172.17.0.1:50053) has been set up ...

    很困惑,这个例子是只能单机上跑吗,四个容器中网络是通的,但是没有反应

    opened by young-chao 4
  • yacs error occurred when run command

    yacs error occurred when run command

    Describe the bug

    when run command: python federatedscope/main.py --cfg federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup.yaml --client_cfg federatedscope/gfl/baseline/isolated_gin_minibatch_on_cikmcup_per_client.yaml

    yacs error: AssertionError: a (cur type <class 'NoneType'>) must be an instance of <class 'yacs.config.CfgNode'>

    Desktop (please complete the following information):

    • OS: [MACOS]
    • Version [13.0]

    Additional context 2022-08-13 13:25:21,112 (fed_runner:302)INFO: Client 13 has been set up ... Traceback (most recent call last): File "/Users/panmin/Downloads/FederatedScope/federatedscope/main.py", line 46, in runner = FedRunner(data=data, File "/Users/panmin/opt/anaconda3/envs/fs/lib/python3.9/site-packages/federatedscope-0.1.9-py3.9.egg/federatedscope/core/fed_runner.py", line 58, in init self._setup_for_standalone() File "/Users/panmin/opt/anaconda3/envs/fs/lib/python3.9/site-packages/federatedscope-0.1.9-py3.9.egg/federatedscope/core/fed_runner.py", line 115, in _setup_for_standalone self.client[client_id] = self._setup_client( File "/Users/panmin/opt/anaconda3/envs/fs/lib/python3.9/site-packages/federatedscope-0.1.9-py3.9.egg/federatedscope/core/fed_runner.py", line 278, in _setup_client client_specific_config.merge_from_other_cfg( File "/Users/panmin/opt/anaconda3/envs/fs/lib/python3.9/site-packages/federatedscope-0.1.9-py3.9.egg/federatedscope/core/configs/config.py", line 153, in merge_from_other_cfg merge_dict_a_into_b(cfg_other, self, self, []) File "/Users/panmin/opt/anaconda3/envs/fs/lib/python3.9/site-packages/federatedscope-0.1.9-py3.9.egg/federatedscope/core/configs/config.py", line 70, in merge_dict_a_into_b _assert_with_logging( File "/Users/panmin/opt/anaconda3/envs/fs/lib/python3.9/site-packages/yacs-0.1.8-py3.9.egg/yacs/config.py", line 545, in _assert_with_logging assert cond, msg AssertionError: a (cur type <class 'NoneType'>) must be an instance of <class 'yacs.config.CfgNode'>

    opened by Bio-Gas 4
  • Cannot run fl training on Femnist in distribute mode

    Cannot run fl training on Femnist in distribute mode

    opened by DavdGao 1
  • Issue with starting a FL process with a trained model

    Issue with starting a FL process with a trained model

    So I was trying to continue training a model with vanilla FL by copying the model into the server and client models as follows:

    Fed_runner = FedRunner(data=data,
                                   server_class=get_server_cls(cfg),
                                   client_class=get_client_cls(cfg),
                                   config=cfg.clone())
            
    Fed_runner.server.model = copy.deepcopy(pretrained_model)
    for i in range(cfg.federate.client_num):
           Fed_runner.client[i+1].model = copy.deepcopy(pretrained_model)
    Fed_runner.run()
    

    I was expecting the training accuracy to continue to rise as pretrained_model was trained on the same data but not until convergence. but the training accuracy only fluctuated around the same level. Is this because I used the wrong way to copy the model weights?

    opened by zyzhang1130 2
  • [Discussion] Refactor worker class

    [Discussion] Refactor worker class

        It is ok to inject those feature engineering procedures into an FL course in this way, but we shall change the instantiation of workers to a better (more general and unified) way. Actually, it is quite confusing to wrap a worker by feature engineering-related wrapper. Feature engineering is just a tiny step in an FL course, which doesn't change a worker significantly. One usual way to instantiate a worker from a collection of such pluggable behaviors is to use factory pattern I guess.
    

    Originally posted by @joneswong in https://github.com/alibaba/FederatedScope/pull/426#discussion_r1046631649

    opened by joneswong 0
Releases(v0.2.0)
  • v0.2.0(Jul 30, 2022)

    Summarization

    The improvements included in this release (FederatedScope v0.2.0) are summarized as follows:

    • FederatedScope allows users to apply asynchronous training strategies in federated learning with event-driven architecture, including different aggregation conditions, staleness toleration, broadcasting manners, etc. And we support an efficient standalone simulation for cross-device FL with a large number of participants.
    • We add three benchmarks for Federated HPO, Personalized FL, and Hetero-Task FL to promote the application of federated learning in a wide range of scenarios.
    • We ease the installation, setup, and continuous integration (CI), and make them more friendly for users to get started and customize. And useful visualization functionalities are added into FederatedScope for users to monitor the training process and evaluation results.
    • We add paper lists of related topics, including FL-Recommendation, Federated-HPO, Personalized FL, Federated Graph Learning, FL-NLP, FL-Attacker, FL-Incentive-Mechanism, and so on. These materials are constantly being updated.
    • Several novel features are also included in this release, such as performance attacks, organizer, unseen clients generalization, splitter, client sampler, and so on, which enhance FederatedScope's robustness and comprehensiveness.

    Commits

    🚀 Enhancements & Features

    • Add backdoor attack @Alan-Qin (#267)
    • Add organizer to FederatedScope @rayrayraykk (#265, #257)
    • Monitoring the client-wise and global wandb info @yxdyc (#260, #226, #206, #176, #90)
    • More friendly guidance of installation, setup and contribution @rayrayraykk (#255, #192)
    • Add learning rate scheduler in FS @DavdGao (#248)
    • Support different types of keys when communicating via grpc @xieyxclack (#239)
    • Support constructing FL course when server does not have data @xieyxclack (#236)
    • Enabled unseen clients case to check the participation generalization gap @yxdyc (#238, #100)
    • Support more robust type conversion in yaml file @yxdyc (#229)
    • Asynchronous Federated Learning @xieyxclack (#225)
    • Support both pre- and post-merging data for the "global" baseline @yxdyc (#220)
    • Format the code by flake8 @rayrayraykk (#211, #207)
    • Add paper list of FL-Attacker and FL-Incentive-Mechanism @Osier-Yi (#203, #202, #201)
    • Add client samplers @xieyxclack (#200)
    • Modify the log for hooks_in_train/test @DavdGao (#181)
    • Modification of the finetune mechanism @DavdGao (#177)
    • Add FedHPO-B, a benchmark suite for federated hyperparameter optimization @rayrayraykk @joneswong (#173, #146, #127)
    • Add pFL-Bench, a comprehensive benchmark for personalized Federated Learning @yxdyc (#169, #149)
    • Add B-FHTL, a benchmark suite for studying federated hetero-task learning @DavdGao (#167, #150)
    • Update splitter for consistent label distribution @xieyxclack (#154)
    • Improve SHA wrapper @joneswong (#145)
    • Add slack & DingDing group @xieyxclack (#142)
    • Add FedEx @joneswong @rayrayraykk (#141, #137, #120)
    • Enable single thread HPO @joneswong (#140)
    • Refactor autotune module @joneswong (#133)
    • Add paper list of federated database @DavdGao (#129)
    • A quadratic objective function-based experiment @joneswong (#111)
    • Support optimizers with different parameters @DavdGao (#96)
    • Demo how to use SMAC for FedHPO @joneswong (#88)
    • FLIT for federated graph classification/regression @wanghh7 (#87)
    • Add momentum for the optimizer in server @DavdGao (#86)
    • Add an example for distributed mode @xieyxclack (#85)
    • Add readme for vFL @xieyxclack (#83)
    • Add paper list of FL-NLP @cheneydon (#81)
    • Add more models and datasets from external packages. @rayrayraykk (#79, #42)
    • Add pFL paper list @yxdyc (#73, #72)
    • Add paper list of FedRec @xieyxclack (#68)
    • Add paper list of FedHPO @joneswong (#67)
    • Add paper list of federated graph learning. @rayrayraykk (#65)

    🐛 Bug Fixes

    • Fix ditto trainer @yxdyc (#271)
    • Fix personalization when module has lazy load hooks @rayrayraykk (#269)
    • Fix the wrongly early_stopper.track_and_check calling in client @yxdyc (#237)
    • Fix type conversion error and invalid logging in distributed mode @rayrayraykk (#232, #223)
    • Fix the cpu and memory wastage problems caused by multiprocess @yxdyc (#212)
    • Fix for invalid sample_client_num in some situation @yxdyc (#210)
    • Fix the url of GFL dataset @rayrayraykk (#196)
    • Fix twitter dataset @rayrayraykk (#187)
    • BugFix for monitor and logger @rayrayraykk @rayrayraykk (#188, #175, #109)
    • Fix download url @Osier-Yi @rayrayraykk @xieyxclack (#101, #95, #92, #76)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(May 6, 2022)

Owner
Alibaba
Alibaba Open Source
Alibaba
TianyuQi 10 Dec 11, 2022
A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

FedML-AI 175 Dec 1, 2022
GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning, as well as corresponding mitigation strategies.

null 129 Dec 30, 2022
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

Mohammad Amin Haghpanah 184 Dec 31, 2022
A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Torch-RecHub A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend. 安装 pip install torch-rechub 主要特性 scikit-learn风格易用

Mincai Lai 67 Jan 4, 2023
this is a lite easy to use virtual keyboard project for anyone to use

virtual_Keyboard this is a lite easy to use virtual keyboard project for anyone to use motivation I made this for this year's recruitment for RobEn AA

Mohamed Emad 3 Oct 23, 2021
A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

Aria Ghora Prabono 16 Jun 16, 2022
FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

Google 208 Dec 14, 2022
An open framework for Federated Learning.

Welcome to Intel® Open Federated Learning Federated learning is a distributed machine learning approach that enables organizations to collaborate on m

Intel Corporation 397 Dec 27, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

Med-AIR@CUHK 156 Dec 15, 2022
[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

Quande Liu 178 Jan 6, 2023
Personalized Federated Learning using Pytorch (pFedMe)

Personalized Federated Learning with Moreau Envelopes (NeurIPS 2020) This repository implements all experiments in the paper Personalized Federated Le

Charlie Dinh 226 Dec 30, 2022
Plato: A New Framework for Federated Learning Research

a new software framework to facilitate scalable federated learning research.

System Group@Theory Lab 192 Jan 5, 2023
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 6, 2023
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat

Dilawar Mahmood 25 Nov 30, 2022
FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cross-device use-cases over FEDn networks.

Scaleout 75 Nov 9, 2022
FedScale: Benchmarking Model and System Performance of Federated Learning

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper) This repository contains scripts and instructions of building FedSca

null 268 Jan 1, 2023
Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

To run the code Unzip the package to your local directory; Run 'pip install -r requirements.txt' to download required packages; Open file ~/nips_code/

null 32 Dec 26, 2022