Plato: A New Framework for Federated Learning Research

System Group@Theory Lab

Last update: Jan 5, 2023

Related tags

Deep Learning plato

Overview

Plato: A New Framework for Federated Learning Research

Welcome to Plato, a new software framework to facilitate scalable federated learning research.

Installing Plato with PyTorch

To install Plato, first clone this repository to the desired directory.

The Plato developers recommend using Miniconda to manage Python packages. Before using Plato, first install Miniconda, update your conda environment, and then create a new conda environment with Python 3.8 using the command:

$ conda update conda
$ conda create -n federated python=3.8
$ conda activate federated

where federated is the preferred name of your new environment.

Update any packages, if necessary by typing y to proceed.

The next step is to install the required Python packages. PyTorch should be installed following the advice of its getting started website. The typical command in Linux with CUDA GPU support, for example, would be:

$ conda install pytorch torchvision cudatoolkit=11.1 -c pytorch

The CUDA version, used in the command above, can be obtained on Ubuntu Linux systems by using the command:

nvidia-smi

In macOS (without GPU support), the typical command would be:

$ conda install pytorch torchvision -c pytorch

We will need to install several packages using pip as well:

$ pip install -r requirements.txt

If you use Visual Studio Code, it is possible to use yapf to reformat the code every time it is saved by adding the following settings to ..vscode/settings.json:

"python.formatting.provider": "yapf", 
"editor.formatOnSave": true

In general, the following is the recommended starting point for .vscode/settings.json:

"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.provider": "yapf", 
"editor.formatOnSave": true,
"python.linting.pylintArgs": [
    "--init-hook",
    "import sys; sys.path.append('/absolute/path/to/project/home/directory')"
],
"workbench.editor.enablePreview": false

It goes without saying that /absolute/path/to/project/home/directory should be replaced with the actual path in the specific development environment.

Tip: When working in Visual Studio Code as the development environment, one of the project developer's colour theme favourites is called Bluloco, both of its light and dark variants are excellent and very thoughtfully designed. The Pylance extension is also strongly recommended, which represents Microsoft's modern language server for Python.

Running Plato in a Docker container

Most of the codebase in Plato is designed to be framework-agnostic, so that it is relatively straightfoward to use Plato with a variety of deep learning frameworks beyond PyTorch, which is the default framwork it is using. One example of such deep learning frameworks that Plato currently supports is MindSpore. Due to the wide variety of tricks that need to be followed correctly for running Plato without Docker, it is strongly recommended to run Plato in a Docker container, on either a CPU-only or a GPU-enabled server.

To build such a Docker image, use the provided Dockerfile for PyTorch and Dockerfile_MindSpore for MindSpore:

docker build -t plato -f Dockerfile .

or:

docker build -t plato -f Dockerfile_MindSpore .

To run the docker image that was just built, use the command:

./dockerrun.sh

Or if GPUs are available, use the command:

./dockerrun_gpu.sh

To remove all the containers after they are run, use the command:

docker rm $(docker ps -a -q)

To remove the plato Docker image, use the command:

docker rmi plato

On Ubuntu Linux, you may need to add sudo before these docker commands.

The provided Dockerfile helps to build a Docker image running Ubuntu 20.04, with a virtual environment called federated pre-configured to support PyTorch 1.8.1 and Python 3.8. If MindSpore support is needed, the provided Dockerfile_MindSpore contains a pre-configured environment, also called federated, that supports MindSpore 1.1.1 and Python 3.7.5 (which is the Python version that MindSpore requires). Both Dockerfiles have GPU support enabled. Once an image is built and a Docker container is running, one can use Visual Studio Code to connect to it and start development within the container.

Running Plato

To start a federated learning training workload, run run from the repository's root directory. For example:

./run --config=configs/MNIST/fedavg_lenet5.yml

--config (-c): the path to the configuration file to be used. The default is config.yml in the project's home directory.
--log (-l): the level of logging information to be written to the console. Possible values are critical, error, warn, info, and debug, and the default is info.

Plato uses the YAML format for its configuration files to manage the runtime configuration parameters. Example configuration files have been provided in the configs directory.

Plato uses wandb to produce and collect logs in the cloud. If this is not needed, run the command wandb offline before running Plato.

If there are issues in the code that prevented it from running to completion, there could be running processes from previous runs. Use the command pkill python to terminate them so that there will not be CUDA errors in the upcoming run.

Installing YOLOv5 as a Python package

If object detection using the YOLOv5 model and any of the COCO datasets is needed, it is required to install YOLOv5 as a Python package first:

cd packages/yolov5
pip install .

Plotting Runtime Results

If the configuration file contains a results section, the selected performance metrics, such as accuracy, will be saved in a .csv file in the results/ directory. By default, the results/ directory is under the path to the used configuration file, but it can be easily changed by modifying Config.result_dir in config.py.

As .csv files, these results can be used however one wishes; an example Python program, called plot.py, plots the necessary figures and saves them as PDF files. To run this program:

python plot.py --config=config.yml

--config (-c): the path to the configuration file to be used. The default is config.yml in the project's home directory.

Running Unit Tests

All unit tests are in the tests/ directory. These tests are designed to be standalone and executed separately. For example, the command python lr_schedule_tests.py runs the unit tests for learning rate schedules.

Installing Plato with MindSpore

Though we provided a Dockerfile for building a Docker container that supports MindSpore 1.1, in rare cases it may still be necessary to install Plato with MindSpore in a GPU server running Ubuntu Linux 18.04 (which MindSpore requires). Similar to a PyTorch installation, we need to first create a new environment with Python 3.7.5 (which MindSpore 1.1 requires), and then install the required packages:

conda create -n mindspore python=3.7.5
pip install -r requirements.txt

We should now install MindSpore 1.1 with the following command:

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.1.1/MindSpore/gpu/ubuntu_x86/cuda-10.1/mindspore_gpu-1.1.1-cp37-cp37m-linux_x86_64.whl

MindSpore may need additional packages that need to be installed if they do not exist:

sudo apt-get install libssl-dev
sudo apt-get install build-essential

If CuDNN has not yet been installed, it needs to be installed with the following commands:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get install libcudnn8=8.0.5.39-1+cuda10.1

To check the current CuDNN version, the following commands are helpful:

function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
check libcudnn

To check if MindSpore is correctly installed on the GPU server, try to import mindspore with a Python interpreter.

Finally, to use trainers and servers based on MindSpore, assign true to use_mindspore in the trainer section of the configuration file. This variable is unassigned by default, and Plato would use PyTorch as its default framework.

Uninstalling Plato

Remove the conda environment used to run Plato first, and then remove the directory containing Plato's git repository.

conda-env remove -n federated
rm -rf plato/

where federated (or mindspore) is the name of the conda environment that Plato runs in.

For more specific documentation on how Plato can be run on GPU cluster environments such as Lambda Labs' GPU cloud or Compute Canada, refer to docs/Running.md.

Technical support

Technical support questions should be directed to the maintainer of this software framework: Baochun Li ([email protected]).

Comments

Unifying data transfer with numpy array
All data transfer are now in numpy array.

Description

For model weights, the transfer type is an OrderedDict{name: numpy.nparray} For features, the transfer type is an list[(numpy.nparray, numpy.nparray)], first value is feature while second value is target.

How has this been tested?

Tested with the following config:

'configs/MNIST/fedavg_lenet5_noniid' 'configs/MNIST/fedavg_lenet5' 'configs/MNIST/fedprox_lenet5' 'configs/MNIST/mistnet_lenet5' 'configs/MNIST/mistnet_pretrain_lenet5'

Please help test for mindspore and Tensorflow. I don't have a proper machine for testing for now.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.
opened by hcngac 19
Update to support simulation of different client's speed in async mode
Description

In the async mode, most of the clients have a relatively fast speed, so it is sometimes quite hard to test Plato in a scenario where clients have various different speeds. This change allows users to simulate clients' speeds by providing a distribution in the configuration file. The user can also choose to only enable the simulation without providing a specific distribution, and the code would just use a default one.

Currently, the simulation only supports Zipf and Normal distribution. More distributions can be added in the future.

How has this been tested?

Test 1 with default distribution: Run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml

Test 2 with Normal distribution: First update the client configuration in configs/MNIST/fedavg_async_lenet5.yml as below:

clients: type: simple total_clients: 2 per_round: 2 do_test: false simulation: true simulation_distribution: distribution: normal mean: 2 sd: 1

Then run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml

Test 3 with Zipf distribution: First update the client configuration in configs/MNIST/fedavg_async_lenet5.yml as below:

clients: type: simple total_clients: 2 per_round: 2 do_test: false simulation: true simulation_distribution: distribution: zipf s: 2

Then run ./run -c ./configs/MNIST/fedavg_async_lenet5.yml

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

Additional information

The current implementation deviates a bit from the original intent. The original intents were to put clients to sleep at the end of each epoch. To accomplish that, the await asyncio.sleep() code should be inserted in method Trainer.train_process() in file plato/trainers/basic.py, and that would require the method Trainer.train_process() to be changed to async, and we need to change every function that calls train_process() to async. I'm afraid that might break the code, so I just stay with the current implementation, where the clients are put to sleep after they finished the model training.

UPDATE: please ignore the information above, the code is now implemented in a way such that clients are put to sleep at the end of each epoch. Please refer to the conversation below for details.
opened by cuiboyuan 16
Add Support for FEMNIST
Add support for the FEMNIST dataset by referring to an open-sourced project LEAF.

Description

Main changes:

Added a new datasource femnist at ~/plato/datasources/femnist.py and modified the ~/plato/datasources/registry.py accordingly.

Added a new sampler empty at ~/plato/samplers/empty.py and modified the ~/plato/samplers/registry.py accordingly.

Made minor changes at ~/plato/trainers/basic.py and ~/plato/models/lenet5.py for further adaptation.

Remark: while the implementation is mainly borrowed from LEAF, it does not need to plug in the LEAF project and can work independently.

Motivation

Apart from the label distribution skew (that can be implemented with LDA), non-IID scenarios also consist of other circumstances including (1) feature distribution skew, (2) same label with different features, as well as (3) same feature with different labels (see this survey for more details). Thus, it would be useful if Plato can supports FL research with more realistic datasets. The FEMNIST dataset is one celebrated example, and it is inherently partitioned by the clients' identification. We thus considered adding support for it, and hopefully, our design can be compatible with those realistic datasets that are also partitioned by clients.

p.s. One may want to refer to an external tutorial in a forked version of Plato for more context.

How has this been tested?

At the root directory,

conda activate federated python run --config=./examples/async/data_hetero/data_hetero_femnist_lenet5.yml > out.txt 2>&1 &

p.s. Please expect hours for the first test in your environment due to the data preprocessing overhead.

In our test, we observed the generated out.txt and confirmed that the training can go on smoothly.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code follows the code style of this project.

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.
opened by SamuelGong 14
[FR] Local Differential Privacy Methods
Is your feature request related to a problem? Please describe. Currently there is only one implementation of local differential privacy (LDP): RAPPOR[1], implemented in https://github.com/TL-System/plato/blob/main/plato/utils/unary_encoding.py and it is not decoupled with algorithm implementation.

https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/plato/algorithms/mistnet.py#L52-L64

https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/plato/algorithms/mindspore/mistnet.py#L44-L48

https://github.com/TL-System/plato/blob/fac44a6bdbe64d3060ae290e4633b316b02a1474/examples/nnrt/nnrt_algorithms/mistnet.py#L60-L65

This feature request calls for a modular LDP plugin interface and a number of different other methods e.g. [2][3]

Describe the solution you'd like

[x] ~~Unified data exchange format between clients and server.~~

[x] A modular interface for plugging in data processing modules into the server-client data exchange.

[x] A config entry for enabling specific data processing modules.

[ ] LDP modules implementation.

[ ] Test on the theoretical property of modules i.e. ε-LDP

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. To be filled.

Additional context Add any other context or screenshots about the feature request here. [1] Ú. Erlingsson, V. Pihur, and A. Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM, 2014. [2] Differential Privacy Team, Apple. Learning with privacy at scale. 2017. [3] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30, December 2017.
enhancement
opened by hcngac 12
[RFC]Android Clients
Development of android FL client to further enhance the simulation of FL on mobile devices

Approach

Use of chaquo to adapt the current Python code base to Android.

Chaquo is not open source, but it provides free license for open source projects.

Chaquo is the only Python to Android tool that has PyTorch packaged.

Building PyTorch for Android in other tools require significant amount of work.

Use of redroid to support multiple instances of android devices.

Redroid is Android in container, using the same kernel as the host.

The performance of Redroid is close to the host, making multiple Android instances possible.

Separate log server to receive log entries from android clients.

There is no good way to directly extract log contents from Android app.

Using an HTTP log server and modifying the logging handler in clients can handle the logs nicely.

enhancement
opened by hcngac 10
Added General Support for Asynchronous Training
Add support for asynchronous FL where the central server can eagerly start the training of idle clients before receiving sufficient model updates.

Motivation

The FL practice is currently dominated by the synchronous mode, wherein each round the server needs to wait until receiving a sufficient number of clients' updates prior to deriving an aggregated model update (an example in Plato can be found in the method Server.client_payload_done() of ~/plato/servers/base.py). On the other hand, asynchronous mode has been extensively studied in traditional distributed learning (where the data distribution across clients is IID). In asynchronous training, the server eagerly starts the training of idle clients before receiving sufficient model updates sent by previously selected clients (an example is illustrated in the following figure). Out of curiosity, we want to explore the spectrum of the system performance (instead of theoretical convergence rate as in existing work) of asynchronous mode in the context of FL under varying degrees of client heterogeneity.

Description

Added a module ~/servers/async_timer.py, which acts as a virtual client at the server-side and plays the role of sending heartbeats for periodically triggered client selection.

Added a module ~/servers/async_base.py, the base class of the respective server, which mainly implements the workflow of an asynchronous step:

Added a module ~/servers/async_fedavg.py, where the aggregation logic is simply performing FedAvg on unaggregated weights. It implies that we can have other implementations of aggregation even in asynchronous mode.

How has this been tested?

We test it in a fresh clone like

git clone [email protected]:SamuelGong/plato.git cd plato [with all necessary installation steps] conda activate federated python run --config=./examples/async/async_train/async_train_mnist_lenet5.yml > log.txt 2>&1 &

Example results

Time-to-accuracy performance w.r.t. the provided configuration can be seen as follows,

while the corresponding time sequence diagram is also depicted for providing more insights.

Context

This is our preliminary attempt and we would like to hear from the authors in an agile manner. Thus, we still anticipate necessary changes on the code, let alone the coding styles, comments, and documentation (though they should be already easy to read at the moment). More context can be found in an external tutorial.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code follows the code style of this project.

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.
opened by SamuelGong 10
Feature/client sim
Some details: On server side (in plato/servers/base.py):

The (actual / launched) client_id is paired with sid

Each launched client has an attribute virtual_id:

Not in simulation: equal to the actual client_id

In simulation: updated every round after the server selects clients from self.client_pool

The server selects clients from self.client_pool instead of self.clients:

Not in simulation: a list of connected clients' ids updated with self.clients

In simulation: a list of all possible clients' ids according to config parameter total_clients

On client side (in plato/clients/base.py and plato/clients/simple.py):

The client_id is the virtual one designated by the server updated each round

The actual_client_id is paired with sid used for connection

The client will update the trainer and algorithm used in this round whenever it receives a response from the server with new designated virtual_id

Status:

Tests regarding client simulation passed for several examples (FedAvg, FedAtt, FedAdp, AFL).

Conflicts in example server or client were solved as much as possible.

README.md or other documents haven't been updated with this new feature yet.

Potential Concerns:

Logging info might be confusing: In client simulation, the id of a new contact sent to the server is still the actual (launched) client id even though the client may represent a virtual one with a different virtual id in the last round.

One should be careful with self.selected_clients, self.clients_pool, self.client_id, self.virtual_id when designing example servers and self.client_id, self.actual_client_id when designing example clients.
opened by silviafeiwang 8
Enable Oort working in the async mode.
Description

Previously, the implementation of Oort cannot work normally in asynchronous mode since the server updates client utility according to the 'self.explored_client' list which contains delayed clients that haven't sent out the updates. To address it, we make the server update based on the update list.

How has this been tested?

Ran 'oort_MNIST_lenet5.yml' in asynchronous environment.

Types of changes

[x] Bug fix (non-breaking change which fixes an issue) Fixes #

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[ x] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.
opened by Yufei-Kang 7
Added the facility to record local testing accuracies in .csv files
Description

When running a job, if the configuration file has the attribute "do_test" set to true and if there also exists a "results" attribute, then the test accuracies of each client will be computed locally and stored in a csv file with the round number, client ID and test accuracy as headers.

How has this been tested?

Tested on local machine using the "fedavg_async_lenet5" configuration file with 1-3 clients. Each with 1-3 rounds after setting "do_test" to True and another time to False

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[x] I have updated the documentation accordingly.
opened by kevinsun203 7
RLFL: A Reinforcement Learning Framework for Active Federated Learning
This implements a reinforcement learning framework for learning and controlling federated learning tasks

Description

The added directory plato/utils/rlfl is the framework base; the added directory examples/fei is an instance of a DRL agent that learns the global aggregation strategy.

How has this been tested?

Tests of the instance examples/fei are passed on the latest plato environment.

To know how to customize another DRL agent and run the training/testing, please refer to plato/utils/rlfl/README.md.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[x] I have updated the documentation accordingly.
opened by silviafeiwang 7
Fixed the reported bug of the config test in #190
Noticing the bug reported in #190, I then fixed all issues that existed in the tests/config_tests.py.

Description

I made three changes to the code. First, I moved the configuration files, including Pipelines and Models, into the Kinetics directory. This fixed the issue of "FileNotFoundError". Second, all PyLint errors were addressed, making the code rate 10.00/10. Finally, I added more comments further to describe the objective of each unit test function.

How has this been tested?

config_tests As for the code running, I tested it with python tests/config_tests.py. As for the format test, I executed the command pylint tests/config_tests.py.

data_tests As for the code running, I tested it with python tests/data_tests.py. As for the format test, I executed the command pylint tests/data_tests.py.

sampler_tests As for the code running, I tested it with python tests/sampler_tests.py. As for the format test, I executed the command pylint tests/sampler_tests.py.

Types of changes

[x] Bug fix (non-breaking change which fixes an issue) Fixes #

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.
opened by CSJDeveloper 6
Added the search space MobileNetV3 into example code of PerFedRLNAS
Description

This PR has two contributions. First, the nasvit space is moved to the plato/models as it has been tested through experiments and confirmed that no big changes will be added over nasvit space. Regarding other search space, they can also inherit part of the code from nasvit.

Second, I added another search space mobilenetv3 on basis of previous codes. As nasvit has some basic units of linear layers, convolution layers, residual blocks for NAS supernet. It is easy to build this search space on the basis of nasvit code. The detailed code of implementation of search space mobilenetv3 in under examples/pfedrlnas/MobileNetV3/model. The idea of how to build this search space refers to the paper Searchiing for MobileNetV3.

How has this been tested?

To test the mobilenetv3 search space, we can test by running the command:

python3 ./examples/pfedrlnas/MobileNetV3/fednas.py -c ./examples/pfedrlnas/configs/FedNAS_CIFAR10_Mobilenet_NonIID03_Scratch.yml

To test if the nasvit is moved to plato/models correct and the search space NASVIT, we can run the command:

python3 ./examples/pfedrlnas/VIT/fednas.py -c ./examples/pfedrlnas/configs/FedNAS_CIFAR10_NASVIT_NonIID01_Scratch.yml

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code has been formatted using Black and checked using PyLint.

[x] My change requires a change to the documentation.

[x] I have updated the documentation accordingly.
opened by dixiyao 1
Added a new fedavg algorithm supporting aggregating partial sub-modules of one model
This PR implements a new, perhaps enhanced, FedAvg algorithm for Plato to support extracting and aggregating partial sub-modules of one defined model.

Description

In many learning setups, only part of the model is used as the global model to be exchanged between server and client. For instance, after defining a ResNet model, its fully convolutional neural network will be utilized as the global model, while the fully-connected part will remain locally.

To achieve the aforementioned feature, this PR inherits from Plato's conventional FedAvg algorithm and boosts the extract_weights function. Besides, there are also some necessary functions to support a wider range of applications.

With the new FedAvg, which parts of the model are utilized as the global model can be set by the hyper-parameter named global_submodules_name whose format should be: {submodule1_prefix}__{submodule2_prefix}__{submodule3_prefix}__... where names for different submodules are separated by two consecutive underscores.

How has this been tested?

This PR can be tested through the unit test called fedavg_tests.py under the folder tests/.

To run the test, you have to first switch to Plato's root folder. And, then you can run:

plato@user:~$ python tests/fedavg_tests.py

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code has been formatted using Black and checked using PyLint.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.
opened by CSJDeveloper 2
Supported a more general way of checkpoint operations
This PR implements multiple checkpoint operations, which can be utilized directly.

Description

Plato should contain sufficient checkpoint operations. Based on these operations, the checkpoint can be saved, loaded, or operated based on desired requirements.

Therefore, this PR mainly includes three types of operations:

Saving

Loading

Checkpoint searching, such as searching for the latest checkpoint

Additionally, the code to generate a consistent filename is implemented to make all filenames of Plato share the same format.

How has this been tested?

As the code under this does not influence existing Plato's examples, the only way utilized to test the implementation is the unit test checkpoint_tests.py placed under tests/ folder of Plato.

To run the test, you must first switch to Plato's root folder. And, then you can run:

plato@user:~$ python tests/checkpoint_tests.py

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code has been formatted using Black and checked using PyLint.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.
opened by CSJDeveloper 1
Added more visual data augmentations
This PR introduces more data augmentations for the visual images.

Description

When implementing other methods, such as self-supervised learning (SSL), under the components of Plato, the datasource generally requires more additional and complex augmentations. One great example is that once the typical SSL method, called BYOL, is utilized for training the model in Plato, the input image should be processed to generate multi-view samples, each corresponding to one specific data augmentation.

Currently, Plato's simple data augmentation method does not support this.

To fill this gap, this PR is created to 1). add a more general way to create different data augmentations; 2). implement multiple visual transforms used in SSL; 3). collect normalizations for different datasets for clarity.

How has this been tested?

No test is needed as 1). the correctness of the code has been proved by work under the 'contrastive_adaptation' branch; 2). sufficient links are added in the comment to present the source and support of the implementation.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue) Fixes #

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

[x] My code has been formatted using Black and checked using PyLint.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.
opened by CSJDeveloper 1

Releases(v0.4.6)

v0.4.6(Dec 5, 2022)

Bug fixes and various improvements.
Source code(tar.gz)
Source code(zip)
v0.4.5(Oct 27, 2022)

Improved client and server APIs; made client-side processors more customizable; added several examples showcasing how the APIs are to be used; various bug fixes.
Source code(tar.gz)
Source code(zip)
v0.4.4(Aug 20, 2022)

Redesigned the API for the server, trainer, and algorithm; Supported new documentation and its automated deployment; Redesigned some examples based on the new API.
Source code(tar.gz)
Source code(zip)
v0.4.3(Jul 20, 2022)

Added more learning rate schedules from PyTorch; added approximate simulations of communication times; revised quantization processors; revised the way of using custom models, datasources, trainers, and algorithms; many bug fixes.
Source code(tar.gz)
Source code(zip)
v0.4.2(Jun 1, 2022)

Added support for training generative adversarial networks; bug fixes.
Source code(tar.gz)
Source code(zip)
v0.4.1(May 20, 2022)

Fixed several important issues related to client-side samplers, loading custom algorithms, federated unlearning, and added default values for configurations.
Source code(tar.gz)
Source code(zip)
v0.4.0(May 14, 2022)

Supported running an FL session on multiple GPUs, and further improved scalability in memory usage by always launching a constant number of client processes regardless of the number of clients selected per round. Made client simulation mode the default and only mode of operation.
Source code(tar.gz)
Source code(zip)
v0.3.9(May 2, 2022)

Fixed an urgent issue related to loading custom trainers from examples/.
Source code(tar.gz)
Source code(zip)
v0.3.8(Apr 30, 2022)

Miscellaneous improvements and bug fixes.
Source code(tar.gz)
Source code(zip)
v0.3.7(Feb 20, 2022)

Added support for HuggingFace Language Modelling models and datasets, reinforcement learning servers, simulating client/server communication, measuring communication time, additional examples using the asynchronous mode, and removed wandb usage.
Source code(tar.gz)
Source code(zip)
v0.3.6(Feb 4, 2022)

Bug fixes, added the EMNIST dataset and supported resuming a FL training session.
Source code(tar.gz)
Source code(zip)
v0.3.5(Jan 28, 2022)

This release fixed several issues in async mode operation when the wall-clock time is simulated on the server.
Source code(tar.gz)
Source code(zip)
v0.3.4(Jan 23, 2022)

Added several multi-modal data sources, and supported simulating the wall clock time in asynchronous mode, when the clients on the same physical machine are training in small batches (controlled by trainer -> max_concurrency) due to insufficient GPU memory.
Source code(tar.gz)
Source code(zip)
v0.3.3(Dec 30, 2021)

Added support for differentially private training on the client side, fixed issues related to cross-silo training, and added basic support for asynchronous training with bounded staleness.
Source code(tar.gz)
Source code(zip)
v0.3.2(Dec 9, 2021)

Added basic model and feature processors before data payloads are transmitted over the network.
Source code(tar.gz)
Source code(zip)

Owner

System Group@Theory Lab

GitHub

Federated Learning - Including common test models for federated learning, like CNN, Resnet18 and lstm, controlled by different parser

Federated_Learning ?? This projest include common test models for federated lear

10 Dec 11, 2022

A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

175 Dec 1, 2022

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cross-device use-cases over FEDn networks.

75 Nov 9, 2022

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

5.1k Jan 4, 2023

An open framework for Federated Learning.

Welcome to Intel® Open Federated Learning Federated learning is a distributed machine learning approach that enables organizations to collaborate on m

397 Dec 27, 2022

Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Megaverse Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of the engine enables ph

191 Dec 23, 2022

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

25.5k Jan 7, 2023

Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

VFedPCA+VFedAKPCA This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-

9 Sep 18, 2022

The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Voxelator The all new way to turn your boring vector meshes into the new fad in town; Voxels! Notes: I have not tested this on a rotated mesh. With fu

6 Feb 3, 2022

⚡ Fast • 🪶 Lightweight • 0️⃣ Dependency • 🔌 Pluggable • 😈 TLS interception • 🔒 DNS-over-HTTPS • 🔥 Poor Man's VPN • ⏪ Reverse & ⏩ Forward • 👮🏿 "Proxy Server" framework • 🌐 "Web Server" framework • ➵ ➶ ➷ ➠ "PubSub" framework • 👷 "Work" acceptor & executor framework

Table of Contents Features Install Using PIP Stable version Development version Using Docker Stable version Development version Using HomeBrew Stable

2.2k Jan 8, 2023

FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

208 Dec 14, 2022

Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

121 Dec 25, 2022

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

156 Dec 15, 2022

[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

178 Jan 6, 2023

Plato: A New Framework for Federated Learning Research

Related tags

Overview

Plato: A New Framework for Federated Learning Research

Installing Plato with PyTorch

Running Plato in a Docker container

Running Plato

Installing YOLOv5 as a Python package

Plotting Runtime Results

Running Unit Tests

Installing Plato with MindSpore

Uninstalling Plato

Technical support

Comments

Description

How has this been tested?

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Additional information

Description

How has this been tested?

Types of changes

Checklist:

Approach

Motivation

Description

How has this been tested?

Example results

Context

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Description

How has this been tested?

Types of changes

Checklist:

Releases(v0.4.6)

v0.4.6(Dec 5, 2022)

v0.4.5(Oct 27, 2022)

v0.4.4(Aug 20, 2022)

v0.4.3(Jul 20, 2022)

v0.4.2(Jun 1, 2022)

v0.4.1(May 20, 2022)

v0.4.0(May 14, 2022)

v0.3.9(May 2, 2022)

v0.3.8(Apr 30, 2022)

v0.3.7(Feb 20, 2022)

v0.3.6(Feb 4, 2022)

v0.3.5(Jan 28, 2022)