FedGS: Data Heterogeneity-Robust Federated Learning via Group Client Selection in Industrial IoT
Preparation
-
For instructions on generating data, please go to the folder of the corresponding dataset. For FEMNIST, please refer to femnist.
-
NVIDIA-Docker is required.
-
NVIDIA CUDA version 10.1 and higher is required.
How to run FedGS
Build a docker image
Enter the scripts folder and build a docker image named fedgs
.
sudo docker build -f build-env.dockerfile -t fedgs .
Modify /home/lizh/fedgs
to your actual project path in scripts/run.sh. Then run scripts/run.sh, which will create a container named fedgs.0
if CONTAINER_RANK
is set to 0 and starts the task.
chmod a+x run.sh && ./run.sh
The output logs and models will be stored in a logs
folder created automatically. For example, outputs of the FEMNIST task with container rank 0 will be stored in logs/femnist/0/
.
Hyperparameters
We categorize hyperparameters into default settings and custom settings, and we will introduce them separately.
Default Hyperparameters
These hyperparameters are included in utils/args.py. We list them in the table below (except for custom hyperparameters), but in general, we do not need to pay attention to them.
Variable Name | Default Value | Optional Values | Description |
---|---|---|---|
--seed | 0 | integer | Seed for client selection and batch splitting. |
--metrics-name | "metrics" | string | Name for metrics file. |
--metrics-dir | "metrics" | string | Folder name for metrics files. |
--log-dir | "logs" | string | Folder name for log files. |
--use-val-set | None | None | Set this option to use the validation set, otherwise the test set is used. (NOT TESTED) |
Custom Hyperparameters
These hyperparameters are included in scripts/run.sh. We list them below.
Environment Variable | Default Value | Description |
---|---|---|
CONTAINER_RANK | 0 | This identify the container (e.g., fedgs.0 ) and log files (e.g., logs/femnist/0/output.0 ). |
BATCH_SIZE | 32 | Number of training samples in each batch. |
LEARNING_RATE | 0.01 | Learning rate for local optimizers. |
NUM_GROUPS | 10 | Number of groups. |
CLIENTS_PER_GROUP | 10 | Number of clients selected in each group. |
SAMPLER | gbp-cs | Sampler to be used, can be random, brute, bayesian, probability, ga and gbp-cs. |
NUM_SYNCS | 50 | Number of internal synchronizations in each round. |
NUM_ROUNDS | 500 | Total rounds of external synchronizations. |
DATASET | femnist | Dataset to be used, only FEMNIST is supported currently. |
MODEL | cnn | Neural network model to be used. |
EVAL_EVERY | 1 | Interval rounds for model evaluation. |
NUM_GPU_AVAILABLE | 2 | Number of GPUs available. |
NUM_GPU_BEGIN | 0 | Index of the first available GPU. |
IMAGE_NAME | fedgs | Experimental image to be used. |
NOTE: If you wish to specify a GPU device (e.g., GPU0), please set
NUM_GPU_AVAILABLE=1
andNUM_GPU_BEGIN=0
.
NOTE: This script will mount project files
/home/lizh/fedgs
from the host into the container/root
, so please check carefully whether your file path is correct.
Visualization
The visualizer metrics/visualize.py reads metrics logs (e.g., metrics/metrics_stat_0.csv
and metrics/metrics_sys_0.csv
) and draws curves of accuracy, loss and so on.
Reference
-
This demo is implemented on LEAF-MX, which is a MXNET implementation of the well-known federated learning framework LEAF.
-
Li, Zonghang, Yihong He, Hongfang Yu, et al. "Data Heterogeneity-Robust Federated Learning via Group Client Selection in Industrial IoT." Submitted to IEEE Internet of Things Journal, (2021).
-
If you get trouble using this repository, please kindly contact us. Our email: [email protected]