Deploy recommendation engines with Edge Computing

Overview

License Activity Chat on Discord

RecoEdge: Bringing Recommendations to the Edge

A one stop solution to build your recommendation models, train them and, deploy them in a privacy preserving manner-- right on the users' devices.

RecoEdge integrate the phenomenal works by OpenMined and FedML to easily explore new federated learning algorithms and deploy them into production.

The steps to building an awesome recommendation system:

  1. 🔩 Standard ML training: Pick up any ML model and benchmark it using BaseTrainer
  2. 🎮 Federated Learning Simulation: Once you are satisfied with your model, explore a host of FL algorithms with FederatedWorker
  3. 🏭 Industrial Deployment: After all the testing and simulation, deploy easily using PySyft from OpenMined
  4. 🚀 Edge Computing: Integrate with NimbleEdge to improve FL training times by over 100x.

QuickStart

Let's train Facebook AI's DLRM on the edge. DLRM has been a standard baseline for all neural network based recommendation models.

Clone this repo and change the argument datafile in configs/dlrm.yml to the above path.

git clone https://github.com/NimbleEdge/RecoEdge
model :
  name : 'dlrm'
  ...
  preproc :
    datafile : "<Path to Criteo>/criteo/train.txt"
 

Install the dependencies with conda or pip

conda env create --name recoedge --file environment.yml
conda activate recoedge

Run data preprocessing with preprocess_data and supply the config file. You should be able to generate per-day split from the entire dataset as well a processed data file

python preprocess_data.py --config configs/dlrm.yml --logdir $HOME/logs/kaggle_criteo/exp_1

Begin Training

python train.py --config configs/dlrm.yml --logdir $HOME/logs/kaggle_criteo/exp_3 --num_eval_batches 1000 --devices 0

Run tensorboard to view training loss and validation metrics at localhost:8888

tensorboard --logdir $HOME/logs/kaggle_criteo --port 8888

Federated Training

This section is still work in progress. Reach out to us directly if you need help with FL deployment

Now we will simulate DLRM in federated setting. Create data split to mimic your users. We use Drichlet sampling for creating non-IID datasets for the model.

 

Adjust the parameters for distributed training like MPI in the config file

communications:
  gpu_map:
    host1: [0, 2]
    host2: [1, 0, 1]
    host3: [1, 1, 0, 1]
    host4: [0, 1, 0, 0, 0, 1, 0, 2]

Implement your own federated learning algorithm. In the demo we are using Federated Averaging. You just need to sub-class FederatedWorker and implement run() method.

@registry.load('fl_algo', 'fed_avg')
class FedAvgWorker(FederatedWorker):
    def __init__(self, ...):
        super().__init__(...)

    async def run(self):
        '''
            `Run` function updates the local model. 
            Implement this method to determine how the roles interact with each other to determine the final updated model.
            For example a worker which has both the `aggregator` and `trainer` roles might first train locally then run discounted `aggregate()` to get the fianl update model 


            In the following example,
            1. Aggregator requests models from the trainers before aggregating and updating its model.
            2. Trainer responds to aggregators' requests after updating its own model by local training.

            Since standard FL requires force updates from central entity before each cycle, trainers always start with global model/aggregator's model 

        '''
        assert role in self.roles, InvalidStateError("unknown role for worker")

        if role == 'aggregator':
            neighbours = await self.request_models_suspendable(self.sample_neighbours())
            weighted_params = self.aggregate(neighbours)
            self.update_model(weighted_params)
        elif role == 'trainer':
            # central server in this case
            aggregators = list(self.out_neighbours.values())
            global_models = await self.request_models_suspendable(aggregators)
            self.update_model(global_models[0])
            await self.train(model_dir=self.persistent_storage)
        self.round_idx += 1

    # Your aggregation strategy
    def aggregate(self, neighbour_ids):
        model_list = [
            (self.in_neighbours[id].sample_num, self.in_neighbours[id].model)
            for id in neighbour_ids
        ]
        (num0, averaged_params) = model_list[0]
        for k in averaged_params.keys():
            for i in range(0, len(model_list)):
                local_sample_number, local_model_params = model_list[i]
                w = local_sample_number / training_num
                if i == 0:
                    averaged_params[k] = local_model_params[k] * w
                else:
                    averaged_params[k] += local_model_params[k] * w

        return averaged_params

    # Your sampling strategy
    def sample_neighbours(self, round_idx, client_num_per_round):
        num_neighbours = len(self.in_neighbours)
        if num_neighbours == client_num_per_round:
            selected_neighbours = [
                neighbour for neighbour in self.in_neighbours]
        else:
            with RandomContext(round_idx):
                selected_neighbours = np.random.choice(
                    self.in_neighbours, min(client_num_per_round, num_neighbours), replace=False)
        logging.info("worker_indexes = %s" % str(selected_neighbours))
        return selected_neighbours

Begin FL simulation by

mpirun -np 20 python -m mpi4py.futures train_fl.py --num_workers 1000.

Deploy with PySyft

Customization

Training Configuration

There are two ways to adjust training hyper-parameters:

  • Set values in config/*.yml persistent settings which are necessary for reproducibility eg randomization seed
  • Pass them as CLI argument Good for non-persistent and dynamic settings like gpu device

In case of conflict, CLI argument supercedes config file parameter. For further reference, check out training config flags

Model Architecture

Adjusting DLRM model params

Any parameter needed to instantiate the pytorch module can be supplied by simply creating a key-value pair in the config file.

For example DLRM requires arch_feature_emb_size, arch_mlp_bot, etc

model: 
  name : 'dlrm'
  arch_sparse_feature_size : 16
  arch_mlp_bot : [13, 512, 256, 64]
  arch_mlp_top : [367, 256, 1]
  arch_interaction_op : "dot"
  arch_interaction_itself : False
  sigmoid_bot : "relu"
  sigmoid_top : "sigmoid"
  loss_function: "mse"

Adding new models

Model architecture can only be changed via configs/*.yml files. Every model declaration is tagged with an appropriate name and loaded into registry.

@registry.load('model','<model_name>')
class My_Model(torch.nn.Module):
    def __init__(num):
        ... 

You can define your own modules and add them in the fedrec/modules. Finally set the name flag of model tag in config file

model : 
  name : "<model name>"

Contribute

  1. Star, fork, and clone the repo.
  2. Do your work.
  3. Push to your fork.
  4. Submit a PR to NimbleEdge/RecoEdge

We welcome you to the Discord for queries related to the library and contribution in general.

Comments
  • Inline documentation of the criteo dataset

    Inline documentation of the criteo dataset

    Description

    I have tried to do the inline documentation of the criteo.py file in the repo.I have documented only some parts of it ,as I want to check first whether I am going in the right direction or not the rest required changes I will do after feedback

    Issue #165

    Checklist

    opened by soma2000-lang 16
  • Adding inline documentation for criteoprocessor

    Adding inline documentation for criteoprocessor

    opened by soma2000-lang 10
  • Correcting the grammatical errors in many files

    Correcting the grammatical errors in many files

    Description

    In the docs folder of envis edge there are several files having grammatical errors.Those have been corrected in this PR.

    Checklist

    opened by soma2000-lang 8
  • Added Inline documentation for job_response_model.py

    Added Inline documentation for job_response_model.py

    Description

    Added docstrings to the "serialize" and "deserialize" methods in fedrec/data_models/job_response_model.py, to help better explain what the methods do. This pull request only closes this issue.

    • Followed the numpy documentation style to write inline documentation for the job_response_model.py file.
    • Text is grammatically correct and has been checked using writing software.

    Relevant Issue

    Issue #186

    Affected Dependencies.

    This task is independent of any other files or folders in this project.

    Checklist

    documentation 📃 GSOD 
    opened by Tob-iee 8
  • added inline documentation

    added inline documentation

    Description

    Added inline documentation as instructed in the issue

    Relevant Issue

    Issue #194

    Affected Dependencies

    List any dependencies that are required for this change.

    How has this been tested?

    • Describe the tests that you ran to verify your changes.
    • Provide instructions so we can reproduce.
    • List any relevant details for your test configuration.

    Checklist

    opened by haripriya9647 7
  • corrected spelling and article errors!! ❤✔

    corrected spelling and article errors!! ❤✔

    Where

    https://github.com/NimbleEdge/EnvisEdge/blob/main/docs/source/tutorials/Tutorial-Part-4-deployment.rst

    Description

    1. corrected spelling of capabilities
    2. corrected spelling of dependencies
    3. corrected the verb form to serialize
    4. corrected spelling of themselves

    Relevant Issue

    Issue #Tag the issue here

    Affected Dependencies

    List any dependencies that are required for this change.

    How has this been tested?

    • Describe the tests that you ran to verify your changes.
    • Provide instructions so we can reproduce.
    • List any relevant details for your test configuration.

    Checklist

    documentation 📃 good first issue GSOD 
    opened by satyamroy001 6
  • 📚 Documentation: improve tutorials

    📚 Documentation: improve tutorials

    🙋 Where

    Here

    💭 Description

    We need to improve the tutorials for android and iOS for federated learning.

    👀 Have you spent some time to check if this issue has been raised before?

    • [X] I checked and didn't find similar issue

    🏢 Have you read the guidlines?

    • [X] I have read the Guildlines

    🏢 Have you read the Code of Conduct?

    • [X] I read the Code of Conduct
    documentation 📃 good first issue GSOD 
    opened by ramesht007 6
  • Update jobber.py

    Update jobber.py

    Created inline documentation for this file. I have followed all the instructions.

    Description

    Fixes #51

    • Followed the numpystyle documentation style to write inline documentation for the jobbers.py file.
    • Text is grammatically correct and has been checked using writing software.

    Relevant Issue

    -Partially fixes #182

    Affected Dependencies

    This work is independent of any other files or folders in this project.

    Checklist

    documentation 📃 GSOD 
    opened by abhiwalia15 6
  • Update criteo_processor.py

    Update criteo_processor.py

    opened by soma2000-lang 5
  • Update embeddings.py

    Update embeddings.py

    opened by soma2000-lang 5
  • updated serialization folder

    updated serialization folder

    Description

    added inline docs for methods and class

    Relevant Issue

    Issue #269 the issue here

    Affected Dependencies

    None

    How has this been tested?

    • Describe the tests that you ran to verify your changes.
    • Provide instructions so we can reproduce.
    • List any relevant details for your test configuration.

    Checklist

    documentation 📃 good first issue GSOD 
    opened by haripriya9647 4
  • added docs for topology tree

    added docs for topology tree

    Description

    added scala docs for topology tree data structure

    Relevant Issue

    Issue #132 the issue here

    Affected Dependencies

    List any dependencies that are required for this change - none

    How has this been tested?

    • Describe the tests that you ran to verify your changes.
    • Provide instructions so we can reproduce.
    • List any relevant details for your test configuration.

    Checklist

    scala 
    opened by haripriya9647 4
  • added docs for main file

    added docs for main file

    Description

    Added scala documentation for main file

    Relevant Issue

    Issue #Tag the issue here - no issue created

    Affected Dependencies

    List any dependencies that are required for this change.

    How has this been tested?

    • Describe the tests that you ran to verify your changes.
    • Provide instructions so we can reproduce.
    • List any relevant details for your test configuration.

    Checklist

    scala 
    opened by haripriya9647 4
  • added docs for FLsystemmanager

    added docs for FLsystemmanager

    Description

    added scala documentation for FLSystem Manager file

    Relevant Issue

    Issue #Tag the issue here - no issue created

    Affected Dependencies

    List any dependencies that are required for this change.

    How has this been tested?

    • Describe the tests that you ran to verify your changes.
    • Provide instructions so we can reproduce.
    • List any relevant details for your test configuration.

    Checklist

    scala 
    opened by haripriya9647 5
  • added docs for aggregator file

    added docs for aggregator file

    Description

    added scala documentation for aggregator file

    Relevant Issue

    Issue #Tag the issue here - no issue created

    Affected Dependencies

    List any dependencies that are required for this change.

    How has this been tested?

    • Describe the tests that you ran to verify your changes.
    • Provide instructions so we can reproduce.
    • List any relevant details for your test configuration.

    Checklist

    scala 
    opened by haripriya9647 5
  • added doc for Identifier Data Structure

    added doc for Identifier Data Structure

    Description

    This PR adds inline documentation to the scala_core/src/main/scala/org/nimbleedge/envisedge/models/Identifier.scala file.

    Relevant Issue

    Issue #131

    Checklist

    scala 
    opened by bashirk 2
  • added doc for Trainer.scala

    added doc for Trainer.scala

    opened by bashirk 2
Owner
NimbleEdge
An edge computing solution for all your needs
NimbleEdge
Azion the best solution of Edge Computing in the world.

Azion Edge Function docker action Create or update an Edge Functions on Azion Edge Nodes. The domain name is the key for decision to a create or updat

null 8 Jul 16, 2022
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

Sky Computing Introduction Sky Computing is a load-balanced framework for federated learning model parallelism. It adaptively allocate model layers to

HPC-AI Tech 72 Dec 27, 2022
Recommendationsystem - Movie-recommendation - matrixfactorization colloborative filtering recommendation system user

recommendationsystem matrixfactorization colloborative filtering recommendation

kunal jagdish madavi 1 Jan 1, 2022
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

Couler Project 781 Jan 3, 2023
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

null 19 Oct 27, 2022
QueryFuzz implements a metamorphic testing approach to test Datalog engines.

Datalog is a popular query language with applications in several domains. Like any complex piece of software, Datalog engines may contain bugs. The mo

null 34 Sep 10, 2022
Fuzzing JavaScript Engines with Aspect-preserving Mutation

DIE Repository for "Fuzzing JavaScript Engines with Aspect-preserving Mutation" (in S&P'20). You can check the paper for technical details. Environmen

gts3.org (SSLab@Gatech) 190 Dec 11, 2022
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

Numenta 6.3k Dec 30, 2022
YolactEdge: Real-time Instance Segmentation on the Edge

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images.

Haotian Liu 1.1k Jan 6, 2023
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Documentation | FAQ | Release Notes | Roadmap | MACE Model Zoo | Demo | Join Us | 中文 Mobile AI Compute Engine (or MACE for short) is a deep learning i

Xiaomi 4.7k Dec 29, 2022
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

NuPIC Numenta Platform for Intelligent Computing The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implem

Numenta 6.2k Feb 12, 2021
EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising

EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising By Tengfei Liang, Yi Jin, Yidong Li, Tao Wang. Th

workingcoder 115 Jan 5, 2023
xitorch: differentiable scientific computing library

xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely used in scientific computing applications as well as deep learning.

null 24 Apr 15, 2021
Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

Michael Schlichtkrull 29 Sep 2, 2022
A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Google Research 258 Dec 29, 2022
Blender scripts for computing geodesic distance

GeoDoodle Geodesic distance computation for Blender meshes Table of Contents Overivew Usage Implementation Overview This addon provides an operator fo

null 20 Jun 8, 2022
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Note: This is an alpha (preview) version which is still under refining. nn-Meter is a novel and efficient system to accurately predict the inference l

Microsoft 244 Jan 6, 2023