CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Related tags

Job Scheduler cosa
Overview

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

CoSA is a scheduler for spatial DNN accelerators that generate high-performance schedules in one shot using mixed integer programming (MIP). For more details, please refer to:

CoSA leverages the regularities in DNN operators and hardware to formulate the DNN scheduling space into a MIP problem with algorithmic and architectural constraints, which can be solved to automatically generate a highly efficient schedule in one shot.

Installation

  1. CoSA Download the CoSA source code git clone [email protected]:ucb-bar/cosa.git
  2. Gurobi Please follow the instructions in Gurobi for Academics and Researchers to install Gurobi optimizer and obtain an academic license.
  3. Timeloop Please refer to the instructions in the Timeloop Tutorial to install Timeloop with Docker. To install from source code please, follow the instructions in Timeloop Github. The specific Timeloop version used for CoSA evaluation is commit 11920be.
  4. Python3 Install packages for Python3 with:
pip install numpy==1.19.0 PyYAML==5.3.1 yamlordereddictloader==0.4.0 seaborn==0.10.1 matplotlib==3.2.2 pandas==1.0.5
  1. Environment Set##up Update env.sh with the paths to COSA_DIR, TIMELOOP_DIR, GUROBI_HOME, and GRB_LICENSE_FILE, and source the environment source env.sh.

Run CoSA

To run one scheduling example:

python3 src/cosa.py

CoSA Inputs and Outputs

CoSA takes problem dimension, architecture constraints, relation encoding constants as inputs and returns a mapping with tiling, temporal/spatial, and permutation solved to optimize the user defined objective.

def cosa(prob, arch, A, B, part_ratios, global_buf_idx, Z=None): 
    """Run CoSA to generate a mapping with tiling, temporal/spatial, and permutation determined. 
        We currently assume there is a global buffer 
    Args:
        prob: An object defines the layer dimension.
        arch: An object defines the hardware architecture dimension. 
        A: A 2d binary constant matrix that encodes the layer dimension to data tensor relationship.
            1 means related, 0 means unrelated
            Note that the R,S to the input tensor relation is specially handled in the formulation,
            and are specified to 0. 
        B: A 2d binary constant matrix that encodes the data tensor to memory level mapping. 
            It can be derived from the mapspace bypass pattern in Timeloop. 
            Note it is intended to be used for even mapping among different data tensors to different memory levels.
        part_ratios: A 2d array to represent the partition ratios of different data tensors in different memory buffers. 
        global_buf_idx: An index point to the global buffer. 
        Z: Similar to B, but intended for uneven mapping among different data tensors to different memory levels.
            It is a 3d binary constant matrix that encodes the data tensor to memory level mapping.

    Returns: 
        factor_config: A 2d array specifying the allocation decision for each prime factor.
        spatial_config: A 2d array specifying the temporal/spatial decisions for each prime factor.
        perm_config: A 2d array specifying the ordering of R,S,P,Q,C,K,N factors at each 
    """

Even and Uneven Mapping

CoSA shall be able to support the even (using matrix B to encode bypassing scheme in Timeloop) and uneven mapping (using matrix Z to encode rank to memory mapping for different data tensors as in ZigZag)

You might also like...
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

A Python library for differentiable optimal control on accelerators.

A Python library for differentiable optimal control on accelerators.

Pyccel stands for Python extension language using accelerators.

Pyccel stands for Python extension language using accelerators.

Azure MLOps (v2) solution accelerators.
Azure MLOps (v2) solution accelerators.

Azure MLOps (v2) solution accelerator Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting poi

MicroPython - a lean and efficient Python implementation for microcontrollers and constrained systems
MicroPython - a lean and efficient Python implementation for microcontrollers and constrained systems

The MicroPython project This is the MicroPython project, which aims to put an implementation of Python 3.x on microcontrollers and small embedded syst

PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

This is the official implementation of the following paper: Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau. PICARD - Parsing Incrementally for Con

Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking"

model_based_energy_constrained_compression Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and

Locally Constrained Self-Attentive Sequential Recommendation
Locally Constrained Self-Attentive Sequential Recommendation

LOCKER This is the pytorch implementation of this paper: Locally Constrained Self-Attentive Sequential Recommendation. Zhankui He, Handong Zhao, Zhe L

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Official PyTorch implementation of the paper
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

Python job scheduling for humans.

schedule Python job scheduling for humans. Run Python functions (or any other callable) periodically using a friendly syntax. A simple to use API for

Driving lessons made simpler. Custom scheduling API built with Python.
Driving lessons made simpler. Custom scheduling API built with Python.

NOTE This is a mirror of a GitLab repository. Dryvo Dryvo is a unique solution for the driving lessons industry. Our aim is to save the teacher’s time

Python job scheduling for humans.

schedule Python job scheduling for humans. Run Python functions (or any other callable) periodically using a friendly syntax. A simple to use API for

Conference planning tool: CfP, scheduling, speaker management
Conference planning tool: CfP, scheduling, speaker management

pretalx is a conference planning tool focused on providing the best experience for organisers, speakers, reviewers, and attendees alike. It handles th

Conference planning tool: CfP, scheduling, speaker management
Conference planning tool: CfP, scheduling, speaker management

pretalx is a conference planning tool focused on providing the best experience for organisers, speakers, reviewers, and attendees alike. It handles th

Oncall is a calendar tool designed for scheduling and managing on-call shifts. It can be used as source of dynamic ownership info for paging systems like http://iris.claims.
Oncall is a calendar tool designed for scheduling and managing on-call shifts. It can be used as source of dynamic ownership info for paging systems like http://iris.claims.

Oncall See admin docs for information on how to run and manage Oncall. Development setup Prerequisites Debian/Ubuntu - sudo apt-get install libsasl2-d

MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

MOpt-AFL 1. Description MOpt-AFL is a AFL-based fuzzer that utilizes a customized Particle Swarm Optimization (PSO) algorithm to find the optimal sele

OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference Scheduling, Job Shop Scheduling, Bin Packing and many more planning problems.

Desktop application for Windows/macOS users to rotate through custom, preset, and searched-for collections of backgrounds with scheduling and additional settings

Background Revolution (In Development, Alpha Release) What? This will be an application for users to customize their windows backgrounds by uploading

Comments
  • Timeloop command line and configuration used in the paper

    Timeloop command line and configuration used in the paper

    Hi I was wondering if you can post the timeloop-mapper yaml and command line you used for the paper. I installed the timeloop as you suggested in the readme and using simba arch in the repository along with resnet workloads in configs. I am adding mapper config file to explicitly set num-threads but letting everything else be default. But mapper is able to use only 1 processors in many cases. I was wondering if you can help with this.

    opened by vmiheer 3
  •  Simba accelerator architecture.

    Simba accelerator architecture.

    Hi, the Simba accelerator architecture in the 'simba.yaml' of your source code is inconsistent with that in Table V. of your paper? Could you please explain this?Thanks!!!

    opened by FuyuWang 1
  • Make CoSA packagable

    Make CoSA packagable

    CoSA is now packagable. In order to build, install poetry, cd into the CoSA directory, and run poetry build; this will generate a whl file in dist/ that can be installed in one command:

    pip install dist/cosa_scheduler-0.1.0-py3-none-any.whl
    
    opened by gdinh 0
  • Sample usage on actual model and NoC simulator

    Sample usage on actual model and NoC simulator

    According to the paper, CoSA was tested on AlexNet, ResNet-50, ResNeXt-50 on the NoC simulator. Would be cool to see samples of it being used on these networks!

    opened by raikonenfnu 2
Owner
UC Berkeley Architecture Research
UC Berkeley Architecture Research
A Python concurrency scheduling library, compatible with asyncio and trio.

aiometer aiometer is a Python 3.6+ concurrency scheduling library compatible with asyncio and trio and inspired by Trimeter. It makes it easier to exe

Florimond Manca 182 Dec 26, 2022
Clepsydra is a mini framework for task scheduling

Intro Clepsydra is a mini framework for task scheduling All parts are designed to be replaceable. Main ideas are: No pickle! Tasks are stored in reada

Andrey Tikhonov 15 Nov 4, 2022
A task scheduler with task scheduling, timing and task completion time tracking functions

A task scheduler with task scheduling, timing and task completion time tracking functions. Could be helpful for time management in daily life.

ArthurLCW 0 Jan 15, 2022
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
Official implementation of the MM'21 paper Constrained Graphic Layout Generation via Latent Optimization

[MM'21] Constrained Graphic Layout Generation via Latent Optimization This repository provides the official code for the paper "Constrained Graphic La

Kotaro Kikuchi 73 Dec 27, 2022
PyTorch implementation of Constrained Policy Optimization

PyTorch implementation of Constrained Policy Optimization (CPO) This repository has a simple to understand and use implementation of CPO in PyTorch. A

Sapana Chaudhary 25 Dec 8, 2022
A semismooth Newton method for elliptic PDE-constrained optimization

sNewton4PDEOpt The Python module implements a semismooth Newton method for solving finite-element discretizations of the strongly convex, linear ellip

null 2 Dec 8, 2022
Spatial Interpolation Toolbox is a Python-based GUI that is able to interpolate spatial data in vector format.

Spatial Interpolation Toolbox This is the home to Spatial Interpolation Toolbox, a graphical user interface (GUI) for interpolating geographic vector

Michael Ward 2 Nov 1, 2021
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

null 166 Dec 27, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

null 167 Jan 2, 2023