An energy estimator for eyeriss-like DNN hardware accelerator

HEXIN BAO

Last update: Mar 26, 2022

Related tags

Deep Learning Energy-Estimator-for-Eyeriss-like-Architecture-

Overview

Energy-Estimator-for-Eyeriss-like-Architecture-

An energy estimator for eyeriss-like DNN hardware accelerator

This is an energy estimator for eyeriss-like architecture utilizing Row-Stationary dataflow which is a DNN hardware accelerator created by works from Vivienne Sze’s group in MIT. You can refer to their original works in github, Y. N. Wu, V. Sze, J. S. Emer, “An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs,” IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2020, http://eyeriss.mit.edu/, etc. Thanks to their contribution in DNN accelerator and energy efficient design.

Eyeriss-like architecture utilizes row-stationary dataflow in order to fully explore data reuse including convolutional reuse, ifmap reuse and filter reuse. In general, the energy breakdown in each DNN layer can be separated in terms of computation and memory access (or data transfer).

Computation Energy : Performing MAC operations. Data Energy : The number of bits accessed at each memory level is calculated based on the dataflow and scaled by the hardware energy cost of accessing one bit at that memory level. The data energy is the summation of each memory hierarchy (DRAM, NoC, Global Buffer, RF) or each data type (ifmap, weight, partial sum).

Quantization ↔ Bitwidth Energy scaling in computation : linear for single operand scaling. Quadratic for two operands scaling. Energy scaling in data access : Linear scaling for any data type in any memory hierarchy.
Pruning on filters (weights) Energy scaling in computation : Skip MAC operations according to pruning ratio. (Linear scaling) Energy scaling in data access : Linear scaling for weight access.

Assumptions: Initial image input and weights in each layer should be first read from DRAM (external off-chip memory). Global Buffer is big enough to store any amount of datum and intermediate numbers. NoC has high-performance and high throughput with non-blocking broadcasting and inter-PE forwarding capability which supports multiple information transactions simultaneously. No data compression technique is considered in estimator design. Each PE is able to recognize information transferred among NoCs so that only those in need could receive data. Sparsity of weights and activations aren’t considered. Register File inside each PE only has the capacity to store one row of weights, one row of ifmap and one partial sum which means that we won’t take the capacity of RF into account. (A pity in this energy estimator) Ifmap and ofmap of each layer should be read from or written back into DRAM for external read operations. Once a data value is read from one memory level and then written into another memory level, the energy consumption of this transaction is always decided by the higher-cost level and only regarded as a single operation. Data transfer could happen directly between any 2 memory levels. This estimator is only applied to 2D systolic PE arrays. Partial sum and ofmap of one layer have the same bitwidth as activations. Maxpooling, Relu and LRN are not taken into account with respect to energy estimation. (little impact on total estimation) In order to make full use of data reuse (convolutional reuse and ifmap reuse), apart from row-stationary dataflow, scheduling algorithm will try to avoid reading ifmaps as much as possible. Once a channel of ifmap is kept inside the RF, the computation will be executed across the corresponding channel of entire filters in each layer.

Example analysis : Hardware Architecture : Eyeriss PE size : 12*14, 2D Dataflow : Row Stationary DNN Model : AlexNet (5 conv layers, 3 FC layers) Initial Input : single image from ImageNet Additional Attributes : Pruning and Quantization (You can revise your own pruning ratio and bitwidth of weight and activation in my source code) Everything is not hard-coded !

A pity ! (future works to do) 3D PE arrays. Memory size is considered in scheduling algorithm to accommodate more intermediate datum in low-cost level without writing back to high-cost level. Possible I/O data compression. (encoder, decoder) Possible sparsity optimization. (zero-gated MAC) Elaborate operation with specific arguments like random read, repeated write, constant read, etc. The impact of memory size, spatial distribution, location can be taken into account when we try to improve precision of our energy estimator. For example, the spatial distribution between two PEs can be characterized by Manhattan distance which can be used to scale the energy consumption of data forwarding in NoC.

If you have any questions or troubles please contact me. I'd also like to listen to your advice and opinions!

Comments

energy

Hello: In the energy consumption of partial summaries section，why Buffer_Psum = ofmap_size2 * 2 * (self.channel - 1) * self.kernel , not Buffer_Psum = ofmap_size2 * self.kernel

opened by sys-syj 1

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

💡 Learnergy is a Python library for energy-based machine learning models.

Learnergy: Energy-based Machine Learners Welcome to Learnergy. Did you ever reach a bottleneck in your computational experiments? Are you tired of imp

57 Nov 17, 2022

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this

3.9k Dec 20, 2022

3.9k Feb 9, 2021

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

72 Jan 3, 2023

PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

Code for On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models This repository will reproduce the main results from our pape

32 Nov 25, 2022

An energy estimator for eyeriss-like DNN hardware accelerator

Related tags

Overview

Energy-Estimator-for-Eyeriss-like-Architecture-

You might also like...

💡 Learnergy is a Python library for energy-based machine learning models.

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Learning Energy-Based Models by Diffusion Recovery Likelihood

A project which aims to protect your privacy using inexpensive hardware and easily modifiable software

Comments

energy

Owner

HEXIN BAO

FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging.

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

a dnn ai project to classify which food people are eating on audio recordings

CUP-DNN is a deep neural network model used to predict tissues of origin for cancers of unknown of primary.

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch