Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Haoyan Huo

Last update: Nov 18, 2022

Related tags

Overview

opt-einsum-torch

There have been many implementations of Einstein's summation. numpy's numpy.einsum is the least efficient one as it only runs in single thread on CPU. PyTorch's torch.einsum works for both CPU and CUDA tensors. However, since there is no virtual CUDA memory, torch.einsum will run out of CUDA memory for large tensors.

This code aims at implementing a memory-efficient einsum function using PyTorch as the backend. This code also uses the opt_einsum package to optimizes the contraction path to achieve the minimal FLOPS.

Usage

from opt_einsum_torch import EinsumPlanner
import torch

# Some huge tensors
arr1, arr2 = ..., ...
ee = EinsumPlanner(torch.device('cuda:0'), cuda_mem_limit=0.9)
result = ee.einsum('ijk,jkl->il', arr1, arr2)

The resulting tensor result will be a PyTorch CPU tensor. You could convert it into numpy array by simply calling result.numpy().

Future works

Support multiple GPUs.
Memory efficient einsum kernels.
CUDA data transfer profilers.

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

RMNet This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation. Cite this work @inprocee

76 Dec 14, 2022

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

A memory-efficient implementation of DenseNets

efficient_densenet_pytorch A PyTorch =1.0 implementation of DenseNets, optimized to save GPU memory. Recent updates Now works on PyTorch 1.0! It uses

1.4k Dec 25, 2022

InvTorch: memory-efficient models with invertible functions

InvTorch: Memory-Efficient Invertible Functions This module extends the functionality of torch.utils.checkpoint.checkpoint to work with invertible fun

12 May 12, 2022

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

2 Jan 4, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Initial release of the package.
Source code(tar.gz)
Source code(zip)

Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Related tags

Overview

opt-einsum-torch

Usage

Future works

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

A memory-efficient implementation of DenseNets

InvTorch: memory-efficient models with invertible functions

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

GNPy: Optical Route Planning and DWDM Network Optimization

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Owner

Haoyan Huo

The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Episodic-memory - Ego4D Episodic Memory Benchmark

Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

Lowest memory consumption and second shortest runtime in NTIRE 2022 challenge on Efficient Super-Resolution