A low-impact profiler to figure out how much memory each task in Dask is using

Itamar Turner-Trauring

Last update: Dec 9, 2022

Related tags

Overview

dask-memusage

If you're using Dask with tasks that use a lot of memory, RAM is your bottleneck for parallelism. That means you want to know how much memory each task uses:

So you can set the highest parallelism level (process or threads) for each machine, given available to RAM.
In order to know where to focus memory optimization efforts.

dask-memusage is an MIT-licensed statistical memory profiler for Dask's Distributed scheduler that can help you with both these problems.

dask-memusage polls your processes for memory usage and records the minimum and maximum usage for each task in the Dask execution graph in a CSV:

task_key,min_memory_mb,max_memory_mb
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 0)",44.84765625,96.98046875
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 1)",47.015625,97.015625
"('sum-part-e15703211a549e75b11c63e0054b53e5', 0)",0,0
"('sum-part-e15703211a549e75b11c63e0054b53e5', 1)",0,0
sum-aggregate-apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,0,0
apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,47.265625,47.265625
task_key,min_memory_mb,max_memory_mb
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 0)",44.84765625,96.98046875
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 1)",47.015625,97.015625
"('sum-part-e15703211a549e75b11c63e0054b53e5', 0)",0,0
"('sum-part-e15703211a549e75b11c63e0054b53e5', 1)",0,0
sum-aggregate-apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,0,0
apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,47.265625,47.265625

You may also find the Fil memory profiler useful in tracking down which specific parts of your code are responsible for peak memory allocations.

Example

Here's a working standalone program using dask-memusage; notice you just need to add two lines of code:

from time import sleep
import numpy as np
from dask.bag import from_sequence
from dask import compute
from dask.distributed import Client, LocalCluster

from dask_memusage import install  # <-- IMPORT

def allocate_50mb(x):
    """Allocate 50MB of RAM."""
    sleep(1)
    arr = np.ones((50, 1024, 1024), dtype=np.uint8)
    sleep(1)
    return x * 2

def no_allocate(y):
    """Don't allocate any memory."""
    return y * 2

def make_bag():
    """Create a bag."""
    return from_sequence(
        [1, 2], npartitions=2
    ).map(allocate_50mb).sum().apply(no_allocate)

def main():
    cluster = LocalCluster(n_workers=2, threads_per_worker=1,
                           memory_limit=None)
    install(cluster.scheduler, "memusage.csv")  # <-- INSTALL
    client = Client(cluster)
    compute(make_bag())

if __name__ == '__main__':
    main()

Usage

Important: Make sure your workers only have a single thread! Otherwise the results will be wrong.

Installation

On the machine where you are running the Distributed scheduler, run:

$ pip install dask_memusage

Or if you're using Conda:

$ conda install -c conda-forge dask-memusage

API usage

# Add to your Scheduler object, which is e.g. your LocalCluster's scheduler
# attribute:
from dask_memoryusage import install
install(scheduler, "/tmp/memusage.csv")

CLI usage

$ dask-scheduler --preload dask_memusage --memusage.csv /tmp/memusage.csv

Limitations

Again, make sure you only have one thread per worker process.
This is statistical profiling, running every 10ms. Tasks that take less than that won't have accurate information.

Help

Need help? File a ticket at https://github.com/itamarst/dask-memusage/issues/new

You might also like...

Inkscape extensions for figure resizing and editing

Academic-Inkscape: Extensions for figure resizing and editing This repository contains several Inkscape extensions designed for editing plots. Scale P

192 Dec 26, 2022

Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

HW2 - ME 495 Overview Part 1: Makes the robot move in a figure 8 shape. The robot starts moving when launched on a real turtlebot3 and can be paused a

0 Oct 21, 2022

Simple function to plot multiple barplots in the same figure.

Simple function to plot multiple barplots in the same figure. Supports padding and custom color.

2 Feb 21, 2022

A simple code for plotting figure, colorbar, and cropping with python

Python Plotting Tools This repository provides a python code to generate figures (e.g., curves and barcharts) that can be used in the paper to show th

134 Jan 2, 2023

IPE is a simple tool for analyzing IP addresses. With IPE you can find out the server region, city, country, longitude and latitude and much more in seconds.

0 Jun 11, 2022

A task scheduler with task scheduling, timing and task completion time tracking functions

A task scheduler with task scheduling, timing and task completion time tracking functions. Could be helpful for time management in daily life.

0 Jan 15, 2022

InfraGenie is allows you to split out your infrastructure project into separate independent pieces, each with its own terraform state.

🧞 InfraGenie InfraGenie is allows you to split out your infrastructure project into separate independent pieces, each with its own terraform state. T

53 Nov 23, 2022

Read and write rasters in parallel using Rasterio and Dask

dask-rasterio dask-rasterio provides some methods for reading and writing rasters in parallel using Rasterio and Dask arrays. Usage Read a multiband r

85 Aug 30, 2022

Demo of using DataLoader to prevent out of memory

3 Jun 25, 2022

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection The official PyTorch implementation for HLA-Face: Joint High-Low Adaptation for Low L

77 Dec 8, 2022

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

77 Dec 24, 2022

A library for low-memory inferencing in PyTorch.

Pylomin Pylomin (PYtorch LOw-Memory INference) is a library for low-memory inferencing in PyTorch. Installation ... Usage For example, the following c

3 Oct 26, 2022

:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof

1.3k Dec 30, 2022

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community

23.6k Dec 31, 2022

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

20.6k Feb 13, 2021

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

697 Jan 6, 2023

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

349 Feb 15, 2021

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

23.6k Jan 3, 2023

Turn a STAC catalog into a dask-based xarray

StackSTAC Turn a list of STAC items into a 4D xarray DataArray (dims: time, band, y, x), including reprojection to a common grid. The array is a lazy

148 Dec 19, 2022

Comments

send_recv_from_rpc() takes 0 positional arguments but 1 was given
Hi,

I am following the instructions on the github site. First, I installed dask-memusage with pip install dask_memusage. I then create my dask cluster with cluster = LocalCluster(n_workers=3, threads_per_worker=1); client = Client(cluster). When I use install(client.scheduler, "/path/to/csv"), i get the following error:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/torresba/.pyenv/versions/3.8.4/lib/python3.8/site-packages/dask_memusage.py", line 123, in install scheduler.add_plugin(plugin) TypeError: send_recv_from_rpc() takes 0 positional arguments but 1 was given

Am I doing something wrong here?

P.S: Also, I think there is a typo in the github site. Instead of from dask_memoryusage import install, I had to use from dask_memusage import install

Thanks
opened by kristiantorres 5

Typo in front page doc CLI usage line.

Change

$ dask-scheduler --preload dask_memusage --memusage.csv /tmp/memusage.csv

$ dask-scheduler --preload dask_memusage --memusage-csv /tmp/memusage.csv

opened by y-he2 1

Add dask_memusage.install introduce "ValueError: Inputs contain futures that were created by another client."

Thank you for the wonderful tool!

I would like to profile peak memory of my dask application. I can run it successfully without dask_memusage. However, after I add memusage.install, it causes "ValueError: Inputs contain futures that were created by another client." I use dask-memusage v1.1, dask-core v2021.3.0.

Attached my chunk of code here:

import dask_memusage
import gc
from utility import get_batch_index
from dask.distributed import Client, LocalCluster
from sklearn.neighbors import NearestNeighbors


CLUSTER_KWARGS = {
    'n_workers': 4,
    'threads_per_worker': 1,
    'processes': False,
    'memory_limit': '8GB',
}

cluster = LocalCluster(**CLUSTER_KWARGS)
dask_memusage.install(cluster.scheduler, 'memory_stats/memusage.csv')

def kNN_graph(X, key_index, ref_index, n_neighbors=10):
    gc.collect()
    nbrs = NearestNeighbors(n_neighbors=n_neighbors).fit(X[ref_index[0]:ref_index[1], :])
    distance, indices = nbrs.kneighbors(X[key_index[0]:key_index[1], :])
    return (distance, indices)


contamination = 0.1  # percentage of outliers
n_train = args.n_train  # number of training points
n_test = 1000  # number of testing points
n_features = args.dim
    
# Generate sample data
X_train, y_train, X_test, y_test = \
    generate_data(n_train=n_train,
                  n_test=n_test,
                  n_features=n_features,
                  contamination=contamination,
                  random_state=42)

k = 5
batch_size = 5000
n_samples = n_train


start = time.time()
batch_index = get_batch_index(n_samples=n_samples, batch_size=batch_size)
n_batches = len(batch_index)

# save the intermediate results
full_list = []

# scatter the data
future_X = client.scatter(X_train)

delayed_knn =  delayed(kNN_graph)

for i, index_A in enumerate(batch_index):
    for j, index_B in enumerate(batch_index):
        full_list.append(delayed_knn(future_X, index_A, index_B, k))
        
full_list = dask.compute(full_list)

opened by CAROLZXYZXY 2

Explain requirement for Distributed, and how to use LocalCluster

If you want 8 worker processes:

from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=8, threads_per_worker=1, memory_limit=None)
Client(cluster)

opened by itamarst 0

Releases(v1.1)

v1.1(Jan 18, 2020)

Source code(tar.gz)
Source code(zip)
v1.0(Sep 28, 2018)

First release.
Source code(tar.gz)
Source code(zip)

Owner

Itamar Turner-Trauring

Helping software teams using Python to ship features faster.

GitHub

Rip Raw - a small tool to analyse the memory of compromised Linux systems

Rip Raw Rip Raw is a small tool to analyse the memory of compromised Linux systems. It is similar in purpose to Bulk Extractor, but particularly focus

127 Oct 28, 2022

Pyccel stands for Python extension language using accelerators.

242 Jan 2, 2023

Django query profiler - one profiler to rule them all. Shows queries, detects N+1 and gives recommendations on how to resolve them

Django Query Profiler This is a query profiler for Django applications, for helping developers answer the question "My Django code/page/API is slow, H

116 Dec 15, 2022

Django query profiler - one profiler to rule them all. Shows queries, detects N+1 and gives recommendations on how to resolve them

Django Query Profiler This is a query profiler for Django applications, for helping developers answer the question "My Django code/page/API is slow, H

116 Dec 15, 2022

A Python module that tries to figure out what your local timezone is

tzlocal This Python module returns a tzinfo object with the local timezone information under Unix and Windows. It requires either Python 3.9+ or the b

159 Dec 16, 2022

The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022

Scalene: a high-performance, high-precision CPU and memory profiler for Python

scalene: a high-performance CPU and memory profiler for Python by Emery Berger 中文版本 (Chinese version) About Scalene % pip install -U scalene Scalen

138 Dec 30, 2022

Lazy Profiler is a simple utility to collect CPU, GPU, RAM and GPU Memory stats while the program is running.

lazyprofiler Lazy Profiler is a simple utility to collect CPU, GPU, RAM and GPU Memory stats while the program is running. Installation Use the packag

28 Dec 9, 2022

Ffxiv-blended-job-icons - All action icons for each class/job are blended together to create new backgrounds for each job/class icon!

ffxiv-blended-job-icons All action icons for each class/job are blended together to create new backgrounds for each job/class icon! I used python to c

2 Jul 7, 2022

This a secret santa game organizer that assigns secret santa randomly to each participant and then sends an automated mail to each santa with details of his/her secret santa child.

Before executing the script, make sure to turn on 'Less Secure App access' option from your gmail ID that will be used to send out the mails to all participants of the game. To do so, get going with the following steps:

10 Dec 6, 2022

A low-impact profiler to figure out how much memory each task in Dask is using

Related tags

Overview

dask-memusage

Example

Usage

Installation

API usage

CLI usage

Limitations

Help

You might also like...

Inkscape extensions for figure resizing and editing

Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

Simple function to plot multiple barplots in the same figure.

A simple code for plotting figure, colorbar, and cropping with python

IPE is a simple tool for analyzing IP addresses. With IPE you can find out the server region, city, country, longitude and latitude and much more in seconds.

A task scheduler with task scheduling, timing and task completion time tracking functions

InfraGenie is allows you to split out your infrastructure project into separate independent pieces, each with its own terraform state.

Read and write rasters in parallel using Rasterio and Dask

Demo of using DataLoader to prevent out of memory

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

A library for low-memory inferencing in PyTorch.

:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Turn a STAC catalog into a dask-based xarray

Comments

send_recv_from_rpc() takes 0 positional arguments but 1 was given

Typo in front page doc CLI usage line.

Add dask_memusage.install introduce "ValueError: Inputs contain futures that were created by another client."

Explain requirement for Distributed, and how to use LocalCluster

Releases(v1.1)

v1.1(Jan 18, 2020)

v1.0(Sep 28, 2018)

Owner

Itamar Turner-Trauring

Rip Raw - a small tool to analyse the memory of compromised Linux systems

Pyccel stands for Python extension language using accelerators.

Django query profiler - one profiler to rule them all. Shows queries, detects N+1 and gives recommendations on how to resolve them

Django query profiler - one profiler to rule them all. Shows queries, detects N+1 and gives recommendations on how to resolve them

A Python module that tries to figure out what your local timezone is

The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Scalene: a high-performance, high-precision CPU and memory profiler for Python

Lazy Profiler is a simple utility to collect CPU, GPU, RAM and GPU Memory stats while the program is running.

Ffxiv-blended-job-icons - All action icons for each class/job are blended together to create new backgrounds for each job/class icon!

This a secret santa game organizer that assigns secret santa randomly to each participant and then sends an automated mail to each santa with details of his/her secret santa child.