Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python

PLASMA @ UMass

Last update: Dec 30, 2022

Related tags

Miscellaneous python cpu profiler gpu memory-management performance-analysis memory-allocation profiling cpu-profiling memory-consumption gpu-programming python-profilers scalene profiles-memory performance-cpu

Overview

Scalene: a high-performance CPU, GPU and memory profiler for Python

by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno.

Scalene community Slack

About Scalene

Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information.

Quick Start

Installing Scalene:

pip install -U scalene

Using Scalene:

Commonly used options:

scalene your_prog.py                             # full profile (prints to console)
python3 -m scalene your_prog.py                  # equivalent alternative
scalene --cpu-only your_prog.py                  # only CPU/GPU
scalene --reduced-profile your_prog.py           # only profile lines with significant usage
scalene --html --outfile prof.html your_prog.py  # output HTML profile to 'prof.html'
scalene --profile-interval 5.0 your_prog.py.     # output a new profile every five seconds
scalene --help                                   # lists all options

To use Scalene programmatically in your code, invoke using scalene as above and then:

import scalene

# Turn profiling on
scalene_profiler.start()

# Turn profiling off
scalene_profiler.stop()

Scalene Overview

Scalene talk (PyCon US 2021)

This talk presented at PyCon 2021 walks through Scalene's advantages and how to use it to debug the performance of an application (and provides some technical details on its internals). We highly recommend watching this video!

Fast and Precise

Scalene is fast. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less).
Scalene performs profiling at the line level and per function, pointing to the functions and the specific lines of code responsible for the execution time in your program.

CPU profiling

Scalene separates out time spent in Python from time in native code (including libraries). Most Python programmers aren't going to optimize the performance of native code (which is usually either in the Python implementation or external libraries), so this helps developers focus their optimization efforts on the code they can actually improve.
Scalene highlights hotspots (code accounting for significant percentages of CPU time or memory allocation) in red, making them even easier to spot.
Scalene also separates out system time, making it easy to find I/O bottlenecks.

GPU profiling

Scalene reports GPU time (currently limited to NVIDIA-based systems).

Memory profiling

Scalene profiles memory usage. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator.
Scalene separates out the percentage of memory consumed by Python code vs. native code.
Scalene produces per-line memory profiles.
Scalene identifies lines with likely memory leaks.
Scalene profiles copying volume, making it easy to spot inadvertent copying, especially due to crossing Python/library boundaries (e.g., accidentally converting numpy arrays into Python arrays, and vice versa).

Other features

Scalene can produce reduced profiles (via --reduced-profile) that only report lines that consume more than 1% of CPU or perform at least 100 allocations.
Scalene supports @profile decorators to profile only specific functions.
When Scalene is profiling a program launched in the background (via &), you can suspend and resume profiling.

Comparison to Other Profilers

Performance and Features

Below is a table comparing the performance and features of various profilers to Scalene.

Slowdown: the slowdown when running a benchmark from the Pyperformance suite. Green means less than 2x overhead. Scalene's overhead is just a 20% slowdown.

Scalene has all of the following features, many of which only Scalene supports:

Lines or functions: does the profiler report information only for entire functions, or for every line -- Scalene does both.
Unmodified Code: works on unmodified code.
Threads: supports Python threads.
Multiprocessing: supports use of the multiprocessing library -- Scalene only
Python vs. C time: breaks out time spent in Python vs. native code (e.g., libraries) -- Scalene only
System time: breaks out system time (e.g., sleeping or performing I/O) -- Scalene only
Profiles memory: reports memory consumption per line / function
GPU: reports time spent on an NVIDIA GPU (if present) -- Scalene only
Memory trends: reports memory use over time per line / function -- Scalene only
Copy volume: reports megabytes being copied per second -- Scalene only
Detects leaks: automatically pinpoints lines responsible for likely memory leaks -- Scalene only

Output

Scalene prints annotated source code for the program being profiled (as text, JSON (--json), or HTML (--html)) and any modules it uses in the same directory or subdirectories (you can optionally have it --profile-all and only include files with at least a --cpu-percent-threshold of time). Here is a snippet from pystone.py.

Memory usage at the top: Visualized by "sparklines", memory consumption over the runtime of the profiled code.
"Time Python": How much time was spent in Python code.
"native": How much time was spent in non-Python code (e.g., libraries written in C/C++).
"system": How much time was spent in the system (e.g., I/O).
"GPU": (not shown here) How much time spent on the GPU, if your system has an NVIDIA GPU installed.
"Memory Python": How much of the memory allocation happened on the Python side of the code, as opposed to in non-Python code (e.g., libraries written in C/C++).
"net": Positive net memory numbers indicate total memory allocation in megabytes; negative net memory numbers indicate memory reclamation.
"timeline / %": Visualized by "sparklines", memory consumption generated by this line over the program runtime, and the percentages of total memory activity this line represents.
"Copy (MB/s)": The amount of megabytes being copied per second (see "About Scalene").

Using Scalene

The following command runs Scalene on a provided example program.

scalene test/testme.py

Click to see all Scalene's options (available by running with --help)

    % scalene --help
     usage: scalene [-h] [--outfile OUTFILE] [--html] [--reduced-profile]
                    [--profile-interval PROFILE_INTERVAL] [--cpu-only]
                    [--profile-all] [--profile-only PROFILE_ONLY]
                    [--use-virtual-time]
                    [--cpu-percent-threshold CPU_PERCENT_THRESHOLD]
                    [--cpu-sampling-rate CPU_SAMPLING_RATE]
                    [--malloc-threshold MALLOC_THRESHOLD]
     
     Scalene: a high-precision CPU and memory profiler.
     https://github.com/plasma-umass/scalene
     
     command-line:
        % scalene [options] yourprogram.py
     or
        % python3 -m scalene [options] yourprogram.py
     
     in Jupyter, line mode:
        %scrun [options] statement
     
     in Jupyter, cell mode:
        %%scalene [options]
        code...
        code...
     
     optional arguments:
       -h, --help            show this help message and exit
       --outfile OUTFILE     file to hold profiler output (default: stdout)
       --html                output as HTML (default: text)
       --reduced-profile     generate a reduced profile, with non-zero lines only (default: False)
       --profile-interval PROFILE_INTERVAL
                             output profiles every so many seconds (default: inf)
       --cpu-only            only profile CPU time (default: profile CPU, memory, and copying)
       --profile-all         profile all executed code, not just the target program (default: only the target program)
       --profile-only PROFILE_ONLY
                             profile only code in filenames that contain the given strings, separated by commas (default: no restrictions)
       --use-virtual-time    measure only CPU time, not time spent in I/O or blocking (default: False)
       --cpu-percent-threshold CPU_PERCENT_THRESHOLD
                             only report profiles with at least this percent of CPU time (default: 1%)
       --cpu-sampling-rate CPU_SAMPLING_RATE
                             CPU sampling rate (default: every 0.01s)
       --malloc-threshold MALLOC_THRESHOLD
                             only report profiles with at least this many allocations (default: 100)
     
     When running Scalene in the background, you can suspend/resume profiling
     for the process ID that Scalene reports. For example:
     
        % python3 -m scalene [options] yourprogram.py &
      Scalene now profiling process 12345
        to suspend profiling: python3 -m scalene.profile --off --pid 12345
        to resume profiling:  python3 -m scalene.profile --on  --pid 12345

Scalene with Jupyter

Instructions for installing and using Scalene with Jupyter notebooks

This notebook illustrates the use of Scalene in Jupyter.

Installation:

!pip install scalene
%load_ext scalene

Line mode:

%scrun [options] statement

Cell mode:

%%scalene [options]
code...
code...

Installation

Using pip (Mac OS X, Linux, Windows, and WSL2)

Scalene is distributed as a pip package and works on Mac OS X, Linux (including Ubuntu in Windows WSL2) and (with limitations) Windows platforms. (Note: the Windows port isn't complete yet and should be considered early alpha; it requires Python 3.8 or later.)

You can install it as follows:

  % pip install -U scalene

  % python3 -m pip install -U scalene

You may need to install some packages first.

See https://stackoverflow.com/a/19344978/4954434 for full instructions for all Linux flavors.

For Ubuntu/Debian:

  # Ubuntu 20
  % sudo apt install git python3-all-dev

  # Ubuntu 18
  % sudo apt install git python3-all-dev

Using Homebrew (Mac OS X)

As an alternative to pip, you can use Homebrew to install the current version of Scalene from this repository:

  % brew tap plasma-umass/scalene
  % brew install --head plasma-umass/scalene/scalene

On ArchLinux

You can install Scalene on Arch Linux via the AUR package. Use your favorite AUR helper, or manually download the PKGBUILD and run makepkg -cirs to build. Note that this will place libscalene.so in /usr/lib; modify the below usage instructions accordingly.

Asked Questions

Q: Is there any way to get shorter profiles or do more targeted profiling?

A: Yes! There are several options:

Use --reduced-profile to include only lines and files with memory/CPU/GPU activity.
Use --profile-only to include only filenames containing specific strings (as in, --profile-only foo,bar,baz).
Decorate functions of interest with @profile to have Scalene report only those functions.
Turn profiling on and off programmatically by importing Scalene (import scalene) and then turning profiling on and off via scalene_profiler.start() and scalene_profiler.stop(). By default, Scalene runs with profiling on, so to delay profiling until desired, use the --off command-line option (python3 -m scalene --off yourprogram.py).

Q: How do I run Scalene in PyCharm?

A: In PyCharm, you can run Scalene at the command line by opening the terminal at the bottom of the IDE and running a Scalene command (e.g., python -m scalene). Use the options --html and --outfile to generate an HTML file that you can then view in the IDE.

Q: How do I use Scalene with Django?

A: Pass in the --noreload option (see https://github.com/plasma-umass/scalene/issues/178).

Q: How do I use Scalene with PyTorch on the Mac?

A: Scalene works with PyTorch version 1.5.1 on Mac OS X. There's a bug in newer versions of PyTorch (https://github.com/pytorch/pytorch/issues/57185) that interferes with Scalene (discussion here: https://github.com/plasma-umass/scalene/issues/110), but only on Macs.

Technical Information

For technical details on Scalene, please see the following paper: Scalene: Scripting-Language Aware Profiling for Python (arXiv link).

Success Stories

If you use Scalene to successfully debug a performance problem, please add a comment to this issue!

Acknowledgements

Logo created by Sophia Berger.

This material is based upon work supported by the National Science Foundation under Grant No. 1955610. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Comments

`Scalene error: received signal SIGABRT`

Hello everyone,

I got this message Scalene error: received signal SIGABRT what does it mean and what should i do ?

Thank you in advance for your answers
bug

opened by thissasbnk 23
libscalene.so included in scalene v1.1.1 requires more recent glibc when running on CentOS 7.8
When trying scalene v1.1.1 (installed from source with pip install ., but I don't think that matters since libscalene.so is included in the source tarball), I'm getting this:

python3: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /.../lib/python3.6/site-packages/scalene/libscalene.so)

The issue here boils down to libscalene.so being compiled on an Linux OS that has a more recent glibc than is used in CentOS 7.8 (which is glibc 2.17).

Is there a specific reason why libscalene.so is packaged into the source tarball, as opposed to compiling it from source when scalene is being installed from source?

I'm happy to try and compile libscalene.so from source myself, but there doesn't seem to be any documentation available on that. I found heaplayers-make.mk which suggests that clang++ is needed (is that a strict requirement, or would a sufficiently recent g++ also work?), and that some source code from https://github.com/emeryberger/Heap-Layers is required as well.
opened by boegel 20

Scalene: internal error: unable to find Python allocator functions

Describe the bug When starting scalene (with poetry run python -m scalene my_script.py), I get the error Scalene: internal error: unable to find Python allocator functions. This happens both with poetry run python -m scalene my_script.py or if started within the venv (python -m scalene my_script.py or just scalene my_script.py).

Some seconds after this output, i get a long traceback (reported afterwards).

I don't really think this is something about my program, since it always worked without scalene, but I think it is something related to Trio.

To Reproduce Steps to reproduce the behavior:

Snippet that cause the error:

#asd.py

import time
import trio
time.sleep(3)

and then poetry run python -m scalene asd.py. This gives this traceback:

Error in program being profiled:
 string index out of range
Traceback (most recent call last):
  File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/scalene/scalene_profiler.py", line 1326, in profile_code
    exec(code, the_globals, the_locals)
  File "asd.py", line 2, in <module>
    import trio
  File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/trio/__init__.py", line 48, in <module>
    from ._sync import (
  File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/trio/_sync.py", line 475, in <module>
    @attr.s(eq=False, hash=False, repr=False)
  File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/attr/_make.py", line 1312, in wrap
    field_transformer,
  File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/attr/_make.py", line 610, in __init__
    field_transformer,
  File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/attr/_make.py", line 503, in _transform_attrs
    AttrsClass = _make_attr_tuple_class(cls.__name__, attr_names)
  File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/attr/_make.py", line 299, in _make_attr_tuple_class
    eval(compile("\n".join(attr_class_template), "", "exec"), globs)
  File "", line 1, in <module>
  File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/scalene/scalene_profiler.py", line 231, in invalidate_lines
    if f.f_code.co_filename[0] == "<" or "scalene" in f.f_code.co_filename:
IndexError: string index out of range

Commenting out the line import trio, makes everything works.

However, the line Scalene: internal error: unable to find Python allocator functions is always present.

Desktop (please complete the following information):

OS: WSL2
Python version: 3.7.10, installed with pyenv
Scalene version: 1.3.15

opened by didimelli 19

How to explain these memory behaviors when using numpy?

When running a very simple script based on numpy functions, we can get the following results:

test2.py: % of CPU time = 100.00% out of   3.59s.
  	 |     CPU % |     CPU % | Avg memory  | Memory      |
  Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
--------------------------------------------------------------------------------
     1	 |           |           |             |             | import numpy as np
     2	 |           |           |             |             |
     3	 |     0.30% |    48.40% |         -80 |       1.03% | x = np.array(range(10**7))
     4	 |     0.59% |    50.72% |           0 |      98.97% | np.array(np.random.uniform(0, 100, size=10**8))
     5	 |           |           |             |             |

How can we get:

A negative memory growth for the first line?
A null memory growth on the second line?

System info

Platform : Mac OS X
Python: 3.7 (brew)
Numpy: 1.16.4

opened by Phylliade 18

scalene breaks when importing pyproj
Describe the bug I am trying to profile code where we import the pyproj library, and scalene fails with Scalene error: received signal SIGSEGV

To Reproduce

Install the pyproj library

create a file with the following line from pyproj import Proj

run scalene my_file.py

Scalene will fail with the error above

Expected behaviour Scalene does not fail and I can profile like for any library !
opened by aymondebroglie 17
pprofile style context profiling
I'm usually not interested in profiling an entire program, I'm more interested in profiling some hotspot of code, or just some new piece of code.

pprofile allows for just profiling a specific region of code (from pprofile's main page):

def someOtherHotSpotCallable(): # Statistic profiler prof = pprofile.StatisticalProfile() with prof( period=0.001, # Sample every 1ms single=True, # Only sample current thread ): # Code to profile prof.print_stats()

It would be nice if scalene allowed for such granularity.
opened by spott 17
Reimplemented settrace callback in C

Because the use of sys.settrace created a large amount of overhead and was partially incorrect, we decided to reimplement the functionality in C. This is a relatively naive translation from Python to C. In addition, the __last_profiled field was set to a list so it could be cached in the C code, and some caching was added to should_trace

opened by sternj 15
Memory usage not measured everytime
Hi! Thanks for making this profiler, I think it's awesome.

Describe the bug The profiler sometimes doesn't measure the memory usage.

To Reproduce

Define prof_test.py with the following:

import numpy as np class A: def __init__(self, n): self.arr = np.random.rand(n) self.lst = [1] * n if __name__ == '__main__': a = A(10_000_000)

Run scalene from the terminal

scalene prof_test.py --reduced-profile

Expected behavior Show the two allocations for the attributes arr and lst (approx 75 MB each).

Screenshots This is the expected behaviour:

Sometimes only the time is measured:

Other times only one of the allocations is measured:

Desktop (please complete the following information):

OS: Ubuntu 20.10

Version: 1.3.0
opened by jose-moralez 14
Scalene error: received signal SIGSEGV
Getting Scalene error: received signal SIGSEGV when trying to profile a python program.

I believe one of the imports at the top of my code breaks the profiler.

test.py:

#!/usr/bin/env python # coding: utf-8 import obspy

profiling

scalene test.py

output: Scalene error: received signal SIGSEGV

Desktop:

OS: macOS-10.16-x86_64
opened by shaharkadmiel 14
Possible to track code run via PyEval_CallObject?

PyEval_CallObject can be used to invoke python code from C. Seems scalene currently can't profile code run this way. Is this a fundamental limit, or can this functionality be added relatively easily? The use case I'm looking at is profiling tf.py_func calls in tensorflow.

opened by guoshimin 14
Profiling hangs. Way to see internal progress?

Thanks for this great resource with amazing potential.

I have a script that runs in about 30 seconds (or up to 2h with different input files), but when I try to profile it, it just hangs on one step (for up to 8 hours now) , and I can't tell what the problem is.

I've tried the --profile-interval option, but it never outputs anything.

Is there something in the /tmp/scalene######/ folder or elsewhere that could give an idea?

Should I try setting the cpu-sampling-rate slower?

I am currently running it on Ubuntu 18 with python 3.8 and these options: --reduced-profile --cpu-only --outfile pppscalene.txt --cpu-percent-threshold 3

top reports that the associated python process is taking only 1% cpu

I have gotten it to successfully profile the benchmark.py program, although it seemed to also fail on that if I didn't specify --cpu-only

Does it have issues with multiprocessing? I have it set to one thread at the moment, but this script does use the multiprocessing library.

Suggestions on where to look?

opened by beroe 13
Profiled submodule shows only CPU usage -- no values in Memory column
Describe the bug I have a script main.py that imports another module, utils.py.

main.py

from utils import func1 ...

I want to profile all of the functions (not just func1) in utils.py after calling main. But I don't see any output in the Memory column.

Expected behavior I expect the rows profiled in utils.py to show values in the Memory column. I tried --profile-only 'utils', but I still don't see values in that column.

Screenshots

Desktop (please complete the following information):

macOS Monterey

Firefox 108.0.1

Scalene version 1.5.16 (2022.12.08)
opened by alecstein 5

bug: python-isal fails to import when being profiled by scalene

Describe the bug When I attempt to use scalene to profile a program that uses python-isal via xopen, I get an error from an internal import within the python-isal package.

Specifically:

OSError: b'Traceback (most recent call last):
  File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/scalene/scalene_profiler.py", line 1776, in profile_code
    exec(code, the_globals, the_locals)
  File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/isal/igzip.py", line 41, in <module>
    from . import igzip_lib, isal_zlib
ImportError: cannot import name \'igzip_lib\' from \'scalene\' (/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/scalene/__init__.py)
' (exit code 1)

Originating stack trace, just coming from xopen's error handling code:

Traceback (most recent call last):
  File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/scalene/scalene_profiler.py", line 1776, in profile_code
    exec(code, the_globals, the_locals)
  File "minimal-reproduction.py", line 3, in <module>
    with xopen('some_file.txt.gz') as infile:
  File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/xopen/__init__.py", line 157, in __exit__
    self.close()
  File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/xopen/__init__.py", line 386, in close
    self._raise_if_error(check_allowed_code_and_message, stderr_message)
  File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/xopen/__init__.py", line 452, in _raise_if_error
    raise OSError("{!r} (exit code {})".format(stderr_message, retcode))

The problem is coming from this relative import in python-isal: https://github.com/pycompression/python-isal/blob/develop/src/isal/igzip.py#L41

from . import igzip_lib, isal_zlib

This code works when I run the python program outside of scalene, as a script, or as a module. Only with scalene do I get this import failure.

To Reproduce

from xopen import xopen

with xopen('some_file.txt.gz') as infile:
  for line in infile:
    continue

Expected behavior I expect scalene to profile this code without the error

Desktop (please complete the following information):

OS: Ubuntu 20.04.5 LTS
Python 3.11.0

Additional context This does not seem to reproduce unless doing long reads. The first minimal reproduction case I tried was a small file and a single read instead of a loop, which somehow did not reproduce this problem for me.

Full pip freeze of my environment:

cloudpickle==2.2.0
commonmark==0.9.1
isal==1.1.0
Jinja2==3.1.2
MarkupSafe==2.1.1
numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1668919096335/work
pandas==1.5.2
Pygments==2.13.0
pynvml==11.4.1
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1667391478166/work
rich==12.6.0
scalene==1.5.16
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
xopen==1.7.0

P.S. Thank you so much for Scalene! It's an awesome tool. This may well be a problem with python-isal, but I've used that package for a long time and never seen failures like this.

opened by pettyalex 0

Start/Stop on demand

AWS and GCP provide a Serverless service where a server in a container starts on demand. This container has a default timeout that, usually, allows Scalene to profile the current job and not raise a job was too fast error. If there are no requests to the service the container stops.

Given that mostly all server frameworks (flask, sanic etc) have on_server_start and on_server_stop custom functions, would be nice to profile in production starting/stopping Scalene in these custom functions. (--profile-interval < container default timeout)

I've tested using scalene in GCP Cloud Run with CMD python -m scalene --profile-interval 2.0 --cli --html --outfile scalene.html app.py and failed to start the container even tho the CPU is always allocated (not allocated only during request processing).

#483 #432 #35

opened by pom11 0
Add PEP 517 support (closes #503)
NOTE: This changeset should be merged with care because of the changes it makes to scalene's build workflows. At a minimum it should be tested on MacOS and Windows before merging!

This PR introduces a pyproject.toml to bring scalene into the world of PEP 517/518. setuptools still does most of the lifting, but most of the configuration is now declared in this more modern file. One big advantage of this change is that build-time dependencies are clearly separated from runtime dependencies, so end-users should not need to worry about installing packages like Cython or wheel in order to build the project. I think this means requirements.txt can be removed from the project, as it is no longer necessary in the GitHub workflow that depends on it, but I would appreciate maintainer insight on this.

Pre-merge checklist

This checklist may be incomplete, please add to it if anything is missing.

[x] Test build/install on Linux

I can install (with pip install .) and build (with make bdist and make sdist) with a fresh 3.8.12 venv on Ubuntu 20.04.3

I also checked that DEV_BUILD is respected when performing a build, adding a .devN tag to the resulting package version

[ ] Test build/install on MacOS

The MacOS smoketests passed, so pip install looks good. It would be a good idea to check the Makefile-driven workflow too (unfortunately, I don't have a Mac to test with)

[ ] Test build/install on Windows

I was able to pip install the repository on Windows 10 in a fresh Python 3.8.10 environment, but the Makefile-driven workflow should be checked too
opened by SnoopJ 3
Run scalene inside Autodesk Maya or any other DCC
Hello,

this request comes from this discussion: https://github.com/plasma-umass/scalene/discussions/505

I was trying to run scalene inside Autodesk Maya 2023 from the script editor, but I got this warning:

I've seen other answers about this issue, but they rely on launching scalene like scalene foo.py, and I'm not quite sure how I can workaround this from inside the live script editor.

For context, I used to perform profiling using cProfile inside Maya using this:

I also tried to run scalene as a module as @emeryberger suggested (using runpy), but the warning keeps appearing:

import runpy d = runpy.run_module('scalene') def foo(): from maya import cmds for _ in range(50): cmds.createNode('locator') # Turn profiling on d['scalene_profiler'].start() foo() # Turn profiling off d['scalene_profiler'].stop()

Thanks!
opened by rchals 0
Support PEP 517/518

Is your feature request related to a problem? Please describe. Not directly, but I am filing this issue after a short chat on the Boston Python Slack about how to update one of the project's dependencies.

Describe the solution you'd like To satisfy this request, scalene should support build systems that follow PEP 517, which mostly comes down to following PEP 518 by introducing a pyproject.toml file to the project's source tree.

The biggest advantage of this approach is that build-time dependencies (namely Cython and possibly crdp as well) can be declared as such in the modern style, so end-user installation should be slightly smoother.

Describe alternatives you've considered N/A

Additional context I sat down for an hour or so tonight and saw how much of the project's setup.py could be migrated to a pyproject.toml. The biggest snag I ran into was the DEV_BUILD functionality that appends a .devN suffix to builds. And of course all the dynamic stuff related to available compilers and extension modules still needs to be expressed in setup.py

The upshot here is that it's pretty easy to start small with a pyproject.toml, it is not a wholesale replacement for setup.py (or its declarative counterpart setup.cfg), as described in the setuptools documentation.

opened by SnoopJ 0

Releases(v1.5.18)

v1.5.18(Jan 2, 2023)

Changed case of Cython in dependency to work with case-sensitive filesystems. Also includes minor UI tweaks.
Source code(tar.gz)
Source code(zip)
v1.5.17(Jan 1, 2023)
What's Changed

Enhancements

Scalene now incorporates AI-powered optimization suggestions. To enable these, you need to enter an OpenAI key:

Once a valid key is entered, click on the lightning bolt (⚡) beside any line to generate a proposed optimization.

You can click as many times as you like on the lightning bolt, and it will generate different suggested optimizations. Your mileage may vary, but in some cases, the suggestions are quite impressive. While this is currently limited to optimizing a single line, we anticipate broadening this to groups of lines or even functions in the near future. To our knowledge, this is the first integration of AI into profilers. It's a brave new world.

The web UI now incorporates collapsible profiles to web GUI (https://github.com/plasma-umass/scalene/pull/527). You can now toggle the displayed lines of code (all or reduced), and show/hide individual files.

Bug fixes

Improved logic for filtering out Python libraries from results.

Added sorting to memory sample intake, made JSON work with subprocesses by @sternj in https://github.com/plasma-umass/scalene/pull/513

Fix C should_trace to support OS-specific path separators by @emeryberger in https://github.com/plasma-umass/scalene/pull/521

Eliminates binary file reading/writing and dependency on linecache. by @emeryberger in https://github.com/plasma-umass/scalene/pull/522

Disable Windows switching of on/off status of profiling for background processes by @emeryberger in https://github.com/plasma-umass/scalene/pull/523

Fixed some Win32 issues. by @emeryberger in https://github.com/plasma-umass/scalene/pull/524

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.16...v1.5.17
Source code(tar.gz)
Source code(zip)
v1.5.16(Dec 8, 2022)
What's Changed

Enhancements

Memory profiling now occasionally much faster: reimplemented settrace callback in C by @sternj (https://github.com/plasma-umass/scalene/pull/479, https://github.com/plasma-umass/scalene/pull/492, https://github.com/plasma-umass/scalene/pull/494)

Incorporates (RDP) algorithm to compress visualizations without sacrificing overall shape by @emeryberger

Uses local fork of crdp to include its dependency on Cython

Improves time and space reporting logic for units (e.g., ms, s, m) by @emeryberger (https://github.com/plasma-umass/scalene/commit/2041b10cfaf63363fe4792c05470e06c9d3ebe81)

Bug fixes

Fixes an install issue that required pre-installation of Cython by @emeryberger, thanks to help from @SnoopJ (https://github.com/plasma-umass/scalene/commit/a830fd2fbc9b36acb4c762265e7fb4175f41f1b9)

Fixes an issue bringing up the web browser on some platforms by @emeryberger

Fixes an issue running Scalene on Windows by @emeryberger (esp. https://github.com/plasma-umass/scalene/commit/8877e221dd42b6d57d61a69370340b5071c67847)

Fixes an issue running Scalene with older GPUs by @emeryberger (https://github.com/plasma-umass/scalene/commit/c1aa3cb47e2650ac6d482013acc9ec5107ac631e)

Other

Now builds wheel for Python 3.11 by @jaltmayerpizzorno (https://github.com/plasma-umass/scalene/commit/52247f7fc05e68da9c5ee9dec85056ffcb962c9c, https://github.com/plasma-umass/scalene/commit/df26dc2f84e85c233f3971bfa136e08ec9c42237)

Removed install-time dependency on wheel by @emeryberger (https://github.com/plasma-umass/scalene/commit/ad4da28e659911dc0e64e904d26138341a57aef1)

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.15...v1.5.16
Source code(tar.gz)
Source code(zip)
v1.5.15(Nov 16, 2022)
What's Changed

Now generates HTML instead of using a web server by @emeryberger in https://github.com/plasma-umass/scalene/pull/477

Clearer command-line parameters (--cpu, --gpu, and --memory) by @emeryberger in https://github.com/plasma-umass/scalene/pull/477

Improved multiprocessing and module support by @RuRo in https://github.com/plasma-umass/scalene/pull/484

Fixes several issues running Scalene on Windows by @emeryberger

New Contributors

@RuRo made their first contribution in https://github.com/plasma-umass/scalene/pull/484

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.14...v1.5.15
Source code(tar.gz)
Source code(zip)
v1.5.14(Nov 4, 2022)
Changes in this release:

Bug fixes:

Disabled GPU profiling for very old NVIDIA platforms that don't support profiling utilization and/or memory consumption; this used to lead to failures

More graceful handling of non source files (which could lead to failures)

Uses correct path regardless of changes to the working directory

UI:

forced profiles in Jupyter cells to consume 100% of the available width

minor usability fix for Jupyter, fixing an issue with %scrun

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.13...v1.5.14
Source code(tar.gz)
Source code(zip)
v1.5.13(Sep 24, 2022)
Changes in this release:

Disabled Apple GPU profiling for now since it is unreliable (the Apple GPU does not record activity on a per-process basis).

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.12...v1.5.13
Source code(tar.gz)
Source code(zip)
v1.5.12(Sep 24, 2022)
Changes in this release:

Corrected some memory attribution errors (primarily relating to calculating average memory per line/function, as well as a race condition).

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.11...v1.5.12
Source code(tar.gz)
Source code(zip)
v1.5.11(Sep 12, 2022)
Changes in this release:

Fixed pre-built MacOS Universal distributions not including M1 support;

Fixed Scalene's assumption it would remain in the same directory, which caused problems when the profiled program did a chdir.

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.10...v1.5.11
Source code(tar.gz)
Source code(zip)
v1.5.10(Aug 18, 2022)
Changes in this release:

Fixed a reference counting issue that could lead to failure (https://github.com/plasma-umass/scalene/commit/73c848b62d33ac47f4e8b7464e75a9c8edafd68e).

Increased an internal buffer size to ensure safe handling when Scalene is accessing very long directory / pathnames.

Added support for profiling applications that themselves use LD_PRELOAD (fixing https://github.com/plasma-umass/scalene/issues/418).

Improved warning message for the current lack of support for multiprocessing on Windows (addressing https://github.com/plasma-umass/scalene/issues/416).

Other changes to enable building on Conda (https://github.com/conda-forge/staged-recipes/pull/18747).

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.9.1...v1.5.10
Source code(tar.gz)
Source code(zip)
v1.5.9.1(Jul 24, 2022)
Changes in this release:

increased accuracy of time attribution to specific lines for CPU & GPU profiling (also reduces memory consumption)

increased accuracy of memory attribution to specific liens

added per-process GPU accounting for NVIDIA, which can dramatically increase accuracy when profiling on shared GPUs

added support for Python 3.11

documented the command-line option to force Scalene to ignore options after that point (---)

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.8...v1.5.9.1
Source code(tar.gz)
Source code(zip)
v1.5.9(Jul 24, 2022)
Changes in this release:

increased accuracy of time attribution to specific lines for CPU & GPU profiling (also reduces memory consumption)

increased accuracy of memory attribution to specific liens

added per-process GPU accounting for NVIDIA, which can dramatically increase accuracy when profiling on shared GPUs

added support for Python 3.11

documented the command-line option to force Scalene to ignore options after that point (---)

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.8...v1.5.9
Source code(tar.gz)
Source code(zip)
v1.5.8(Apr 29, 2022)
Changes in this release:

fixed missing GUI files from Linux wheels;

fixed some issues launching browser to display GUI results;

Source code(tar.gz)
Source code(zip)
v1.5.7(Apr 24, 2022)
What's Changed

UI improvements:

Memory activity now shown as pies instead of numbers

Compatibility:

Working towards conda builds.

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.6...v1.5.7
Source code(tar.gz)
Source code(zip)
v1.5.6(Apr 5, 2022)
What's Changed

Improved functionality and accuracy:

Fixed Python memory attribution for large requests.

Fixed an issue with the multiprocessing library.

UI improvements:

Fixed reporting of the Python fraction of memory allocated.

Compatibility:

Removed nvidia-ml-py dependency, which was causing a reported issue with Dask (https://github.com/plasma-umass/scalene/issues/378).

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.5...v1.5.6
Source code(tar.gz)
Source code(zip)
v1.5.5(Mar 12, 2022)
What's Changed

Improved functionality and accuracy:

Fixed occasional segfaults caused by unaligned memory allocations.

Corrected an issue with attribution of CPU time with threads.

Leak detection enabled by default.

UI improvements:

Hovering over memory timelines now shows amount of memory consumed, and when.

Memory timelines are compressed, reducing the size of profiles and reducing the memory consumption of the UI.

Suspected leaks are now highlighted.

Compatibility:

Moved to Python 3.8.

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.4...v1.5.5
Source code(tar.gz)
Source code(zip)
v1.5.4(Feb 11, 2022)
What's Changed

Improved functionality and accuracy:

Fixed memcpy attribution (now on specific lines, just like allocations).

GPU profiling now enabled on Apple systems

UI improvements:

Memory timelines are now zoomable.

Fatter bars, zoomable timelines, fixed sorting for code line separators

Added explanations when hovering over column headers.

Omit function summaries if no functions.

Added GPU memory profiling.

Report peak instead of average GPU memory.

GPU utilization now pies.

Compatibility:

Fixed Windows support

Added support for profiling programs that use signals

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.3...v1.5.4
Source code(tar.gz)
Source code(zip)
v1.5.3(Feb 4, 2022)
What's Changed

Adds exception handling to workaround a virtualized GPU issue (https://github.com/plasma-umass/scalene/issues/323).

Added average memory consumption calculation to function summaries.

Fixes a missing argument issue in output (https://github.com/plasma-umass/scalene/issues/344)

Fixes an issue with Jupyter notebooks when they don't have access to a web browser.

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.2...v1.5.3
Source code(tar.gz)
Source code(zip)
v1.5.2(Feb 3, 2022)
What's Changed

Scalene's web-based GUI is now integrated into Jupyter notebooks

When using --cpu-only or profiling in Jupyter, columns for memory profiling (which would all be empty) are now hidden

The local webserver now exits after 5 seconds.

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.1...v1.5.2
Source code(tar.gz)
Source code(zip)
v1.5.1(Feb 1, 2022)
What's Changed

Scalene now launches its web-based GUI locally by default. After profiling, It opens a browser tab to a local webserver and automatically brings up the most recent profile. (The old behavior is still available by using --cli on the command line.)

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.0...v1.5.1
Source code(tar.gz)
Source code(zip)
v1.5.0(Jan 30, 2022)
What's Changed

Scalene now supports a new web-based GUI. Invoke using --web; this opens a browser tab (http://plasma-umass.org/scalene-gui/) and prompts to upload the generated profile.json file in the current working directory.

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.4.2...v1.5.0
Source code(tar.gz)
Source code(zip)
v1.4.2(Jan 25, 2022)
What's Changed

Fixed scalene looping infinitely in some functions by @sternj in https://github.com/plasma-umass/scalene/pull/335

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.4.1...v1.4.2
Source code(tar.gz)
Source code(zip)
v1.4.1(Jan 20, 2022)
What's Changed

Update README.md by @barseghyanartur in https://github.com/plasma-umass/scalene/pull/324

Fixed double-counting newlines by @sternj in https://github.com/plasma-umass/scalene/pull/328

Added --allocation-sampling-window; fixed reporting of peak function summary by @emeryberger in https://github.com/plasma-umass/scalene/pull/329

Added in shim for get_context by @sternj in https://github.com/plasma-umass/scalene/pull/320

Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.4.0...v1.4.1
Source code(tar.gz)
Source code(zip)
v1.4.0(Jan 12, 2022)
New features:

adds --profile-exclude flag to exclude from profiles any filenames containing the given strings (comma-separated)

adds experimental memory leak detection (--memory-leak-detector)

Enhancements:

provides more accurate memory accounting for small objects

higher resolution tracking of system vs. user time, per line, on Linux and Mac

new sampling approach, using “intervals” and per-line triggers, to ensure consistent accounting of per-line peak and average memory consumption

Bug fixes:

fixes build on Windows

adds -arm64e target to enable building on Apple Silicon (M1)

fixed exit signal propagation for failed scripts

ensures correct build on old Xcode + Mac OS combinations

distribution includes wheels for Windows

Source code(tar.gz)
Source code(zip)
v1.3.16(Oct 21, 2021)
Added wheels for Python 3.10;

Improved granularity of memory recording;

Fixed "unable to find Python allocator functions" issue (#278);

Performed various cleanups;

Source code(tar.gz)
Source code(zip)
v1.3.15(Oct 4, 2021)
Overhauled memory attribution logic:

uses Python's custom memory management APIs to efficiently disambiguate native vs. Python memory allocations, supplanting the prior approach that employed periodic call stack sampling.

performs immediate lookup of the location in source code responsible for allocation/deallocation, reducing the "smearing" effect in attributions previously caused by delayed attribution.

computes average memory consumption (rather than total) for each line of code (using the novel technique of "one-shot" tracing); lines executed many times no longer appear to have consumed large amounts of memory.

no longer reports negative memory growth from output, caused by lines freeing more than allocating, which has been a source of confusion for some users.

this release also resolves a memory leak.

Overhauled internal signal handling:

uses signal actors, an approach based on actors that decouples signal handling logic from the main thread, avoiding the risk of races and deadlocks and simplifying logic

Bug fixes:

fixed missing handling of pynvml.NVMLError_NotSupported exception (issue #262);

fixed issue cleaning up after profiling multiprocessor and multithreaded programs;

fixed issue not accounting for elapsed time when zero frames were recorded (issue #269).

New features:

added JSON output option (--json);

added programmatic profile control (scalene_profiler.start() and scalene_profiler.stop()).

Miscellaneous:

improved documentation.

Note: this release is for MacOS and Linux only.
Source code(tar.gz)
Source code(zip)
v1.3.12(Jul 16, 2021)

Fixes Windows-specific bug introduced in 1.3.11 leading to empty outputs. With this release, scalene on Windows now requires python 3.8 or newer.
Source code(tar.gz)
Source code(zip)
v1.3.11(Jul 15, 2021)

Fixed inadvertent use of signal.raise_signal, which isn't available in Python 3.7.
Source code(tar.gz)
Source code(zip)
v1.3.10(Jul 12, 2021)

This release adds a Windows wheel to pypi, making it a C/C++ development environment unnecessary for installation.
Source code(tar.gz)
Source code(zip)
v1.3.9(Jul 11, 2021)
Fixes pip install for Windows

Fixes @profile functionality

Source code(tar.gz)
Source code(zip)
v1.3.8(Jun 29, 2021)
fixed AttributeError bug running with --cpu-only;

Source code(tar.gz)
Source code(zip)