Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python

Overview

scalene

Scalene: a high-performance CPU, GPU and memory profiler for Python

by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno.

Scalene community Slack

PyPI Latest ReleaseDownloads Downloads Python versions License Twitter Follow

About Scalene

Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information.

Quick Start

Installing Scalene:

pip install -U scalene

Using Scalene:

Commonly used options:

scalene your_prog.py                             # full profile (prints to console)
python3 -m scalene your_prog.py                  # equivalent alternative
scalene --cpu-only your_prog.py                  # only CPU/GPU
scalene --reduced-profile your_prog.py           # only profile lines with significant usage
scalene --html --outfile prof.html your_prog.py  # output HTML profile to 'prof.html'
scalene --profile-interval 5.0 your_prog.py.     # output a new profile every five seconds
scalene --help                                   # lists all options

To use Scalene programmatically in your code, invoke using scalene as above and then:

import scalene

# Turn profiling on
scalene_profiler.start()

# Turn profiling off
scalene_profiler.stop()

Scalene Overview

Scalene talk (PyCon US 2021)

This talk presented at PyCon 2021 walks through Scalene's advantages and how to use it to debug the performance of an application (and provides some technical details on its internals). We highly recommend watching this video!

Scalene presentation at PyCon 2021

Fast and Precise

  • Scalene is fast. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less).
  • Scalene performs profiling at the line level and per function, pointing to the functions and the specific lines of code responsible for the execution time in your program.

CPU profiling

  • Scalene separates out time spent in Python from time in native code (including libraries). Most Python programmers aren't going to optimize the performance of native code (which is usually either in the Python implementation or external libraries), so this helps developers focus their optimization efforts on the code they can actually improve.
  • Scalene highlights hotspots (code accounting for significant percentages of CPU time or memory allocation) in red, making them even easier to spot.
  • Scalene also separates out system time, making it easy to find I/O bottlenecks.

GPU profiling

  • Scalene reports GPU time (currently limited to NVIDIA-based systems).

Memory profiling

  • Scalene profiles memory usage. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator.
  • Scalene separates out the percentage of memory consumed by Python code vs. native code.
  • Scalene produces per-line memory profiles.
  • Scalene identifies lines with likely memory leaks.
  • Scalene profiles copying volume, making it easy to spot inadvertent copying, especially due to crossing Python/library boundaries (e.g., accidentally converting numpy arrays into Python arrays, and vice versa).

Other features

  • Scalene can produce reduced profiles (via --reduced-profile) that only report lines that consume more than 1% of CPU or perform at least 100 allocations.
  • Scalene supports @profile decorators to profile only specific functions.
  • When Scalene is profiling a program launched in the background (via &), you can suspend and resume profiling.

Comparison to Other Profilers

Performance and Features

Below is a table comparing the performance and features of various profilers to Scalene.

Performance and feature comparison

  • Slowdown: the slowdown when running a benchmark from the Pyperformance suite. Green means less than 2x overhead. Scalene's overhead is just a 20% slowdown.

Scalene has all of the following features, many of which only Scalene supports:

  • Lines or functions: does the profiler report information only for entire functions, or for every line -- Scalene does both.
  • Unmodified Code: works on unmodified code.
  • Threads: supports Python threads.
  • Multiprocessing: supports use of the multiprocessing library -- Scalene only
  • Python vs. C time: breaks out time spent in Python vs. native code (e.g., libraries) -- Scalene only
  • System time: breaks out system time (e.g., sleeping or performing I/O) -- Scalene only
  • Profiles memory: reports memory consumption per line / function
  • GPU: reports time spent on an NVIDIA GPU (if present) -- Scalene only
  • Memory trends: reports memory use over time per line / function -- Scalene only
  • Copy volume: reports megabytes being copied per second -- Scalene only
  • Detects leaks: automatically pinpoints lines responsible for likely memory leaks -- Scalene only

Output

Scalene prints annotated source code for the program being profiled (as text, JSON (--json), or HTML (--html)) and any modules it uses in the same directory or subdirectories (you can optionally have it --profile-all and only include files with at least a --cpu-percent-threshold of time). Here is a snippet from pystone.py.

Example profile

  • Memory usage at the top: Visualized by "sparklines", memory consumption over the runtime of the profiled code.
  • "Time Python": How much time was spent in Python code.
  • "native": How much time was spent in non-Python code (e.g., libraries written in C/C++).
  • "system": How much time was spent in the system (e.g., I/O).
  • "GPU": (not shown here) How much time spent on the GPU, if your system has an NVIDIA GPU installed.
  • "Memory Python": How much of the memory allocation happened on the Python side of the code, as opposed to in non-Python code (e.g., libraries written in C/C++).
  • "net": Positive net memory numbers indicate total memory allocation in megabytes; negative net memory numbers indicate memory reclamation.
  • "timeline / %": Visualized by "sparklines", memory consumption generated by this line over the program runtime, and the percentages of total memory activity this line represents.
  • "Copy (MB/s)": The amount of megabytes being copied per second (see "About Scalene").

Using Scalene

The following command runs Scalene on a provided example program.

scalene test/testme.py
Click to see all Scalene's options (available by running with --help)
    % scalene --help
     usage: scalene [-h] [--outfile OUTFILE] [--html] [--reduced-profile]
                    [--profile-interval PROFILE_INTERVAL] [--cpu-only]
                    [--profile-all] [--profile-only PROFILE_ONLY]
                    [--use-virtual-time]
                    [--cpu-percent-threshold CPU_PERCENT_THRESHOLD]
                    [--cpu-sampling-rate CPU_SAMPLING_RATE]
                    [--malloc-threshold MALLOC_THRESHOLD]
     
     Scalene: a high-precision CPU and memory profiler.
     https://github.com/plasma-umass/scalene
     
     command-line:
        % scalene [options] yourprogram.py
     or
        % python3 -m scalene [options] yourprogram.py
     
     in Jupyter, line mode:
        %scrun [options] statement
     
     in Jupyter, cell mode:
        %%scalene [options]
        code...
        code...
     
     optional arguments:
       -h, --help            show this help message and exit
       --outfile OUTFILE     file to hold profiler output (default: stdout)
       --html                output as HTML (default: text)
       --reduced-profile     generate a reduced profile, with non-zero lines only (default: False)
       --profile-interval PROFILE_INTERVAL
                             output profiles every so many seconds (default: inf)
       --cpu-only            only profile CPU time (default: profile CPU, memory, and copying)
       --profile-all         profile all executed code, not just the target program (default: only the target program)
       --profile-only PROFILE_ONLY
                             profile only code in filenames that contain the given strings, separated by commas (default: no restrictions)
       --use-virtual-time    measure only CPU time, not time spent in I/O or blocking (default: False)
       --cpu-percent-threshold CPU_PERCENT_THRESHOLD
                             only report profiles with at least this percent of CPU time (default: 1%)
       --cpu-sampling-rate CPU_SAMPLING_RATE
                             CPU sampling rate (default: every 0.01s)
       --malloc-threshold MALLOC_THRESHOLD
                             only report profiles with at least this many allocations (default: 100)
     
     When running Scalene in the background, you can suspend/resume profiling
     for the process ID that Scalene reports. For example:
     
        % python3 -m scalene [options] yourprogram.py &
      Scalene now profiling process 12345
        to suspend profiling: python3 -m scalene.profile --off --pid 12345
        to resume profiling:  python3 -m scalene.profile --on  --pid 12345

Scalene with Jupyter

Instructions for installing and using Scalene with Jupyter notebooks

This notebook illustrates the use of Scalene in Jupyter.

Installation:

!pip install scalene
%load_ext scalene

Line mode:

%scrun [options] statement

Cell mode:

%%scalene [options]
code...
code...

Installation

Using pip (Mac OS X, Linux, Windows, and WSL2)

Scalene is distributed as a pip package and works on Mac OS X, Linux (including Ubuntu in Windows WSL2) and (with limitations) Windows platforms. (Note: the Windows port isn't complete yet and should be considered early alpha; it requires Python 3.8 or later.)

You can install it as follows:

  % pip install -U scalene

or

  % python3 -m pip install -U scalene

You may need to install some packages first.

See https://stackoverflow.com/a/19344978/4954434 for full instructions for all Linux flavors.

For Ubuntu/Debian:

  # Ubuntu 20
  % sudo apt install git python3-all-dev

  # Ubuntu 18
  % sudo apt install git python3-all-dev
Using Homebrew (Mac OS X)

As an alternative to pip, you can use Homebrew to install the current version of Scalene from this repository:

  % brew tap plasma-umass/scalene
  % brew install --head plasma-umass/scalene/scalene
On ArchLinux

You can install Scalene on Arch Linux via the AUR package. Use your favorite AUR helper, or manually download the PKGBUILD and run makepkg -cirs to build. Note that this will place libscalene.so in /usr/lib; modify the below usage instructions accordingly.

Asked Questions

Q: Is there any way to get shorter profiles or do more targeted profiling?

A: Yes! There are several options:

  1. Use --reduced-profile to include only lines and files with memory/CPU/GPU activity.
  2. Use --profile-only to include only filenames containing specific strings (as in, --profile-only foo,bar,baz).
  3. Decorate functions of interest with @profile to have Scalene report only those functions.
  4. Turn profiling on and off programmatically by importing Scalene (import scalene) and then turning profiling on and off via scalene_profiler.start() and scalene_profiler.stop(). By default, Scalene runs with profiling on, so to delay profiling until desired, use the --off command-line option (python3 -m scalene --off yourprogram.py).

Q: How do I run Scalene in PyCharm?

A: In PyCharm, you can run Scalene at the command line by opening the terminal at the bottom of the IDE and running a Scalene command (e.g., python -m scalene ). Use the options --html and --outfile to generate an HTML file that you can then view in the IDE.

Q: How do I use Scalene with Django?

A: Pass in the --noreload option (see https://github.com/plasma-umass/scalene/issues/178).

Q: How do I use Scalene with PyTorch on the Mac?

A: Scalene works with PyTorch version 1.5.1 on Mac OS X. There's a bug in newer versions of PyTorch (https://github.com/pytorch/pytorch/issues/57185) that interferes with Scalene (discussion here: https://github.com/plasma-umass/scalene/issues/110), but only on Macs.

Technical Information

For technical details on Scalene, please see the following paper: Scalene: Scripting-Language Aware Profiling for Python (arXiv link).

Success Stories

If you use Scalene to successfully debug a performance problem, please add a comment to this issue!

Acknowledgements

Logo created by Sophia Berger.

This material is based upon work supported by the National Science Foundation under Grant No. 1955610. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Comments
  • `Scalene error: received signal SIGABRT`

    `Scalene error: received signal SIGABRT`

    Hello everyone,

    I got this message Scalene error: received signal SIGABRT what does it mean and what should i do ?

    Thank you in advance for your answers

    bug 
    opened by thissasbnk 23
  • libscalene.so included in scalene v1.1.1 requires more recent glibc when running on CentOS 7.8

    libscalene.so included in scalene v1.1.1 requires more recent glibc when running on CentOS 7.8

    When trying scalene v1.1.1 (installed from source with pip install ., but I don't think that matters since libscalene.so is included in the source tarball), I'm getting this:

    python3: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /.../lib/python3.6/site-packages/scalene/libscalene.so)
    

    The issue here boils down to libscalene.so being compiled on an Linux OS that has a more recent glibc than is used in CentOS 7.8 (which is glibc 2.17).

    Is there a specific reason why libscalene.so is packaged into the source tarball, as opposed to compiling it from source when scalene is being installed from source?

    I'm happy to try and compile libscalene.so from source myself, but there doesn't seem to be any documentation available on that. I found heaplayers-make.mk which suggests that clang++ is needed (is that a strict requirement, or would a sufficiently recent g++ also work?), and that some source code from https://github.com/emeryberger/Heap-Layers is required as well.

    opened by boegel 20
  • Scalene: internal error: unable to find Python allocator functions

    Scalene: internal error: unable to find Python allocator functions

    Describe the bug When starting scalene (with poetry run python -m scalene my_script.py), I get the error Scalene: internal error: unable to find Python allocator functions. This happens both with poetry run python -m scalene my_script.py or if started within the venv (python -m scalene my_script.py or just scalene my_script.py).

    Some seconds after this output, i get a long traceback (reported afterwards).

    I don't really think this is something about my program, since it always worked without scalene, but I think it is something related to Trio.

    To Reproduce Steps to reproduce the behavior:

    Snippet that cause the error:

    #asd.py
    
    import time
    import trio
    time.sleep(3)
    

    and then poetry run python -m scalene asd.py. This gives this traceback:

    Error in program being profiled:
     string index out of range
    Traceback (most recent call last):
      File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/scalene/scalene_profiler.py", line 1326, in profile_code
        exec(code, the_globals, the_locals)
      File "asd.py", line 2, in <module>
        import trio
      File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/trio/__init__.py", line 48, in <module>
        from ._sync import (
      File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/trio/_sync.py", line 475, in <module>
        @attr.s(eq=False, hash=False, repr=False)
      File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/attr/_make.py", line 1312, in wrap
        field_transformer,
      File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/attr/_make.py", line 610, in __init__
        field_transformer,
      File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/attr/_make.py", line 503, in _transform_attrs
        AttrsClass = _make_attr_tuple_class(cls.__name__, attr_names)
      File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/attr/_make.py", line 299, in _make_attr_tuple_class
        eval(compile("\n".join(attr_class_template), "", "exec"), globs)
      File "", line 1, in <module>
      File "/home/didi/leaf/leaf-radio/.venv/lib/python3.7/site-packages/scalene/scalene_profiler.py", line 231, in invalidate_lines
        if f.f_code.co_filename[0] == "<" or "scalene" in f.f_code.co_filename:
    IndexError: string index out of range
    

    Commenting out the line import trio, makes everything works.

    However, the line Scalene: internal error: unable to find Python allocator functions is always present.

    Desktop (please complete the following information):

    • OS: WSL2
    • Python version: 3.7.10, installed with pyenv
    • Scalene version: 1.3.15
    opened by didimelli 19
  • How to explain these memory behaviors when using numpy?

    How to explain these memory behaviors when using numpy?

    When running a very simple script based on numpy functions, we can get the following results:

    test2.py: % of CPU time = 100.00% out of   3.59s.
      	 |     CPU % |     CPU % | Avg memory  | Memory      |
      Line	 |  (Python) |  (native) | growth (MB) | usage (%)   | [test2.py]
    --------------------------------------------------------------------------------
         1	 |           |           |             |             | import numpy as np
         2	 |           |           |             |             |
         3	 |     0.30% |    48.40% |         -80 |       1.03% | x = np.array(range(10**7))
         4	 |     0.59% |    50.72% |           0 |      98.97% | np.array(np.random.uniform(0, 100, size=10**8))
         5	 |           |           |             |             |
    

    How can we get:

    • A negative memory growth for the first line?
    • A null memory growth on the second line?

    System info

    • Platform : Mac OS X
    • Python: 3.7 (brew)
    • Numpy: 1.16.4
    opened by Phylliade 18
  • scalene breaks when importing pyproj

    scalene breaks when importing pyproj

    Describe the bug I am trying to profile code where we import the pyproj library, and scalene fails with Scalene error: received signal SIGSEGV

    To Reproduce

    • Install the pyproj library
    • create a file with the following line from pyproj import Proj
    • run scalene my_file.py

    Scalene will fail with the error above

    Expected behaviour Scalene does not fail and I can profile like for any library !

    opened by aymondebroglie 17
  • pprofile style context profiling

    pprofile style context profiling

    I'm usually not interested in profiling an entire program, I'm more interested in profiling some hotspot of code, or just some new piece of code.

    pprofile allows for just profiling a specific region of code (from pprofile's main page):

    def someOtherHotSpotCallable():
        # Statistic profiler
        prof = pprofile.StatisticalProfile()
        with prof(
            period=0.001, # Sample every 1ms
            single=True, # Only sample current thread
        ):
            # Code to profile
        prof.print_stats()
    

    It would be nice if scalene allowed for such granularity.

    opened by spott 17
  • Reimplemented settrace callback in C

    Reimplemented settrace callback in C

    Because the use of sys.settrace created a large amount of overhead and was partially incorrect, we decided to reimplement the functionality in C. This is a relatively naive translation from Python to C. In addition, the __last_profiled field was set to a list so it could be cached in the C code, and some caching was added to should_trace

    opened by sternj 15
  • Memory usage not measured everytime

    Memory usage not measured everytime

    Hi! Thanks for making this profiler, I think it's awesome.

    Describe the bug The profiler sometimes doesn't measure the memory usage.

    To Reproduce

    1. Define prof_test.py with the following:
    import numpy as np
    
    class A:
    
        def __init__(self, n):
            self.arr = np.random.rand(n)
            self.lst = [1] * n
    
    if __name__ == '__main__':
        a = A(10_000_000)
    
    1. Run scalene from the terminal
    scalene prof_test.py --reduced-profile
    

    Expected behavior Show the two allocations for the attributes arr and lst (approx 75 MB each).

    Screenshots This is the expected behaviour: image

    Sometimes only the time is measured: image

    Other times only one of the allocations is measured: image

    Desktop (please complete the following information):

    • OS: Ubuntu 20.10
    • Version: 1.3.0
    opened by jose-moralez 14
  • Scalene error: received signal SIGSEGV

    Scalene error: received signal SIGSEGV

    Getting Scalene error: received signal SIGSEGV when trying to profile a python program.

    I believe one of the imports at the top of my code breaks the profiler.

    test.py:

    #!/usr/bin/env python
    # coding: utf-8
    
    import obspy
    

    profiling

    scalene test.py
    

    output: Scalene error: received signal SIGSEGV

    Desktop:

    • OS: macOS-10.16-x86_64
    opened by shaharkadmiel 14
  • Possible to track code run via PyEval_CallObject?

    Possible to track code run via PyEval_CallObject?

    PyEval_CallObject can be used to invoke python code from C. Seems scalene currently can't profile code run this way. Is this a fundamental limit, or can this functionality be added relatively easily? The use case I'm looking at is profiling tf.py_func calls in tensorflow.

    opened by guoshimin 14
  • Profiling hangs. Way to see internal progress?

    Profiling hangs. Way to see internal progress?

    Thanks for this great resource with amazing potential.

    I have a script that runs in about 30 seconds (or up to 2h with different input files), but when I try to profile it, it just hangs on one step (for up to 8 hours now) , and I can't tell what the problem is.

    I've tried the --profile-interval option, but it never outputs anything.

    Is there something in the /tmp/scalene######/ folder or elsewhere that could give an idea?

    Should I try setting the cpu-sampling-rate slower?

    I am currently running it on Ubuntu 18 with python 3.8 and these options: --reduced-profile --cpu-only --outfile pppscalene.txt --cpu-percent-threshold 3

    top reports that the associated python process is taking only 1% cpu

    I have gotten it to successfully profile the benchmark.py program, although it seemed to also fail on that if I didn't specify --cpu-only

    Does it have issues with multiprocessing? I have it set to one thread at the moment, but this script does use the multiprocessing library.

    Suggestions on where to look?

    opened by beroe 13
  • Profiled submodule shows only CPU usage -- no values in Memory column

    Profiled submodule shows only CPU usage -- no values in Memory column

    Describe the bug I have a script main.py that imports another module, utils.py.

    main.py

    from utils import func1
    ...
    

    I want to profile all of the functions (not just func1) in utils.py after calling main. But I don't see any output in the Memory column.

    Expected behavior I expect the rows profiled in utils.py to show values in the Memory column. I tried --profile-only 'utils', but I still don't see values in that column.

    Screenshots image

    Desktop (please complete the following information):

    • macOS Monterey
    • Firefox 108.0.1
    • Scalene version 1.5.16 (2022.12.08)
    opened by alecstein 5
  • bug: python-isal fails to import when being profiled by scalene

    bug: python-isal fails to import when being profiled by scalene

    Describe the bug When I attempt to use scalene to profile a program that uses python-isal via xopen, I get an error from an internal import within the python-isal package.

    Specifically:

    OSError: b'Traceback (most recent call last):
      File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/scalene/scalene_profiler.py", line 1776, in profile_code
        exec(code, the_globals, the_locals)
      File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/isal/igzip.py", line 41, in <module>
        from . import igzip_lib, isal_zlib
    ImportError: cannot import name \'igzip_lib\' from \'scalene\' (/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/scalene/__init__.py)
    ' (exit code 1)
    

    Originating stack trace, just coming from xopen's error handling code:

    Traceback (most recent call last):
      File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/scalene/scalene_profiler.py", line 1776, in profile_code
        exec(code, the_globals, the_locals)
      File "minimal-reproduction.py", line 3, in <module>
        with xopen('some_file.txt.gz') as infile:
      File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/xopen/__init__.py", line 157, in __exit__
        self.close()
      File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/xopen/__init__.py", line 386, in close
        self._raise_if_error(check_allowed_code_and_message, stderr_message)
      File "/home/alex/.conda/envs/alex/envs/alyssa-profile/lib/python3.11/site-packages/xopen/__init__.py", line 452, in _raise_if_error
        raise OSError("{!r} (exit code {})".format(stderr_message, retcode))
    
    

    The problem is coming from this relative import in python-isal: https://github.com/pycompression/python-isal/blob/develop/src/isal/igzip.py#L41

    from . import igzip_lib, isal_zlib
    

    This code works when I run the python program outside of scalene, as a script, or as a module. Only with scalene do I get this import failure.

    To Reproduce

    from xopen import xopen
    
    with xopen('some_file.txt.gz') as infile:
      for line in infile:
        continue
    

    Expected behavior I expect scalene to profile this code without the error

    Desktop (please complete the following information):

    • OS: Ubuntu 20.04.5 LTS
    • Python 3.11.0

    Additional context This does not seem to reproduce unless doing long reads. The first minimal reproduction case I tried was a small file and a single read instead of a loop, which somehow did not reproduce this problem for me.

    Full pip freeze of my environment:

    cloudpickle==2.2.0
    commonmark==0.9.1
    isal==1.1.0
    Jinja2==3.1.2
    MarkupSafe==2.1.1
    numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1668919096335/work
    pandas==1.5.2
    Pygments==2.13.0
    pynvml==11.4.1
    python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
    pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1667391478166/work
    rich==12.6.0
    scalene==1.5.16
    six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
    xopen==1.7.0
    
    

    P.S. Thank you so much for Scalene! It's an awesome tool. This may well be a problem with python-isal, but I've used that package for a long time and never seen failures like this.

    opened by pettyalex 0
  • Start/Stop on demand

    Start/Stop on demand

    AWS and GCP provide a Serverless service where a server in a container starts on demand. This container has a default timeout that, usually, allows Scalene to profile the current job and not raise a job was too fast error. If there are no requests to the service the container stops.

    Given that mostly all server frameworks (flask, sanic etc) have on_server_start and on_server_stop custom functions, would be nice to profile in production starting/stopping Scalene in these custom functions. (--profile-interval < container default timeout)

    I've tested using scalene in GCP Cloud Run with CMD python -m scalene --profile-interval 2.0 --cli --html --outfile scalene.html app.py and failed to start the container even tho the CPU is always allocated (not allocated only during request processing).

    #483 #432 #35

    opened by pom11 0
  • Add PEP 517 support (closes #503)

    Add PEP 517 support (closes #503)

    NOTE: This changeset should be merged with care because of the changes it makes to scalene's build workflows. At a minimum it should be tested on MacOS and Windows before merging!

    This PR introduces a pyproject.toml to bring scalene into the world of PEP 517/518. setuptools still does most of the lifting, but most of the configuration is now declared in this more modern file. One big advantage of this change is that build-time dependencies are clearly separated from runtime dependencies, so end-users should not need to worry about installing packages like Cython or wheel in order to build the project. I think this means requirements.txt can be removed from the project, as it is no longer necessary in the GitHub workflow that depends on it, but I would appreciate maintainer insight on this.

    Pre-merge checklist

    This checklist may be incomplete, please add to it if anything is missing.

    • [x] Test build/install on Linux
      • I can install (with pip install .) and build (with make bdist and make sdist) with a fresh 3.8.12 venv on Ubuntu 20.04.3
      • I also checked that DEV_BUILD is respected when performing a build, adding a .devN tag to the resulting package version
    • [ ] Test build/install on MacOS
      • The MacOS smoketests passed, so pip install looks good. It would be a good idea to check the Makefile-driven workflow too (unfortunately, I don't have a Mac to test with)
    • [ ] Test build/install on Windows
      • I was able to pip install the repository on Windows 10 in a fresh Python 3.8.10 environment, but the Makefile-driven workflow should be checked too
    opened by SnoopJ 3
  • Run scalene inside Autodesk Maya or any other DCC

    Run scalene inside Autodesk Maya or any other DCC

    Hello,

    this request comes from this discussion: https://github.com/plasma-umass/scalene/discussions/505

    I was trying to run scalene inside Autodesk Maya 2023 from the script editor, but I got this warning: image

    I've seen other answers about this issue, but they rely on launching scalene like scalene foo.py, and I'm not quite sure how I can workaround this from inside the live script editor.

    For context, I used to perform profiling using cProfile inside Maya using this: image

    I also tried to run scalene as a module as @emeryberger suggested (using runpy), but the warning keeps appearing:

    import runpy
    d = runpy.run_module('scalene')
    
    def foo():
        from maya import cmds
        for _ in range(50):
            cmds.createNode('locator')
    
    # Turn profiling on
    d['scalene_profiler'].start()
    
    foo()
    
    # Turn profiling off
    d['scalene_profiler'].stop()
    

    Thanks!

    opened by rchals 0
  • Support PEP 517/518

    Support PEP 517/518

    Is your feature request related to a problem? Please describe. Not directly, but I am filing this issue after a short chat on the Boston Python Slack about how to update one of the project's dependencies.

    Describe the solution you'd like To satisfy this request, scalene should support build systems that follow PEP 517, which mostly comes down to following PEP 518 by introducing a pyproject.toml file to the project's source tree.

    The biggest advantage of this approach is that build-time dependencies (namely Cython and possibly crdp as well) can be declared as such in the modern style, so end-user installation should be slightly smoother.

    Describe alternatives you've considered N/A

    Additional context I sat down for an hour or so tonight and saw how much of the project's setup.py could be migrated to a pyproject.toml. The biggest snag I ran into was the DEV_BUILD functionality that appends a .devN suffix to builds. And of course all the dynamic stuff related to available compilers and extension modules still needs to be expressed in setup.py

    The upshot here is that it's pretty easy to start small with a pyproject.toml, it is not a wholesale replacement for setup.py (or its declarative counterpart setup.cfg), as described in the setuptools documentation.

    opened by SnoopJ 0
Releases(v1.5.18)
  • v1.5.18(Jan 2, 2023)

  • v1.5.17(Jan 1, 2023)

    What's Changed

    Enhancements

    • Scalene now incorporates AI-powered optimization suggestions. To enable these, you need to enter an OpenAI key:
    Screenshot 2023-01-01 at 6 36 40 PM

    Once a valid key is entered, click on the lightning bolt (⚡) beside any line to generate a proposed optimization.

    Screenshot 2023-01-01 at 6 39 12 PM

    You can click as many times as you like on the lightning bolt, and it will generate different suggested optimizations. Your mileage may vary, but in some cases, the suggestions are quite impressive. While this is currently limited to optimizing a single line, we anticipate broadening this to groups of lines or even functions in the near future. To our knowledge, this is the first integration of AI into profilers. It's a brave new world.

    • The web UI now incorporates collapsible profiles to web GUI (https://github.com/plasma-umass/scalene/pull/527). You can now toggle the displayed lines of code (all or reduced), and show/hide individual files.
    Screenshot 2023-01-01 at 6 44 13 PM

    Bug fixes

    • Improved logic for filtering out Python libraries from results.
    • Added sorting to memory sample intake, made JSON work with subprocesses by @sternj in https://github.com/plasma-umass/scalene/pull/513
    • Fix C should_trace to support OS-specific path separators by @emeryberger in https://github.com/plasma-umass/scalene/pull/521
    • Eliminates binary file reading/writing and dependency on linecache. by @emeryberger in https://github.com/plasma-umass/scalene/pull/522
    • Disable Windows switching of on/off status of profiling for background processes by @emeryberger in https://github.com/plasma-umass/scalene/pull/523
    • Fixed some Win32 issues. by @emeryberger in https://github.com/plasma-umass/scalene/pull/524

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.16...v1.5.17

    Source code(tar.gz)
    Source code(zip)
  • v1.5.16(Dec 8, 2022)

    What's Changed

    Enhancements

    • Memory profiling now occasionally much faster: reimplemented settrace callback in C by @sternj (https://github.com/plasma-umass/scalene/pull/479, https://github.com/plasma-umass/scalene/pull/492, https://github.com/plasma-umass/scalene/pull/494)
    • Incorporates (RDP) algorithm to compress visualizations without sacrificing overall shape by @emeryberger
      • Uses local fork of crdp to include its dependency on Cython
    • Improves time and space reporting logic for units (e.g., ms, s, m) by @emeryberger (https://github.com/plasma-umass/scalene/commit/2041b10cfaf63363fe4792c05470e06c9d3ebe81)

    Bug fixes

    • Fixes an install issue that required pre-installation of Cython by @emeryberger, thanks to help from @SnoopJ (https://github.com/plasma-umass/scalene/commit/a830fd2fbc9b36acb4c762265e7fb4175f41f1b9)
    • Fixes an issue bringing up the web browser on some platforms by @emeryberger
    • Fixes an issue running Scalene on Windows by @emeryberger (esp. https://github.com/plasma-umass/scalene/commit/8877e221dd42b6d57d61a69370340b5071c67847)
    • Fixes an issue running Scalene with older GPUs by @emeryberger (https://github.com/plasma-umass/scalene/commit/c1aa3cb47e2650ac6d482013acc9ec5107ac631e)

    Other

    • Now builds wheel for Python 3.11 by @jaltmayerpizzorno (https://github.com/plasma-umass/scalene/commit/52247f7fc05e68da9c5ee9dec85056ffcb962c9c, https://github.com/plasma-umass/scalene/commit/df26dc2f84e85c233f3971bfa136e08ec9c42237)
    • Removed install-time dependency on wheel by @emeryberger (https://github.com/plasma-umass/scalene/commit/ad4da28e659911dc0e64e904d26138341a57aef1)

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.15...v1.5.16

    Source code(tar.gz)
    Source code(zip)
  • v1.5.15(Nov 16, 2022)

    What's Changed

    • Now generates HTML instead of using a web server by @emeryberger in https://github.com/plasma-umass/scalene/pull/477
    • Clearer command-line parameters (--cpu, --gpu, and --memory) by @emeryberger in https://github.com/plasma-umass/scalene/pull/477
    • Improved multiprocessing and module support by @RuRo in https://github.com/plasma-umass/scalene/pull/484
    • Fixes several issues running Scalene on Windows by @emeryberger

    New Contributors

    • @RuRo made their first contribution in https://github.com/plasma-umass/scalene/pull/484

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.14...v1.5.15

    Source code(tar.gz)
    Source code(zip)
  • v1.5.14(Nov 4, 2022)

    Changes in this release:

    Bug fixes:

    • Disabled GPU profiling for very old NVIDIA platforms that don't support profiling utilization and/or memory consumption; this used to lead to failures
    • More graceful handling of non source files (which could lead to failures)
    • Uses correct path regardless of changes to the working directory

    UI:

    • forced profiles in Jupyter cells to consume 100% of the available width
    • minor usability fix for Jupyter, fixing an issue with %scrun

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.13...v1.5.14

    Source code(tar.gz)
    Source code(zip)
  • v1.5.13(Sep 24, 2022)

    Changes in this release:

    • Disabled Apple GPU profiling for now since it is unreliable (the Apple GPU does not record activity on a per-process basis).

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.12...v1.5.13

    Source code(tar.gz)
    Source code(zip)
  • v1.5.12(Sep 24, 2022)

    Changes in this release:

    • Corrected some memory attribution errors (primarily relating to calculating average memory per line/function, as well as a race condition).

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.11...v1.5.12

    Source code(tar.gz)
    Source code(zip)
  • v1.5.11(Sep 12, 2022)

    Changes in this release:

    • Fixed pre-built MacOS Universal distributions not including M1 support;
    • Fixed Scalene's assumption it would remain in the same directory, which caused problems when the profiled program did a chdir.

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.10...v1.5.11

    Source code(tar.gz)
    Source code(zip)
  • v1.5.10(Aug 18, 2022)

    Changes in this release:

    • Fixed a reference counting issue that could lead to failure (https://github.com/plasma-umass/scalene/commit/73c848b62d33ac47f4e8b7464e75a9c8edafd68e).
    • Increased an internal buffer size to ensure safe handling when Scalene is accessing very long directory / pathnames.
    • Added support for profiling applications that themselves use LD_PRELOAD (fixing https://github.com/plasma-umass/scalene/issues/418).
    • Improved warning message for the current lack of support for multiprocessing on Windows (addressing https://github.com/plasma-umass/scalene/issues/416).
    • Other changes to enable building on Conda (https://github.com/conda-forge/staged-recipes/pull/18747).

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.9.1...v1.5.10

    Source code(tar.gz)
    Source code(zip)
  • v1.5.9.1(Jul 24, 2022)

    Changes in this release:

    • increased accuracy of time attribution to specific lines for CPU & GPU profiling (also reduces memory consumption)
    • increased accuracy of memory attribution to specific liens
    • added per-process GPU accounting for NVIDIA, which can dramatically increase accuracy when profiling on shared GPUs
    • added support for Python 3.11
    • documented the command-line option to force Scalene to ignore options after that point (---)

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.8...v1.5.9.1

    Source code(tar.gz)
    Source code(zip)
  • v1.5.9(Jul 24, 2022)

    Changes in this release:

    • increased accuracy of time attribution to specific lines for CPU & GPU profiling (also reduces memory consumption)
    • increased accuracy of memory attribution to specific liens
    • added per-process GPU accounting for NVIDIA, which can dramatically increase accuracy when profiling on shared GPUs
    • added support for Python 3.11
    • documented the command-line option to force Scalene to ignore options after that point (---)

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.8...v1.5.9

    Source code(tar.gz)
    Source code(zip)
  • v1.5.8(Apr 29, 2022)

  • v1.5.7(Apr 24, 2022)

    What's Changed

    UI improvements:

    • Memory activity now shown as pies instead of numbers

    Compatibility:

    • Working towards conda builds.

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.6...v1.5.7

    Source code(tar.gz)
    Source code(zip)
  • v1.5.6(Apr 5, 2022)

    What's Changed

    Improved functionality and accuracy:

    • Fixed Python memory attribution for large requests.
    • Fixed an issue with the multiprocessing library.

    UI improvements:

    • Fixed reporting of the Python fraction of memory allocated.

    Compatibility:

    • Removed nvidia-ml-py dependency, which was causing a reported issue with Dask (https://github.com/plasma-umass/scalene/issues/378).

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.5...v1.5.6

    Source code(tar.gz)
    Source code(zip)
  • v1.5.5(Mar 12, 2022)

    What's Changed

    Improved functionality and accuracy:

    • Fixed occasional segfaults caused by unaligned memory allocations.
    • Corrected an issue with attribution of CPU time with threads.
    • Leak detection enabled by default.

    UI improvements:

    • Hovering over memory timelines now shows amount of memory consumed, and when.
    • Memory timelines are compressed, reducing the size of profiles and reducing the memory consumption of the UI.
    • Suspected leaks are now highlighted.

    Compatibility:

    • Moved to Python 3.8.

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.4...v1.5.5

    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Feb 4, 2022)

    What's Changed

    • Adds exception handling to workaround a virtualized GPU issue (https://github.com/plasma-umass/scalene/issues/323).
    • Added average memory consumption calculation to function summaries.
    • Fixes a missing argument issue in output (https://github.com/plasma-umass/scalene/issues/344)
    • Fixes an issue with Jupyter notebooks when they don't have access to a web browser.

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.2...v1.5.3

    Source code(tar.gz)
    Source code(zip)
  • v1.5.2(Feb 3, 2022)

    What's Changed

    • Scalene's web-based GUI is now integrated into Jupyter notebooks
    • When using --cpu-only or profiling in Jupyter, columns for memory profiling (which would all be empty) are now hidden
    • The local webserver now exits after 5 seconds.

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.1...v1.5.2

    Source code(tar.gz)
    Source code(zip)
  • v1.5.1(Feb 1, 2022)

    What's Changed

    • Scalene now launches its web-based GUI locally by default. After profiling, It opens a browser tab to a local webserver and automatically brings up the most recent profile. (The old behavior is still available by using --cli on the command line.)

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.5.0...v1.5.1

    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(Jan 30, 2022)

    What's Changed

    • Scalene now supports a new web-based GUI. Invoke using --web; this opens a browser tab (http://plasma-umass.org/scalene-gui/) and prompts to upload the generated profile.json file in the current working directory.

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.4.2...v1.5.0

    Source code(tar.gz)
    Source code(zip)
  • v1.4.2(Jan 25, 2022)

    What's Changed

    • Fixed scalene looping infinitely in some functions by @sternj in https://github.com/plasma-umass/scalene/pull/335

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.4.1...v1.4.2

    Source code(tar.gz)
    Source code(zip)
  • v1.4.1(Jan 20, 2022)

    What's Changed

    • Update README.md by @barseghyanartur in https://github.com/plasma-umass/scalene/pull/324
    • Fixed double-counting newlines by @sternj in https://github.com/plasma-umass/scalene/pull/328
    • Added --allocation-sampling-window; fixed reporting of peak function summary by @emeryberger in https://github.com/plasma-umass/scalene/pull/329
    • Added in shim for get_context by @sternj in https://github.com/plasma-umass/scalene/pull/320

    Full Changelog: https://github.com/plasma-umass/scalene/compare/v1.4.0...v1.4.1

    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(Jan 12, 2022)

    New features:

    • adds --profile-exclude flag to exclude from profiles any filenames containing the given strings (comma-separated)
    • adds experimental memory leak detection (--memory-leak-detector)

    Enhancements:

    • provides more accurate memory accounting for small objects
    • higher resolution tracking of system vs. user time, per line, on Linux and Mac
    • new sampling approach, using “intervals” and per-line triggers, to ensure consistent accounting of per-line peak and average memory consumption

    Bug fixes:

    • fixes build on Windows
    • adds -arm64e target to enable building on Apple Silicon (M1)
    • fixed exit signal propagation for failed scripts
    • ensures correct build on old Xcode + Mac OS combinations
    • distribution includes wheels for Windows
    Source code(tar.gz)
    Source code(zip)
  • v1.3.16(Oct 21, 2021)

    • Added wheels for Python 3.10;
    • Improved granularity of memory recording;
    • Fixed "unable to find Python allocator functions" issue (#278);
    • Performed various cleanups;
    Source code(tar.gz)
    Source code(zip)
  • v1.3.15(Oct 4, 2021)

    Overhauled memory attribution logic:

    • uses Python's custom memory management APIs to efficiently disambiguate native vs. Python memory allocations, supplanting the prior approach that employed periodic call stack sampling.
    • performs immediate lookup of the location in source code responsible for allocation/deallocation, reducing the "smearing" effect in attributions previously caused by delayed attribution.
    • computes average memory consumption (rather than total) for each line of code (using the novel technique of "one-shot" tracing); lines executed many times no longer appear to have consumed large amounts of memory.
    • no longer reports negative memory growth from output, caused by lines freeing more than allocating, which has been a source of confusion for some users.
    • this release also resolves a memory leak.

    Overhauled internal signal handling:

    • uses signal actors, an approach based on actors that decouples signal handling logic from the main thread, avoiding the risk of races and deadlocks and simplifying logic

    Bug fixes:

    • fixed missing handling of pynvml.NVMLError_NotSupported exception (issue #262);
    • fixed issue cleaning up after profiling multiprocessor and multithreaded programs;
    • fixed issue not accounting for elapsed time when zero frames were recorded (issue #269).

    New features:

    • added JSON output option (--json);
    • added programmatic profile control (scalene_profiler.start() and scalene_profiler.stop()).

    Miscellaneous:

    • improved documentation.

    Note: this release is for MacOS and Linux only.

    Source code(tar.gz)
    Source code(zip)
  • v1.3.12(Jul 16, 2021)

    Fixes Windows-specific bug introduced in 1.3.11 leading to empty outputs. With this release, scalene on Windows now requires python 3.8 or newer.

    Source code(tar.gz)
    Source code(zip)
  • v1.3.11(Jul 15, 2021)

  • v1.3.10(Jul 12, 2021)

  • v1.3.9(Jul 11, 2021)

  • v1.3.8(Jun 29, 2021)

Owner
PLASMA @ UMass
PLASMA @ UMass
peace-performance (Rust) binding for python. To calculate star ratings and performance points for all osu! gamemodes

peace-performance-python Fast, To calculate star ratings and performance points for all osu! gamemodes peace-performance (Rust) binding for python bas

null 9 Sep 19, 2022
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild community 87 Dec 27, 2022
A python script developed to process Windows memory images based on triage type.

Overview A python script developed to process Windows memory images based on triage type. Requirements Python3 Bulk Extractor Volatility2 with Communi

CrowdStrike 245 Nov 24, 2022
python's memory-saving dictionary data structure

ConstDict python代替的Dict数据结构 若字典不会增加字段,只读/原字段修改 使用ConstDict可节省内存 Dict()内存主要消耗的地方: 1、Dict扩容机制,预留内存空间 2、Dict也是一个对象,内部会动态维护__dict__,增加slot类属性可以节省内容 节省内存大小

Grenter 1 Nov 3, 2021
Library for Memory Trace Statistics in Python

Memory Search Library for Memory Trace Statistics in Python The library uses tracemalloc as a core module, which is why it is only available for Pytho

Memory Search 1 Dec 20, 2021
Module for remote in-memory Python package/module loading through HTTP/S

httpimport Python's missing feature! The feature has been suggested in Python Mailing List Remote, in-memory Python package/module importing through H

John Torakis 220 Dec 17, 2022
Holographic Declarative Memory for Python ACT-R

HDM This is the repository for the Holographic Declarative Memory (HDM) module for Python ACT-R. This repository contains: documentation: a paper, con

Carleton Cognitive Modeling Lab 1 Jan 17, 2022
Demo of using DataLoader to prevent out of memory

Demo of using DataLoader to prevent out of memory

null 3 Jun 25, 2022
DownTime-Score is a Small project aimed to Monitor the performance and the availabillity of a variety of the Vital and Critical Moroccan Web Portals

DownTime-Score DownTime-Score is a Small project aimed to Monitor the performance and the availabillity of a variety of the Vital and Critical Morocca

adnane-tebbaa 5 Apr 30, 2022
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

Performance Ninja Class This is an online course where you can learn to find and fix low-level performance issues, for example CPU cache misses and br

Denis Bakhvalov 1.2k Dec 30, 2022
Developed a website to analyze and generate report of students based on the curriculum that represents student’s academic performance.

Developed a website to analyze and generate report of students based on the curriculum that represents student’s academic performance. We have developed the system such that, it will automatically parse data onto the database from excel file, which will in return reduce time consumption of analysis of data.

VIJETA CHAVHAN 3 Nov 8, 2022
Terrible sudoku solver with spaghetti code and performance issues

SudokuSolver Terrible sudoku solver with spaghetti code and performance issues - if it's unable to figure out next step it will stop working, it never

Kamil Bizoń 1 Dec 5, 2021
Performance monitoring and testing of OpenStack

Browbeat Browbeat is a performance tuning and analysis tool for OpenStack. Browbeat is free, Open Source software. Analyze and tune your Cloud for opt

cloud-bulldozer 83 Dec 14, 2022
Performance data for WASM SIMD instructions.

WASM SIMD Data This repository contains code and data which can be used to generate a JSON file containing information about the WASM SIMD proposal. F

Evan Nemerson 5 Jul 24, 2022
An interactive tool with which to explore the possible imaging performance of candidate ngEHT architectures.

ngEHTexplorer An interactive tool with which to explore the possible imaging performance of candidate ngEHT architectures. Welcome! ngEHTexplorer is a

Avery Broderick 7 Jan 28, 2022
Defichain maxi - Scripts to optimize performance on defichain rewards

defichain_maxi This script is made to optimize your defichain vault rewards by m

kuegi 75 Dec 31, 2022
Collie is for uncovering RDMA NIC performance anomalies

Collie is for uncovering RDMA NIC performance anomalies. Overview Prerequ

Bytedance Inc. 34 Dec 11, 2022
PyScaffold is a project generator for bootstrapping high quality Python packages

PyScaffold is a project generator for bootstrapping high quality Python packages, ready to be shared on PyPI and installable via pip. It is easy to use and encourages the adoption of the best tools and practices of the Python ecosystem, helping you and your team to stay sane, happy and productive. The best part? It is stable and has been used by thousands of developers for over half a decade!

PyScaffold 1.7k Jan 3, 2023