jupyter/ipython experiment containers for GPU and general RAM re-use

Overview

pypi ipyexperiments version Conda ipyexperiments version Anaconda-Server Badge ipyexperiments python compatibility PyPI - Downloads ipyexperiments license

ipyexperiments

jupyter/ipython experiment containers and utils for profiling and reclaiming GPU and general RAM, and detecting memory leaks.

About

This module's main purpose is to help calibrate hyper parameters in deep learning notebooks to fit the available GPU and General RAM, but, of course, it can be useful for any other use where memory limits is a constant issue. It is also useful for detecting memory leaks in your code. And over time other goodies that help with running machine learning experiments have been added.

This package is slowly evolving into a suite of different helper modules that are designed to help diagnose issues with memory leakages and make the debug of these easy.

Currently the package contains several modules:

  1. IpyExperiments - a smart container for ipython/jupyter experiments (documentation / demo)
  2. CellLogger - per cell memory profiler and more features (documentation / demo)
  3. ipython utils - workarounds for ipython memory leakage on exception (documentation)
  4. memory debugging and profiling utils (documentation)

Using this framework you can run multiple consequent experiments without needing to restart the kernel all the time, especially when you run out of GPU memory - the familiar to all "cuda: out of memory" error. When this happens you just go back to the notebook cell where you started the experiment, change the hyper parameters, and re-run the updated experiment until it fits the available memory. This is much more efficient and less error-prone then constantly restarting the kernel, and re-running the whole notebook.

As an extra bonus you get access to the memory consumption data, so you can use it to automate the discovery of the hyper parameters to suit your hardware's unique memory limits.

The idea behind this module is very simple - it implements a python function-like functionality, where its local variables get destroyed at the end of its run, giving us memory back, except it'll work across multiple jupyter notebook cells (or ipython). In addition it also runs gc.collect() to immediately release badly behaved variables with circular references, and reclaim general and GPU RAM. It also helps to discover memory leaks, and performs various other useful things behind the scenes.

If you need a more fine-grained memory profiling, the CellLogger sub-system reports RAM usage on a per cell-level when used with jupyter or per line of code in ipython. You get the resource usage report automatically as soon as a command or a cell finished executing. It includes other features, such as resetting RNG seed in python/numpy/pytorch if you need a reproducible result when re-running the whole notebook or just one cell.

Currently this sub-system logs GPU RAM, general RAM and execution time. But it can be expanded to track other important things. While there are various similar loggers out there, the main focus of this implementation is to help track GPU, whose main scarce resource is GPU RAM.

Usage demo

Installation

  • pypi:

    pip install ipyexperiments
    
  • conda:

    conda install -c conda-forge -c stason ipyexperiments
    
  • dev:

    pip install git+https://github.com/stas00/ipyexperiments.git
    

Usage

Here is an example with using code from the fastai v1 library, spread out through 8 jupyter notebook cells:

# cell 1
exp1 = IPyExperimentsPytorch() # new experiment
# cell 2
learn1 = language_model_learner(data_lm, bptt=60, drop_mult=0.25, pretrained_model=URLs.WT103)
# cell 3
learn1.lr_find()
# cell 4
del exp1
# cell 5
exp2 = IPyExperimentsPytorch() # new experiment
# cell 6
learn2 = language_model_learner(data_lm, bptt=70, drop_mult=0.3, pretrained_model=URLs.WT103)
# cell 7
learn2.lr_find()
# cell 8
del exp2

Demo

See this demo notebook, to see how this system works.

Documentation

  1. IPyExperiments
  2. CellLogger sub-system
  3. ipython utils
  4. memory debug/profiling utils

Contributing and Testing

Please see CONTRIBUTING.md.

Caveats

Google Colab

As of this writing colab runs a really old version of ipython (5.5.0) which doesn't support the modern ipython events API.

To solve this problem automatically so you never have to think about it again, always add this cell as the very first one in each colab notebook

# This magic cell should be put first in your colab notebook.
# It'll automatically upgrade colab's really antique ipython/ipykernel to their
# latest versions which are required for packages like ipyexperiments
from packaging import version
import IPython, ipykernel
if version.parse(IPython.__version__) <= version.parse("5.5.0"):
    !pip install -q --upgrade ipython
    !pip install -q --upgrade ipykernel

    import os
    import signal
    os.kill(os.getpid(), signal.SIGTERM)
print(f"ipykernel=={ipykernel.__version__}")
print(f"IPython=={IPython.__version__}")

If you're on the default old ipykernel/ipython this cell will update it, then crash the current session. After the crash restart the execution and the code will work normally.

History

A detailed history of changes can be found here.

Related Projects

(If you know of a related pytorch gpu memory profiler please send a PR to add the link. Thank you!)

Comments
  • Not using the same GPU as pytorch because pytorch device id doesn't match nvidia-smi id without setting environment variable. What is a good way to select gpu_id for experiments?

    Not using the same GPU as pytorch because pytorch device id doesn't match nvidia-smi id without setting environment variable. What is a good way to select gpu_id for experiments?

    Hi there. I think IPyExperiments is pretty cool, but I'm finding that IPyExperiment is not grabbing the same GPU as pytorch.

    For more context, I have two GPUs but I only use one since I'm doing some neural style transfer via optimization and don't have multiple batches so it doesn't look like nn.DataParallel would help me.

    Originally: I have a 1060 and a 1080ti, and pytorch grabs my 1080ti and calls it gpu id 0. However, nvidia-smi calls my 1080ti gpu 1. See attached screenshots.

    screen shot 2019-01-03 at 3 27 40 pm screen shot 2019-01-03 at 3 28 02 pm screen shot 2019-01-03 at 3 28 43 pm

    AlbanD's answer and SO explained why the ID mismatch: (CUDA order by default is sorting by computing power rather than bus-id which is what nvidia-smi shows) https://discuss.pytorch.org/t/gpu-devices-nvidia-smi-and-cuda-get-device-name-output-appear-inconsistent/13150 https://stackoverflow.com/questions/52815708/order-of-cuda-devices

    After setting the bus-id environment variable torch.cuda.current_device() now shows 0 and matches with the nvidia-smi ordering. screen shot 2019-01-03 at 4 08 03 pm

    Currently in line 34 of ipyexperiments.py I see that GPU id is always returning 0 from the lambda function, and I see that the global is being used in the backend for pytorch and I'm assuming that's how the device id is being grabbed. Perhaps it should be changed accept a torch.device and take the torch.device.index? device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") with IPyExperiments(device): Or maybe something like with IPyExperiments(gpu_id=1): What do you think?

    opened by jmhsi 7
  • Suggestion: Enable in script

    Suggestion: Enable in script

    Hi @stas00 :

    This works wonderfully in Jupyter/Notebook environment.

    Can we enable this (take snapshot of variables at T1, log changes in variables, and cleanup/finish on demand at T2) in scripts as well?

    This will be useful in long running scripts with user interactions.

    Currently, the code fails (by design, not bug) as the ipython.kernel is null when the code runs in script.

    Would be happy to test if something like this gets enabled.

    opened by rahulraj80 5
  • fix: possibly wrong variable name of  ipython_tb_clear_frames()

    fix: possibly wrong variable name of ipython_tb_clear_frames()

    https://github.com/stas00/ipyexperiments/blob/0.1.26/ipyexperiments/utils/ipython.py#L47

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if not IS_IN_IPYTHON:
                return func(*args, **kwargs)
    
            try:
                return func(*args, **kwargs)
            except:
                type, val, tb = sys.exc_info()
                traceback.clear_frames(exc_tb)  # here
    
    opened by tianjianjiang 3
  • Fails on Colab with Error in callback <bound method CellLogger.post_run_cell of <ipyexperiments.cell_logger.CellLogger object >>

    Fails on Colab with Error in callback >

    Unable to even load it once. Minimum example to replicate the issue

    [IN:1]

    !pip install -q ipyexperiments
    import ipyexperiments as ipye
    print(ipye.__version__)
    

    [OUT:1]

    0.1.17
    

    [IN:2]

    exp1 = ipye.IPyExperimentsPytorch()
    

    [OUT:2]

    *** Experiment started with the Pytorch backend
    Device: ID 0, Tesla K80 (11441 RAM)
    
    
    *** Current state:
    RAM:    Used    Free   Total       Util
    CPU:   1,681  11,122  13,021 MB  12.92% 
    GPU:     312  11,128  11,441 MB   2.73% 
    
    
    Error in callback <bound method CellLogger.post_run_cell of <ipyexperiments.cell_logger.CellLogger object at 0x7fcd473018d0>> (for post_run_cell):
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    TypeError: post_run_cell() missing 1 required positional argument: 'result'
    

    Similar failure with ipye.IPyExperimentsCPU()

    Have a look here: https://colab.research.google.com/gist/rahulraj80/0612b885c6fdf79b828fd38bfeb0e1f9/untitled1.ipynb

    opened by rahulraj80 2
  • stuck on lock?

    stuck on lock?

    after almost all day and probably some hibernations...

    I created an experiment just right after that tried to delete it...

    image

    I ended hiting once the stop button on top of the ipynb.

    opened by tyoc213 1
Owner
Stas Bekman
Solving Natural Language Processing/Machine Learning problems one problem at a time.
Stas Bekman
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

ArrayFire 4k Dec 29, 2022
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

NVIDIA Corporation 4.2k Jan 8, 2023
đź“Š A simple command-line utility for querying and monitoring GPU status

gpustat Just less than nvidia-smi? NOTE: This works with NVIDIA Graphics Devices only, no AMD support as of now. Contributions are welcome! Self-Promo

Jongwook Choi 3.2k Jan 4, 2023
Python interface to GPU-powered libraries

Package Description scikit-cuda provides Python interfaces to many of the functions in the CUDA device/runtime, CUBLAS, CUFFT, and CUSOLVER libraries

Lev E. Givon 924 Dec 26, 2022
cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Resources cuDF Reference Documentation: Python API refe

RAPIDS 5.2k Jan 8, 2023
Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

py3nvml Documentation also available at readthedocs. Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of

Fergal Cotter 212 Jan 4, 2023
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

BlazingSQL 1.8k Jan 2, 2023
A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python

GPUtil GPUtil is a Python module for getting the GPU status from NVIDA GPUs using nvidia-smi. GPUtil locates all GPUs on the computer, determines thei

Anders Krogh Mortensen 927 Dec 8, 2022
Library for faster pinned CPU <-> GPU transfer in Pytorch

SpeedTorch Faster pinned CPU tensor <-> GPU Pytorch variabe transfer and GPU tensor <-> GPU Pytorch variable transfer, in certain cases. Update 9-29-1

Santosh Gupta 657 Dec 19, 2022
A Python function for Slurm, to monitor the GPU information

Gpu-Monitor A Python function for Slurm, where I couldn't use nvidia-smi to monitor the GPU information. whole repo is not finish Installation TODO Mo

Squidward Tentacles 2 Feb 11, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Introduction This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code her

NVIDIA Corporation 6.9k Dec 28, 2022
Lazy Profiler is a simple utility to collect CPU, GPU, RAM and GPU Memory stats while the program is running.

lazyprofiler Lazy Profiler is a simple utility to collect CPU, GPU, RAM and GPU Memory stats while the program is running. Installation Use the packag

Shankar Rao Pandala 28 Dec 9, 2022
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Vulkan Kompute The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabl

The Institute for Ethical Machine Learning 1k Dec 26, 2022
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 335 Nov 29, 2022
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot

Nicolas Kruchten 512 Dec 26, 2022
IPython/Jupyter notebook module for Vega and Vega-Lite

IPython Vega IPython/Jupyter notebook module for Vega 5, and Vega-Lite 4. Notebooks with embedded visualizations can be viewed on GitHub and nbviewer.

Vega 294 Feb 12, 2021
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot

Nicolas Kruchten 419 Feb 11, 2021
Plotting library for IPython/Jupyter notebooks

bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

null 3.4k Dec 29, 2022
Plotting library for IPython/Jupyter notebooks

bqplot 2-D plotting library for Project Jupyter Introduction bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar

null 3.4k Dec 30, 2022