nvitop, an interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management

Overview

nvitop

Python 3.5+ PyPI Status Downloads License

nvitop, an interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management. (screenshots)

Screenshot Monitor

Table of Contents

This project is inspired by nvidia-htop and nvtop for monitoring, and gpustat for application integration.

nvidia-htop is a tool for enriching the output of nvidia-smi. It uses regular expressions to read the output of nvidia-smi from a subprocess, which is inefficient. In the meanwhile, there is a powerful interactive GPU monitoring tool called nvtop. But nvtop is written in C, which makes it lack of portability. And what is really inconvenient is that you should compile it yourself during the installation. Therefore, I made this repo. I got a lot help when reading the source code of ranger, the console file manager. Some files in this repo are copied and modified from ranger under the GPLv3 License.

So far, nvitop is in the beta phase, and most features have been tested on Linux. If you are using Windows with NVIDIA-GPUs, please submit feedback on the issue page, thank you very much!

If this repo is useful to you, please star โญ๏ธ it to let more people know ๐Ÿค— .

Compare to nvidia-smi:

Screenshot Comparison

Features

  • Informative and fancy output: show more information than nvidia-smi with colorized fancy box drawing.
  • Monitor mode: can run as a resource monitor, rather than print the results only once. (vs. nvidia-htop, limited support with command watch -c)
  • Interactive: responsive for user input in monitor mode. (vs. gpustat & py3nvml)
  • Efficient:
    • query device status using NVML Python bindings directly, instead of parsing the output of nvidia-smi. (vs. nvidia-htop)
    • cache results with ttl_cache from cachetools. (vs. gpustat)
    • display information using the curses library rather than print with ANSI escape codes. (vs. py3nvml)
    • asynchronously gather information using multithreading and correspond to user input much faster. (vs. nvtop)
  • Portable: work on both Linux and Windows.
    • get host process information using the cross-platform library psutil instead of calling ps -p in a subprocess. (vs. nvidia-htop & py3nvml)
    • written in pure Python, easy to install with pip. (vs. nvtop)
  • Integrable: easy to integrate into other applications, more than monitoring. (vs. nvidia-htop & nvtop)

Requirements

  • Python 3.5+
  • NVIDIA Management Library (NVML)
  • nvidia-ml-py
  • psutil
  • cachetools
  • curses
  • termcolor

NOTE: The NVIDIA Management Library (NVML) is a C-based programmatic interface for monitoring and managing various states. The runtime version of NVML library ships with the NVIDIA display driver (available at Download Drivers | NVIDIA), or can be downloaded as part of the NVIDIA CUDA Toolkit (available at CUDA Toolkit | NVIDIA Developer). The lists of OS platforms and NVIDIA-GPUs supported by the NVML library can be found in the NVML API Reference.

Installation

Install from PyPI (PyPI / Status):

pip3 install --upgrade nvitop

Install the latest version from GitHub (Commit Count):

pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop

Or, clone this repo and install manually:

git clone --depth=1 https://github.com/XuehaiPan/nvitop.git
cd nvitop
pip3 install .

IMPORTANT: pip will install nvidia-ml-py==11.450.51 as a dependency for nvitop. Please verify whether the nvidia-ml-py package is compatible with your NVIDIA driver version. Since nvidia-ml-py==11.450.129, the definition of nvmlProcessInfo_t has introduced two new fields gpuInstanceId and computeInstanceId (GI ID and CI ID in newer nvidia-smi) which are incompatible with some old NVIDIA drivers. nvitop may not display the processes correctly due to this incompatibility. You can check the release history of nvidia-ml-py at nvidia-ml-py's Release History, and install the compatible version manually.

Usage

Device and Process Status

Query the device and process status. The output is similar to nvidia-smi, but has been enriched and colorized.

# Query status of all devices
$ nvitop

# Specify query devices
$ nvitop -o 0 1  # only show  and 

# Only show devices in `CUDA_VISIBLE_DEVICES`
$ nvitop -ov

NOTE: nvitop uses only one character to indicate the type of processes. C stands for compute processes, G for graphics processes, and X for processes with both contexts (i.e. mi(x)ed, in nvidia-smi it is C+G).

Resource Monitor

Run as a resource monitor:

# Automatically configure the display mode according to the terminal size
$ nvitop -m

# Arbitrarily display as `full` mode
$ nvitop -m full

# Arbitrarily display as `compact` mode
$ nvitop -m compact

# Specify query devices
$ nvitop -m -o 0 1  # only show  and 

# Only show devices in `CUDA_VISIBLE_DEVICES`
$ nvitop -m -ov

Press q to return to the terminal.

For Docker Users

Build and run the Docker image using nvidia-docker:

docker build --tag nvitop:latest .
docker run --interactive --tty --rm --runtime=nvidia --gpus all --pid=host nvitop:latest -m

NOTE: Don't forget to add --pid=host option when running the container.

For SSH Users

Run nvitop directly on the SSH session instead of a login shell:

ssh user@host -t nvitop -m                 # installed by `sudo pip3 install ...`
ssh user@host -t '~/.local/bin/nvitop' -m  # installed by `pip3 install --user ...`

NOTE: Users need to add the -t option to allocate a pseudo-terminal over the SSH session for monitor mode.

Type nvitop --help for more information:

usage: nvitop [--help] [--version] [--monitor [{auto,full,compact}]]
              [--only idx [idx ...]] [--only-visible]
              [--gpu-util-thresh th1 th2] [--mem-util-thresh th1 th2]
              [--ascii]

An interactive NVIDIA-GPU process viewer.

optional arguments:
  --help, -h            show this help message and exit
  --version             show program's version number and exit
  --monitor [{auto,full,compact}], -m [{auto,full,compact}]
                        Run as a resource monitor. Continuously report query data,
                        rather than the default of just once.
                        If no argument is given, the default mode `auto` is used.
  --only idx [idx ...], -o idx [idx ...]
                        Only show the specified devices, suppress option `--only-visible`.
  --only-visible, -ov   Only show devices in environment variable `CUDA_VISIBLE_DEVICES`.
  --gpu-util-thresh th1 th2
                        Thresholds of GPU utilization to distinguish load intensity.
                        Coloring rules: light < th1 % <= moderate < th2 % <= heavy.
                        ( 1 <= th1 < th2 <= 99, defaults: 10 75 )
  --mem-util-thresh th1 th2
                        Thresholds of GPU memory utilization to distinguish load intensity.
                        Coloring rules: light < th1 % <= moderate < th2 % <= heavy.
                        ( 1 <= th1 < th2 <= 99, defaults: 10 80 )
  --ascii               Use ASCII characters only, which is useful for terminals without Unicode support.

Keybindings for Monitor Mode

Key Binding
q Quit and return to the terminal.
h Go to the help screen.
a / f / c Change the display mode to auto / full / compact.
/
/
[ / ]
Scroll the host information of processes.

^
Scroll left to the beginning of the process entry (i.e. beginning of line).

$
Scroll right to the end of the process entry (i.e. end of line).
/
/
/
Select and highlight a process.
Select the first process.
Select the last process.
Clear process selection.
I
Send signal.SIGINT to the selected process (interrupt).
T Send signal.SIGTERM to the selected process (terminate).
K Send signal.SIGKILL to the selected process (kill).
, / . Select the sort column.
/ Reverse the sort order.
on (oN) Sort processes in the natural order, i.e., in ascending (descending) order of GPU.
ou (oU) Sort processes by USER in ascending (descending) order.
op (oP) Sort processes by PID in descending (ascending) order.
og (oG) Sort processes by GPU-MEM in descending (ascending) order.
os (oS) Sort processes by %SM in descending (ascending) order.
oc (oC) Sort processes by %CPU in descending (ascending) order.
om (oM) Sort processes by %MEM in descending (ascending) order.
ot (oT) Sort processes by TIME in descending (ascending) order.

NOTE: Press the CTRL key to multiply the mouse wheel events by 5.

More than Monitoring

nvitop can be easily integrated into other applications.

Device

In [1]: from nvitop import host, Device, HostProcess, GpuProcess, NA

In [2]: Device.driver_version()
Out[2]: '430.64'

In [3]: Device.cuda_version()
Out[3]: '10.1'

In [4]: Device.count()
Out[4]: 10

In [5]: all_devices = Device.all()
   ...: all_devices
Out[5]: [
    Device(index=0, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=1, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=2, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=3, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=4, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=5, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=6, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=7, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=8, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    Device(index=9, name="GeForce RTX 2080 Ti", total_memory=11019MiB)
]

In [6]: nvidia0 = Device(0)  # from device index
   ...: nvidia0
Out[6]: Device(index=0, name="GeForce RTX 2080 Ti", total_memory=11019MiB)

In [7]: nvidia0.memory_used()  # in bytes
Out[7]: 9293398016

In [8]: nvidia0.memory_used_human()
Out[8]: '8862MiB'

In [9]: nvidia0.gpu_utilization()  # in percentage
Out[9]: 5

In [10]: nvidia0.processes()
Out[10]: {
    52059: GpuProcess(pid=52059, gpu_memory=7885MiB, type=C, device=Device(index=0, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=52059, name='ipython3', status='sleeping', pid=tatus, started='14:31:22')),
    53002: GpuProcess(pid=53002, gpu_memory=967MiB, type=C, device=Device(index=0, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=53002, name='python', status='running', started='14:31:59'))
}

In [11]: nvidia1 = Device(bus_id='00000000:05:00.0')  # from PCI bus ID
    ...: nvidia1
Out[11]: Device(index=1, name="GeForce RTX 2080 Ti", total_memory=11019MiB)

In [12]: nvidia1_snapshot = nvidia1.as_snapshot()
    ...: nvidia1_snapshot
Out[12]: DeviceSnapshot(
    real=Device(index=1, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    bus_id='00000000:05:00.0',
    compute_mode='Default',
    display_active='Off',
    ecc_errors='N/A',
    fan_speed=22,                       # in percentage
    fan_speed_string='22%',             # in percentage
    gpu_utilization=17,                 # in percentage
    gpu_utilization_string='17%',       # in percentage
    index=1,
    memory_free=10462232576,            # in bytes
    memory_free_human='9977MiB',
    memory_total=11554717696,           # in bytes
    memory_total_human='11019MiB',
    memory_usage='1041MiB / 11019MiB',
    memory_used=1092485120,             # in bytes
    memory_used_human='1041MiB',
    memory_utilization=9.5,             # in percentage
    memory_utilization_string='9.5%',   # in percentage
    name='GeForce RTX 2080 Ti',
    performance_state='P2',
    persistence_mode='Off',
    power_limit=250000,                 # in milliwatts (mW)
    power_status='66W / 250W',          # in watts (W)
    power_usage=66051,                  # in milliwatts (mW)
    temperature=39,                     # in Celsius
    temperature_string='39C'            # in Celsius
)

In [13]: nvidia1_snapshot.memory_utilization_string  # snapshot uses properties instead of function calls
Out[13]: '9%'

In [14]: nvidia1_snapshot.encoder_utilization  # snapshot will automatically retrieve not presented attributes from `real`
Out[14]: [0, 1000000]

In [15]: nvidia1_snapshot
Out[15]: DeviceSnapshot(
    real=Device(index=1, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    bus_id='00000000:05:00.0',
    compute_mode='Default',
    display_active='Off',
    ecc_errors='N/A',
    encoder_utilization=[0, 1000000],   ##### <-- new entry #####
    fan_speed=22,                       # in percentage
    fan_speed_string='22%',             # in percentage
    gpu_utilization=17,                 # in percentage
    gpu_utilization_string='17%',       # in percentage
    index=1,
    memory_free=10462232576,            # in bytes
    memory_free_human='9977MiB',
    memory_total=11554717696,           # in bytes
    memory_total_human='11019MiB',
    memory_usage='1041MiB / 11019MiB',
    memory_used=1092485120,             # in bytes
    memory_used_human='1041MiB',
    memory_utilization=9.5,             # in percentage
    memory_utilization_string='9.5%',   # in percentage
    name='GeForce RTX 2080 Ti',
    performance_state='P2',
    persistence_mode='Off',
    power_limit=250000,                 # in milliwatts (mW)
    power_status='66W / 250W',          # in watts (W)
    power_usage=66051,                  # in milliwatts (mW)
    temperature=39,                     # in Celsius
    temperature_string='39C'            # in Celsius
)

NOTE: The entry values may be 'N/A' (type: NaType) when the corresponding resources are not applicable. You can use some if entry != 'N/A' checks to avoid exceptions. It's safe to use float(entry) for numbers while 'N/A' will be converted to 'math.nan'. For example:

memory_used: Union[int, NaType] = device.memory_used()            # memory usage in bytes or `N/A`
memory_used_in_mib: float       = float(memory_used) / (1 << 20)  # memory usage in Mebibytes (MiB) or `math.nan`

Process

In [16]: processes = nvidia1.processes()  # type: Dict[int, GpuProcess]
    ...: processes
Out[16]: {
    23266: GpuProcess(pid=23266, gpu_memory=1031MiB, type=C, device=Device(index=1, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=23266, name='python3', status='running', started='2021-05-10 21:02:40'))
}

In [17]: process = processes[23266]
    ...: process
Out[17]: GpuProcess(pid=23266, gpu_memory=1031MiB, type=C, device=Device(index=1, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=23266, name='python3', status='running', started='2021-05-10 21:02:40'))

In [18]: process.status()
Out[18]: 'running'

In [19]: process.cmdline()  # type: List[str]
Out[19]: ['python3', 'rllib_train.py']

In [20]: process.command()  # type: str
Out[20]: 'python3 rllib_train.py'

In [21]: process.cwd()
Out[21]: '/home/xxxxxx/Projects/xxxxxx'

In [22]: process.gpu_memory_human()
Out[22]: '1031MiB'

In [23]: process.as_snapshot()
Out[23]: GpuProcessSnapshot(
    real=GpuProcess(pid=23266, gpu_memory=1031MiB, type=C, device=Device(index=1, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=23266, name='python3', status='running', started='2021-05-10 21:02:40')),
    cmdline=['python3', 'rllib_train.py'],
    command='python3 rllib_train.py',
    cpu_percent=98.5,                      # in percentage
    cpu_percent_string='98.5%',            # in percentage
    device=Device(index=1, name="GeForce RTX 2080 Ti", total_memory=11019MiB),
    gpu_encoder_utilization=0,             # in percentage
    gpu_encoder_utilization_string='0%',   # in percentage
    gpu_decoder_utilization=0,             # in percentage
    gpu_decoder_utilization_string='0%',   # in percentage
    gpu_memory=1081081856,                 # in bytes
    gpu_memory_human='1031MiB',
    gpu_memory_utilization=9.4,            # in percentage
    gpu_memory_utilization_string='9.4%',  # in percentage
    gpu_sm_utilization=0,                  # in percentage
    gpu_sm_utilization_string='0%',        # in percentage
    identity=(23266, 1620651760.15, 1),
    is_running=True,
    memory_percent=1.6849018430285683,     # in percentage
    memory_percent_string='1.7%',          # in percentage
    name='python3',
    pid=23266,
    running_time=datetime.timedelta(days=1, seconds=80013, microseconds=470024),
    running_time_human='46:13:33',
    type='C',                             # 'C' for Compute / 'G' for Graphics / 'C+G' for Both
    username='panxuehai'
)

In [24]: process.kill()

In [25]: list(map(Device.processes, all_devices))  # all processes
Out[25]: [
    {
        52059: GpuProcess(pid=52059, gpu_memory=7885MiB, type=C, device=Device(index=0, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=52059, name='ipython3', status='sleeping', started='14:31:22')),
        53002: GpuProcess(pid=53002, gpu_memory=967MiB, type=C, device=Device(index=0, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=53002, name='python', status='running', started='14:31:59'))
    },
    {},
    {},
    {},
    {},
    {},
    {},
    {},
    {
        84748: GpuProcess(pid=84748, gpu_memory=8975MiB, type=C, device=Device(index=8, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=84748, name='python', status='running', started='11:13:38'))
    },
    {
        84748: GpuProcess(pid=84748, gpu_memory=8341MiB, type=C, device=Device(index=9, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=84748, name='python', status='running', started='11:13:38'))
    }
]

In [26]: import os
    ...: this = HostProcess(os.getpid())
    ...: this
Out[26]: HostProcess(pid=35783, name='python', status='running', started='19:19:00')

In [27]: this.cmdline()  # type: List[str]
Out[27]: ['python', '-c', 'import IPython; IPython.terminal.ipapp.launch_new_instance()']

In [27]: this.command()  # not simply `' '.join(cmdline)` but quotes are added
Out[27]: 'python -c "import IPython; IPython.terminal.ipapp.launch_new_instance()"'

In [28]: this.memory_info()
Out[28]: pmem(rss=83988480, vms=343543808, shared=12079104, text=8192, lib=0, data=297435136, dirty=0)

In [29]: import cupy as cp
    ...: x = cp.zeros((10000, 1000))
    ...: this = GpuProcess(os.getpid(), nvidia0)  # construct from `GpuProcess(pid, device)` explicitly rather than calling `device.processes()`
    ...: this
Out[29]: GpuProcess(pid=35783, gpu_memory=N/A, type=N/A, device=Device(index=0, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=35783, name='python', status='running', started='19:19:00'))

In [30]: this.update_gpu_status()  # update used GPU memory from new driver queries
Out[30]: 267386880

In [31]: this
Out[31]: GpuProcess(pid=35783, gpu_memory=255MiB, type=C, device=Device(index=0, name="GeForce RTX 2080 Ti", total_memory=11019MiB), host=HostProcess(pid=35783, name='python', status='running', started='19:19:00'))

In [32]: id(this) == id(GpuProcess(os.getpid(), nvidia0))  # IMPORTANT: the instance will be reused while the process is running
Out[32]: True

Host (inherited from psutil)

In [33]: host.cpu_count()
Out[33]: 88

In [34]: host.cpu_percent()
Out[34]: 18.5

In [35]: host.cpu_times()
Out[35]: scputimes(user=2346377.62, nice=53321.44, system=579177.52, idle=10323719.85, iowait=28750.22, irq=0.0, softirq=11566.87, steal=0.0, guest=0.0, guest_nice=0.0)

In [36]: host.load_average()
Out[36]: (14.88, 17.8, 19.91)

In [37]: host.virtual_memory()
Out[37]: svmem(total=270352478208, available=192275968000, percent=28.9, used=53350518784, free=88924037120, active=125081112576, inactive=44803993600, buffers=37006450688, cached=91071471616, shared=23820632064, slab=8200687616)

In [38]: host.swap_memory()
Out[38]: sswap(total=65534947328, used=475136, free=65534472192, percent=0.0, sin=2404139008, sout=4259434496)

Screenshots

Screen Recording

Example output of nvitop:

Screenshot

Example output of nvitop -m:

Full Compact
Full Compact

License

nvitop is released under the GNU General Public License, version 3 (GPLv3).

NOTE: Please feel free to use nvitop as a package or dependency for your own projects. However, if you want to add or modify some features of nvitop, or copy some source code of nvitop into your own code, the source code should also be released under the GPLv3 License (as nvitop contains some modified source code from ranger under the GPLv3 License).

Comments
  • MIG device support

    MIG device support

    Issue Type

    • Improvement/feature implementation

    Runtime Environment

    • Operating system and version: Ubuntu 20.04 LTS
    • Terminal emulator and version: GNOME Terminal 3.36.2
    • Python version: 3.5+
    • NVML version (driver version): 430.64 / 460.84 / 470.82.00
    • nvitop version or commit: WIP
    • nvidia-ml-py version: 11.450.51 / 11.450.129 / 11.495.46
    • Locale: C / C.UTF-8 / en_US.UTF-8

    Description

    Add MIG device support to nvitop.

    • core/device: Add class MigDevice and update CUDA_VISIBLE_DEVICES handling for MIG devices.
    • gui: Update nvitop's UI for MIG enabled setup.

    Motivation and Context

    Add MIG device support to nvitop. Resolves #5.

    Testing

    Help wanted, see https://github.com/XuehaiPan/nvitop/pull/8#issuecomment-1155241507.

    enhancement core cli / gui 
    opened by XuehaiPan 16
  • [Bug] gpu memory-usage not show right in driver 510 version

    [Bug] gpu memory-usage not show right in driver 510 version

    Runtime Environment

    • Operating system and version: Ubuntu 20.04 LTS
    • Terminal emulator and version: GNOME Terminal 3.36.2
    • Python version: 3.8.10
    • NVML version (driver version): 510.47.03
    • nvitop version or commit: 0.5.3
    • nvidia-ml-py version: 11.450.51
    • Locale: zh_CN.UTF-8

    Current Behavior

    After upgrade the nvidia driver to the latest version 510.47.03, the gpu memory-usage not show right in my workstation both for 1080Ti and A100. It shows more memory usage than the actual one, which is not matched with the nvidia-smi command.

    nvitop

    image

    nvidia-smi

    image

    It seems the nvtop command also makes mistakes.

    nvtop

    image

    Expected Behavior

    The gpu memory-usage should match the nvidia-smi.

    bug upstream pynvml 
    opened by jue-jue-zi 14
  • [Feature Request] MIG device support (e.g. A100 GPUs)

    [Feature Request] MIG device support (e.g. A100 GPUs)

    Hello!

    Firstly, thanks for creating and maintaining such an excellent library.

    Runtime Environment

    • Operating system and version: Ubuntu 20.04 LTS
    • Terminal emulator and version: GNOME Terminal 3.36.2
    • Python version: `3.7
    • NVML version (driver version): 450.0
    • nvitop version or commit: main@b669fa3
    • python-ml-py version: 11.450.51
    • Locale: en_US.UTF-8

    Current Behavior

    When running nvitop on MiG enabled A100 GPU. nvitop fails to detect the GPU running process and GPU memory consumption. Which can otherwise be viewed by running the command, nvidia-smi

    Expected Behavior

    The A100 MiG GPU should be visible in the GUI.

    Context

    So far we can only view CPU usage metrics, which are really handy but it would also be nice to have GPU usage as designed.

    Possible Solutions

    I think that the MiG naming convention is different from regular naming conventions, and looks something like this: MIG 7g.80gb Device 0: rather than just Device 0: as is currently set-up in the nvitop repo.

    Steps to reproduce

    • Run A100 in Mig mode
    • start nvitop watch -n 0.5 nvitop
    enhancement core cli / gui 
    opened by ki-arie 12
  • [Feature Request] Collect metrics in a fixed interval for the lifespan of a training job

    [Feature Request] Collect metrics in a fixed interval for the lifespan of a training job

    Hi @XuehaiPan,

    In your examples to collect metrics using ResourceMetricCollector inside a training loop, the collector.collect(), collects a snapshot at each epoch/batch loop which misses the the entire period between the previous and current loop. If a loop takes 5 minutes, we have the metrics at 5 minutes interval.

    I wonder if there is a way to run a process in background to collect the metrics at a certain interval let's say 5 seconds, during the lifespan of a training job?

    Therefore if the entire job took 1hr, with the 5 sec interval, we collect 720 snapshots.

    Thanks

    enhancement core 
    opened by classicboyir 8
  • [Enhancement] Skip error gpus and show normal infos automatically

    [Enhancement] Skip error gpus and show normal infos automatically

    Runtime Environment

    • Operating system and version: Ubuntu 20.04 LTS
    • Terminal emulator and version: SSH
    • Python version: 3.8.10
    • NVML version (driver version): 515.65.01
    • nvitop version or commit: 0.10.0
    • nvidia-ml-py version: 11.515.75

    Current Behavior

    There are four GPUs on our server. And one of those was overheated for some reasons, which make that GPU cannot be recognized. If run nvidia-smi command without any args to query all the GPUs, error Unable to determine the device handle for GPU 0000:0C:00.0: Unknown Error will show without showing the remaining normal GPUs' infos. But if the command assigns the normal GPUs (nvidia-smi -i 0,1,3), all infos of the normal GPUs can be shown directly.

    image image

    And if I use nvitop command to show the GPUs' infos, nvidia-ml-py will throw exceptions like this below,

    image image

    Expected Behavior

    I hope that with nvitop command, all the GPUs with errors can be skipped automatically, and show the normal GPUs' infos. If possible, maybe the error GPUs' info can be shown as tips below the normal infos using red fonts for emphasizing.

    bug enhancement 
    opened by jue-jue-zi 6
  • [Bug] display issue when running inside tmux

    [Bug] display issue when running inside tmux

    Runtime Environment

    • Operating system and version: Ubuntu 16.04 LTS
    • Terminal emulator and version: iTerm2 3.4.15
    • Python version: 3.9.7
    • NVML version (driver version): 470.57
    • nvitop version: main@latest
    • Locale: en_US.UTF-8

    Current Behavior

    When running nvitop inside tmux, the rendered display will become messed up, as shown in the screenshots. This behavior is not present when not using tmux.

    Steps to Reproduce

    1. open a tmux session
    2. run nvitop

    Images / Videos

    scrnsht

    opened by Cveinnt 6
  • [Feature Request] torch_geometric support

    [Feature Request] torch_geometric support

    First of all, thank you for the excellent nvitop.

    I want to know if you have plans to add an integration with PyTorch Geometric (pyg)? It is a really great library for GNNs. I don't know if its helpful at all but it also has some profiling functions in the torch_geometric.profile module. Since pytorch lightning doesn't give you granular control over your models (sometimes reqd in research) I haven't seen anyone use it. On the flipside, pytorch geometric is probably the most popular library for GNNs.

    Hope you consider this!

    enhancement core 
    opened by plutonium-239 5
  • NVML ERROR: RM has detected an NVML/RM version mismatch.

    NVML ERROR: RM has detected an NVML/RM version mismatch.

    I installed the nvitop via pip3 as described and it worked fine.

    Then I installed nvcc via:

    sudo apt install nvidia-cuda-toolkit

    Then nvitop stopped working with the error:

    NVML ERROR: RM has detected an NVML/RM version mismatch.

    How to make both work?

    opened by bounlu 4
  • feat(core/libnvml): add compatibility layers for NVML Python bindings

    feat(core/libnvml): add compatibility layers for NVML Python bindings

    Issue Type

    • Improvement/feature implementation

    Runtime Environment

    • Operating system and version: Ubuntu 20.04 LTS
    • Terminal emulator and version: GNOME Terminal 3.36.2
    • Python version: 3.9.13
    • NVML version (driver version): 470.129.06
    • nvitop version or commit: v0.7.1
    • python-ml-py version: 11.450.51
    • Locale: en_US.UTF-8

    Description

    Automatically patch the pynvml module when the first call fails when calling the versioned APIs. Now we support a more broad range of the PyPI package nvidia-ml-py dependency versions.

    Motivation and Context

    See #29 for more details.

    Resolves #29 Closes #13

    Testing

    Using nvidia-ml-py == 11.515.48 with the NVIDIA R430 driver (CUDA 10.x):

    $ pip3 install --ignore-installed .
    Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
    Processing /home/panxuehai/Projects/nvitop
      Installing build dependencies ... done
      Getting requirements to build wheel ... done
      Installing backend dependencies ... done
      Preparing metadata (pyproject.toml) ... done
    Collecting psutil>=5.6.6
      Using cached https://pypi.tuna.tsinghua.edu.cn/packages/62/1f/f14225bda76417ab9bd808ff21d5cd59d5435a9796ca09b34d4cb0edcd88/psutil-5.9.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (281 kB)
    Collecting cachetools>=1.0.1
      Using cached https://pypi.tuna.tsinghua.edu.cn/packages/68/aa/5fc646cae6e997c3adf3b0a7e257cda75cff21fcba15354dffd67789b7bb/cachetools-5.2.0-py3-none-any.whl (9.3 kB)
    Collecting nvidia-ml-py<11.516.0a0,>=11.450.51
      Using cached https://pypi.tuna.tsinghua.edu.cn/packages/7c/b6/738d9c68f8abcdedf8901c4abf00df74e8f281626de67b5185dcc443e693/nvidia_ml_py-11.515.48-py3-none-any.whl (28 kB)
    Collecting termcolor>=1.0.0
      Using cached termcolor-1.1.0-py3-none-any.whl
    Building wheels for collected packages: nvitop
      Building wheel for nvitop (pyproject.toml) ... done
      Created wheel for nvitop: filename=nvitop-0.7.1+6.g0feed99-py3-none-any.whl size=154871 sha256=da07a27d8579e1cc38a3bd3d537f0d885d592df0c3293ba585b831fa236f100e
      Stored in directory: /tmp/pip-ephem-wheel-cache-3qzopv_e/wheels/9a/17/84/86d7a108dc1c0d7a25e96628d476e19df73a27353725b35779
    Successfully built nvitop
    Installing collected packages: termcolor, nvidia-ml-py, psutil, cachetools, nvitop
    Successfully installed cachetools-5.2.0 nvidia-ml-py-11.515.48 nvitop-0.7.1+6.g84f43f5 psutil-5.9.1 termcolor-1.1.0
    

    Result:

    The v3 API nvmlDeviceGetComputeRunningProcesses_v3 fails-back to v2 API nvmlDeviceGetComputeRunningProcesses_v2 (which could not found either), then fails-back to v1 API nvmlDeviceGetComputeRunningProcesses.

    $ LOGLEVEL=DEBUG ./nvitop.py -1
    Patching NVML function pointer `nvmlDeviceGetComputeRunningProcesses_v3`
        Map NVML function `nvmlDeviceGetComputeRunningProcesses_v3` to `nvmlDeviceGetComputeRunningProcesses_v2`
        Map NVML function `nvmlDeviceGetGraphicsRunningProcesses_v3` to `nvmlDeviceGetGraphicsRunningProcesses_v2`
        Map NVML function `nvmlDeviceGetMPSComputeRunningProcesses_v3` to `nvmlDeviceGetMPSComputeRunningProcesses_v2`
        Patch NVML struct `c_nvmlProcessInfo_t` to `c_nvmlProcessInfo_v2_t`
    Patching NVML function pointer `nvmlDeviceGetComputeRunningProcesses_v2`
        Map NVML function `nvmlDeviceGetComputeRunningProcesses_v2` to `nvmlDeviceGetComputeRunningProcesses`
        Map NVML function `nvmlDeviceGetGraphicsRunningProcesses_v2` to `nvmlDeviceGetGraphicsRunningProcesses`
        Map NVML function `nvmlDeviceGetMPSComputeRunningProcesses_v2` to `nvmlDeviceGetMPSComputeRunningProcesses`
        Map NVML function `nvmlDeviceGetComputeRunningProcesses_v3` to `nvmlDeviceGetComputeRunningProcesses`
        Map NVML function `nvmlDeviceGetGraphicsRunningProcesses_v3` to `nvmlDeviceGetGraphicsRunningProcesses`
        Map NVML function `nvmlDeviceGetMPSComputeRunningProcesses_v3` to `nvmlDeviceGetMPSComputeRunningProcesses`
        Patch NVML struct `c_nvmlProcessInfo_t` to `c_nvmlProcessInfo_v1_t`
    Sun Jul 24 19:32:24 2022
    โ•’โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ••
    โ”‚ NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     โ”‚
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚ GPU  Name        Persistence-Mโ”‚ Bus-Id        Disp.A โ”‚ Volatile Uncorr. ECC โ”‚
    โ”‚ Fan  Temp  Perf  Pwr:Usage/Capโ”‚         Memory-Usage โ”‚ GPU-Util  Compute M. โ”‚
    โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ••
    โ”‚   0  TITAN Xp            Off  โ”‚ 00000000:05:00.0 Off โ”‚                  N/A โ”‚ MEM: โ– 0.2%                                                                      โ”‚
    โ”‚ 24%   43C    P8    19W / 250W โ”‚     19MiB / 12194MiB โ”‚      0%      Default โ”‚ UTL: โ– 0%                                                                        โ”‚
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚   1  TITAN Xp            Off  โ”‚ 00000000:06:00.0 Off โ”‚                  N/A โ”‚ MEM: โ– 0.0%                                                                      โ”‚
    โ”‚ 23%   36C    P8    10W / 250W โ”‚      2MiB / 12196MiB โ”‚      0%      Default โ”‚ UTL: โ– 0%                                                                        โ”‚
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚   2  ..orce GTX TITAN X  Off  โ”‚ 00000000:09:00.0 Off โ”‚                  N/A โ”‚ MEM: โ– 0.0%                                                                      โ”‚
    โ”‚ 22%   34C    P8    17W / 250W โ”‚      2MiB / 12213MiB โ”‚      0%      Default โ”‚ UTL: โ– 0%                                                                        โ”‚
    โ•˜โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•›
    [ CPU: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰ 5.3%                                                                                                          ]  ( Load Average:  0.89  0.61  0.39 )
    [ MEM: โ–ˆโ–ˆโ–ˆโ–‹ 3.2%                                                                                                            ]  [ SWP: โ– 0.0%                     ]
    
    โ•’โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ••
    โ”‚ Processes:                                                                                                                                    panxuehai@ubuntu โ”‚
    โ”‚ GPU     PID      USER  GPU-MEM %SM  %CPU  %MEM      TIME  COMMAND                                                                                              โ”‚
    โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
    โ”‚   0    2122 G    root    17MiB   0   0.0   0.0  4.2 days  /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch โ”‚
    โ•˜โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•›
    
    enhancement pynvml core 
    opened by XuehaiPan 4
  • nvidia-ml-py version conflicts with other packages (e.g., gpustat)

    nvidia-ml-py version conflicts with other packages (e.g., gpustat)

    Context: https://github.com/wookayin/gpustat/pull/107 trying to use nvidia-ml-py, Related issues: #4

    Hello @XuehaiPan,

    I just realized that nvitop requires nvidia-ml-py to be pinned at 11.450.51 due to the incompatible API, as discussed in wookayin/gpustat#107. My solution (in gpustat) to this bothersome library is to use pynvml greater than 11.450.129, but this would create some nuisance problems for normal users who may have both nvitop and gpustat>=1.0 installed.

    From nvitop's README:

    IMPORTANT: pip will install nvidia-ml-py==11.450.51 as a dependency for nvitop. Please verify whether the nvidia-ml-py package is compatible with your NVIDIA driver version. You can check the release history of nvidia-ml-py at nvidia-ml-py's Release History, and install the compatible version manually by:

    Since nvidia-ml-py>=11.450.129, the definition of nvmlProcessInfo_t has introduced two new fields gpuInstanceId and computeInstanceId (GI ID and CI ID in newer nvidia-smi) which are incompatible with some old NVIDIA drivers. nvitop may not display the processes correctly due to this incompatibility.

    Is having pynvml version NOT pinned at the specific version an option for you? More specifically, nvmlDeviceGetComputeRunningProcesses_v2 exists since 11.450.129+. In my opinion, pinning nvidia-ml-py at too old and too specific version isn't a great idea, although I also admit that the solution I accepted isn't ideal at all.

    We could discuss and coordinate together to avoid any package conflict issues, because in the current situation gpustat and nvitop would be not compatible with each other due to the nvidia-ml-py version.

    enhancement pynvml 
    opened by wookayin 4
  • [Question] Can nvitop keep a log/record of GPU-Utilization and store in a CSV?

    [Question] Can nvitop keep a log/record of GPU-Utilization and store in a CSV?

    I'm trying to record GPU-utilization, and the users., and what programs are running. Is there a way to log and save this information? Like into a CSV or database?

    Sorry if I missed something from the readme

    enhancement question core 
    opened by FelixMildon 4
  • [BUG] Cannot gather infomation of the `/XWayland` process in WSLg

    [BUG] Cannot gather infomation of the `/XWayland` process in WSLg

    Required prerequisites

    • [X] I have read the documentation https://nvitop.readthedocs.io.
    • [X] I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
    • [X] I have tried the latest version of nvitop in an new isolated virtual environment.

    What version of nvitop are you using?

    0.11.0

    Operating system and version

    Windows 10 build 10.0.19045.0

    NVIDIA driver version

    526.98

    NVIDIA-SMI

    Sat Dec 10 20:36:09 2022
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 525.60.02    Driver Version: 526.98       CUDA Version: 12.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  On   | 00000000:09:00.0  On |                  N/A |
    |  0%   56C    P3    34W / 240W |   2880MiB /  8192MiB |      7%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A        23      G   /Xwayland                       N/A      |
    +-----------------------------------------------------------------------------+
    

    Python environment

    $ python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))'
    3.10.8 (main, Oct 11 2022, 11:35:05) [GCC 11.2.0] linux
    nvidia-ml-py==11.515.75
    nvitop==0.11.0
    

    Problem description

    The XWayland process in WSLg uses the NVIDIA GPU in the WSL instance. However, WSL does not expose the process in the /proc directory. So the psutil fails to gather process information by reading the files under /proc/23.

    Steps to Reproduce

    Command lines:

    $ wsl.exe --shutdown
    $ wsl.exe --update
    $ wsl.exe
    user@WSL $ nvitop
    

    Traceback

    No response

    Logs

    $ nvitop -1
    Sat Dec 10 12:35:33 2022
    โ•’โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ••
    โ”‚ NVITOP 0.11.0        Driver Version: 526.98       CUDA Driver Version: 12.0 โ”‚
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚ GPU  Name        Persistence-Mโ”‚ Bus-Id        Disp.A โ”‚ Volatile Uncorr. ECC โ”‚
    โ”‚ Fan  Temp  Perf  Pwr:Usage/Capโ”‚         Memory-Usage โ”‚ GPU-Util  Compute M. โ”‚
    โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ••
    โ”‚   0  GeForce RTX 3070    On   โ”‚ 00000000:09:00.0  On โ”‚                  N/A โ”‚ MEM: โ–ˆโ–ˆโ–ˆโ– 34.7%    โ”‚
    โ”‚  0%   55C    P3    30W / 240W โ”‚    2844MiB / 8192MiB โ”‚     49%      Default โ”‚ UTL: โ–ˆโ–ˆโ–ˆโ–ˆโ– 49%     โ”‚
    โ•˜โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•›
    [ CPU: โ–ˆโ–Œ 3.1%                                                ]  ( Load Average:  0.08  0.02  0.01 )
    [ MEM: โ–ˆโ–ˆโ–Ž 4.5%                                               ]  [ SWP: โ– 0.0%                     ]
    
    โ•’โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ••
    โ”‚ Processes:                                                       PanXuehai@BIGAI-PanXuehai (WSL) โ”‚
    โ”‚ GPU     PID      USER  GPU-MEM %SM  %CPU  %MEM  TIME  COMMAND                                    โ”‚
    โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
    โ”‚   0      23 G     N/A WDDM:N/A N/A   N/A   N/A   N/A  No Such Process                            โ”‚
    โ•˜โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•›
    

    Expected behavior

    Show the process information rather than N/A and No Such Process.

    Additional context

    I have raised an issue in microsoft/wslg#919.

    • microsoft/wslg#919
    bug upstream core cli / gui 
    opened by XuehaiPan 0
Releases(v0.11.0)
  • v0.11.0(Dec 4, 2022)

  • v0.10.2(Nov 18, 2022)

  • v0.10.1(Oct 22, 2022)

    • Add warning messages for corrupted dependencies (Fixes #44).
    • Handle "NVML Unknown Error" when failing to get the device handles (Fixes #45).
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Oct 17, 2022)

    The last beta version of nvitop. We are waiting for several months of compatibility check the NVIDIA driver and nvidia-ml-py package. The v1.0 stable release will be coming soon if everything goes fine. Feedback is welcome.

    Source code(tar.gz)
    Source code(zip)
Owner
Xuehai Pan
PhD student at Peking University.
Xuehai Pan
Minimal Ethereum fee data viewer for the terminal, contained in a single python script.

Minimal Ethereum fee data viewer for the terminal, contained in a single python script. Connects to your node and displays some metrics in real-time.

null 48 Dec 5, 2022
Realtime Viewer Mandelbrot set with Python and Taichi (cpu, opengl, cuda, vulkan, metal)

Mandelbrot-set-Realtime-Viewer- Realtime Viewer Mandelbrot set with Python and Taichi (cpu, opengl, cuda, vulkan, metal) Control: "WASD" - movement, "

null 22 Oct 31, 2022
a robust room presence solution for home automation with nearly no false negatives

Argos Room Presence This project builds a room presence solution on top of Argos. Using just a cheap raspberry pi zero w (plus an attached pi camera,

Angad Singh 46 Sep 18, 2022
Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

Bokeh 17.1k Dec 31, 2022
An interactive GUI for WhiteboxTools in a Jupyter-based environment

whiteboxgui An interactive GUI for WhiteboxTools in a Jupyter-based environment GitHub repo: https://github.com/giswqs/whiteboxgui Documentation: http

Qiusheng Wu 105 Dec 15, 2022
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotlyโ€™s Python graphing library is Dash En

Plotly 12.7k Jan 5, 2023
Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

Bokeh 14.7k Feb 13, 2021
Draw interactive NetworkX graphs with Altair

nx_altair Draw NetworkX graphs with Altair nx_altair offers a similar draw API to NetworkX but returns Altair Charts instead. If you'd like to contrib

Zachary Sailer 206 Dec 12, 2022
Interactive plotting for Pandas using Vega-Lite

pdvega: Vega-Lite plotting for Pandas Dataframes pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas datafra

Altair 342 Oct 26, 2022
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotlyโ€™s Python graphing library is Dash En

Plotly 8.9k Feb 18, 2021
Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

Bokeh 14.7k Feb 18, 2021
Draw interactive NetworkX graphs with Altair

nx_altair Draw NetworkX graphs with Altair nx_altair offers a similar draw API to NetworkX but returns Altair Charts instead. If you'd like to contrib

Zachary Sailer 156 Feb 6, 2021
Interactive plotting for Pandas using Vega-Lite

pdvega: Vega-Lite plotting for Pandas Dataframes pdvega is a library that allows you to quickly create interactive Vega-Lite plots from Pandas datafra

Altair 340 Feb 1, 2021
Easily convert matplotlib plots from Python into interactive Leaflet web maps.

mplleaflet mplleaflet is a Python library that converts a matplotlib plot into a webpage containing a pannable, zoomable Leaflet map. It can also embe

Jacob Wasserman 502 Dec 28, 2022
SummVis is an interactive visualization tool for text summarization.

SummVis is an interactive visualization tool for analyzing abstractive summarization model outputs and datasets.

Robustness Gym 246 Dec 8, 2022
An interactive dashboard built with python that enables you to visualise how rent prices differ across Sweden.

sweden-rent-dashboard An interactive dashboard built with python that enables you to visualise how rent prices differ across Sweden. The dashboard/web

Rory Crean 5 Dec 19, 2021
ICS-Visualizer is an interactive Industrial Control Systems (ICS) network graph that contains up-to-date ICS metadata

ICS-Visualizer is an interactive Industrial Control Systems (ICS) network graph that contains up-to-date ICS metadata (Name, company, port, user manua

QeeqBox 2 Dec 13, 2021
An interactive dashboard for visualisation, integration and classification of data using Active Learning.

AstronomicAL An interactive dashboard for visualisation, integration and classification of data using Active Learning. AstronomicAL is a human-in-the-

null 45 Nov 28, 2022
Learning Convolutional Neural Networks with Interactive Visualization.

CNN Explainer An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs) For more information,

Polo Club of Data Science 6.3k Jan 1, 2023