An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Overview

MIT licensed Build Status Issues Bugs Pull Requests Version Join the chat at https://gitter.im/Microsoft/nni Documentation Status

NNI Doc | 简体中文

NNI (Neural Network Intelligence) is a lightweight but powerful toolkit to help users automate Feature Engineering, Neural Architecture Search, Hyperparameter Tuning and Model Compression.

The tool manages automated machine learning (AutoML) experiments, dispatches and runs experiments' trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different training environments like Local Machine, Remote Servers, OpenPAI, Kubeflow, FrameworkController on K8S (AKS etc.), DLWorkspace (aka. DLTS), AML (Azure Machine Learning), AdaptDL (aka. ADL) , other cloud options and even Hybrid mode.

Who should consider using NNI

  • Those who want to try different AutoML algorithms in their training code/model.
  • Those who want to run AutoML trial jobs in different environments to speed up search.
  • Researchers and data scientists who want to easily implement and experiment new AutoML algorithms, may it be: hyperparameter tuning algorithm, neural architect search algorithm or model compression algorithm.
  • ML Platform owners who want to support AutoML in their platform.

What's NEW!  

NNI capabilities in a glance

NNI provides CommandLine Tool as well as an user friendly WebUI to manage training experiments. With the extensible API, you can customize your own AutoML algorithms and training services. To make it easy for new users, NNI also provides a set of build-in state-of-the-art AutoML algorithms and out of box support for popular training platforms.

Within the following table, we summarized the current NNI capabilities, we are gradually adding new capabilities and we'd love to have your contribution.

Frameworks & Libraries Algorithms Training Services
Built-in
  • Supported Frameworks
    • PyTorch
    • Keras
    • TensorFlow
    • MXNet
    • Caffe2
    • More...
  • Supported Libraries
    • Scikit-learn
    • XGBoost
    • LightGBM
    • More...
Hyperparameter Tuning Neural Architecture Search (Retiarii) Model Compression Feature Engineering (Beta) Early Stop Algorithms
References

Installation

Install

NNI supports and is tested on Ubuntu >= 16.04, macOS >= 10.14.1, and Windows 10 >= 1809. Simply run the following pip install in an environment that has python 64-bit >= 3.6.

Linux or macOS

python3 -m pip install --upgrade nni

Windows

python -m pip install --upgrade nni

If you want to try latest code, please install NNI from source code.

For detail system requirements of NNI, please refer to here for Linux & macOS, and here for Windows.

Note:

  • If there is any privilege issue, add --user to install NNI in the user directory.
  • Currently NNI on Windows supports local, remote and pai mode. Anaconda or Miniconda is highly recommended to install NNI on Windows.
  • If there is any error like Segmentation fault, please refer to FAQ. For FAQ on Windows, please refer to NNI on Windows.

Verify installation

  • Download the examples via clone the source code.

    git clone -b v2.5 https://github.com/Microsoft/nni.git
  • Run the MNIST example.

    Linux or macOS

    nnictl create --config nni/examples/trials/mnist-pytorch/config.yml

    Windows

    nnictl create --config nni\examples\trials\mnist-pytorch\config_windows.yml
  • Wait for the message INFO: Successfully started experiment! in the command line. This message indicates that your experiment has been successfully started. You can explore the experiment using the Web UI url.

INFO: Starting restful server...
INFO: Successfully started Restful server!
INFO: Setting local config...
INFO: Successfully set local config!
INFO: Starting experiment...
INFO: Successfully started experiment!
-----------------------------------------------------------------------
The experiment id is egchD4qy
The Web UI urls are: http://223.255.255.1:8080   http://127.0.0.1:8080
-----------------------------------------------------------------------

You can use these commands to get more information about the experiment
-----------------------------------------------------------------------
         commands                       description
1. nnictl experiment show        show the information of experiments
2. nnictl trial ls               list all of trial jobs
3. nnictl top                    monitor the status of running experiments
4. nnictl log stderr             show stderr log content
5. nnictl log stdout             show stdout log content
6. nnictl stop                   stop an experiment
7. nnictl trial kill             kill a trial job by id
8. nnictl --help                 get help information about nnictl
-----------------------------------------------------------------------
  • Open the Web UI url in your browser, you can view detailed information of the experiment and all the submitted trial jobs as shown below. Here are more Web UI pages.

webui

Releases and Contributing

NNI has a monthly release cycle (major releases). Please let us know if you encounter a bug by filling an issue.

We appreciate all contributions. If you are planning to contribute any bug-fixes, please do so without further discussions.

If you plan to contribute new features, new tuners, new training services, etc. please first open an issue or reuse an exisiting issue, and discuss the feature with us. We will discuss with you on the issue timely or set up conference calls if needed.

To learn more about making a contribution to NNI, please refer to our How-to contribution page.

We appreciate all contributions and thank all the contributors!

Feedback

Join IM discussion groups:

Gitter WeChat
image OR image

Test status

Essentials

Type Status
Fast test Build Status
Full linux Build Status
Full windows Build Status

Training services

Type Status
Remote - linux to linux Build Status
Remote - linux to windows Build Status
Remote - windows to linux Build Status
OpenPAI Build Status
Frameworkcontroller Build Status
Kubeflow Build Status
Hybrid Build Status
AzureML Build Status

Related Projects

Targeting at openness and advancing state-of-art technology, Microsoft Research (MSR) had also released few other open source projects.

  • OpenPAI : an open source platform that provides complete AI model training and resource management capabilities, it is easy to extend and supports on-premise, cloud and hybrid environments in various scale.
  • FrameworkController : an open source general-purpose Kubernetes Pod Controller that orchestrate all kinds of applications on Kubernetes by a single controller.
  • MMdnn : A comprehensive, cross-framework solution to convert, visualize and diagnose deep neural network models. The "MM" in MMdnn stands for model management and "dnn" is an acronym for deep neural network.
  • SPTAG : Space Partition Tree And Graph (SPTAG) is an open source library for large scale vector approximate nearest neighbor search scenario.
  • nn-Meter : An accurate inference latency predictor for DNN models on diverse edge devices.

We encourage researchers and students leverage these projects to accelerate the AI development and research.

License

The entire codebase is under MIT license

Comments
  • The software breaks down at example code after running for 10s normally.

    The software breaks down at example code after running for 10s normally.

    Describe the issue: I am trying your pytorch example in https://nni.readthedocs.io/zh/stable/tutorials/hpo_quickstart_pytorch/model.html everything goes well, installation succeed, and the web page looks nice. But after about 10s, the program ends up.

    image

    [2023-01-02 01:56:47] Creating experiment, Experiment ID: 6j50nacv
    [2023-01-02 01:56:47] Starting web server...
    [2023-01-02 01:56:48] Setting up...
    [2023-01-02 01:56:48] Web portal URLs: http://127.0.0.1:8080 http://172.18.36.113:8080
    node:internal/fs/watchers:252
        throw error;
        ^
    
    Error: ENOSPC: System limit for number of file watchers reached, watch '/home/linux_username/nni-experiments/6j50nacv/trials/gnav8/.nni/metrics'
        at FSWatcher.<computed> (node:internal/fs/watchers:244:19)
        at Object.watch (node:fs:2251:34)
        at TailStream.waitForMoreData (/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni_node/node_modules/tail-stream/index.js:123:31)
        at TailStream.<anonymous> (/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni_node/node_modules/tail-stream/index.js:275:22)
        at FSReqCallback.wrapper [as oncomplete] (node:fs:660:5) {
      errno: -28,
      syscall: 'watch',
      code: 'ENOSPC',
      path: '/home/linux_username/nni-experiments/6j50nacv/trials/gnav8/.nni/metrics',
      filename: '/home/linux_username/nni-experiments/6j50nacv/trials/gnav8/.nni/metrics'
    }
    Thrown at:
        at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
        at __node_internal_uvException (node:internal/errors:521:10)
        at FSWatcher.<computed> (node:internal/fs/watchers:244:19)
        at watch (node:fs:2251:34)
        at TailStream.waitForMoreData (/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni_node/node_modules/tail-stream/index.js:123:31)
        at /home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni_node/node_modules/tail-stream/index.js:275:22
        at wrapper (node:fs:660:5)
    Traceback (most recent call last):
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/runpy.py", line 197, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/__main__.py", line 85, in <module>
        main()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/__main__.py", line 61, in main
        dispatcher.run()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/runtime/msg_dispatcher_base.py", line 69, in run
        command, data = self._channel._receive()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/runtime/tuner_command_channel/channel.py", line 94, in _receive
        command = self._retry_receive()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/runtime/tuner_command_channel/channel.py", line 104, in _retry_receive
        self._channel.connect()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/runtime/tuner_command_channel/websocket.py", line 62, in connect
        self._ws = _wait(_connect_async(self._url))
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/runtime/tuner_command_channel/websocket.py", line 111, in _wait
        return future.result()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/concurrent/futures/_base.py", line 446, in result
        return self.__get_result()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
        raise self._exception
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/runtime/tuner_command_channel/websocket.py", line 125, in _connect_async
        return await websockets.connect(url, max_size=None)  # type: ignore
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/websockets/legacy/client.py", line 659, in __await_impl_timeout__
        return await asyncio.wait_for(self.__await_impl__(), self.open_timeout)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/asyncio/tasks.py", line 479, in wait_for
        return fut.result()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/websockets/legacy/client.py", line 663, in __await_impl__
        _transport, _protocol = await self._create_connection()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/asyncio/base_events.py", line 1065, in create_connection
        raise exceptions[0]
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/asyncio/base_events.py", line 1050, in create_connection
        sock = await self._connect_sock(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/asyncio/base_events.py", line 961, in _connect_sock
        await self.sock_connect(sock, address)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/asyncio/selector_events.py", line 500, in sock_connect
        return await fut
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/asyncio/selector_events.py", line 535, in _sock_connect_cb
        raise OSError(err, f'Connect call failed {address}')
    ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 8080)
    Traceback (most recent call last):
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
        conn = connection.create_connection(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection
        raise err
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
        sock.connect(sa)
    ConnectionRefusedError: [Errno 111] Connection refused
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
        httplib_response = self._make_request(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 398, in _make_request
        conn.request(method, url, **httplib_request_kw)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connection.py", line 239, in request
        super(HTTPConnection, self).request(method, url, body=body, headers=headers)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 1285, in request
        self._send_request(method, url, body, headers, encode_chunked)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 1331, in _send_request
        self.endheaders(body, encode_chunked=encode_chunked)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 1280, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 1040, in _send_output
        self.send(msg)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 980, in send
        self.connect()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connection.py", line 205, in connect
        conn = self._new_conn()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
        raise NewConnectionError(
    urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f9d7c6042e0>: Failed to establish a new connection: [Errno 111] Connection refused
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
        resp = conn.urlopen(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in urlopen
        retries = retries.increment(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9d7c6042e0>: Failed to establish a new connection: [Errno 111] Connection refused'))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/linux_username/yecanming/repo/new_things_exploring/tunning_parameter/nni/nni_main.py", line 23, in <module>
        experiment.run(8080)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/experiment/experiment.py", line 183, in run
        self._wait_completion()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/experiment/experiment.py", line 163, in _wait_completion
        status = self.get_status()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/experiment/experiment.py", line 283, in get_status
        resp = rest.get(self.port, '/check-status', self.url_prefix)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/experiment/rest.py", line 43, in get
        return request('get', port, api, prefix=prefix)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/experiment/rest.py", line 31, in request
        resp = requests.request(method, url, timeout=timeout)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/api.py", line 59, in request
        return session.request(method=method, url=url, **kwargs)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
        resp = self.send(prep, **send_kwargs)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
        r = adapter.send(request, **kwargs)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/adapters.py", line 565, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9d7c6042e0>: Failed to establish a new connection: [Errno 111] Connection refused'))
    [2023-01-02 01:57:08] Stopping experiment, please wait...
    [2023-01-02 01:57:08] ERROR: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/v1/nni/experiment (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9d929a5c40>: Failed to establish a new connection: [Errno 111] Connection refused'))
    Traceback (most recent call last):
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
        conn = connection.create_connection(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection
        raise err
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
        sock.connect(sa)
    ConnectionRefusedError: [Errno 111] Connection refused
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
        httplib_response = self._make_request(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 398, in _make_request
        conn.request(method, url, **httplib_request_kw)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connection.py", line 239, in request
        super(HTTPConnection, self).request(method, url, body=body, headers=headers)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 1285, in request
        self._send_request(method, url, body, headers, encode_chunked)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 1331, in _send_request
        self.endheaders(body, encode_chunked=encode_chunked)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 1280, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 1040, in _send_output
        self.send(msg)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/http/client.py", line 980, in send
        self.connect()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connection.py", line 205, in connect
        conn = self._new_conn()
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
        raise NewConnectionError(
    urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f9d929a5c40>: Failed to establish a new connection: [Errno 111] Connection refused
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
        resp = conn.urlopen(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in urlopen
        retries = retries.increment(
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/v1/nni/experiment (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9d929a5c40>: Failed to establish a new connection: [Errno 111] Connection refused'))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/experiment/experiment.py", line 143, in _stop_impl
        rest.delete(self.port, '/experiment', self.url_prefix)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/experiment/rest.py", line 52, in delete
        request('delete', port, api, prefix=prefix)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/nni/experiment/rest.py", line 31, in request
        resp = requests.request(method, url, timeout=timeout)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/api.py", line 59, in request
        return session.request(method=method, url=url, **kwargs)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
        resp = self.send(prep, **send_kwargs)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
        r = adapter.send(request, **kwargs)
      File "/home/linux_username/anaconda3/envs/torch/lib/python3.9/site-packages/requests/adapters.py", line 565, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/v1/nni/experiment (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9d929a5c40>: Failed to establish a new connection: [Errno 111] Connection refused'))
    [2023-01-02 01:57:08] WARNING: Cannot gracefully stop experiment, killing NNI process...
    [2023-01-02 01:57:08] Experiment stopped
    

    image

    Environment:

    • NNI version: 2.10
    • Training service (local|remote|pai|aml|etc): local
    • Client OS: Linux qaz 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    • Server OS (for remote mode only):
    • Python version: Python 3.9.13
    • PyTorch/TensorFlow version: '1.13.0+cu116'
    • Is conda/virtualenv/venv used?: conda is used
    • Is running in Docker?: no

    Configuration:

    • Experiment config (remember to remove secrets!):
    experiment.config.trial_command = 'python run_nn.py'
    experiment.config.trial_code_directory = '.'
    
    experiment.config.search_space = search_space
    
    experiment.config.tuner.name = 'TPE'
    experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
    
    experiment.config.max_trial_number = 10
    experiment.config.trial_concurrency = 2
    
    experiment.config.max_experiment_duration = '1h'
    
    • Search space:

    Log message:

    • nnimanager.log: cannot find your tutorial at "https://github.com/microsoft/nni/blob/master/docs/en_US/Tutorial/HowToDebug.md#experiment-root-director"
    • dispatcher.log:
    • nnictl stdout and stderr:

    How to reproduce it?:

    opened by 2catycm 0
  • Prune Problem

    Prune Problem

    Describe the issue: The code is normal before adding prune code(L2filterPruner), but there is an error after adding. My project code is from: https://github.com/minar09/cp-vton-plus. It's error: mmexport1672317665038 error part: mmexport1672317667485 my code: mmexport1672317670054

    Environment:

    • NNI version:2.10
    • Training service (local|remote|pai|aml|etc): local
    • Client OS:
    • Server OS (for remote mode only):
    • Python version:3.10
    • PyTorch/TensorFlow version:1.12.1
    • Is conda/virtualenv/venv used?: yes
    • Is running in Docker?:

    Configuration:

    • Experiment config (remember to remove secrets!):
    • Search space:

    Log message:

    • nnimanager.log:
    • dispatcher.log:
    • nnictl stdout and stderr:

    How to reproduce it?:

    opened by ShuYangXie 0
  • yolov5prune error

    yolov5prune error

    Describe the issue: I'm trying to prune the pre-trained model yolov5n-0.5 from Yolov5-face. Here is the code I used:

    import torch, torchvision
    from nni.algorithms.compression.v2.pytorch.pruning import L1NormPruner, L2NormPruner,FPGMPruner,ActivationAPoZRankPruner
    from nni.compression.pytorch.speedup import ModelSpeedup
    from rich import print
    from utils.general import check_img_size
    from models.common import Conv
    from models.experimental import attempt_load
    from models.yolo import Detect
    from utils.activations import SiLU
    import torch.nn as nn
    from nni.compression.pytorch.utils.counter import count_flops_params
    
    class SiLU(nn.Module):  # export-friendly version of nn.SiLU()
        @staticmethod
        def forward(x):
            return x * torch.sigmoid(x)
    
    device = device = torch.device("cuda:1")
    model = attempt_load('/data03/hezhenhui/project/helmet/yolov5-6.0/runs/train/helmet6/weights/best.pt', map_location=device, inplace=True, fuse=True) # load FP32 model
    model.eval()
    
    for k, m in model.named_modules():
        if isinstance(m, Conv): # assign export-friendly activations
            if isinstance(m.act, nn.SiLU):
                m.act = SiLU()
            elif isinstance(m, Detect):
                m.inplace = False
        m.onnx_dynamic = False
        if hasattr(m, 'forward_export'):
            m.forward = m.forward_export # assign custom forward (optional)
    
    
    imgsz = (640, 640)
    imgsz *= 2 if len(imgsz) == 1 else 1 # expand
    
    gs = int(max(model.stride)) # grid size (max stride)
    imgsz = [check_img_size(x, gs) for x in imgsz] # verify img_size are gs-multiples
    im = torch.zeros(1, 3, *imgsz).to(device) # image size(1,3,320,192) BCHW iDetection
    dummy_input = im
    
    cfg_list = [{
    'sparsity': 0.3, 'op_types': ['Conv2d'],'op_names': [
        'model.0.conv',
        'model.1.conv',
        'model.2.cv1.conv',
        'model.2.cv2.conv',
        'model.2.cv3.conv',
        'model.2.m.0.cv1.conv',
        'model.2.m.0.cv2.conv',
        'model.2.m.1.cv1.conv',
        'model.2.m.1.cv2.conv',
        'model.2.m.2.cv1.conv',
        'model.2.m.2.cv2.conv',
        'model.2.m.3.cv1.conv',
        'model.2.m.3.cv2.conv',
        'model.3.conv',
        'model.4.cv1.conv',
        'model.4.cv2.conv',
        'model.4.cv3.conv',
        'model.4.m.0.cv1.conv',
        'model.4.m.0.cv2.conv',
        'model.4.m.1.cv1.conv',
        'model.4.m.1.cv2.conv',
        'model.4.m.2.cv1.conv',
        'model.4.m.2.cv2.conv',
        'model.4.m.3.cv1.conv',
        'model.4.m.3.cv2.conv',
        'model.4.m.4.cv1.conv',
        'model.4.m.4.cv2.conv',
        'model.4.m.5.cv1.conv',
        'model.4.m.5.cv2.conv',
        'model.4.m.6.cv1.conv',
        'model.4.m.6.cv2.conv',
        'model.4.m.7.cv1.conv',
        'model.4.m.7.cv2.conv',
        'model.5.conv',
        'model.6.cv1.conv',
        'model.6.cv2.conv',
        'model.6.cv3.conv',
        'model.6.m.0.cv1.conv',
        'model.6.m.0.cv2.conv',
        'model.6.m.1.cv1.conv',
        'model.6.m.1.cv2.conv',
        'model.6.m.2.cv1.conv',
        'model.6.m.2.cv2.conv',
        'model.6.m.3.cv1.conv',
        'model.6.m.3.cv2.conv',
        'model.6.m.4.cv1.conv',
        'model.6.m.4.cv2.conv',
        'model.6.m.5.cv1.conv',
        'model.6.m.5.cv2.conv',
        'model.6.m.6.cv1.conv',
        'model.6.m.6.cv2.conv',
        'model.6.m.7.cv1.conv',
        'model.6.m.7.cv2.conv',
        'model.6.m.8.cv1.conv',
        'model.6.m.8.cv2.conv',
        'model.6.m.9.cv1.conv',
        'model.6.m.9.cv2.conv',
        'model.6.m.10.cv1.conv',
        'model.6.m.10.cv2.conv',
        'model.6.m.11.cv1.conv',
        'model.6.m.11.cv2.conv',
        'model.7.conv',
        'model.8.cv1.conv',
        'model.8.cv2.conv',
        'model.8.cv3.conv',
        'model.8.m.0.cv1.conv',
        'model.8.m.0.cv2.conv',
        'model.8.m.1.cv1.conv',
        'model.8.m.1.cv2.conv',
        'model.8.m.2.cv1.conv',
        'model.8.m.2.cv2.conv',
        'model.8.m.3.cv1.conv',
        'model.8.m.3.cv2.conv',
        'model.9.cv1.conv',
        'model.9.cv2.conv',
        'model.10.conv',
        'model.13.cv1.conv',
        'model.13.cv2.conv',
        'model.13.cv3.conv',
        'model.13.m.0.cv1.conv',
        'model.13.m.0.cv2.conv',
        'model.13.m.1.cv1.conv',
        'model.13.m.1.cv2.conv',
        'model.13.m.2.cv1.conv',
        'model.13.m.2.cv2.conv',
        'model.13.m.3.cv1.conv',
        'model.13.m.3.cv2.conv',
        'model.14.conv',
        'model.17.cv1.conv',
        'model.17.cv2.conv',
        'model.17.cv3.conv',
        'model.17.m.0.cv1.conv',
        'model.17.m.0.cv2.conv',
        'model.17.m.1.cv1.conv',
        'model.17.m.1.cv2.conv',
        'model.17.m.2.cv1.conv',
        'model.17.m.2.cv2.conv',
        'model.17.m.3.cv1.conv',
        'model.17.m.3.cv2.conv',
        'model.18.conv',
        'model.20.cv1.conv',
        'model.20.cv2.conv',
        'model.20.cv3.conv',
        'model.20.m.0.cv1.conv',
        'model.20.m.0.cv2.conv',
        'model.20.m.1.cv1.conv',
        'model.20.m.1.cv2.conv',
        'model.20.m.2.cv1.conv',
        'model.20.m.2.cv2.conv',
        'model.20.m.3.cv1.conv',
        'model.20.m.3.cv2.conv',
        'model.21.conv',
        'model.23.cv1.conv',
        'model.23.cv2.conv',
        'model.23.cv3.conv',
        'model.23.m.0.cv1.conv',
        'model.23.m.0.cv2.conv',
        'model.23.m.1.cv1.conv',
        'model.23.m.1.cv2.conv',
        'model.23.m.2.cv1.conv',
        'model.23.m.2.cv2.conv',
        'model.23.m.3.cv1.conv',
        'model.23.m.3.cv2.conv'
        ]
    },
    {
    'op_names':['model.24.m.0','model.24.m.1','model.24.m.2'],
    'exclude': True
        }
    ]
    
    
    pruner = L1NormPruner(model, cfg_list)
    _, masks = pruner.compress()
    # print(masks)
    pruner.export_model(model_path='helmet_yolov5s.pt', mask_path='helmet_mask.pt')
    pruner.show_pruned_weights()
    pruner._unwrap_model()
    
    print("im.shape:",dummy_input.shape)
    

    But it always throws this error:

    ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
            Node:
                    %864 : Tensor = prim::Constant[value={2}](), scope: __module.model.24 # /data03/hezhenhui/project/helmet/yolov5-6.0/models/yolo.py:66:0
            Source Location:
                    /data03/hezhenhui/project/helmet/yolov5-6.0/models/yolo.py(66): forward
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/torch/nn/modules/module.py(709): _slow_forward
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/torch/nn/modules/module.py(725): _call_impl
                    /data03/hezhenhui/project/helmet/yolov5-6.0/models/yolo.py(149): _forward_once
                    /data03/hezhenhui/project/helmet/yolov5-6.0/models/yolo.py(126): forward
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/torch/nn/modules/module.py(709): _slow_forward
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/torch/nn/modules/module.py(725): _call_impl
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/torch/jit/_trace.py(934): trace_module
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/torch/jit/_trace.py(733): trace
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/nni/common/graph_utils.py(91): _trace
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/nni/common/graph_utils.py(67): __init__
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/nni/common/graph_utils.py(265): __init__
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/nni/common/graph_utils.py(25): build_module_graph
                    /data03/hezhenhui/.conda/envs/tdn/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py(73): __init__
                    prune_nni.py(242): <module>
            Comparison exception:   expand(torch.cuda.FloatTensor{[1, 3, 40, 40, 2]}, size=[]): the number of sizes provided (0) must be greater or equal to the number of dimensions in the tensor (5)
    

    I can't find a solution to the problem, can you give some advice

    Environment:

    • NNI version:2.10
    • Cent OS version:Linux version 3.10.0-957.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Thu Oct 4 20:48:51 UTC 2018
    • Python version:3.8.13
    • PyTorch version:1.7.1
    opened by Turing77 0
  • NetWork Error

    NetWork Error

    Describe the issue: When I use NNI, at most 10 minutes, I will be reminded of NetWork Error, and then the port connection will be disconnected. I want to know what the problem is. By the way, I am using Windows10 system

    Environment:

    • NNI version: 2.10
    • Training service (local|remote|pai|aml|etc):local
    • Client OS:windows 10
    • Server OS (for remote mode only):
    • Python version:3.7
    • PyTorch/TensorFlow version:torch==1.8.1
    • Is conda/virtualenv/venv used?:conda
    • Is running in Docker?:no

    Configuration:

    • Experiment config (remember to remove secrets!):
    • Search space:websockets.exceptions.InvalidMessage: did not receive a valid HTTP response

    Log message:

    • nnimanager.log:
    • dispatcher.log:[2022-12-26 10:35:12] INFO (nni.tuner.tpe/MainThread) Using random seed 175818551 [2022-12-26 10:35:12] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started [2022-12-26 10:41:26] WARNING (nni.runtime.tuner_command_channel.channel/MainThread) Exception on receiving: ConnectionClosedError(None, None, None) [2022-12-26 10:41:26] WARNING (nni.runtime.tuner_command_channel.channel/MainThread) Connection lost. Trying to reconnect... [2022-12-26 10:41:26] INFO (nni.runtime.tuner_command_channel.channel/MainThread) Attempt #0, wait 0 seconds... [2022-12-26 10:41:26] INFO (nni.runtime.msg_dispatcher_base/MainThread) Report error to NNI manager: Traceback (most recent call last): File "E:\Anaconda\install\envs\pytorch\lib\site-packages\websockets\legacy\client.py", line 138, in read_http_response status_code, reason, headers = await read_response(self.reader) File "E:\Anaconda\install\envs\pytorch\lib\site-packages\websockets\legacy\http.py", line 120, in read_response status_line = await read_line(stream) File "E:\Anaconda\install\envs\pytorch\lib\site-packages\websockets\legacy\http.py", line 194, in read_line line = await stream.readline() File "E:\Anaconda\install\envs\pytorch\lib\asyncio\streams.py", line 496, in readline line = await self.readuntil(sep) File "E:\Anaconda\install\envs\pytorch\lib\asyncio\streams.py", line 588, in readuntil await self._wait_for_data('readuntil') File "E:\Anaconda\install\envs\pytorch\lib\asyncio\streams.py", line 473, in _wait_for_data await self._waiter File "E:\Anaconda\install\envs\pytorch\lib\asyncio\selector_events.py", line 814, in _read_ready__data_received data = self._sock.recv(self.max_size) ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "E:\Anaconda\install\envs\pytorch\lib\site-packages\nni_main_.py", line 61, in main dispatcher.run() File "E:\Anaconda\install\envs\pytorch\lib\site-packages\nni\runtime\msg_dispatcher_base.py", line 69, in run command, data = self._channel._receive() File "E:\Anaconda\install\envs\pytorch\lib\site-packages\nni\runtime\tuner_command_channel\channel.py", line 94, in _receive command = self._retry_receive() File "E:\Anaconda\install\envs\pytorch\lib\site-packages\nni\runtime\tuner_command_channel\channel.py", line 104, in _retry_receive self._channel.connect() File "E:\Anaconda\install\envs\pytorch\lib\site-packages\nni\runtime\tuner_command_channel\websocket.py", line 62, in connect self._ws = _wait(_connect_async(self._url)) File "E:\Anaconda\install\envs\pytorch\lib\site-packages\nni\runtime\tuner_command_channel\websocket.py", line 111, in _wait return future.result() File "E:\Anaconda\install\envs\pytorch\lib\concurrent\futures_base.py", line 435, in result return self.__get_result() File "E:\Anaconda\install\envs\pytorch\lib\concurrent\futures_base.py", line 384, in __get_result raise self._exception File "E:\Anaconda\install\envs\pytorch\lib\site-packages\nni\runtime\tuner_command_channel\websocket.py", line 125, in _connect_async return await websockets.connect(url, max_size=None) # type: ignore File "E:\Anaconda\install\envs\pytorch\lib\site-packages\websockets\legacy\client.py", line 659, in await_impl_timeout return await asyncio.wait_for(self.await_impl(), self.open_timeout) File "E:\Anaconda\install\envs\pytorch\lib\asyncio\tasks.py", line 442, in wait_for return fut.result() File "E:\Anaconda\install\envs\pytorch\lib\site-packages\websockets\legacy\client.py", line 671, in await_impl extra_headers=protocol.extra_headers, File "E:\Anaconda\install\envs\pytorch\lib\site-packages\websockets\legacy\client.py", line 326, in handshake status_code, response_headers = await self.read_http_response() File "E:\Anaconda\install\envs\pytorch\lib\site-packages\websockets\legacy\client.py", line 144, in read_http_response raise InvalidMessage("did not receive a valid HTTP response") from exc websockets.exceptions.InvalidMessage: did not receive a valid HTTP response

    • nnictl stdout and stderr:

    How to reproduce it?:

    opened by accelerator1737 0
  • How to solve

    How to solve "UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed"

    Describe the issue:

    execute: CUDA_VISIBLE_DEVICES=0 python taylorfo_lightning_evaluator.py

    some warning:

    [2022-12-27 10:01:54] Update the indirect sparsity for the model.classifier.3 /home/user/miniconda3/envs/prune/lib/python3.8/site-packages/nni/compression/pytorch/speedup/infer_mask.py:275: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:480.) if isinstance(self.output, torch.Tensor) and self.output.grad is not None:

    [2022-12-27 10:01:54] Update the indirect sparsity for the model.classifier.2 /home/user/miniconda3/envs/prune/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py:305: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:480.) if last_output.grad is not None and tin.grad is not None: /home/user/miniconda3/envs/prune/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py:307: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:480.) elif last_output.grad is None:

    Environment:ubuntu20.04

    • NNI version:2.10
    • Training service (local|remote|pai|aml|etc):local
    • Client OS:ubuntu20.04
    • Server OS (for remote mode only):
    • Python version:3.8.15
    • PyTorch/TensorFlow version:1.13.1+ cu116
    • Is conda/virtualenv/venv used?:conda
    • Is running in Docker?:no
    • pytorch-lightning: 1.8.6

    Configuration:

    • Experiment config (remember to remove secrets!):
    • Search space:

    Log message:

    • nnimanager.log:
    • dispatcher.log:
    • nnictl stdout and stderr:

    How to reproduce it?: execute: python taylorfo_lightning_evaluator.py

    opened by skylaugher 0
  • cannot fix the mask of the interdependent layers

    cannot fix the mask of the interdependent layers

    Describe the issue: When i pruned the segmentation model.After saving the mask.pth,when i speed up ,the mask cannot fix the new architecture of the model

    Environment:

    • NNI version:2.0
    • Training service (local|remote|pai|aml|etc):local
    • Client OS:ubuntu
    • Server OS (for remote mode only):
    • Python version:3.7
    • PyTorch/TensorFlow version:pytorch
    • Is conda/virtualenv/venv used?:yes
    • Is running in Docker?:no

    Configuration:

    • Experiment config (remember to remove secrets!):
    • Search space:

    Log message:

    • nnimanager.log:
    • dispatcher.log:
    • nnictl stdout and stderr: mask_conflict.py,line195,in fix_mask_conflict assert shape[0] % group == 0 AssertionError

    when i print shape[0] and group: 32 32 32 32 16 1 96 96 96 96 24 320 the group==320 is bigger than the shape[0]==24 how can i fix the problem?

    How to reproduce it?:

    opened by sungh66 5
Releases(v2.10)
  • v2.10(Nov 14, 2022)

    Neural Architecture Search

    • Added trial deduplication for evolutionary search.
    • Fixed the racing issue in RL strategy on submitting models.
    • Fixed an issue introduced by the trial recovery feature.
    • Fixed import error of PyTorch Lightning in NAS.

    Compression

    • Supported parsing schema by replacing torch._C.parse_schema in pytorch 1.8.0 in ModelSpeedup.
    • Fixed the bug that speedup rand_like_with_shape is easy to overflow when dtype=torch.int8.
    • Fixed the propagation error with view tensors in speedup.

    Hyper-parameter optimization

    • Supported rerunning the interrupted trials induced by the termination of an NNI experiment when resuming this experiment.
    • Fixed a dependency issue of Anneal tuner by changing Anneal tuner dependency to optional.
    • Fixed a bug that tuner might lose connection in long experiments.

    Training service

    • Fixed a bug that trial code directory cannot have non-English characters.

    Web portal

    • Fixed an error of columns in HPO experiment hyper-parameters page by using localStorage.
    • Fixed a link error in About menu on WebUI.

    Known issues

    • Modelspeedup does not support non-tensor intermediate variables.
    Source code(tar.gz)
    Source code(zip)
  • v2.9(Sep 7, 2022)

    Neural Architecture Search

    • New tutorial of model space hub and one-shot strategy. (tutorial)
    • Add pretrained checkpoints to AutoFormer. (doc)
    • Support loading checkpoint of a trained supernet in a subnet. (doc)
    • Support view and resume of NAS experiment. (doc)

    Enhancements

    • Support fit_kwargs in lightning evaluator. (doc)
    • Support drop_path and auxiliary_loss in NASNet. (doc)
    • Support gradient clipping in DARTS. (doc)
    • Add export_probs to monitor the architecture weights.
    • Rewrite configure_optimizers, functions to step optimizers / schedulers, along with other hooks for simplicity, and to be compatible with latest lightning (v1.7).
    • Align implementation of DifferentiableCell with DARTS official repo.
    • Re-implementation of ProxylessNAS.
    • Move nni.retiarii code-base to nni.nas.

    Bug fixes

    • Fix a performance issue caused by tensor formatting in weighted_sum.
    • Fix a misuse of lambda expression in NAS-Bench-201 search space.
    • Fix the gumbel temperature schedule in Gumbel DARTS.
    • Fix the architecture weight sharing when sharing labels in differentiable strategies.
    • Fix the memo reusing in exporting differentiable cell.

    Compression

    • New tutorial of pruning transformer model. (tutorial)
    • Add TorchEvaluator, LightningEvaluator, TransformersEvaluator to ease the expression of training logic in pruner. (doc, API)

    Enhancements

    • Promote all pruner API using Evaluator, the old API is deprecated and will be removed in v3.0. (doc)
    • Greatly enlarge the set of supported operators in pruning speedup via automatic operator conversion.
    • Support lr_scheduler in pruning by using Evaluator.
    • Support pruning NLP task in ActivationAPoZRankPruner and ActivationMeanRankPruner.
    • Add training_steps, regular_scale, movement_mode, sparse_granularity for MovementPruner. (doc)
    • Add GroupNorm replacement in pruning speedup. Thanks external contributor @cin-xing .
    • Optimize balance mode performance in LevelPruner.

    Bug fixes

    • Fix the invalid dependency_aware mode in scheduled pruners.
    • Fix the bug where bias mask cannot be generated.
    • Fix the bug where max_sparsity_per_layer has no effect.
    • Fix Linear and LayerNorm speedup replacement in NLP task.
    • Fix tracing LightningModule failed in pytorch_lightning >= 1.7.0.

    Hyper-parameter optimization

    • Fix the bug that weights are not defined correctly in adaptive_parzen_normal of TPE.

    Training service

    • Fix trialConcurrency bug in K8S training service: use${envId}_run.sh to replace run.sh.
    • Fix upload dir bug in K8S training service: use a separate working directory for each experiment. Thanks external contributor @amznero .

    Web portal

    • Support dict keys in Default metric chart in the detail page.
    • Show experiment error message with small popup windows in the bottom right of the page.
    • Upgrade React router to v6 to fix index router issue.
    • Fix the issue of details page crashing due to choices containing None.
    • Fix the issue of missing dict intermediate dropdown in comparing trials dialog.

    Known issues

    • Activation based pruner can not support [batch, seq, hidden].
    • Failed trials are NOT auto-submitted when experiment is resumed (#4931 is reverted due to its pitfalls).
    Source code(tar.gz)
    Source code(zip)
  • v2.8(Jun 22, 2022)

    Neural Architecture Search

    • Align user experience of one-shot NAS with multi-trial NAS, i.e., users can use one-shot NAS by specifying the corresponding strategy (doc)
    • Support multi-GPU training of one-shot NAS
    • Preview Support load/retrain the pre-searched model of some search spaces, i.e., 18 models in 4 different search spaces (doc)
    • Support AutoFormer search space in search space hub, thanks our collaborators @nbl97 and @penghouwen
    • One-shot NAS supports the NAS API repeat and cell
    • Refactor of RetiariiExperiment to share the common implementation with HPO experiment
    • CGO supports pytorch-lightning 1.6

    Model Compression

    • Preview Refactor and improvement of automatic model compress with a new CompressionExperiment
    • Support customizating module replacement function for unsupported modules in model speedup (doc)
    • Support the module replacement function for some user mentioned modules
    • Support output_padding for convtranspose2d in model speedup, thanks external contributor @haoshuai-orka

    Hyper-Parameter Optimization

    • Make config.tuner.name case insensitive
    • Allow writing configurations of advisor in tuner format, i.e., aligning the configuration of advisor and tuner

    Experiment

    • Support launching multiple HPO experiments in one process

    • Internal refactors and improvements

      • Refactor of the logging mechanism in NNI
      • Refactor of NNI manager globals for flexible and high extensibility
      • Migrate dispatcher IPC to WebSocket
      • Decouple lock stuffs from experiments manager logic
      • Use launcher's sys.executable to detect Python interpreter

    WebUI

    • Improve user experience of trial ordering in the overview page
    • Fix the update issue in the trial detail page

    Documentation

    • A new translation framework for document
    • Add a new quantization demo (doc)

    Notable Bugfixes

    • Fix TPE import issue for old metrics
    • Fix the issue in TPE nested search space
    • Support RecursiveScriptModule in speedup
    • Fix the issue of failed "implicit type cast" in merge_parameter()
    Source code(tar.gz)
    Source code(zip)
  • v2.7(Apr 18, 2022)

    Documentation

    A full-size upgrade of the documentation, with the following significant improvements in the reading experience, practical tutorials, and examples:

    Hyper-Parameter Optimization

    • [Improvement] TPE and random tuners will not generate duplicate hyperparameters anymore.
    • [Improvement] Most Python APIs now have type annotations.

    Neural Architecture Search

    • Jointly search for architecture and hyper-parameters: ValueChoice in evaluator. (doc)
    • Support composition (transformation) of one or several value choices. (doc)
    • Enhanced Cell API (merge_op, preprocessor, postprocessor). (doc)
    • The argument depth in the Repeat API allows ValueChoice. (doc)
    • Support loading state_dict between sub-net and super-net. (doc, example in spos)
    • Support BN fine-tuning and evaluation in SPOS example. (doc)
    • Experimental Model hyper-parameter choice. (doc)
    • Preview Lightning implementation for Retiarii including DARTS, ENAS, ProxylessNAS and RandomNAS. (example usage)
    • Preview A search space hub that contains 10 search spaces. (code)

    Model Compression

    • Pruning V2 is promoted as default pruning framework, old pruning is legacy and keeps for a few releases.(doc)
    • A new pruning mode balance is supported in LevelPruner.(doc)
    • Support coarse-grained pruning in ADMMPruner.(doc)
    • [Improvement] Support more operation types in pruning speedup.
    • [Improvement] Optimize performance of some pruners.

    Experiment

    • [Improvement] Experiment.run() no longer stops web portal on return.

    Notable Bugfixes

    • Fixed: experiment list could not open experiment with prefix.
    • Fixed: serializer for complex kinds of arguments.
    • Fixed: some typos in code. (thanks @a1trl9 @mrshu)
    • Fixed: dependency issue across layer in pruning speedup.
    • Fixed: uncheck trial doesn't work bug in the detail table.
    • Fixed: filter name | id bug in the experiment management page.
    Source code(tar.gz)
    Source code(zip)
  • v2.6.1(Feb 18, 2022)

  • v2.6(Jan 19, 2022)

    NOTE: NNI v2.6 is the last version that supports Python 3.6. From next release NNI will require Python 3.7+.

    Hyper-Parameter Optimization

    Experiment

    • The legacy experiment config format is now deprecated. (doc of new config)
      • If you are still using legacy format, nnictl will show equivalent new config on start. Please save it to replace the old one.
    • nnictl now uses nni.experiment.Experiment APIs as backend. The output message of create, resume, and view commands have changed.
    • Added Kubeflow and Frameworkcontroller support to hybrid mode. (doc)
    • The hidden tuner manifest file has been updated. This should be transparent to users, but if you encounter issues like failed to find tuner, please try to remove ~/.config/nni.

    Algorithms

    • Random tuner now supports classArgs seed. (doc)
    • TPE tuner is refactored: (doc)
      • Support classArgs seed.
      • Support classArgs tpe_args for expert users to customize algorithm behavior.
      • Parallel optimization has been turned on by default. To turn it off set tpe_args.constant_liar_type to null (or None in Python).
      • parallel_optimize and constant_liar_type has been removed. If you are using them please update your config to use tpe_args.constant_liar_type instead.
    • Grid search tuner now supports all search space types, including uniform, normal, and nested choice. (doc)

    Neural Architecture Search

    • Enhancement to serialization utilities (doc) and changes to recommended practice of customizing evaluators. (doc)
    • Support latency constraint on edge device for ProxylessNAS based on nn-Meter. (doc)
    • Trial parameters are showed more friendly in Retiarii experiments.
    • Refactor NAS examples of ProxylessNAS and SPOS.

    Model Compression

    • New Pruner Supported in Pruning V2
      • Auto-Compress Pruner (doc)
      • AMC Pruner (doc)
      • Movement Pruning Pruner (doc)
    • Support nni.trace wrapped Optimizer in Pruning V2. In the case of not affecting the user experience as much as possible, trace the input parameters of the optimizer. (doc)
    • Optimize Taylor Pruner, APoZ Activation Pruner, Mean Activation Pruner in V2 memory usage.
    • Add more examples for Pruning V2.
    • Add document for pruning config list. (doc)
    • Parameter masks_file of ModelSpeedup now accepts pathlib.Path object. (Thanks to @dosemeion) (doc)
    • Bug Fix
      • Fix Slim Pruner in V2 not sparsify the BN weight.
      • Fix Simulator Annealing Task Generator generates config ignoring 0 sparsity.

    Documentation

    • Supported GitHub feature "Cite this repository".
    • Updated index page of readthedocs.
    • Updated Chinese documentation.
      • From now on NNI only maintains translation for most import docs and ensures they are up to date.
    • Reorganized HPO tuners' doc.

    Bugfixes

    • Fixed a bug where numpy array is used as a truth value. (Thanks to @khituras)
    • Fixed a bug in updating search space.
    • Fixed a bug that HPO search space file does not support scientific notation and tab indent.
      • For now NNI does not support mixing scientific notation and YAML features. We are waiting for PyYAML to update.
    • Fixed a bug that causes DARTS 2nd order to crash.
    • Fixed a bug that causes deep copy of mutation primitives (e.g., LayerChoice) to crash.
    • Removed blank at bottom in Web UI overview page.
    Source code(tar.gz)
    Source code(zip)
  • v2.5(Nov 4, 2021)

    Model Compression

    • New major version of pruning framework (doc)
      • Iterative pruning is more automated, users can use less code to implement iterative pruning.
      • Support exporting intermediate models in the iterative pruning process.
      • The implementation of the pruning algorithm is closer to the paper.
      • Users can easily customize their own iterative pruning by using PruningScheduler.
      • Optimize the basic pruners underlying generate mask logic, easier to extend new functions.
      • Optimized the memory usage of the pruners.
    • MobileNetV2 end-to-end example (notebook)
    • Improved QAT quantizer (doc)
      • Support dtype and scheme customization
      • Support dp multi-gpu training
      • Support load_calibration_config
    • Model speed-up now supports directly loading the mask (doc)
    • Support speed-up depth-wise convolution
    • Support bn-folding for LSQ quantizer
    • Support QAT and LSQ resume from PTQ
    • Added doc for observer quantizer (doc)

    Neural Architecture Search

    • NAS benchmark (doc)
      • Support benchmark table lookup in experiments
      • New data preparation approach
    • Improved quick start doc
    • Experimental CGO execution engine (doc)

    Hyper-Parameter Optimization

    • New training platform: Alibaba DSW+DLC (doc)
    • Support passing ConfigSpace definition directly to BOHB (doc) (thanks to @khituras)
    • Reformatted experiment config doc
    • Added example config files for Windows (thanks to @politecat314)
    • FrameworkController now supports reuse mode

    Fixed Bugs

    • Experiment cannot start due to platform timestamp format (issue #4077 #4083)
    • Cannot use 1e-5 in search space (issue #4080)
    • Dependency version conflict caused by ConfigSpace (issue #3909) (thanks to @jexxers)
    • Hardware-aware SPOS example does not work (issue #4198)
    • Web UI show wrong remaining time when duration exceeds limit (issue #4015)
    • cudnn.deterministic is always set in AMC pruner (#4117) thanks to @mstczuo

    And...

    New emoticons! holiday emoticon

    Install from pypi

    Source code(tar.gz)
    Source code(zip)
  • v2.4(Aug 12, 2021)

    Major Updates

    Neural Architecture Search

    • NAS visualization: visualize model graph through Netron (#3878)
    • Support NAS bench 101/201 on Retiarii framework (#3871 #3920)
    • Support hypermodule AutoActivation (#3868)
    • Support PyTorch v1.8/v1.9 (#3937)
    • Support Hardware-aware NAS with nn-Meter (#3938)
    • Enable fixed_arch on Retiarii (#3972)

    Model Compression

    • Refactor of ModelSpeedup: auto shape/mask inference (#3462)
    • Added more examples for ModelSpeedup (#3880)
    • Support global sort for Taylor pruning (#3896)
    • Support TransformerHeadPruner (#3884)
    • Support batch normalization folding in QAT quantizer (#3911, thanks the external contributor @chenbohua3)
    • Support post-training observer quantizer (#3915, thanks the external contributor @chenbohua3)
    • Support ModelSpeedup for Slim Pruner (#4008)
    • Support TensorRT 8.0.0 in ModelSpeedup (#3866)

    Hyper-parameter Tuning

    • Improve HPO benchmarks (#3925)
    • Improve type validation of user defined search space (#3975)

    Training service & nnictl

    • Support JupyterLab (#3668 #3954)
    • Support viewing experiment from experiment folder (#3870)
    • Support kubeflow in training service reuse framework (#3919)
    • Support viewing trial log on WebUI for an experiment launched in view mode (#3872)

    Minor Updates & Bug Fixes

    • Fix the failure of the exit of Retiarii experiment (#3899)
    • Fix exclude not supported in some config_list cases (#3815)
    • Fix bug in remote training service on reuse mode (#3941)
    • Improve IP address detection in modern way (#3860)
    • Fix bug of the search box on WebUI (#3935)
    • Fix bug in url_prefix of WebUI (#4051)
    • Support dict format of intermediate on WebUI (#3895)
    • Fix bug in openpai training service induced by experiment config v2 (#4027 #4057)
    • Improved doc (#3861 #3885 #3966 #4004 #3955)
    • Improved the API export_model in model compression (#3968)
    • Supported UnSqueeze in ModelSpeedup (#3960)
    • Thanks other external contributors: @Markus92 (#3936), @thomasschmied (#3963), @twmht (#3842)
    Source code(tar.gz)
    Source code(zip)
  • v2.3(Jun 15, 2021)

    Major Updates

    Neural Architecture Search

    • Retiarii Framework (NNI NAS 2.0) Beta Release with new features:

      • Support new high-level APIs: Repeat and Cell (#3481)
      • Support pure-python execution engine (#3605)
      • Support policy-based RL strategy (#3650)
      • Support nested ModuleList (#3652)
      • Improve documentation (#3785)

      Note: there are more exciting features of Retiarii planned in the future releases, please refer to Retiarii Roadmap for more information.

    • Add new NAS algorithm: Blockwise DNAS FBNet (#3532, thanks the external contributor @alibaba-yiwuyao)

    Model Compression

    • Support Auto Compression Framework (#3631)
    • Support slim pruner in Tensorflow (#3614)
    • Support LSQ quantizer (#3503, thanks the external contributor @chenbohua3)
    • Improve APIs for iterative pruners (#3507 #3688)

    Training service & Rest

    • Support 3rd-party training service (#3662 #3726)
    • Support setting prefix URL (#3625 #3674 #3672 #3643)
    • Improve NNI manager logging (#3624)
    • Remove outdated TensorBoard code on nnictl (#3613)

    Hyper-Parameter Optimization

    • Add new tuner: DNGO (#3479 #3707)
    • Add benchmark for tuners (#3644 #3720 #3689)

    WebUI

    • Improve search parameters on trial detail page (#3651 #3723 #3715)
    • Make selected trials consistent after auto-refresh in detail table (#3597)
    • Add trial stdout button on local mode (#3653 #3690)

    Examples & Documentation

    • Convert all trial examples' from config v1 to config v2 (#3721 #3733 #3711 #3600)
    • Add new jupyter notebook examples (#3599 #3700)

    Dev Excellent

    • Upgrade dependencies in Dockerfile (#3713 #3722)
    • Substitute PyYAML for ruamel.yaml (#3702)
    • Add pipelines for AML and hybrid training service and experiment config V2 (#3477 #3648)
    • Add pipeline badge in README (#3589)
    • Update issue bug report template (#3501)

    Bug Fixes & Minor Updates

    • Fix syntax error on Windows (#3634)
    • Fix a logging related bug (#3705)
    • Fix a bug in GPU indices (#3721)
    • Fix a bug in FrameworkController (#3730)
    • Fix a bug in export_data_url format (#3665)
    • Report version check failure as a warning (#3654)
    • Fix bugs and lints in nnictl (#3712)
    • Fix bug of optimize_mode on WebUI (#3731)
    • Fix bug of useActiveGpu in AML v2 config (#3655)
    • Fix bug of experiment_working_directory in Retiarii config (#3607)
    • Fix a bug in mask conflict (#3629, thanks the external contributor @Davidxswang)
    • Fix a bug in model speedup shape inference (#3588, thanks the external contributor @Davidxswang)
    • Fix a bug in multithread on Windows (#3604, thanks the external contributor @Ivanfangsc)
    • Delete redundant code in training service (#3526, thanks the external contributor @maxsuren)
    • Fix typo in DoReFa compression doc (#3693, thanks the external contributor @Erfandarzi)
    • Update docstring in model compression (#3647, thanks the external contributor @ichejun)
    • Fix a bug when using Kubernetes container (#3719, thanks the external contributor @rmfan)
    Source code(tar.gz)
    Source code(zip)
  • v2.2(Apr 26, 2021)

    Major updates

    Neural Architecture Search

    • Improve NAS 2.0 (Retiarii) Framework (Alpha Release)

      • Support local debug mode (#3476)
      • Support nesting ValueChoice in LayerChoice (#3508)
      • Support dict/list type in ValueChoice (#3508)
      • Improve the format of export architectures (#3464)
      • Refactor of NAS examples (#3513)
      • Refer to here <https://github.com/microsoft/nni/issues/3301>__ for Retiarii Roadmap

    Model Compression

    • Support speedup for mixed precision quantization model (Experimental) (#3488 #3512)
    • Support model export for quantization algorithm (#3458 #3473)
    • Support model export in model compression for TensorFlow (#3487)
    • Improve documentation (#3482)

    nnictl & nni.experiment

    • Add native support for experiment config V2 (#3466 #3540 #3552)
    • Add resume and view mode in Python API nni.experiment (#3490 #3524 #3545)

    Training Service

    • Support umount for shared storage in remote training service (#3456)
    • Support Windows as the remote training service in reuse mode (#3500)
    • Remove duplicated env folder in remote training service (#3472)
    • Add log information for GPU metric collector (#3506)
    • Enable optional Pod Spec for FrameworkController platform (#3379, thanks the external contributor @mbu93)

    WebUI

    • Support launching TensorBoard on WebUI (#3454 #3361 #3531)
    • Upgrade echarts-for-react to v5 (#3457)
    • Add wrap for dispatcher/nnimanager log monaco editor (#3461)

    Bug Fixes

    • Fix bug of FLOPs counter (#3497)
    • Fix bug of hyper-parameter Add/Remove axes and table Add/Remove columns button conflict (#3491)
    • Fix bug that monaco editor search text is not displayed completely (#3492)
    • Fix bug of Cream NAS (#3498, thanks the external contributor @AliCloud-PAI)
    • Fix typos in docs (#3448, thanks the external contributor @OliverShang)
    • Fix typo in NAS 1.0 (#3538, thanks the external contributor @ankitaggarwal23)
    Source code(tar.gz)
    Source code(zip)
  • v2.1(Mar 10, 2021)

    Major updates

    Neural architecture search

    • Improve NAS 2.0 (Retiarii) Framework (Improved Experimental)

      • Improve the robustness of graph generation and code generation for PyTorch models (#3365)
      • Support the inline mutation API ValueChoice (#3349 #3382)
      • Improve the design and implementation of Model Evaluator (#3359 #3404)
      • Support Random/Grid/Evolution exploration strategies (i.e., search algorithms) (#3377)
      • Refer to here for Retiarii Roadmap

    Training service

    • Support shared storage for reuse mode (#3354)
    • Support Windows as the local training service in hybrid mode (#3353)
    • Remove PAIYarn training service (#3327)
    • Add "recently-idle" scheduling algorithm (#3375)
    • Deprecate preCommand and enable pythonPath for remote training service (#3284 #3410)
    • Refactor reuse mode temp folder (#3374)

    nnictl & nni.experiment

    • Migrate nnicli to new Python API nni.experiment (#3334)
    • Refactor the way of specifying tuner in experiment Python API (nni.experiment), more aligned with nnictl (#3419)

    WebUI

    • Support showing the assigned training service of each trial in hybrid mode on WebUI (#3261 #3391)
    • Support multiple selection for filter status in experiments management page (#3351)
    • Improve overview page (#3316 #3317 #3352)
    • Support copy trial id in the table (#3378)

    Documentation

    • Improve model compression examples and documentation (#3326 #3371)
    • Add Python API examples and documentation (#3396)
    • Add SECURITY doc (#3358)
    • Add 'What's NEW!' section in README (#3395)
    • Update English contributing doc (#3398, thanks external contributor @Yongxuanzhang)

    Bug fixes

    • Fix AML outputs path and python process not killed (#3321)
    • Fix bug that an experiment launched from Python cannot be resumed by nnictl (#3309)
    • Fix import path of network morphism example (#3333)
    • Fix bug in the tuple unpack (#3340)
    • Fix bug of security for arbitrary code execution (#3311, thanks external contributor @huntr-helper)
    • Fix NoneType error on jupyter notebook (#3337, thanks external contributor @tczhangzhi)
    • Fix bugs in Retiarii (#3339 #3341 #3357, thanks external contributor @tczhangzhi)
    • Fix bug in AdaptDL mode example (#3381, thanks external contributor @ZeyaWang)
    • Fix the spelling mistake of assessor (#3416, thanks external contributor @ByronCHAO)
    • Fix bug in ruamel import (#3430, thanks external contributor @rushtehrani)
    Source code(tar.gz)
    Source code(zip)
  • v2.0(Jan 14, 2021)

    Major updates

    Neural architecture search

    Training service

    • Support hybrid training service (#3097 #3251 #3252)
    • Support AdlTrainingService, a new training service based on Kubernetes (#3022, thanks external contributors Petuum @pw2393)

    Model compression

    • Support pruning schedule for fpgm pruning algorithm (#3110)
    • ModelSpeedup improvement: support torch v1.7 (updated graph_utils.py) (#3076)
    • Improve model compression utility: model flops counter (#3048 #3265)

    WebUI & nnictl

    • Support experiments management on WebUI, add a web page for it (#3081 #3127)
    • Improve the layout of overview page (#3046 #3123)
    • Add navigation bar on the right for logs and configs; add expanded icons for table (#3069 #3103)

    Others

    • Support launching an experiment from Python code (#3111 #3210 #3263)
    • Refactor builtin/customized tuner installation (#3134)
    • Support new experiment configuration V2 (#3138 #3248 #3251)
    • Reorganize source code directory hierarchy (#2962 #2987 #3037)
    • Change SIGKILL to SIGTERM in local mode when cancelling trial jobs (#3173)
    • Refector hyperband (#3040)

    Documentation

    • Port markdown docs to reStructuredText docs and introduce githublink (#3107)
    • List related research and publications in doc (#3150)
    • Add tutorial of saving and loading quantized model (#3192)
    • Remove paiYarn doc and add description of reuse config in remote mode (#3253)
    • Update EfficientNet doc to clarify repo versions (#3158, thanks external contributor @ahundt)

    Bug fixes

    • Fix exp-duration pause timing under NO_MORE_TRIAL status (#3043)
    • Fix bug in NAS SPOS trainer, apply_fixed_architecture (#3051, thanks external contributor @HeekangPark)
    • Fix _compute_hessian bug in NAS DARTS (PyTorch version) (#3058, thanks external contributor @hroken)
    • Fix bug of conv1d in the cdarts utils (#3073, thanks external contributor @athaker)
    • Fix the handling of unknown trials when resuming an experiment (#3096)
    • Fix bug of kill command under Windows (#3106)
    • Fix lazy logging (#3108, thanks external contributor @HarshCasper)
    • Fix checkpoint load and save issue in QAT quantizer (#3124, thanks external contributor @eedalong)
    • Fix quant grad function calculation error (#3160, thanks external contributor @eedalong)
    • Fix device assignment bug in quantization algorithm (#3212, thanks external contributor @eedalong)
    • Fix bug in ModelSpeedup and enhance UT for it (#3279)
    • and others
    Source code(tar.gz)
    Source code(zip)
  • v1.9(Oct 22, 2020)

    Release 1.9 - 10/22/2020

    Major updates

    Neural architecture search

    • Support regularized evolution algorithm for NAS scenario (#2802)
    • Add NASBench201 in search space zoo (#2766)

    Model compression

    • AMC pruner improvement: support resnet, support reproduction of the experiments (default parameters in our example code) in AMC paper (#2876 #2906)
    • Support constraint-aware on some of our pruners to improve model compression efficiency (#2657)
    • Support "tf.keras.Sequential" in model compression for TensorFlow (#2887)
    • Support customized op in the model flops counter (#2795)
    • Support quantizing bias in QAT quantizer (#2914)

    Training service

    • Support configuring python environment using "preCommand" in remote mode (#2875)
    • Support AML training service in Windows (#2882)
    • Support reuse mode for remote training service (#2923)

    WebUI & nnictl

    • The "Overview" page on WebUI is redesigned with new layout (#2914)
    • Upgraded node, yarn and FabricUI, and enabled Eslint (#2894 #2873 #2744)
    • Add/Remove columns in hyper-parameter chart and trials table in "Trials detail" page (#2900)
    • JSON format utility beautify on WebUI (#2863)
    • Support nnictl command auto-completion (#2857)

    UT & IT

    • Add integration test for experiment import and export (#2878)
    • Add integration test for user installed builtin tuner (#2859)
    • Add unit test for nnictl (#2912)

    Documentation

    • Refactor of the document for model compression (#2919)

    Bug fixes

    • Bug fix of naïve evolution tuner, correctly deal with trial fails (#2695)
    • Resolve the warning "WARNING (nni.protocol) IPC pipeline not exists, maybe you are importing tuner/assessor from trial code?" (#2864)
    • Fix search space issue in experiment save/load (#2886)
    • Fix bug in experiment import data (#2878)
    • Fix annotation in remote mode (python 3.8 ast update issue) (#2881)
    • Support boolean type for "choice" hyper-parameter when customizing trial configuration on WebUI (#3003)
    Source code(tar.gz)
    Source code(zip)
  • v1.8(Aug 28, 2020)

    Release 1.8 - 8/27/2020

    Major updates

    Training service

    • Access trial log directly on WebUI (local mode only) (#2718)
    • Add OpenPAI trial job detail link (#2703)
    • Support GPU scheduler in reusable environment (#2627) (#2769)
    • Add timeout for web_channel in trial_runner (#2710)
    • Show environment error message in AzureML mode (#2724)
    • Add more log information when copying data in OpenPAI mode (#2702)

    WebUI, nnictl and nnicli

    • Improve hyper-parameter parallel coordinates plot (#2691) (#2759)
    • Add pagination for trial job list (#2738) (#2773)
    • Enable panel close when clicking overlay region (#2734)
    • Remove support for Multiphase on WebUI (#2760)
    • Support save and restore experiments (#2750)
    • Add intermediate results in export result (#2706)
    • Add command to list trial results with highest/lowest metrics (#2747)
    • Improve the user experience of nnicli with examples (#2713)

    Neural architecture search

    Model compression

    Backward incompatible changes

    • Update the default experiment folder from $HOME/nni/experiments to $HOME/nni-experiments. If you want to view the experiments created by previous NNI releases, you can move the experiments folders from $HOME/nni/experiments to $HOME/nni-experiments manually. (#2686) (#2753)
    • Dropped support for Python 3.5 and scikit-learn 0.20 (#2778) (#2777) (2783) (#2787) (#2788) (#2790)

    Others

    • Upgrade TensorFlow version in Docker image (#2732) (#2735) (#2720)

    Examples

    • Remove gpuNum in assessor examples (#2641)

    Documentation

    • Improve customized tuner documentation (#2628)
    • Fix several typos and grammar mistakes in documentation (#2637 #2638, thanks @tomzx)
    • Improve AzureML training service documentation (#2631)
    • Improve CI of Chinese translation (#2654)
    • Improve OpenPAI training service documenation (#2685)
    • Improve documentation of community sharing (#2640)
    • Add tutorial of Colab support (#2700)
    • Improve documentation structure for model compression (#2676)

    Bug fixes

    • Fix mkdir error in training service (#2673)
    • Fix bug when using chmod in remote training service (#2689)
    • Fix dependency issue by making _graph_utils imported inline (#2675)
    • Fix mask issue in SimulatedAnnealingPruner (#2736)
    • Fix intermediate graph zooming issue (#2738)
    • Fix issue when dict is unordered when querying NAS benchmark (#2728)
    • Fix import issue for gradient selector dataloader iterator (#2690)
    • Fix support of adding tens of machines in remote training service (#2725)
    • Fix several styling issues in WebUI (#2762 #2737)
    • Fix support of unusual types in metrics including NaN and Infinity (#2782)
    • Fix nnictl experiment delete (#2791)
    Source code(tar.gz)
    Source code(zip)
  • v1.7.1(Jul 31, 2020)

    Release 1.7.1 - 8/1/2020

    Bug Fixes

    • Fix pai training service error handling #2692
    • Fix pai training service codeDir copying issue #2673
    • Upgrade training service to support latest pai restful API #2722
    Source code(tar.gz)
    Source code(zip)
  • v1.7(Jul 8, 2020)

    Release 1.7 - 7/8/2020

    Major Features

    Training Service

    Neural Architecture Search (NAS)

    Model Compression

    Examples

    Built-in tuners/assessors/advisors

    WebUI

    • Support visualizing nested search space more friendly.
    • Show trial's dict keys in hyper-parameter graph.
    • Enhancements to trial duration display.

    Others

    • Provide utility function to merge parameters received from NNI
    • Support setting paiStorageConfigName in pai mode

    Documentation

    Bug Fixes

    • Fix bug for model graph with shared nn.Module
    • Fix nodejs OOM when make build
    • Fix NASUI bugs
    • Fix duration and intermediate results pictures update issue.
    • Fix minor WebUI table style issues.
    Source code(tar.gz)
    Source code(zip)
  • v1.6(May 26, 2020)

    Release 1.6 - 5/26/2020

    Major Features

    New Features and improvement

    • support __version__ for SDK version
    • support windows dev install
    • Improve IPC limitation to 100W
    • improve code storage upload logic among trials in non-local platform

    HPO Updates

    • Improve PBT on failure handling and support experiment resume for PBT

    NAS Updates

    • NAS support for TensorFlow 2.0 (preview) TF2.0 NAS examples
    • Use OrderedDict for LayerChoice
    • Prettify the format of export
    • Replace layer choice with selected module after applied fixed architecture

    Model Compression Updates

    • Model compression PyTorch 1.4 support

    Training Service Updates

    • update pai yaml merge logic
    • support windows as remote machine in remote mode Remote Mode

    Web UI new supports or improvements

    • Show trial error message
    • finalize homepage layout
    • Refactor overview's best trials module
    • Remove multiphase from webui
    • add tooltip for trial concurrency in the overview page
    • Show top trials for hyper-parameter graph

    Bug Fix

    • fix dev install
    • SPOS example crash when the checkpoints do not have state_dict
    • Fix table sort issue when experiment had failed trial
    • Support multi python env (conda, pyenv etc)
    Source code(tar.gz)
    Source code(zip)
  • v1.5(Apr 13, 2020)

    New Features and Documentation

    Hyper-Parameter Optimizing

    Neural Architecture Search

    Model Compression

    • New Pruner: GradientRankFilterPruner
    • Compressors will validate configuration by default
    • Refactor: Adding optimizer as an input argument of pruner, for easy support of DataParallel and more efficient iterative pruning. This is a broken change for the usage of iterative pruning algorithms.
    • Model compression examples are refactored and improved
    • Added documentation for implementing compressing algorithm

    Training Service

    • Kubeflow now supports pytorchjob crd v1 (thanks external contributor @jiapinai)
    • Experimental DLTS support

    Overall Documentation Improvement

    • Documentation is significantly improved on grammar, spelling, and wording (thanks external contributor @AHartNtkn)

    Fixed Bugs

    • ENAS cannot have more than one LSTM layers (thanks external contributor @marsggbo)
    • NNI manager's timers will never unsubscribe (thanks external contributor @guilhermehn)
    • NNI manager may exhaust head memory (thanks external contributor @Sundrops)
    • Batch tuner does not support customized trials (#2075)
    • Experiment cannot be killed if it failed on start (#2080)
    • Non-number type metrics break web UI (#2278)
    • A bug in lottery ticket pruner
    • Other minor glitches
    Source code(tar.gz)
    Source code(zip)
  • v1.4(Feb 19, 2020)

    Release 1.4 - 2/19/2020

    Major Features

    Neural Architecture Search

    Model Compression

    • Support DataParallel for compressing models, and provide an example of using DataParallel
    • Support model speedup for compressed models, in Alpha version

    Training Service

    • Support complete PAI configurations by allowing users to specify PAI config file path
    • Add example config yaml files for the new PAI mode (i.e., paiK8S)
    • Support deleting experiments using sshkey in remote mode (thanks external contributor @tyusr)

    WebUI

    • WebUI refactor: adopt fabric framework

    Others

    • Support running NNI experiment at foreground, i.e., --foreground argument in nnictl create/resume/view
    • Support canceling the trials in UNKNOWN state
    • Support large search space whose size could be up to 50mb (thanks external contributor @Sundrops)

    Documentation

    Bug Fixes

    • Correctly support NaN in metric data, JSON compliant
    • Fix the out-of-range bug of randint type in search space
    • Fix the bug of wrong tensor device when exporting onnx model in model compression
    • Fix incorrect handling of nnimanagerIP in the new PAI mode (i.e., paiK8S)
    Source code(tar.gz)
    Source code(zip)
  • v1.3(Dec 31, 2019)

    Release 1.3 - 12/30/2019

    Major Features

    Neural Architecture Search Algorithms Support

    Model Compression Algorithms Support

    Training Service

    • NFS Support for PAI

      Instead of using HDFS as default storage, since OpenPAI v0.11, OpenPAI can have NFS or AzureBlob or other storage as default storage. In this release, NNI extended the support for this recent change made by OpenPAI, and could integrate with OpenPAI v0.11 or later version with various default storage.

    • Kubeflow update adoption Add support for zero gpuNum in kubernetes (#1830 | thanks to external contributor @skyser2003) Adopted the Kubeflow 0.7's new supports for tf-operator. (thanks to external contributor @skyser2003)

    Engineering (code and build automation)

    • Enforced ESLint on static code analysis.

    Small changes & Bug Fixes

    • correctly recognize builtin tuner and customized tuner
    • logging in dispatcher base
    • fix the bug where tuner/assessor's failure sometimes kills the experiment.
    • Fix local system as remote machine issue
    • de-duplicate trial configuration in smac tuner ticket
    Source code(tar.gz)
    Source code(zip)
  • v1.2(Dec 2, 2019)

    Release 1.2 - 12/2/2019

    Major Features

    Bug fix

    • Fix the table sort issue when failed trials haven't metrics. -Issue #1764
    • Maintain selected status(Maximal/Minimal) when the page switched. -PR #1710
    • Make hyper-parameters graph's default metric yAxis more accurate. -PR #1736
    • Fix GPU script permission issue. -Issue #1665
    Source code(tar.gz)
    Source code(zip)
  • v1.1(Oct 23, 2019)

    Release 1.1 - 10/23/2019

    Major Features

    Fixed Bugs

    • Multiphase job hangs when search space exhuasted (issue #1204)
    • nnictl fails when log not available (issue #1548)
    Source code(tar.gz)
    Source code(zip)
  • v1.0(Sep 2, 2019)

    Release 1.0 - 09/02/2019

    Major Features

    • Tuners and Assessors

      • Support Auto-Feature generator & selection -Issue#877 -PR #1387
      • Add a parallel algorithm to improve the performance of TPE with large concurrency. -PR #1052
      • Support multiphase for hyperband -PR #1257
    • Training Service

      • Support private docker registry -PR #755
    • Engineering Improvements

      • Python wrapper for rest api, support retrieve the values of the metrics in a programmatic way PR #1318
      • New python API : get_experiment_id(), get_trial_id() -PR #1353 -Issue #1331 & -Issue#1368
      • Optimized NAS Searchspace -PR #1393
        • Unify NAS search space with _type -- "mutable_type"e
        • Update random search tuner
      • Set gpuNum as optional -Issue #1365
      • Remove outputDir and dataDir configuration in PAI mode -Issue #1342
      • When creating a trial in Kubeflow mode, codeDir will no longer be copied to logDir -Issue #1224
    • Web Portal & User Experience

      • Show the best metric curve during search progress in WebUI -Issue #1218
      • Show the current number of parameters list in multiphase experiment -Issue1210 -PR #1348
      • Add "Intermediate count" option in AddColumn. -Issue #1210
      • Support search parameters value in WebUI -Issue #1208
      • Enable automatic scaling of axes for metric value in default metric graph -Issue #1360
      • Add a detailed documentation link to the nnictl command in the command prompt -Issue #1260
      • UX improvement for showing Error log -Issue #1173
    • Documentation

    Bug fix

    • (Bug fix)Fix the broken links in 0.9 release -Issue #1236
    • (Bug fix)Script for auto-complete
    • (Bug fix)Fix pipeline issue that it only check exit code of last command in a script. -PR #1417
    • (Bug fix)quniform fors tuners -Issue #1377
    • (Bug fix)'quniform' has different meaning beween GridSearch and other tuner. -Issue #1335
    • (Bug fix)"nnictl experiment list" give the status of a "RUNNING" experiment as "INITIALIZED" -PR #1388
    • (Bug fix)SMAC cannot be installed if nni is installed in dev mode -Issue #1376
    • (Bug fix)The filter button of the intermediate result cannot be clicked -Issue #1263
    • (Bug fix)API "/api/v1/nni/trial-jobs/xxx" doesn't show a trial's all parameters in multiphase experiment -Issue #1258
    • (Bug fix)Succeeded trial doesn't have final result but webui show ×××(FINAL) -Issue #1207
    • (Bug fix)IT for nnictl stop -Issue #1298
    • (Bug fix)fix security warning
    • (Bug fix)Hyper-parameter page broken -Issue #1332
    • (Bug fix)Run flake8 tests to find Python syntax errors and undefined names -PR #1217
    Source code(tar.gz)
    Source code(zip)
  • v0.9(Jul 1, 2019)

    Release 0.9 - 7/1/2019

    Major Features

    • General NAS programming interface

      • Add enas-mode and oneshot-mode for NAS interface: PR #1201
    • Gaussian Process Tuner with Matern kernel

    • Multiphase experiment supports

      • Added new training service support for multiphase experiment: PAI mode supports multiphase experiment since v0.9.
      • Added multiphase capability for the following builtin tuners:
        • TPE, Random Search, Anneal, Naïve Evolution, SMAC, Network Morphism, Metis Tuner.

      For details, please refer to Write a tuner that leverages multi-phase

    • Web Portal

    • Commandline Interface

      • nnictl experiment delete: delete one or all experiments, it includes log, result, environment information and cache. It uses to delete useless experiment result, or save disk space.
      • nnictl platform clean: It uses to clean up disk on a target platform. The provided YAML file includes the information of target platform, and it follows the same schema as the NNI configuration file.

    Bug fix and other changes

    • Tuner Installation Improvements: add sklearn to nni dependencies.
    • (Bug Fix) Failed to connect to PAI http code - Issue #1076
    • (Bug Fix) Validate file name for PAI platform - Issue #1164
    • (Bug Fix) Update GMM evaluation in Metis Tuner
    • (Bug Fix) Negative time number rendering in Web Portal - Issue #1182, Issue #1185
    • (Bug Fix) Hyper-parameter not shown correctly in WebUI when there is only one hyper parameter - Issue #1192
    Source code(tar.gz)
    Source code(zip)
  • v0.8(Jun 5, 2019)

    Release 0.8 - 6/4/2019

    Major Features

    • Support NNI on Windows for PAI/Remote mode

      • NNI running on windows for remote mode

      • NNI running on windows for PAI mode

    • Advanced features for using GPU

      • Run multiple trial jobs on the same GPU for local and remote mode

      • Run trial jobs on the GPU running non-NNI jobs

    • Kubeflow v1beta2 operator

      • Support Kubeflow TFJob/PyTorchJob v1beta2
    • General NAS programming interface

      • Provide NAS programming interface for users to easily express their neural architecture search space through NNI annotation

      • Provide a new command nnictl trial codegen for debugging the NAS code

      • Tutorial of NAS programming interface, example of NAS on mnist, customized random tuner for NAS

    • Support resume tuner/advisor's state for experiment resume

      • For experiment resume, tuner/advisor will be resumed by replaying finished trial data
    • Web Portal

      • Improve the design of copying trial's parameters

      • Support 'randint' type in hyper-parameter graph

      • Use should ComponentUpdate to avoid unnecessary render

    Bug fix and other changes

    Source code(tar.gz)
    Source code(zip)
  • v0.7(Apr 29, 2019)

    Release 0.7 - 4/29/2019

    Major Features

    • Support NNI on Windows
      • NNI running on windows for local mode
    • New advisor: BOHB
      • Support a new advisor BOHB, which is a robust and efficient hyperparameter tuning algorithm, combines the advantages of Bayesian optimization and Hyperband
    • Support import and export experiment data through nnictl
      • Generate analysis results report after the experiment execution
      • Support import data to tuner and advisor for tuning
    • Designated gpu devices for NNI trial jobs
      • Specify GPU devices for NNI trial jobs by gpuIndices configuration, if gpuIndices is set in experiment configuration file, only the specified GPU devices are used for NNI trial jobs.
    • Web Portal enhancement
      • Decimal format of metrics other than default on the Web UI
      • Hints in WebUI about Multi-phase
      • Enable copy/paste for hyperparameters as python dict
      • Enable early stopped trials data for tuners.
    • NNICTL provide better error message
      • nnictl provide more meaningful error message for yaml file format error

    Bug fix

    • Unable to kill all python threads after nnictl stop in async dispatcher mode
    • nnictl --version does not work with make dev-instal
    • All trail jobs status stays on 'waiting' for long time on PAI platform
    Source code(tar.gz)
    Source code(zip)
  • v0.6(Apr 2, 2019)

    Release 0.6 - 4/2/2019

    Major Features

    • Version checking
      • check whether the version is consistent between nniManager and trialKeeper
    • Report final metrics for early stop job
      • If includeIntermediateResults is true, the last intermediate result of the trial that is early stopped by assessor is sent to tuner as final result. The default value of includeIntermediateResults is false.
    • Separate Tuner/Assessor
      • Adds two pipes to separate message receiving channels for tuner and assessor.
    • Make log collection feature configurable
    • Add intermediate result graph for all trials

    Bug fix

    • Add shmMB config key for PAI
    • Fix the bug that doesn't show any result if metrics is dict
    • Fix the number calculation issue for float types in hyperband
    • Fix a bug in the search space conversion in SMAC tuner
    • Fix the WebUI issue when parsing experiment.json with illegal format
    • Fix cold start issue in Metis Tuner
    Source code(tar.gz)
    Source code(zip)
  • v0.5.2.1(Mar 4, 2019)

  • v0.5.2(Mar 4, 2019)

    Release 0.5.2 - 3/4/2019

    Improvements

    • Curve fitting assessor performance improvement.

    Documentation

    • Chinese version document: https://nni.readthedocs.io/zh/latest/
    • Debuggability/serviceability document: https://nni.readthedocs.io/en/latest/Tutorial/HowToDebug.html
    • Tuner assessor reference: https://nni.readthedocs.io/en/latest/sdk_reference.html#tuner

    Bug Fixes and Other Changes

    • Fix a race condition bug that does not store trial job cancel status correctly.
    • Fix search space parsing error when using SMAC tuner.
    • Fix cifar10 example broken pipe issue.
    • Add unit test cases for nnimanager and local training service.
    • Add integration test azure pipelines for remote machine, PAI and kubeflow training services.
    • Support Pylon in PAI webhdfs client.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Jan 31, 2019)

    Release 0.5.1 - 1/31/2018

    Improvements

    Documentation

    • Reorganized documentation & New Homepage Released: https://nni.readthedocs.io/en/latest/
    • Chinese users are able to learn NNI with the translated Chinese doc: https://github.com/microsoft/nni/blob/master/README_zh_CN.md Dear Contributors: We'd love to provide more language translations, contribute to NNI with more languages =)

    Bug Fixes and Other Changes

    • Fix the bug of installation in python virtualenv, and refactor the installation logic
    • Fix the bug of HDFS access failure on PAI mode after PAI is upgraded.
    • Fix the bug that sometimes in-place flushed stdout makes experiment crash
    Source code(tar.gz)
    Source code(zip)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

Google 3.2k Dec 31, 2022
MMRazor: a model compression toolkit for model slimming and AutoML

Documentation: https://mmrazor.readthedocs.io/ English | 简体中文 Introduction MMRazor is a model compression toolkit for model slimming and AutoML, which

OpenMMLab 899 Jan 2, 2023
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

null 45 Dec 8, 2022
Hyper-parameter optimization for sklearn

hyperopt-sklearn Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn. See how to use hyperopt-sklearn

null 1.4k Jan 1, 2023
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
An Image compression simulator that uses Source Extractor and Monte Carlo methods to examine the post compressive effects different compression algorithms have.

ImageCompressionSimulation An Image compression simulator that uses Source Extractor and Monte Carlo methods to examine the post compressive effects o

James Park 1 Dec 11, 2021
code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

null 39 Dec 17, 2022
Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

A Comprehensive Experimental Evaluation for Database Configuration Tuning This is the source code to the paper "Facilitating Database Tuning with Hype

DAIR Lab 9 Oct 29, 2022
Özlem Taşkın 0 Feb 23, 2022
Saeed Lotfi 28 Dec 12, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

null 220 Dec 31, 2022
A "gym" style toolkit for building lightweight Neural Architecture Search systems

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Jack Turner 12 Nov 5, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 4, 2023
Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021)

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021) Overview Prerequisites Linux Pytho

Shaojie Li 34 Mar 31, 2022
The Power of Scale for Parameter-Efficient Prompt Tuning

The Power of Scale for Parameter-Efficient Prompt Tuning Implementation of soft embeddings from https://arxiv.org/abs/2104.08691v1 using Pytorch and H

Kip Parker 208 Dec 30, 2022
Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Prompt-Tuning Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning" Currently, we support the following huggigface models: Bart

Andrew Zeng 36 Dec 19, 2022
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022