ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

Overview

ClearML - Auto-Magical Suite of tools to streamline your ML workflow
Experiment Manager, MLOps and Data-Management

GitHub license PyPI pyversions PyPI version shields.io Optuna Slack Channel Artifact Hub


ClearML

Formerly known as Allegro Trains

ClearML is a ML/DL development and production suite, it contains three main modules:

  • Experiment Manager - Automagical experiment tracking, environments and results
  • ML-Ops - Automation, Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)
  • Data-Management - Fully differentiable data management & version control solution on top of object-storage (S3/GS/Azure/NAS)

Instrumenting these components is the ClearML-server, see Self-Hosting & Free tier Hosting


Sign up & Start using in under 2 minutes


ClearML Experiment Manager

Adding only 2 lines to your code gets you the following

  • Complete experiment setup log
    • Full source control info including non-committed local changes
    • Execution environment (including specific packages & versions)
    • Hyper-parameters
      • ArgParser/Click for command line parameters with currently used values
      • Explicit parameters dictionary
      • Tensorflow Defines (absl-py)
      • Hydra configuration and overrides
    • Initial model weights file
  • Full experiment output automatic capture
    • stdout and stderr
    • Resource Monitoring (CPU/GPU utilization, temperature, IO, network, etc.)
    • Model snapshots (With optional automatic upload to central storage: Shared folder, S3, GS, Azure, Http)
    • Artifacts log & store (Shared folder, S3, GS, Azure, Http)
    • Tensorboard/TensorboardX scalars, metrics, histograms, images, audio and video samples
    • Matplotlib & Seaborn
    • ClearML Logger interface for complete flexibility.
  • Extensive platform support and integrations

Start using ClearML

  1. Sign up for free to the ClearML Hosted Service (alternatively, you can set up your own server, see here).

    ClearML Demo Server: ClearML no longer uses the demo server by default. To enable the demo server, set the CLEARML_NO_DEFAULT_SERVER=0 environment variable. Credentials aren't needed, but experiments launched to the demo server are public, so make sure not to launch sensitive experiments if using the demo server.

  2. Install the clearml python package:

    pip install clearml
  3. Connect the ClearML SDK to the server by creating credentials, then execute the command below and follow the instructions:

    clearml-init
  4. Add two lines to your code:

    from clearml import Task
    task = Task.init(project_name='examples', task_name='hello world')

You are done, everything your process outputs is now automagically logged into ClearML.

Next step, automation! Learn more about ClearML's two-click automation here.

ClearML Architecture

The ClearML run-time components:

  • The ClearML Python Package for integrating ClearML into your existing scripts by adding just two lines of code, and optionally extending your experiments and other workflows with ClearML powerful and versatile set of classes and methods.
  • The ClearML Server storing experiment, model, and workflow data, and supporting the Web UI experiment manager, and ML-Ops automation for reproducibility and tuning. It is available as a hosted service and open source for you to deploy your own ClearML Server.
  • The ClearML Agent for ML-Ops orchestration, experiment and workflow reproducibility, and scalability.

clearml-architecture

Additional Modules

  • clearml-session - Launch remote JupyterLab / VSCode-server inside any docker, on Cloud/On-Prem machines
  • clearml-task - Run any codebase on remote machines with full remote logging of Tensorboard, Matplotlib & Console outputs
  • clearml-data - CLI for managing and versioning your datasets, including creating / uploading / downloading of data from S3/GS/Azure/NAS
  • AWS Auto-Scaler - Automatically spin EC2 instances based on your workloads with preconfigured budget! No need for K8s!
  • Hyper-Parameter Optimization - Optimize any code with black-box approach and state of the art Bayesian optimization algorithms
  • Automation Pipeline - Build pipelines based on existing experiments / jobs, supports building pipelines of pipelines!
  • Slack Integration - Report experiments progress / failure directly to Slack (fully customizable!)

Why ClearML?

ClearML is our solution to a problem we share with countless other researchers and developers in the machine learning/deep learning universe: Training production-grade deep learning models is a glorious but messy process. ClearML tracks and controls the process by associating code version control, research projects, performance metrics, and model provenance.

We designed ClearML specifically to require effortless integration so that teams can preserve their existing methods and practices.

  • Use it on a daily basis to boost collaboration and visibility in your team
  • Create a remote job from any experiment with a click of a button
  • Automate processes and create pipelines to collect your experimentation logs, outputs, and data
  • Store all you data on any object-storage solution, with the simplest interface possible
  • Make you data transparent by cataloging it all on the ClearML platform

We believe ClearML is ground-breaking. We wish to establish new standards of true seamless integration between experiment management,ML-Ops and data management.

Who We Are

ClearML is supported by the team behind allegro.ai, where we build deep learning pipelines and infrastructure for enterprise companies.

We built ClearML to track and control the glorious but messy process of training production-grade deep learning models. We are committed to vigorously supporting and expanding the capabilities of ClearML.

We promise to always be backwardly compatible, making sure all your logs, data and pipelines will always upgrade with you.

License

Apache License, Version 2.0 (see the LICENSE for more information)

Documentation, Community & Support

More information in the official documentation and on YouTube.

For examples and use cases, check the examples folder and corresponding documentation.

If you have any questions: post on our Slack Channel, or tag your questions on stackoverflow with 'clearml' tag (previously trains tag).

For feature requests or bug reports, please use GitHub issues.

Additionally, you can always find us at [email protected]

Contributing

PRs are always welcomed ❤️ See more details in the ClearML Guidelines for Contributing.

May the force (and the goddess of learning rates) be with you!

Comments
  • trains causes ETA for epoch to be 3 time more slower

    trains causes ETA for epoch to be 3 time more slower

    Hey, We are using trains with tf 2.3.0 and tf.keras. We notice that the ETA with trains for a single epoch takes us about 7.5 Hours

    with trains: ETA: 7:46:59 without trains: ETA: 2:14:54

    Any ideas/solution for this trains bottleneck?

    Thanks, Alon

    EDIT:

    trains server version is 0.13, trains packages (trains.__version__) is 0.16.1

    opened by oak-tree 28
  • Scrolling log problem when using tqdm as training process bar

    Scrolling log problem when using tqdm as training process bar

    I was using tqdm to show the training process in the command line, but when I tried to use Trains together, some scrolling problems occur.

    Log in Trains: image

    Log in the command line: image

    opened by huihui-v 27
  • remote execution with hydra

    remote execution with hydra

    Hello trains team,

    I'm trying to use trains and hydra. hydra is nice to manage configuration and helps the user to build one from the command line with auto-completion and preset features.

    Nevertheless I'm strugling with remote execution. As explained here, hydra changes the working dir which defeats trains script information detector. At runtime hydra creates a directory to compose the configuration and set the working dir inside that untracked directory. If we tried to remote execute such a task, trains-agent complains that the working dir does not exist.

    I came up with two non sastisfactoring partial solutions demonstrated here

    First the defective script: The trains task is created inside hydra main, working_dir and entrypoint are wrong.

    Fisrt fix attempt: The trains task is created before entering hydra's main function. Script info are indeed correct, but I'd like to modify the task name and project according to the application config. I did not find a way to set the project name once the task is created. Anyway I don't find this solution to be elegant.

    Second fix attempt: The task is created inside hydra's context and I try to update the working_dir and entrypoint. I tested this solution in my closed source application. It looks very fragile as the script info is filled by a trains thread and I didn't find a way to porperly synchronize with it although this minimal example seems to work.

    Any suggestion ?

    opened by elinep 26
  • [Feature] Support individual scalar reporting without plots

    [Feature] Support individual scalar reporting without plots

    When using logger.report_scalar a plot is automatically generated to accommodate for a time-series axis. There is no current way for the user to report a single scalar with a name (that is not a time-series) and have it both aesthetically pleasing and visible enough in the WebUI. Using iteration=0 just wastes space by creating a scatter plot with a single datum.

    It would be great to have e.g. logger.report_scalar(name="MAE", value=mae), and have it visible as a table (or similar) with e.g.:

    | Scalar name | value | |-|-| |MAE|0.123| |NRMSE|0.4| |My favorite scalar|and it's actually a string|

    Even better, once these are in place, they can automatically be aligned between multiple experiments for comparison.

    Feature Request 
    opened by idantene 23
  • Auto logging/scalar detection stopped working in new version

    Auto logging/scalar detection stopped working in new version

    I had previously been using clearml version 0.17.4 with pytorch and pytorch lightning both at latest release. I recently updated clearml to the latest version 0.17.5. However, it appears that the auto logging/iteration detection does not work now. Before, the iteration and scalar detection (via tensorboard file) would be picked up automatically. However, now it only works once in a while or not at all, usually reporting no scalars and no iterations.

    My script does a lot of caching up front before iterations actually start, takes maybe 3 minutes? Perhaps this is part of the issue. I can't share the code, however, but other than that it is standard pytorch/pytorch lightning.

    Reverting to 0.17.4 seems to fix the issue.

    opened by ndalton12 23
  • It seems *api_server* is misconfigured.

    It seems *api_server* is misconfigured.

    After setting up trains Docker image together with trains package in my local library, I keep getting issues when I'm trying to connect to trains api server.

    Specifically, after running docker-compose

    The following error message can be found in the console: trains-apiserver | raise ConnectionError(e, request=request) trains-apiserver | requests.exceptions.ConnectionError: HTTPSConnectionPool(host='updates.trains.allegro.ai', port=443): Max retries exceeded with url: /updates (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f390d528da0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

    When I try to import Task from trains, no error until I run this:

    task = Task.init(project_name="my project", task_name="my task")

    Resulting in the following error message:

    `InsecureRequestWarning: Certificate verification is disabled! Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings Traceback (most recent call last): File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_api\session\session.py", line 545, in _do_refresh_token return resp["data"]["token"] KeyError: 'data'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\task.py", line 260, in init reuse_last_task_id, File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\task.py", line 1009, in _create_dev_task log_to_backend=True, File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\task.py", line 112, in init super(Task, self).init(**kwargs) File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_interface\task\task.py", line 81, in init super(Task, self).init(id=task_id, session=session, log=log) File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_interface\base.py", line 129, in init super(IdObjectBase, self).init(session, log, **kwargs) File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_interface\base.py", line 34, in init self._session = session or self._get_default_session() File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_interface\base.py", line 103, in _get_default_session secret_key=ENV_SECRET_KEY.get(), File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_api\session\session.py", line 144, in init self.refresh_token() File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_api\session\token_manager.py", line 95, in refresh_token self._set_token(self._do_refresh_token(self.__token, exp=self.req_token_expiration_sec)) File "C:\ProgramData\Miniconda3\envs\jurica\lib\site-packages\trains\backend_api\session\session.py", line 552, in _do_refresh_token 'Is this the TRAINS API server {} ?'.format(self.get_api_server_host())) ValueError: It seems api_server is misconfigured. Is this the TRAINS API server http://localhost:8008 ?`

    I have set up the trains.conf file to contain the following:

    `# TRAINS SDK configuration file api { # API server on port 8008 api_server: "http://localhost:8008"

      # web_server on port 8080
      web_server: "http://localhost:8080"
    
      # file server on port 8081
      files_server: "http://localhost:8081"
      verify_certificate: False
    

    }`

    As I am behind a corporate proxy, I set that up and that should be working fine. Is the issue on my side, maybe due to DNS or is there another explanation? Much appreciated!

    opened by Zeko403 22
  • Sub process logger

    Sub process logger

    Hi,

    Here is the description of the problem:

    I have process A who creates logger using this command: task = Task.init(project_name=args.clearml_proj_base + “/training”, task_name=args.clearml_task, tags=[args.loss,‘patch size’ + str(args.patch_size), str(args.num_features)+‘’+str(args.max_features)+‘’+str(args.unet_depth) , ‘two channels’, ‘lr_default’], continue_last_task=False) logger = task.get_logger()

    Then this process submit new job/process B (for inference) in our cluster and this job runs on a different computer. The new job creates logger using: task = Task.init(project_name=project_name, task_name=task_name) or task = Task.init(project_name=project_name, task_name=task_name, continue_last_task=False, reuse_last_task_id = False) or task = Task.init(project_name=project_name, task_name=task_name, continue_last_task=False)

    Different project_name and task_name.

    The problem is that all of B logs are created in the A task. Also there is no entry for the new project_name/task_name.

    Thanks, Ophir Azulai IBM Research AI

    opened by ophirazulai 21
  • Bug fix: Auto-scaler should not spin a new instance with task_id = None

    Bug fix: Auto-scaler should not spin a new instance with task_id = None

    Hi, after a discussion from clearml-community Slack channel, I figured out the solution and decided to make this PR. Thanks in advance for your feedbacks.

    opened by tienduccao 20
  • ClearML is not saving scalars/images when using Tensorflow Object detection API - TF2.2

    ClearML is not saving scalars/images when using Tensorflow Object detection API - TF2.2

    This issue is related to this thread: https://clearml.slack.com/archives/CTK20V944/p1610457717141400

    To reproduce: git clone https://github.com/glemarivero/raccoon_dataset.git setup a virtualenv run sh autoinstall.sh Generate tfrecords:

    python generate_tfrecord.py --output_path images.tfrecord --csv_input raccoon_labels.csv --image_dir images
    

    Run training:

    python model_main_tf2.py --model_dir=models/ --pipeline_config_path=pipeline.config
    
    opened by glemarivero 20
  • Task.init() registers main process on all available GPUs

    Task.init() registers main process on all available GPUs

    The issue

    When I run experiments I set CUDA_VISIBLE_DEVICES to some integer to only make that device available to the main process (as is common). I can verify that this is in fact the case with sudo fuser -v /dev/nvidia* which shows that a single process has been created on the single device I chose.

    However, I observe that a subsequent call to Task.init() in the python script somehow overrides this and “registers” the main process on all GPU devices of the node. This can be seen by inspecting sudo fuser -v /dev/nvidia* after the call to Task.init() . The original process ID registered on the device initially chosen with CUDA_VISIBLE_DEVICES is now registered on all GPU devices on the node:

    /dev/nvidia0:        jdh        2448 F.... python
    /dev/nvidia1:        je          315 F...m python3
                         jdh        2448 F.... python
    /dev/nvidia2:        jdh        2448 F.... python
    

    I can only see this proces on the other devices when I use sudo fuser but not with gpustat or nvidia-smi. I can also not see any memory being allocated on the other devices.

    I’ve verified that CUDA_VISIBLE_DEVICES doesn’t get changed during the Task.init call or anywhere else during the script.

    I’m running manual mode and I only see one GPU tracked in the resource monitoring. I’m using trains 0.16.4.

    Reproducing

    You can reproduce simply by taking the ClearML PyTorch MNIST example https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py.

    To clearly see it happening, it’s easiest if you get the GPU allocated before calling task = Task.init(…) and to avoid crashing because you’re missing the task variable, you can embed just before and after Task.init(…) using IPython. You also need the process ID of the main process to use to check against sudo fuser -v /dev/nvidia*.

    Summarizing, I move task = Task.init(…) to just before the for epoch in range(…) loop and replace it with

    import psutil
    current_process_pid = psutil.Process().pid
    print(current_process_pid)  # e.g 12971
    import IPython; IPython.embed()
    task = Task.init(project_name='examples', task_name='pytorch mnist train')
    import IPython; IPython.embed()
    

    You can then run the example until it reaches the embed and check that the main process printed is only visible on your designated device. Then you can quit the embed to see the Task.init giving the problem after which you are waiting in the second embed. You can then quit that one to see training work fine.

    You can then try the whole thing again without Task.init but you need to remove reporting in that case - otherwise you get

    Logger.current_logger().report_scalar(
    AttributeError: 'NoneType' object has no attribute 'report_scalar'
    

    I haven’t tested on any other versions than trains 0.16.4 so I don’t know if it happens in the new clearml package.

    opened by JakobHavtorn 20
  • Frozen worker with large num_workers

    Frozen worker with large num_workers

    Hi,

    I'm running a modified version of your example pytorch mnist train (attached below in txt file). When I set the number of workers to large number (e.g. 40) and save my model multiple times, the worker freezes after random number of epochs but always after complete uploading a model:

    2020-09-10 08:08:32    Test set: Average loss: 0.0268, Accuracy: 9915/10000 (99%)
    2020-09-10 08:08:33    2020-09-10 08:08:33,292 - trains.Task - INFO - Completed model upload to file:///data/trains/saved/examples/pytorch mnist train.20e2ad4c4f4247c89d7b7681a0c51d78/models/mnist_cnn.pt
    2020-09-10 08:08:34    Train Epoch: 13 [0/60000 (0%)] Loss: 0.019223
    

    It happens both when running from trains-agent and terminal.

    I'm using trains==0.16.1 and trains-agent==0.16.0 and python 3.6. I'm not using docker mode.

    Any idea why?

    trains_mnist.txt

    opened by nirraviv 20
  • Add support for Pipenv as requirements specification

    Add support for Pipenv as requirements specification

    Proposal Summary

    Add support for Pipenv as requirements specification in addition to requirements.txt.

    Motivation

    Pipenv is an officially recommended tool for providing fully-specified application-specific dependencies.

    One of the main benefits of using Pipenv is that it creates a Pipfile and a Pipfile.lock, which allow you to specify all of your project's dependencies in a single place, rather than using separate requirement.txt files for different environments (e.g., production, staging, development). The Pipfile.lock file ensures that the exact versions of each package are installed, which can help to prevent version conflicts. Especially the latter would be beneficial in a ClearML setup.

    opened by cajewsa 1
  • Azure storage uploading not working with SDK 1.9.0

    Azure storage uploading not working with SDK 1.9.0

    Describe the bug

    Azure storage upload does not work with 1.9.0, even though it works with 1.8.3.

    To reproduce

    Using AzureContainerConfig to set up output_uri:

    def get_storage_uri() -> str:
        # Set up logging to Azure
        bucket_config = AzureContainerConfig(
            account_name="💩",  # type: ignore
            account_key="💩",  # type: ignore
            container_name="💩",  # type: ignore
        )
        StorageHelper.add_azure_configuration(bucket_config)
        return StorageHelper.get_azure_storage_uri_from_config(bucket_config)
    
    
    def init_experiment_tracker(config: DictConfig) -> Task:
        """Initialize the experiment tracker and return the task and
        (possibly modified) configuration object."""
    
        # Initialize the task
        storage_uri = get_storage_uri()
        task: Task = Task.init(
            project_name=config.clearml.project_name,
            task_name=config.clearml.task_name,
            task_type=Task.TaskTypes.training,
            reuse_last_task_id=False,
            output_uri=storage_uri,
        )
        task.logger.set_default_upload_destination(storage_uri)
        return task
    

    Expected behaviour

    This should upload all output and artifacts to Azure, instead of default file server. It works flawlessly on my setup with clearml==1.8.* but when upgrading to clearml==1.9.* it fails with the error:

    Traceback (most recent call last):
      File "/home/azureuser/.local/share/virtualenvs/daily-representation-F_rNT0sY/lib/python3.10/site-packages/clearml/storage/helper.py", line 999, in check_write_permissions
        self.delete(path=dest_path)
      File "/home/azureuser/.local/share/virtualenvs/daily-representation-F_rNT0sY/lib/python3.10/site-packages/clearml/storage/helper.py", line 984, in delete
        return self._driver.delete_object(self.get_object(path))
      File "/home/azureuser/.local/share/virtualenvs/daily-representation-F_rNT0sY/lib/python3.10/site-packages/clearml/storage/helper.py", line 2230, in delete_object
        container = object.container
    AttributeError: 'NoneType' object has no attribute 'container'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/azureuser/.local/share/virtualenvs/daily-representation-F_rNT0sY/lib/python3.10/site-packages/clearml/binding/hydra_bind.py", line 173, in _patched_task_function
        return task_function(a_config, *a_args, **a_kwargs)
      File "/home/azureuser/daily-representation/src/train_vae.py", line 20, in main
        task = init_experiment_tracker(cfg)
      File "/home/azureuser/daily-representation/src/core/utils/hydra.py", line 48, in init_experiment_tracker
        task: Task = Task.init(
      File "/home/azureuser/.local/share/virtualenvs/daily-representation-F_rNT0sY/lib/python3.10/site-packages/clearml/task.py", line 620, in init
        task.output_uri = output_uri
      File "/home/azureuser/.local/share/virtualenvs/daily-representation-F_rNT0sY/lib/python3.10/site-packages/clearml/task.py", line 1135, in output_uri
        helper.check_write_permissions(value)
      File "/home/azureuser/.local/share/virtualenvs/daily-representation-F_rNT0sY/lib/python3.10/site-packages/clearml/storage/helper.py", line 1001, in check_write_permissions
        raise ValueError('Insufficient permissions (delete failed) for {}'.format(base_url))
    ValueError: Insufficient permissions (delete failed) for azure://💩.blob.core.windows.net/💩
    

    Environment

    • Server type: self hosted
    • ClearML SDK: 1.8.3 vs. 1.9.0
    • ClearML Server Version: 1.7.0-232
    • Python Version: 3.10.6
    • OS: Linux

    Related Discussion

    Likely related to https://github.com/allegroai/clearml/issues/647 Obviously 💩 are not the real configuration values :-)

    bug 
    opened by cajewsa 0
  • CreateAndPopulate incorrectly does not recognize clearml in a requirements.txt if supplied with extras

    CreateAndPopulate incorrectly does not recognize clearml in a requirements.txt if supplied with extras

    Describe the bug

    CreateAndPopulate incorrectly does not recognize clearml in a requirements.txt if supplied with extras. This leads to clearml overriding my "clearml" dependency, and thus throwing away version requirements.

    To reproduce

    Make a requirements.txt with e.g. clearml[azure]==1.7.* and create a Task using Task.create. When creating the task, the "installed packages" specify a new clearml added to the list:

    clearml[azure]==1.7.*
    clearml
    

    which will end up taking precedence, and ignoring the version specifier.

    Expected behaviour

    I expect no modifications of my requirements if clearml is specified, even with extras. The error lies within clearml.backend_interface.task.populate.CreateAndPopulate.create_task (line 245) (link) where a fix is likely either

    • if package.startswith("clearml"), or
    • package = reduce(lambda a, b: a.split(b)[0], "#;@=~<>", line).strip() adding [ and ] as splitters
    • matching the extras pattern (clearml[*]) more explicitly with a regex.

    Environment

    • Server type: self hosted
    • ClearML SDK: 1.7.2
    • ClearML Server Version: 1.7.0-232
    • Python Version: 3.10.6
    • OS: Linux
    bug 
    opened by cajewsa 2
  • Task.get_task does not work with offline tasks when using project names

    Task.get_task does not work with offline tasks when using project names

    Describe the bug

    The Task.get_task does not find offline tasks if one uses a project name (offline tasks can only be found with their task IDs)

    Expected behaviour

    Tasks should be found first online, if possible, then offline if not available online.

    bug 
    opened by idantene 0
  • StorageManage.list with_metadata not return size on clearml==1.9.0

    StorageManage.list with_metadata not return size on clearml==1.9.0

    Describe the bug

    Hi, it seems there is a bug: StorageManager.list with_metadata not returning 'size' i am using minio

    ls_files = StorageManager.list(
                remote_url=remote_url,
                return_full_path=True,
                with_metadata=True
            )
    

    in version 1.8.4rc1, it can got size. but when clearml==1.9.0, it return None for 'size'

    version: 1.8.4rc1

    {
        'size': 55076,
        'name': 's3://minio_url:9000/x/x/x2edf173.jpg'
    }
    

    version: 1.9.0

    {
        'size': None,
        'name': 's3://minio_url:9000/x/x/x2edf173.jpg'
    }
    

    Environment

    • Server type (self hosted \ app.clear.ml)
    • ClearML SDK Version: 1.9.0
    • ClearML Server Version: WebApp: 1.8.0-254 • Server: 1.8.0-254 • API: 2.22
    • Python Version: 3.10.4
    • OS Linux

    Related Discussion

    If this continues a slack thread, please provide a link to the original slack thread. https://clearml.slack.com/archives/CTK20V944/p1672745062907479

    bug 
    opened by muhammadAgfian96 1
  • Error when querying experiments from UI

    Error when querying experiments from UI

    Describe the bug

    I'm using a self hosted version of ClearML. I am querying experiments from UI and I am getting such error:

    Fetch experiments for selection failed
    Error 1 : Internal server error: err=multiplanner encountered a failure while selecting best plan :: caused by :: Sort exceeded memory limit of 196100200 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in., full error: {'ok': 0.0, 'errmsg': 'multiplanner encountered a failure while selecting best plan :: caused by :: Sort exceeded memory limit of 196100200 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.', 'code': 292, 'codeName': 'QueryExceededMemoryLimitNoDiskUseAllowed'}
    

    To reproduce

    Hard to say. In the beginning it worked correctly, but then after tasks were registered to the system it occurs. To select the tasks click "select all" (triangle to the left from EXPERIMENTS LIST)

    Expected behaviour

    Task should be selected without a problem

    Environment

    • Server type: self hosted
    • ClearML Server Version: 1.7.0-232
    • OS (Windows \ Linux \ Macos)

    Related Discussion

    https://clearml.slack.com/archives/CTK20V944/p1672140983605599

    bug 
    opened by brooda 3
Releases(v1.9.0)
  • v1.9.0(Dec 23, 2022)

    New Features and Improvements

    • Add r prefix to re.match() strings (#837, thanks @daugihao!)
    • Add path_substitution to clearml.conf example file (#842)
    • Clarify deferred_init usage in Task.init() (#855)
    • Add pipeline decorator argument to control docker image (#856)
    • Add StorageManager.set_report_upload_chunk_size() and StorageManager.set_report_download_chunk_size() to set chunk size for upload and download
    • Add allow_archived argument in Task.get_tasks()
    • Support querying model metadata in Model.query_models()
    • Add Dataset.set_metadata() and Dataset.get_metadata()
    • Add delete_from_storage (default True) to Task.delete_artifacts()

    Bug Fixes

    • Fix jsonargparse and pytorch lightning integration broken for remote execution (#403)
    • Fix error when using TaskScheduler with 'limit_execution_time' (#648)
    • Fix dataset not synced if the changes are only modified files (#835, thanks @fjean!)
    • Fix StorageHelper.delete() does not respect path substitutions (#838)
    • Fix can't write more than 2 GB to a file
    • Fix StorageManager.get_file_size_bytes() returns ClientError instead of None for invalid S3 links
    • Fix Dataset lineage view is broken with multiple dataset dependencies
    • Fix tensorflow_macos support
    • Fix crash when calling task.flush(wait_for_uploads=True) while executing remotely
    • Fix None values get casted to empty strings when connecting a dictionary
    Source code(tar.gz)
    Source code(zip)
    clearml-1.9.0-py2.py3-none-any.whl(943.80 KB)
  • v1.8.3(Dec 4, 2022)

  • v1.8.2(Dec 1, 2022)

  • v1.8.1(Nov 21, 2022)

    New Features and Improvements

    • Raise error on failed uploads (#820, thanks @shpigi!)
    • Add hyperdataset examples (#823)
    • Change report_event_flush_threshold default to 100
    • Add ModelInfo.weights_object() for store callback access to the actual model object being stored (valid for both pre/post save calls, otherwise None)
    • Support num_workers in dataset operations
    • Support max connections setting for Azure storage using the sdk.azure.storage.max_connection configuration option

    Bug Fixes

    • Fix clearml logger default level cannot be changed (#741)
    • Fix Hydra does use get overridden information from ClearML (#751)
    • Fix StorageManager.list(“s3://..”, with_metadata=True) doesn't work
    • Fix ModelsList.keys() is missing
    • Fix CLEARML_DEFERRED_TASK_INIT=1 doesn't work
    • Fix default API method does not work when set in configuration
    Source code(tar.gz)
    Source code(zip)
    clearml-1.8.1-py2.py3-none-any.whl(935.82 KB)
  • v1.8.0(Nov 13, 2022)

    New Features and Improvements

    • Add tarfile member sanitization to extractall() (#803, thanks @TrellixVulnTeam!)
    • Add Task.delete_artifacts() with raise_on_errors argument (#806, thanks @frolovconst!)
    • Add CI/CD example (#815, thanks @thepycoder!)
    • Limit number of _serialize requests when adding list of links with add_external_files() (#813)
    • Add support for connecting Enum values as parameters
    • Improve CoLab integration (store entire colab, not history)
    • Add clearml.browser_login to authenticate browser online sessions such as CoLab, Jupyter Notebooks etc.
    • Remove import_bind from stack trace of import errors
    • Add sdk.development.worker.report_event_flush_threshold configuration option to control the number of events to trigger a report
    • Return stub object from Task.init() if no clearml.conf file is found
    • Improve manual model uploading example
    • Remove deprecated demo server

    Bug Fixes

    • Fix passing compression=ZIP_STORED (or 0) to Dataset.upload() uses ZIP_DEFLATED and overrides the user-supplied argument (#812, thanks @doronser!)
    • Fix unique_selector is not applied properly on batches after the first batch. Remove default selector value since it does not work for all event types (and we always specify it anyway)
    • Fix clearml-init colab detection
    • Fix cloning pipelines ran with start_locally() doesn't work
    • Fix if project has a default output uri there is no way to disable it in development mode (manual), allow passing output_uri=False to disable it
    • Fix git remote repository detection when remote is not "origin"
    • Fix reported images might not all be reported when waiting to complete the task
    • Fix Dataset.get_local_copy() deletes the source archive if it is stored locally
    • Fix too many parts will cause preview to inflate Task object beyond its 16MB limit - set a total limit of 320kbs
    • Fix media preview is created instead of a table preview
    • Fix task.update_output_model() should always upload local models to a remote server
    • Fix broken pip package might mess up requirements detection
    Source code(tar.gz)
    Source code(zip)
    clearml-1.8.0-py2.py3-none-any.whl(933.56 KB)
  • v1.7.2(Oct 23, 2022)

    New Features and Improvements

    • Support running jupyter notebook inside a git repository (repository will be referenced without uncommitted changes and jupyter notebook will be stored om plain code as uncommitted changes)
    • Add jupyter notebook fail warning
    • Allow pipeline steps to return string paths without them being treated as a folder artifact and zipped (#780)
    • Remove future from Python 3 requirements

    Bug Fixes

    • Fix exception raised when using ThreadPool (#790)
    • Fix Pyplot/Matplotlib binding reports incorrect line labels and colors (#791)
    • Pipelines
      • Fix crash when running cloned pipeline that invokes a step twice (#770, related to #769, thanks @tonyd!)
      • Fix pipeline argument becomes None if default value is not set
      • Fix retry_on_failure callback does nothing when specified on PipelineController.add_step()
      • Fix pipeline clone logic
    • Jupyter Notebook
      • Fix support for multiple jupyter servers running on the same machine
      • Fix issue with old/new notebook packages installed
    • Fix local cache with access rules disabling partial local access
    • Fix Task.upload_artifact() fails uploading pandas DataFrame
    • Fix relative paths in examples (#787, thanks @mendrugory!)
    Source code(tar.gz)
    Source code(zip)
    clearml-1.7.2-py2.py3-none-any.whl(928.30 KB)
  • v1.7.1(Sep 30, 2022)

    New Features and Improvements

    • Add callback option for pipeline step retry

    Bug Fixes

    • Fix Python Fire binding
    • Fix Dataset failing to load helper packages should not crash
    • Fix Dataset.get_local_copy() is allowed for a non-finalized dataset
    • Fix Task.upload_artifact() does not upload empty lists/tuples
    • Fix pipeline retry mechanism interface
    • Fix Python <3.5 compatibility
    • Fix local cache warning (should be a debug message)
    Source code(tar.gz)
    Source code(zip)
    clearml-1.7.1-py2.py3-none-any.whl(923.79 KB)
  • v1.7.0(Sep 15, 2022)

    New Features and Improvements

    • ClearML Data: Support providing list of links
    • Upload artifacts with a custom serializer (#689)
    • Allow user to specify extension when using custom serializer functions (for artifacts)
    • Skip server URL verification in clearml-init wizard process
    • When calling Dataset.get() without "alias" field, tell user that he can use alias to log it in the UI
    • Add mmcv support for logging models
    • Add support for Azure and GCP storage in Task.setup_upload()
    • Support pipeline retrying tasks which are failing on suspected non-stable failures
    • Better storage (AWS, GCP) internal load balancing and configurations
    • Add Task.register_abort_callback

    Bug Fixes

    • Allow getting datasets with non-semantic versioning (#776)
    • Fix interactive plots (instead of a generated png)
    • Fix Python 2.7 support
    • Fix clearml datasets list functionality
    • Fix Dataset.init() modifies task (moved to Dataset.create())
    • Fix failure with large files upload on HTTPS
    • Fix 3d plots with plt shows to show 2d plot on task results page
    • Fix uploading files with project's default_upload_destination (#734)
    • Fix broken reporting of Matplotlib - Using logarithmic scale breaks reporting
    • Fix supporting of wildcards in clearml-data CLI
    • Fix report_histogram - does not show "horizontal" orientation (#699)
    • Fix table reporting 'series' arg does not appear on UI when using logger.report_table(title, series, iteration...) (#684)
    • Fix artifacts (and models) use task original name and not new name
    • Fix very long filenames from S3 can't be downloaded (with get_local_copy())
    • Fix overwrite of existing output models on pipeline task with monitor_models (#758)
    Source code(tar.gz)
    Source code(zip)
    clearml-1.7.0-py2.py3-none-any.whl(922.70 KB)
  • v1.6.4(Aug 10, 2022)

  • v1.6.3(Aug 9, 2022)

    New Features and Improvements

    • Add option to specify an endpoint URL when creating S3 resource service (#679, thanks @AndolsiZied!)
    • Add support for providing ExtraArgs to boto3 when uploading files using the sdk.aws.s3.extra_args configuration option
    • Add support for Server API 2.20
    • Add Task.get_num_enqueued_tasks() to get the number of tasks enqueued in a specific queue
    • Add support for updating model metadata using Model.set_metadata(), Model.get_metadata(), Model.get_all_metadata(), Model.get_all_metadata_casted() and Model.set_all_metadata()
    • Add Task.get_reported_single_value()
    • Add a retry mechanism for models and artifacts upload
    • Pipelines with empty configuration takes it from code
    • Add support for running pipeline steps on preemptible instances
    • Datasets
      • Add description to Datasets
      • Add wild-card support in clearml-data

    Bug Fixes

    • Fix dataset download (#713, thanks @dankirsdot!)
    • Fix lock is not released after dataset cache is downloaded (#708, thanks @mralgos!)
    • Fix deadlock might occur when using process pool large number processes (#674)
    • Fix 'series' not appearing on UI when using logger.report_table() (#684)
    • Fix Task.init() docstring to include behavior when executing remotely (#737, thanks @mmiller-max!)
    • Fix KeyError when running remotely and no params were passed to click (https://github.com/allegroai/clearml-agent/issues/111)
    • Fix full path is stored when uploading a single artifact file
    • Fix passing non-alphanumeric filename in sdk.development.detect_with_pip_freeze
    • Fix Python 3.6 and 3.10 support
    • Fix mimetype cannot be None when uploading to S3
    • Pipelines
      • Fix pipeline DAG
      • Add support for pipelines with spot instances
      • Fix pipeline proxy object is always resolved in main pipeline logic
      • Fix pipeline steps with empty configuration should try and take it from code
      • Fix wait for jobs based on local/remote pool frequency
      • Fix UniformIntegerParameterRange.to_list() ignores min value
      • Fix pipeline component returning a list of length 1
    • Datasets
      • Fix Dataset.get() does not respect auto_create
      • Fix getting datasets fails with new ClearML Server v1.6
      • Fix datasets can't be queried by project/name alone
      • Fix adding child dataset to older parent dataset without stats
    • Fix error when connecting an input model
    • Fix deadlocks, including:
      • Change thread Event/Lock to a process fork safe threading objects
      • Use file lock instead of process lock to avoid future deadlocks since python process lock is not process safe (killing a process holding a lock will Not release the lock)
    • Fix StorageManager.list() on a local Windows path
    • Fix model not created in the current project
    • Fix keras_tuner_cifar example raises DeprecationWarning and ValueError
    Source code(tar.gz)
    Source code(zip)
    clearml-1.6.3-py2.py3-none-any.whl(903.47 KB)
  • v1.6.2(Jul 4, 2022)

  • v1.6.1(Jul 1, 2022)

  • v1.6(Jun 29, 2022)

    New Features and Improvements

    • New HyperParameter Optimization CLI clearml-param-search
    • Improvements to ClearML Data
      • Add support for a new ClearML Data UI in the ClearML WebApp
      • Add clearml-data new options set-description and rename
    • Add random seed control using Task.set_random_seed() allowing to set a new random seed for task initialization or to disable it
    • Improve error messages when failing to download an artifact
    • Improve error messages when testing for permissions

    Bug Fixes

    • Fix axis range settings when logging plots
    • Fix Task.get_project() to return more than 500 entries (#612)
    • Fix pipeline progress calculation
    • Fix StorageManager.upload_folder() returns None for both successful and unsuccessful uploads
    • Fix script path capturing stores a relative path and not an absolute path
    • Fix HTML debug samples are saved incorrectly on S3
    • Fix Hydra deprecation warning in examples
    • Fix missing requirement for tensorboardx example

    Known issues

    • When removing an image from a Dataset, its preview image won't be removed
    • Moving Datasets between projects still shows the Dataset in the old project
    Source code(tar.gz)
    Source code(zip)
    clearml-1.6.0-py2.py3-none-any.whl(793.49 KB)
  • v1.5.0(Jun 16, 2022)

    New Features and Improvements

    • Add support for single value metric reporting (#400)
    • Add support for specifying parameter sections in PipelineDecorator (#629)
    • Add support for parallel uploads and downloads (upload \ download and zip \ unzip of artifacts) (ClearML Slack)
    • Add support for specifying execution details (repository, branch, commit, packages, image) in PipelineDecorator
    • Bump PyJWT version due to "Key confusion through non-blocklisted public key formats" vulnerability
    • Add support for AWS Session Token (using boto3's aws_session_token argument)

    Bug Fixes

    • Fix Task.get_projects() retrieves only the first 500 results (#612)
    • Fix failure to delete artifacts stored in Azure (#660)
    • Fix Process Pool hangs at exit (#674)
    • Fix number of unpacked values when syncing a dataset (#682)
    • Fix FastAI DeprecationWarning (#683)
    • Fix StorageManager.download_folder() crash
    • Fix pipelines can't handle None return value
    • Fix pre-existing pipeline raises an exception
    • Fix deprecation warning in the image_reporting example
    • Fix patches are kept binded afterTask.close() is called
    • Fix running pipeline code remotely without first running it locally (i.e. no configuration on the Task)
    • Fix local task execution with empty working directory
    • Fix permission check fails when using local storage folder that does not exist
    • Fix pipeline add_function_step breaks in remote execution
    • Fix wrong mimetype used for any file or folder uploaded to S3 using StorageManager
    • Add missing default default_cache_manager_size in configuration files
    Source code(tar.gz)
    Source code(zip)
    clearml-1.5.0-py2.py3-none-any.whl(780.89 KB)
  • v1.4.1(May 17, 2022)

  • v1.4.0(May 5, 2022)

    New Features

    • Add OpenMMLab example #655 (thanks @zhouzaida!)
    • Add support for saving artifacts with different formats #634
    • Add support for setting reported values for NaN and Inf #604
    • Support more than 500 results in Task.get_tasks() using the fetch_only_first_page argument #612
    • Support links in clearml-data #585
    • Support deferred task initialization using Task.init() argument deferred_init (beta feature)
    • Support resuming experiments when importing an Offline session
    • Add --import-offline-session command line option to clearml-task
    • Support automatically logging Tensorboard Hparams
    • Add wildcard support for model auto-logging, see Task.init() (ClearML Slack)
    • Add support for Lightning CLI
    • Support None values in Task.connect()
    • Add Model.project getter/setter
    • Add support for Task progress indication
    • Datasets
      • Improve Dataset version table
      • Add warning to Dataset creation on current Task
    • Examples and documentation
      • Add manual seaborn logging example #628
      • Change package author
      • Change pipeline example to run locally #642
      • Update Pytorch Lightning example for pytorch-lightning>=v1.6.0 #650

    Bug Fixes

    • Fix Keras model config serialization in PatchKerasModelIO #616 (thanks @bzamecnik!)
    • Fix task.get_parameters_as_dict(cast=True) casts False to True #622 (thanks @bewt85!)
    • Fix Fire integration is not compatible with typing library #610
    • Fix remote execution with argparse mutually exclusive groups raises "required" error even when no argument is required
    • Fix Hydra tasks never fail and are only set to completed (fix handling return code)
    • Fix clearml-data wildcard support
    • Fix HPO randomly aborts running tasks before the time limit
    • Fix matplotlib capture
    • Fix issue with accessing images in projects containing /
    • AutoScaler
      • Fix resource name with a prefix matching a resource type may cause the auto-scaler to avoid spinning down idle instances
      • Fix Idle workers should contain resource name and not instance type
    • Fix backwards compatibility issue when using abstractmethod
    • Matplotlib
      • Fix uploading 3D plots with matplotlib plt shows 2D plot on task results page
      • Fix wrong Histogram plotting using when matplotlib
    • Fix PyTorch ScriptModule autobind
    • Fix PyTorch auto-magic logging torchscript models
    • Fix forked process will not call _at_exit and flush all outstanding reports
    • Fix matplotlib to plotly conversion fails on subplots (convert as image if figure has subplots)
    • Fix Windows sub process might end up waiting forever for uploads to finish if subprocess is very shot lived
    • Fix StorageManager.get_local_copy() returning None for a valid path in Windows
    • Fix Jupyter notebook cannot be detected
    • Fix PipelineController does not change node Task name, only pipeline step name
    • Fix Task.query_tasks() specifying page size or page number
    Source code(tar.gz)
    Source code(zip)
    clearml-1.4.0-py2.py3-none-any.whl(783 bytes)
  • v1.3.2(Mar 29, 2022)

    New Features and Improvements

    • Add support for setting reported values for NaN and Inf #604
    • Add reserved OS environments warning
    • Add git credentials to colab example #621 (thanks @thepycoder!)
    • Add jsonargparse support #403 (thanks @ajecc and @mauvilsa!)
    • Update autokeras example

    Bug Fixes

    • Fix sub-project separators are incorrectly quoted in generated URLs #584
    • Revert Optuna deprecation fix #613
    • Fix HPO randomly aborts running tasks before the time limit
    • Fix cloud driver overwrites agent.extra_docker_arguments
    • Fix Pipeline Controller auto-magic framework connect
    • Fix unused scroll is not cleared in Task.get_reported_plots()
    Source code(tar.gz)
    Source code(zip)
    clearml-1.3.2-py2.py3-none-any.whl(762.36 KB)
  • v1.3.1(Mar 16, 2022)

    Features

    • Add Python 3.10 support

    Bug Fixes

    • Update Slack SDK requirement #597 (thanks @mmiller-max!)
    • Fix fork after task.close() is called #605
    • Fix Azure storage upload #598
    • Fix offline mode crash
    • Fix task delete response not checked
    • Fix pipeline controller kwargs with list
    • Fix PipelineDecorator.debug_pipeline()
    • Fix PipelineDecorator example
    • Fix Python 3.10 issues
    • Fix handling of legacy fileserver (files.community.clear.ml)
    • Fix cloud driver may use None credentials
    • Fix APIClient worker raises exception when accessing .name attribute
    • Fix minimum/default API version setting
    Source code(tar.gz)
    Source code(zip)
    clearml-1.3.1-py2.py3-none-any.whl(761.12 KB)
  • v1.3.0(Mar 6, 2022)

    Features and Bug Fixes

    • Add new pipeline visualization support (requires ClearML Server v1.3)
    • Support IAM Instance Profile in AWS auto-scaler
    • Remove old server API versions support (pre-ClearML Server)
    • Restructure FastAI examples
    • Fix failed catboost bind on GPU (#592)
    • Fix Optuna n_jobs deprecation warning
    • Fix invalid method called on delete() error
    Source code(tar.gz)
    Source code(zip)
    clearml-1.3.0-py2.py3-none-any.whl(761.07 KB)
  • v1.2.1(Mar 2, 2022)

  • v1.2.0(Feb 26, 2022)

    Features

    • Add fastai v2 support (#571)
    • Add catboost support (#542)
    • Add Python Fire support (#550)
    • Add new Azure Storage driver support (#548)
    • Add requirements file support in Task.add_requirements (#575)
    • Allow overriding auto_delete_file in Task.update_output_model() (#554)
    • Support artifact_object empty string
    • Add skip_zero_size_check to StorageManager.download_folder()
    • Add support for extra HTTP retry codes (see here or use CLEARML_API_EXTRA_RETRY_CODES)
    • Add Task.get_parameters() cast back to original type
    • Add callback support to Task.delete()
    • Add autoscaler CPU-only support
    • Add AWS autoscaler IAM instance profile support
    • Update examples
      • Edit HTML reporting examples (#546)
      • Add model reporting examples (#553)

    Bug Fixes

    • Fix nargs="?" without type does not properly cast the default value (#531)
    • Fix using invalid configurations (#544)
    • Fix extra_layout not passed to report_matrix (#559)
    • Fix group arguments in click (#561)
    • Fix no warning when failing to patch argparse (#576)
    • Fix crash in Dataset.upload() when there is nothing to upload (#579)
    • Fix requirements, refactor and reformat examples (#567, #573, #582)
    • Auto-scaler
      • Change confusing log message
      • Fix AWS tags support
      • Fix instance startup script fails on any command (should only fail on the agent failing to launch)
      • Fix spin down stuck machine, ignore unknown stale workers
    • Fix pandas object passed as Task.upload_artifact() preview object
    • Fix incorrect timeout used for stale workers
    • Fix clearml-task calls Task.init() in the wrong place when a single local file is used
    • Fix ArgumentParser SUPPRESS as default should be resolved at remote execution in the same way (i.e. empty string equals SUPPRESS)
    • Upgrade six version (in case pathlib2>2.3.7 is installed)
    • Fix connected object base class members are not used
    • Fix clearml-init changing web host after pasting full credentials
    • Fix fileserver upload does not support path in URL
    • Fix crash on semaphore acquire error
    • Fix docs and docstrings (#558, #560)

    Thanks @eugen-ajechiloae-clearml, @pollfly and @Rizwan-Hasan for contributing!

    Source code(tar.gz)
    Source code(zip)
    clearml-1.2.0-py2.py3-none-any.whl(1.04 MB)
  • v1.1.6(Jan 20, 2022)

    Features

    • Add Task.force_store_standalone_script() to force storing standalone script instead of a Git repository reference (#340)
    • Add Logger.set_default_debug_sample_history() and Logger.get_default_debug_sample_history() to allow controlling maximum debug samples programmatically
    • Add populate now stores function arg types as part of the hyper paremeters
    • Add status_message argument to Task.mark_stopped()
    • Change HTTP driver timeout and retry codes (connection timeout will now trigger a retry)

    Bug Fixes

    • Fix and upgrade the SlackMonitor (#533)
    • Fix network issues causing Task to stop on status change when no status change has occurred (#535)
    • Fix Pipeline controller function support for dict as input argument
    • Fix uploading the same metric/variant from multiple processes in threading mode should create a unique file per process (since global counter is not passed between the subprocesses)
    • Fix resource monitoring should only run in the main process when using threaded logging mode
    • Fix fork patching so that the signal handler (at_exit) will be called on time
    • Fix fork (process pool) hangs or drops reports when reports are at the end of the forked function in both threaded and subprocess mode reporting
    • Fix Multi-pipeline support
    • Fix delete artifacts after upload
    • Fix artifact preview has no truth value
    • Fix storage cache cleanup does not remove all entries on a silent fail
    • Fix always store session cache in ~/.clearml (regardless of the cache folder)
    • Fix StorageManager.download_folder() fails on Windows path
    Source code(tar.gz)
    Source code(zip)
    clearml-1.1.6-py2.py3-none-any.whl(1.03 MB)
  • v1.1.5(Jan 1, 2022)

    Features

    • Add support for jsonargpraser (#403)
    • Add HyperParameterOptimizer.get_top_experiments_details() returns the hparams and metrics of the top performing experiments of an HPO (#473)
    • Allow overriding initial iteration offset using environment variable (CLEARML_SET_ITERATION_OFFSET) or Task.init(continue_last_task==<offset>) (#496)
    • Add better input handling for clearml-init in colab (#515)
    • Add environment variable for default request method (#521)
    • Add LocalClearmlJob as possible option for HPO (#525)
    • Add convenience functionality to clearml-data (#526)
    • Add support for vscode-jupyter (https://github.com/microsoft/vscode-jupyter/pull/8531)
    • Improve detection of running reporting subprocess (including zombie state)
    • Support controlling S3/Google Cloud Storage _stream_download_pool_connections using the stream_connections configuration setting in clearml.conf (default 128)
    • Add warning when loosing reporting subprocess
    • Add Model.remove() to allow removing a model from the model repository
    • Add HTTP download timeout control (change default connection timeout to 30 seconds)
    • Add initial setup callback to monitoring class
    • Add Task.get_reported_plots()
    • Allow Monitor.get_query_parameters to override defaults
    • Add support for Google Cloud Storage pool_connections and pool_maxsize overrides
    • Add last worker time to AutoScaler
    • Add warning when opening an aborted Dataset
    • Store multi-pipeline execution plots on the master pipeline Task
    • Support pipeline return value stored on pipeline Task
    • Add PipelineDecorator.multi_instance_support
    • Add PipelineDecorator to clearml and clearml.automation namespaces
    • Documentation and examples
      • Update docstrings (#501)
      • Add Markdown in pipeline jupyter notebooks (#502)
      • Update pipeline example (#494)
      • Add abseil example (#509)
      • Change README to dark theme (#513)
      • Update XGBoost example (#524)
      • Change example name (#528)

    Bug Fixes

    • Fix TriggerScheduler on Dataset change (#491)
    • Fix links in Jupyter Notebooks (#505)
    • Fix pandas delta datetime conversion (#510)
    • Fix matplotlib auto-magic detect bar graph series name (#518)
    • Fix path limitation on storage services (posix, object storage) when storing target artifacts by limiting length of project name (full path) and task name used for object path (#516)
    • Fix multi-processing context block catching exception
    • Fix Google Cloud Storage with no default project causes a crash
    • Fix main process's reporting subprocess lost, switch back to thread mode
    • Fix forked StorageHelper should use its own ThreadExecuter
    • Fix local StorageHelper.delete() raising exception on non existing file instead of returning false
    • Fix StorageHelper rename partial file throwing errors on multiple access
    • Fix resource monitor fails on permission issues (skip over parts)
    • Fix reusing Task does not reset it
    • Fix support clearml PyCharm Plugin 1.0.2 (support partial pycharm git repo sync)
    • Fix Task.reset() force argument ineffective
    • Fix PY3.5 compatibility
    • Fix validation error causes infinite loop
    • Fix tasks schema prevents sending null container parts
    • Fix missing CLEARML_SET_ITERATION_OFFSET definition
    • Fix Model.get_weights_package() returns None on error
    • Fix download progress bar based on sdk.storage.log.report_download_chunk_size_mb configuration
    • Fix Conda lists the CudaToolkit version installed (for the agent to reproduce)
    • Fix Jupyter kernel shutdown causing nested atexit callbacks leaving Task in running state
    • Fix multi-subprocess can cause Task to hand at close
    • Fix TF 2.7 support (get logdir on with multiple TB writers)
    Source code(tar.gz)
    Source code(zip)
    clearml-1.1.5-py2.py3-none-any.whl(1.03 MB)
  • v1.1.4(Nov 8, 2021)

    Bug Fixes

    • Fix duplicate keyword argument (affects clearml-data Dataset.get()) https://github.com/allegroai/clearml/issues/490, ClearML Slack Channel #1, #2, #3, #4
    • Fix session raises missing host error when in offline mode #489
    • Fix Task.get_task() does not load output_uri from stored Task
    • Fix Task.get_models()['input'] returns string instead of clearml.Model
    • Fix tf.saved_model.load() binding for TensorFlow >=2.0
    • Fix hyperparams with None value converted to empty string causes inferred type to change to str in consecutive Task.connect() calls
    Source code(tar.gz)
    Source code(zip)
    clearml-1.1.4-py2.py3-none-any.whl(1.02 MB)
  • v1.1.3(Oct 25, 2021)

    Features

    • Add support for MegEngine with examples (#455)
    • Add TaskTypes to main namespace (#453)
    • Add LogUnifomParameterRange for hyperparameter optimization with Optuna (#462)
    • Add joblib (equivalent to scikit) to Task.init(auto_connect_frameworks) argument
    • Log environment variables starting with * in environ_bind.py (#459)
    • Pipeline
      • Add eager decorated pipeline execution
      • Support pipeline monitoring for scalers/models/artifacts
      • Add PipelineController.upload_model()
      • Add PipelineController.add_step(configuration_overrides) argument allowing to override Task configuration objects
      • Change PipelineController.start_locally() default run_pipeline_steps_locally=False
      • Add PipelineController.stop(mark_failed, mark_aborted) argguments
      • Add PipelineController.run_locally decorator
      • Add PipelineController.continue_on_fail property
      • Add PipelineController.__init__(abort_on_failure) argument
      • Add ClreamlJob state cache (refresh every second)
    • Datasets
      • Add clearml-data multi-chunk support
      • Change clearml-data default chunk size to 512MB
      • Change Dataset.create() now automatically reverts to using current Task if no project/name provided
    • Add Optimizer.get_top_experiments_id_metrics_pair() for top performing experiments
    • Add support for setting default value to auto connected argparse arguments
    • Add Task.get_script() and Task.set_script() for getting and setting task's script prioerties for execution
    • Add Task.mark_completed() force and status_message arguments
    • Add Task.stopped() reason argument
    • Add Task.query_tasks(), Task.get_task() and Task.get_tasks() tags argument

    Bug Fixes

    • Fix PyJWT resiliency support
    • Fix xgb train overload (#456)
    • Fix http:// throws OSError in Windows by using pathlib2 instead of os (#463)
    • Fix local diff should include staged commits, otherwise applying git diff fails (#457)
    • Fix task.upload_artifact non-standard dictionary will now revert to pickle (#452)
    • Fix S3BucketConfig.is_valid() for EC2 environments with use_credentials_chain (#478)
    • Fix audio classifier example when training with a custom dataset (#484)
    • Fix clearml-task diff was corrupted by Windows drive letter and separator (#483)
    • Fix TQDM "line cleanup" not using CR but rather arrow-up escape sequence (#181)
    • Fix task.connect(dict) value casting - if None is the default value, use backend stored type
    • Fix Jupyter notebook should always set Task as completed/stopped, never failed (exceptions are caught in interactive session)
    • Fix Pipeline support
      • Fix LocalClearmlJob setting failed status
      • Fix pipeline stopping all running steps
      • Fix nested pipeline component parent point to pipeline Task
      • Fix PipelineController.start() should not kill the process when done
      • Fix pipeline failing to create Step Task should cause the pipeline to be marked failed
      • Fix nested pipeline components missing pipeline tags
    • Fix images reported over history size were not sent if frequency was too high
    • Fix git detectors missing git repository without origin
    • Fix support for upload LazyEvalWrapper artifacts
    • Fix duplicate task dataset tags
    • Fix FileLock create target folder
    • Fix crash inside forked subprocess might leave SafeQueue in a locked state, causing task.close() to hang
    • Fix PyTorch distributed example TimeoutSocket issue in Windows
    • Fix broken Dataset.finalize()
    • Fix Python 3.5 compatibility
    Source code(tar.gz)
    Source code(zip)
    clearml-1.1.3-py2.py3-none-any.whl(1.02 MB)
  • v1.1.2(Oct 7, 2021)

  • 1.1.1(Sep 20, 2021)

  • 1.1.0(Sep 19, 2021)

    Breaking Changes

    • New PipelineController v2 (note: new constructor is not backwards compatible)
    • Disable default demo server (available by setting the CLEARML_NO_DEFAULT_SERVER=0 environment variable)
    • Deprecate Task.completed() (use Task.mark_completed() instead)

    Features

    • Add Task Trigger Scheduler
    • Add Task Cron Scheduler
    • Add PipelineController from function
    • Add PipelineDecorator (PipelineDecorator.pipeline and PipelineDecorator.component decorators for full custom pipeline logic)
    • Add xgboost auto metric logging #381
    • Add sdk.storage.log.report_upload_chunk_size_mb and sdk.storage.log.report_download_chunk_size_mb configuration options to control upload/download log reporting #424
    • Add new optional auto_connect_frameworks argument value to Task.init() (e.g. auto_connect_frameworks={'tfdefines':False}) to allow disabling TF defines #408
    • Add support for CLEARNL_CONFIG_VERBOSE environment variable to allow external control over verbosity of the configuration loading process
    • Add support for uploading artifacts with a list of files using Task.upload_artifcats(name, [Path(), Path()])
    • Add missing clearml-task parameters --docker_args, --docker_bash_setup_script and --output-uri
    • Change CreateAndPopulate will auto list packages imported but not installed locally
    • Add clearml.task.populate.create_task_from_function() to create a Task from a function, wrapping function input arguments into hyper-parameter section as kwargs and storing function results as named artifacts
    • Add support for Task serialization (e.g. for pickle)
    • Add Task.get_configuration_object_as_dict()
    • Add docker_image argument to Task.set_base_docker() (deprecate docker_cmd)
    • Add auto_version_bump argument to PipelineController
    • Add sdk.development.detailed_import_report configuration option to provide a detailed report of all python package imports
    • Set current Task as Dataset parent when creating dataset
    • Add support for deferred configuration
    • Examples
      • Add Pipeline v2 examples
      • Add TaskScheduler and TriggerScheduler examples
      • Add pipeline controller callback example
      • Improve existing examples and docstrings

    Bug Fixes

    • Fix poltly plots converting NaN to nan instead of null #373
    • Fix deprecation warning #376
    • Fix plotly multi-index without index names #399
    • Fix click support #437
    • Fix docstring #438
    • Fix passing task-type to clearml-task #422
    • Fix clearml-task --version throws an error #422
    • Fix clearml-task ssh repository links are not detected as remote repositories #423
    • Fix getattr throws an exception #426
    • Fix encoding while saving notebook preview #443
    • Fix poetry toml file without requirements.txt #444
    • Fix PY3.x fails calling SemLock._after_fork with forkserver context, forking while lock is acquired https://github.com/allegroai/clearml-agent/issues/73
    • Fix wrong download path in StorageManager.download_folder()
    • Fix jupyter notebook display(...) convert to print(...)
    • Fix Tensorflow add_image() with description='text'
    • Fix Task.close() should remove current_task() reference
    • Fix TaskScheduler weekdays, change default execute_immediately to False
    • Fix Python2 compatibility
    • Fix clearml-task exit with error when failing to verify output_uri (output warning instead)
    • Fix unsafe Google Storage delete object
    • Fix multi-process spawning wait-for-uploads can create a deadlock in very rare cases
    • Fix task.set_parent() fails when passing Task object
    • Fix PipelineController skipping queued Tasks
    • Remove humanfriendly dependency (unused)
    Source code(tar.gz)
    Source code(zip)
    clearml-1.1.0-py2.py3-none-any.whl(1018.39 KB)
  • 1.0.5(Aug 5, 2021)

    Features

    • Add Click support and examples #386
    • Add progress bar to SHA2 generation #396
    • Add prefix to Task reported runtime info: cpu_cores, gpu_driver_version and gpu_driver_cuda_version
    • Add support for Logger.report_text() explicit log-level reporting
    • Add return_full_path argument to StorageManager.list()
    • Support Task.get_tasks() passing multiple project names
    • Add TaskScheduler
    • Add task_filter argument to Objective.get_top_tasks(), allow name as a task_filter field
    • Add --output-uri command-line option to clearml-task
    • Add requirements_file argument to Task.force_requirements_env_freeze() to allow specifying a local requirements file
    • Add support for list type argument in Task.connect_configuration() (previously only dict type was supported)
    • Rename TrainsTuner to ClearmlTuner
    • Update documentation links

    Bug Fixes

    • Fix Pandas with multi-index #399
    • Fix check permissions fail in HTTPDriver #394
    • Fix Dataset not setting system tag on existing data_processing Tasks
    • Fix disable redundant resource monitoring in pipeline controller
    • Fix ClearMLJob when both project and target_project are specified
    • Fix ClearMLJob docker container info is not cached
    • Fix no print logging after Python logging handlers are cleared
    • Fix PipelineController callback returning False
    • Fix machine specs when GPU is not supported
    • Fix internal logging.Logger can't be pickled (only applicable to Python 3.6 or lower)
    • Wait for reported events to flush to ensure Task.flush() with wait_for_uploads=True awaits background processes
    Source code(tar.gz)
    Source code(zip)
    clearml-1.0.5-py2.py3-none-any.whl(993.35 KB)
  • 1.0.4(Jun 22, 2021)

    Features

    • Add Google Colab notebook tutorial #368 #374
    • Add support for GIF images in Tensorboard #372
    • Add a tensorboardX example for add_video (creates GIFs in tensorboard) #372
    • Add auto scaler customizable boot bash script
    • Add Task.ignore_requirements
    • Deprecate Logger.tensorboard_single_series_per_graph() as it is now controlled from the UI 🙂

    Bug Fixes

    • Fix default_output_uri for Dataset creation #371
    • Fix clearml-task failing without a docker script #378
    • Fix Pytorch DDP sub-process spawn multi-process
    • Fix Task.execute_remotely() on created Task (not initialized Task)
    • Fix auto scaler custom bash script should be called last before starting agent
    • Fix auto scaler spins too many instances at once then kills the idle ones (spin time is longer than poll time)
    • Fix multi-process spawn context using ProcessFork kills sub-process before parent process ends
    Source code(tar.gz)
    Source code(zip)
    clearml-1.0.4-py2.py3-none-any.whl(981.43 KB)
Owner
ClearML
Your entire MLOps stack in one open-source tool
ClearML
⏳ Tempo: The MLOps Software Development Kit

Tempo provides a unified interface to multiple MLOps projects that enable data scientists to deploy and productionise machine learning systems.

Seldon 36 Jun 20, 2021
Pragmatic AI Labs 421 Dec 31, 2022
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstractions that are catered towards ML workflows.

ZenML 2.6k Jan 8, 2023
End to End toy example of MLOps

churn_model MLOps Toy Example End to End You might find below links useful Connect VSCode to Git MLFlow Port Heroku App Project Organization ├── LICEN

Ashish Tele 6 Feb 6, 2022
MLOps pipeline project using Amazon SageMaker Pipelines

This project shows steps to build an end to end MLOps architecture that covers data prep, model training, realtime and batch inference, build model registry, track lineage of artifacts and model drift detection. It utilizes SageMaker Pipelines that offers machine learning (ML) to orchestrate SageMaker jobs and author reproducible ML pipelines.

AWS Samples 3 Sep 16, 2022
Azure MLOps (v2) solution accelerators.

Azure MLOps (v2) solution accelerator Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting poi

Microsoft Azure 233 Jan 1, 2023
AutoOED: Automated Optimal Experiment Design Platform

AutoOED is an optimal experiment design platform powered with automated machine learning to accelerate the discovery of optimal solutions. Our platform solves multi-objective optimization problems and automatically guides the design of experiment to be evaluated.

Yunsheng Tian 107 Jan 3, 2023
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

null 152 Jan 2, 2023
Lightweight Machine Learning Experiment Logging 📖

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and combination of multi-configuration runs. For a quickstart checkout the notebook blog ??

Robert Lange 65 Dec 8, 2022
A Tools that help Data Scientists and ML engineers train and deploy ML models.

Domino Research This repo contains projects under active development by the Domino R&D team. We build tools that help Data Scientists and ML engineers

Domino Data Lab 73 Oct 17, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

null 1 Nov 3, 2021
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

Paweł Rościszewski 131 Dec 12, 2022
Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft 366 Jan 3, 2023
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
Tools for Optuna, MLflow and the integration of both.

HPOflow - Sphinx DOC Tools for Optuna, MLflow and the integration of both. Detailed documentation with examples can be found here: Sphinx DOC Table of

Telekom Open Source Software 17 Nov 20, 2022
Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

FINRA 25 Dec 28, 2022
A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Bram Wasti 35 Dec 29, 2022