Management of exclusive GPU access for distributed machine learning workloads

Overview

TensorHive

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting exclusive access to GPUs for machine learning workloads and consists of reservation, monitoring and job execution modules.

It's designed with simplicity, flexibility and configuration-friendliness in mind.


Main features:

GPU Reservation calendar

Each column represents all reservation events for a GPU on a given day. In order to make a new reservation simply click and drag with your mouse, select GPU(s), add some meaningful title, optionally adjust time range. If there are many hosts and GPUs in our infrastructure, you can use our simplified, horizontal calendar to quickly identify empty time slots and filter out already reserved GPUs. image

From now on, only your processes are eligible to run on the reserved GPU(s). TensorHive periodically checks if some other user has violated it. They will be spammed with warnings on all his PTYs, emailed every once in a while, additionally admin will also be notified (it all depends on the configuration).

Terminal warning Email warning Admin warning
image image image

Infrastructure monitoring dashboard

Accessible infrastructure can be monitored in the Nodes overview tab. Sample screenshot: Here you can add new watches, select metrics and monitor ongoing GPU processes and their owners.

image

Job execution

Thanks to the Job execution module, you can define commands for tasks you want to run on any configured nodes. You can manage them manually, set specific spawn/terminate dates or add jobs to a queue, so that they are executed automatically when the required resources are not reserved. Commands are run within screen session, so attaching to it while they are running is a piece of cake.

It provides a simple, but flexible (framework-agnostic) command templating mechanism that will help you automate multi-node trainings. Additionally, specialized templates help to conveniently set proper parameters for chosen well known frameworks. In the examples directory, you will find sample scenarios of using the Job execution module for various frameworks (including TensorFlow and PyTorch) and computing environments.

image

TensorHive requires that users who want to use this feature must append TensorHive's public key to their ~/.ssh/authorized_keys on all nodes they want to connect to.


Use cases

Our goal is to provide solutions for painful problems that ML engineers often have to struggle with when working with remote machines in order to run neural network trainings.

You should really consider using TensorHive if anything described in profiles below matches you:

  1. You're an admin, who is responsible for managing a cluster (or multiple servers) with powerful GPUs installed.
  • 😠 There are more users than resources, so they have to compete for it
  • 🎤 The users require exclusive access to the GPUs, rather than a queuing system
  • 🔮 You need to control which projects in your organization consume the most computing power
  • 🌊 Other popular tools are simply an overkill, have different purpose or require a lot of time to spend on reading documentation, installation and configuration (Grafana, Kubernetes, Slurm)
  • 🐧 People using your infrastructure expect only one interface for all the things related to managing computing infrastructure: monitoring, reservation calendar and scheduling distributed jobs
  • 💥 Can't risk messing up sensitive configuration by installing software on each individual machine, prefering centralized solution which can be managed from one place
  1. You're a standalone user who has access to beefy GPUs scattered across multiple machines.
  • 〽️ You want to keep the GPU utilization high, considering batch size, host to device data transfer etc. - charts with metrics such as gpu_util, mem_util, mem_used are great for this purpose
  • 📅 Visualizing names of training experiments using calendar helps you track how you're progressing on the project
  • 🐍 Launching distributed trainings is essential for you, no matter what the framework is
  • 😵 Managing a list of training commands for all your distributed training experiments drives you nuts
  • 💤 Remembering to manually launch the training before going sleep is no fun anymore

Advantages of TensorHive

0️⃣ Dead-simple one-machine installation and configuration, no sudo requirements

1️⃣ Users can make GPU reservations for specific time range in advance via reservation mechanism

     ➡️ no more frustration caused by rules: "first come, first served" or "the law of the jungle".

2️⃣ Users can prepare and schedule custom tasks (commands) to be run on selected GPUs and hosts

     ➡️ automate and simplify distributed trainings - "one button to rule them all"

3️⃣ Gather all useful GPU metrics, from all configured hosts in one dashboard

     ➡️ no more manual logging in to each individual machine in order to check if GPU is currently in use or not

4️⃣ Access to specific GPUs or hosts can be granted to specific users or groups

     ➡️ division of the infrastructure can be easily adjusted to the current needs of work groups in your organization

5️⃣ Automatic execution of queued jobs when there are no active GPU reservations

     ➡️ jobs that are not urgent can be added to a queue and automatically executed later


Getting started

Prerequisites

  • All nodes must be accessible via SSH, without password, using SSH Key-Based Authentication (How to set up SSH keys - explained in Quickstart section)
  • Only NVIDIA GPUs are supported (relying on nvidia-smi command)
  • Currently TensorHive assumes that all users who want to register into the system must have identical UNIX usernames on all nodes configured by TensorHive administrator (not relevant for standalone developers)
  • (optional) We recommend installing TensorHive on a separate user account (for example tensorhive) and adding this user to the tty system group.

Installation

via pip
pip install tensorhive
From source

(optional) For development purposes we encourage separation from your current python packages using e.g. virtualenv, Anaconda.

git clone https://github.com/roscisz/TensorHive.git && cd TensorHive
pip install -e .

TensorHive is already shipped with newest web app build, but in case you modify the source, you can can build it with make app. For more useful commands see our Makefile. Build tested with Node v14.15.4 and npm 6.14.10

Basic usage

Quickstart

The init command will guide you through basic configuration process:

tensorhive init

You can check connectivity with the configured hosts using the test command.

tensorhive test

(optional) If you want to allow your UNIX users to set up their TensorHive accounts on their own and run distributed programs through Job execution module, use the key command to generate the SSH key for TensorHive:

tensorhive key

Now you should be ready to launch a TensorHive instance:

tensorhive

Web application and API Documentation can be accessed via URLs highlighted in green (Ctrl + click to open in browser).

Advanced configuration

You can fully customize TensorHive behaviours via INI configuration files (which will be created automatically after tensorhive init):

~/.config/TensorHive/main_config.ini
~/.config/TensorHive/mailbot_config.ini
~/.config/TensorHive/hosts_config.ini

(see example)


Contribution and feedback

We'd ❤️ to collect your observations, issues and pull requests!

Feel free to report any configuration problems, we will help you.

Currently we are gathering practical infrastructure protection scenarios from our users to extract and further support the most common TensorHive deployments.

If you consider becoming a contributor, please look at issues labeled as good-first-issue and help wanted.

Credits

Project created and maintained by:

Top contributors:

TensorHive has been greatly supported within a joint project between VoiceLab.ai and Gdańsk University of Technology titled: "Exploration and selection of methods for parallelization of neural network training using multiple GPUs".

License

Apache License 2.0

Comments
  • Tensorhive behind reverse proxy

    Tensorhive behind reverse proxy

    Dzień dobry!

    Subject of the issue

    I'm currently trying to run your application in our environment (~80 gpus) but when I try to use it behind a reverse proxy (like nginx/apache) which is a standard way of exposing a webapp, I don't find any way to specify that the gevent api should be called on the same url, or on another url than the gunicorn server.

    More precisely:

    url of gunicorn = https://something.com/

    When a user access it here is the process:

    https://something.com/ --https:443--> nginx (for ssl termination) --http:5000--> gunicorn
    

    When accessing https://something.com/, I'm trying to connect using admin account but the front-end calls http://something.com:1111/api/0.3.1/user/login : this is not a standard way to call this gevent api, as it is using port 1111, and this is not authorized in our environment.

    How can I make the front-end to call the gevent api in either of those possibilities? :

    1. On the same url than gunicorn, but maybe with a prefix like "/api": https://something.com/api/
    2. On a different url, like https://something-api.com/

    Steps to reproduce

    My main_config.ini looks like this:

    ...
    
    [api]
    title = TensorHive API
    version = 0.3.1
    url_hostname = something.com
    url_prefix = api/%(version)s
    spec_file = api_specification.yml
    impl_location = tensorhive.api.controllers
    
    [web_app.server]
    backend = gunicorn
    host = 0.0.0.0
    port = 5000
    loglevel = warning
    workers = 8
    
    [api.server]
    backend = gevent
    host = 0.0.0.0
    port = 1111
    debug = off
    
    ...
    
    

    Thanks for this project that looks very promising!

    opened by Dubrzr 8
  • Can't register unix user account in Tensorhive version 0.3.6

    Can't register unix user account in Tensorhive version 0.3.6

    Running Tensorhive, the Tensorhive web Apps require user login which I created unix username earlier in "tensorhive init" command. However, I could not login into the webpage and it thrown error
    {"detail": "The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.", "status": 500, "title":"Internal Server error", "type": "about: blank"}

    I tried to registered other unix username by clicking in register button: It shown me another error of: "Unprivileged".

    Please find my environment as below: OS distribution, version: Ubuntu 18.04 LTS Tensorhive: version 0.3.6 GPU model:

    Your environment

    List relevant info:

    • OS distribution, version
    • TensorHive and API version
    • Versions of installed dependencies
    • Essential hardware specs (GPU model)

    Steps to reproduce

    Tell us how to reproduce this issue from the ground up.

    ![image](https://user-images.githubusercontent.com/78297811/107117969-e7f81b00-68b8-11eb-89de-dd8647a10df6.png)
    ![image](https://user-images.githubusercontent.com/78297811/107117972-ededfc00-68b8-11eb-91c3-7db0f94238c1.png)
    
    Thank you!
    Best Regards,
    William Le
    bug in-progress dependencies 
    opened by William-Le88 5
  • Does TensorHive support automatic scheduling/assignment to an available node?

    Does TensorHive support automatic scheduling/assignment to an available node?

    Hi!

    I just started playing with TensorHive and I'm trying to understand if it fits my needs. My use case is that I have multiple nodes (servers with GPUs) and multiple users that want to run ML tasks on these GPUs, with the goal of maximizing utilization, and supporting scheduling priority and fairness.

    When I create a new task in the web UI, it requires me to input the hostname. Does that mean I can't submit a task, and have TensorHive automatically select the best node (based on available resources)?

    In addition, does TensorHive include any support for packaging code and data, for example by using a Docker container?

    Thanks!

    opened by infokiller 5
  • Unprivileged when canceling ongoing reservation

    Unprivileged when canceling ongoing reservation

    Subject of the issue

    Cannot cancel ongoing reservation

    Your environment

    List relevant info:

    • Web app

    Steps to reproduce

    Reserve some GPU and try to cancel it

    obraz

    Expected behaviour

    Reservation should be deleted

    Actual behaviour

    Reservation is not deleted

    opened by tymons 4
  • Enable multiple Admin emails for mailbot

    Enable multiple Admin emails for mailbot

    Current Tensorhive version 0.3.6 seems does not support multiple Admin e-mails for mailbot. It would be needed for an organization to manage the Tensorhive with plural admin emails which configurable in Tensorhive.

    Expected:

    • There is configurable for multiple admin emails setting for mailbot.
    feature-request 
    opened by William-Le88 4
  • Exception handling bug in tasks controller

    Exception handling bug in tasks controller

    Traceback (most recent call last):
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/flask/app.py", line 2446, in wsgi_app
        response = self.full_dispatch_request()
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/flask/app.py", line 1951, in full_dispatch_request
        rv = self.handle_user_exception(e)
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/flask_cors/extension.py", line 161, in wrapped_function
        return cors_after_request(app.make_response(f(*args, **kwargs)))
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/flask/app.py", line 1820, in handle_user_exception
        reraise(exc_type, exc_value, tb)
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
        raise value
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/flask/app.py", line 1949, in full_dispatch_request
        rv = self.dispatch_request()
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/flask/app.py", line 1935, in dispatch_request
        return self.view_functions[rule.endpoint](**req.view_args)
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/connexion/decorators/decorator.py", line 48, in wrapper
        response = function(request)
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/connexion/decorators/security.py", line 299, in wrapper
        return function(request)
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/connexion/decorators/uri_parsing.py", line 143, in wrapper
        response = function(request)
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/connexion/decorators/validation.py", line 347, in wrapper
        return function(request)
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/connexion/decorators/parameter.py", line 126, in wrapper
        return function(**kwargs)
      File "/home/tensorhive/TensorHive/venv/lib/python3.6/site-packages/flask_jwt_extended/view_decorators.py", line 103, in wrapper
        return fn(*args, **kwargs)
      File "/home/tensorhive/TensorHive/tensorhive/controllers/task.py", line 172, in get_all
        return content, status
    UnboundLocalError: local variable 'content' referenced before assignment
    
    bug 
    opened by roscisz 4
  • Fix group/user serialization; add date string parsing to the restriction model

    Fix group/user serialization; add date string parsing to the restriction model

    This fixes the issue with group / user serialization where it was impossible to obtain a json representing group/user instance due to recursion (getting a group required getting all users and their groups, and then users of these groups, and so on...).

    Behavior for an outside user does not change - you should still use as_dict property to obtain valid json representation. You may use as_dict_shallow intentionally if you wish to, but know that it will not include any information about group's users / user's groups.


    Edit: Second commit in this PR refactors a bit datetime <-> string parsing that was used in multiple places in code (Task, Reservation model) & adds the ability of parsing to the Reservation model.


    Third commit: Add graceful handling of incorrect operations in Group and Restrictio… adds proper handling of situations when someone would attempt to add user to a group that he's already in, resource to a restriction already applied to it, etc. Demonstrative test cases added as well


    Further commits feature some minor improvements

    opened by jszemplinski 4
  • Implicit reusing deleted record's id - inconsistency problem

    Implicit reusing deleted record's id - inconsistency problem

    Steps to reproduce

    1. Open web app
    2. Make reservation
    3. Wait until it ends
    4. Check out files inside ~/.config/TensorHive/logs. There should be old_X.json and summary_X.json
    5. Delete aforementioned reservation from calendar
    6. Make new reservation
    7. See log file(s) during reservation time. There will be old_X.json, summary_X.json and X.json (but we should get X+1.json instead)
    8. Notice that summary_X.json will be overwritten by the summary of newer X.json file.

    The problem is that removing reservation record allows for reusing old ids.

    Possible solutions:

    • Ignore and allow for overriding summaries (inconsistency with reality)
    • Disable reusing old ids
    • Append some unique number to summary filename
    bug 
    opened by micmarty 4
  • Bump axios from 0.18.0 to 0.21.2 in /tensorhive/app/web/dev

    Bump axios from 0.18.0 to 0.21.2 in /tensorhive/app/web/dev

    Bumps axios from 0.18.0 to 0.21.2.

    Release notes

    Sourced from axios's releases.

    v0.21.2

    0.21.2 (September 4, 2021)

    Fixes and Functionality:

    • Updating axios requests to be delayed by pre-emptive promise creation (#2702)
    • Adding "synchronous" and "runWhen" options to interceptors api (#2702)
    • Updating of transformResponse (#3377)
    • Adding ability to omit User-Agent header (#3703)
    • Adding multiple JSON improvements (#3688, #3763)
    • Fixing quadratic runtime and extra memory usage when setting a maxContentLength (#3738)
    • Adding parseInt to config.timeout (#3781)
    • Adding custom return type support to interceptor (#3783)
    • Adding security fix for ReDoS vulnerability (#3980)

    Internal and Tests:

    • Updating build dev dependancies (#3401)
    • Fixing builds running on Travis CI (#3538)
    • Updating follow rediect version (#3694, #3771)
    • Updating karma sauce launcher to fix failing sauce tests (#3712, #3717)
    • Updating content-type header for application/json to not contain charset field, according do RFC 8259 (#2154)
    • Fixing tests by bumping karma-sauce-launcher version (#3813)
    • Changing testing process from Travis CI to GitHub Actions (#3938)

    Documentation:

    • Updating documentation around the use of AUTH_TOKEN with multiple domain endpoints (#3539)
    • Remove duplication of item in changelog (#3523)
    • Fixing gramatical errors (#2642)
    • Fixing spelling error (#3567)
    • Moving gitpod metion (#2637)
    • Adding new axios documentation website link (#3681, #3707)
    • Updating documentation around dispatching requests (#3772)
    • Adding documentation for the type guard isAxiosError (#3767)
    • Adding explanation of cancel token (#3803)
    • Updating CI status badge (#3953)
    • Fixing errors with JSON documentation (#3936)
    • Fixing README typo under Request Config (#3825)
    • Adding axios-multi-api to the ecosystem file (#3817)
    • Adding SECURITY.md to properly disclose security vulnerabilities (#3981)

    Huge thanks to everyone who contributed to this release via code (authors listed below) or via reviews and triaging on GitHub:

    ... (truncated)

    Changelog

    Sourced from axios's changelog.

    0.21.2 (September 4, 2021)

    Fixes and Functionality:

    • Updating axios requests to be delayed by pre-emptive promise creation (#2702)
    • Adding "synchronous" and "runWhen" options to interceptors api (#2702)
    • Updating of transformResponse (#3377)
    • Adding ability to omit User-Agent header (#3703)
    • Adding multiple JSON improvements (#3688, #3763)
    • Fixing quadratic runtime and extra memory usage when setting a maxContentLength (#3738)
    • Adding parseInt to config.timeout (#3781)
    • Adding custom return type support to interceptor (#3783)
    • Adding security fix for ReDoS vulnerability (#3980)

    Internal and Tests:

    • Updating build dev dependancies (#3401)
    • Fixing builds running on Travis CI (#3538)
    • Updating follow rediect version (#3694, #3771)
    • Updating karma sauce launcher to fix failing sauce tests (#3712, #3717)
    • Updating content-type header for application/json to not contain charset field, according do RFC 8259 (#2154)
    • Fixing tests by bumping karma-sauce-launcher version (#3813)
    • Changing testing process from Travis CI to GitHub Actions (#3938)

    Documentation:

    • Updating documentation around the use of AUTH_TOKEN with multiple domain endpoints (#3539)
    • Remove duplication of item in changelog (#3523)
    • Fixing gramatical errors (#2642)
    • Fixing spelling error (#3567)
    • Moving gitpod metion (#2637)
    • Adding new axios documentation website link (#3681, #3707)
    • Updating documentation around dispatching requests (#3772)
    • Adding documentation for the type guard isAxiosError (#3767)
    • Adding explanation of cancel token (#3803)
    • Updating CI status badge (#3953)
    • Fixing errors with JSON documentation (#3936)
    • Fixing README typo under Request Config (#3825)
    • Adding axios-multi-api to the ecosystem file (#3817)
    • Adding SECURITY.md to properly disclose security vulnerabilities (#3981)

    Huge thanks to everyone who contributed to this release via code (authors listed below) or via reviews and triaging on GitHub:

    ... (truncated)

    Commits
    Maintainer changes

    This version was pushed to npm by jasonsaayman, a new releaser for axios since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 3
  • Please add citation template

    Please add citation template

    Hi!

    Your repo misses bibtex citation. How am I supposed to cite this amazing tool? Any self-respecting tool or framework have one. Here you can see an example off the top of my head: https://github.com/piojanu/humblerl/blob/master/README.md.

    Thanks for your work, much appreciated!

    P.S. I'd like to thank especially Karol Draszawka. He is amazing guy and I hope he will give me A for my extraordinary thesis. Peace :v:

    help wanted 
    opened by piojanu 3
  • Protection Service dies silently in case of missing email

    Protection Service dies silently in case of missing email

    When user e-mail is missing, mailer fails (which is good), but the protection service no longer scans for violations:

      File "/home/tensorhive/TensorHive/tensorhive/core/services/ProtectionService.py", line 199, in do_run
        handler.trigger_action(violation_data)
      File "/home/tensorhive/TensorHive/tensorhive/core/violation_handlers/ProtectionHandler.py", line 8, in trigger_action
        self._protection_behaviour.trigger_action(*args, **kwargs)
      File "/home/tensorhive/TensorHive/tensorhive/core/violation_handlers/EmailSendingBehaviour.py", line 82, in trigger_action
        self._email_intruder(intruder_email, violation_data, timer)
      File "/home/tensorhive/TensorHive/tensorhive/core/violation_handlers/EmailSendingBehaviour.py", line 129, in _email_intruder
        self.mailer.send(email)
      File "/home/tensorhive/TensorHive/tensorhive/core/utils/mailer.py", line 76, in send
        self.server.sendmail(message.author, message.recipients, message.body)
      File "/usr/lib/python3.5/smtplib.py", line 876, in sendmail
        raise SMTPRecipientsRefused(senderrs)
    smtplib.SMTPRecipientsRefused: {'<email_missing>': (553, b'5.1.2 The recipient address <email_missing> is not a valid RFC-5321 address. g5sm4044005ljk.59 - gsmtp')}
    
    bug 
    opened by roscisz 3
  • Bump express from 4.16.4 to 4.18.2 in /tensorhive/app/web/dev

    Bump express from 4.16.4 to 4.18.2 in /tensorhive/app/web/dev

    Bumps express from 4.16.4 to 4.18.2.

    Release notes

    Sourced from express's releases.

    4.18.2

    4.18.1

    • Fix hanging on large stack of sync routes

    4.18.0

    ... (truncated)

    Changelog

    Sourced from express's changelog.

    4.18.2 / 2022-10-08

    4.18.1 / 2022-04-29

    • Fix hanging on large stack of sync routes

    4.18.0 / 2022-04-25

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • Bump certifi from 2020.12.5 to 2022.12.7

    Bump certifi from 2020.12.5 to 2022.12.7

    Bumps certifi from 2020.12.5 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Bump decode-uri-component from 0.2.0 to 0.2.2 in /tensorhive/app/web/dev

    Bump decode-uri-component from 0.2.0 to 0.2.2 in /tensorhive/app/web/dev

    Bumps decode-uri-component from 0.2.0 to 0.2.2.

    Release notes

    Sourced from decode-uri-component's releases.

    v0.2.2

    • Prevent overwriting previously decoded tokens 980e0bf

    https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.1...v0.2.2

    v0.2.1

    • Switch to GitHub workflows 76abc93
    • Fix issue where decode throws - fixes #6 746ca5d
    • Update license (#1) 486d7e2
    • Tidelift tasks a650457
    • Meta tweaks 66e1c28

    https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.1

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • Bump engine.io and karma in /tensorhive/app/web/dev

    Bump engine.io and karma in /tensorhive/app/web/dev

    Bumps engine.io to 6.2.1 and updates ancestor dependency karma. These dependencies need to be updated together.

    Updates engine.io from 1.8.3 to 6.2.1

    Release notes

    Sourced from engine.io's releases.

    6.2.1

    :warning: This release contains an important security fix :warning:

    A malicious client could send a specially crafted HTTP request, triggering an uncaught exception and killing the Node.js process:

    Error: read ECONNRESET
        at TCP.onStreamRead (internal/stream_base_commons.js:209:20)
    Emitted 'error' event on Socket instance at:
        at emitErrorNT (internal/streams/destroy.js:106:8)
        at emitErrorCloseNT (internal/streams/destroy.js:74:3)
        at processTicksAndRejections (internal/process/task_queues.js:80:21) {
      errno: -104,
      code: 'ECONNRESET',
      syscall: 'read'
    }
    

    Please upgrade as soon as possible.

    Bug Fixes

    • catch errors when destroying invalid upgrades (#658) (425e833)

    6.2.0

    Features

    • add the "maxPayload" field in the handshake details (088dcb4)

    So that clients in HTTP long-polling can decide how many packets they have to send to stay under the maxHttpBufferSize value.

    This is a backward compatible change which should not mandate a new major revision of the protocol (we stay in v4), as we only add a field in the JSON-encoded handshake data:

    0{"sid":"lv_VI97HAXpY6yYWAAAC","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":5000,"maxPayload":1000000}
    

    Links

    6.1.3

    Bug Fixes

    • typings: allow CorsOptionsDelegate as cors options (#641) (a463d26)
    • uws: properly handle chunked content (#642) (3367440)

    ... (truncated)

    Changelog

    Sourced from engine.io's changelog.

    6.2.1 (2022-11-20)

    :warning: This release contains an important security fix :warning:

    A malicious client could send a specially crafted HTTP request, triggering an uncaught exception and killing the Node.js process:

    Error: read ECONNRESET
        at TCP.onStreamRead (internal/stream_base_commons.js:209:20)
    Emitted 'error' event on Socket instance at:
        at emitErrorNT (internal/streams/destroy.js:106:8)
        at emitErrorCloseNT (internal/streams/destroy.js:74:3)
        at processTicksAndRejections (internal/process/task_queues.js:80:21) {
      errno: -104,
      code: 'ECONNRESET',
      syscall: 'read'
    }
    

    Please upgrade as soon as possible.

    Bug Fixes

    • catch errors when destroying invalid upgrades (#658) (425e833)

    3.6.0 (2022-06-06)

    Bug Fixes

    Features

    • decrease the default value of maxHttpBufferSize (58e274c)

    This change reduces the default value from 100 mb to a more sane 1 mb.

    This helps protect the server against denial of service attacks by malicious clients sending huge amounts of data.

    See also: https://github.com/advisories/GHSA-j4f2-536g-r55m

    • increase the default value of pingTimeout (f55a79a)

    ... (truncated)

    Commits
    • 24b847b chore(release): 6.2.1
    • 425e833 fix: catch errors when destroying invalid upgrades (#658)
    • 99adb00 chore(deps): bump xmlhttprequest-ssl and engine.io-client in /examples/latenc...
    • d196f6a chore(deps): bump minimatch from 3.0.4 to 3.1.2 (#660)
    • 7c1270f chore(deps): bump nanoid from 3.1.25 to 3.3.1 (#659)
    • 535a01d ci: add Node.js 18 in the test matrix
    • 1b71a6f docs: remove "Vanilla JS" highlight from README (#656)
    • 917d1d2 refactor: replace deprecated String.prototype.substr() (#646)
    • 020801a chore: add changelog for version 3.6.0
    • ed1d6f9 test: make test script work on Windows (#643)
    • Additional commits viewable in compare view

    Updates karma from 1.7.1 to 6.4.1

    Release notes

    Sourced from karma's releases.

    v6.4.1

    6.4.1 (2022-09-19)

    Bug Fixes

    v6.4.0

    6.4.0 (2022-06-14)

    Features

    • support SRI verification of link tags (dc51a2e)
    • support SRI verification of script tags (6a54b1c)

    v6.3.20

    6.3.20 (2022-05-13)

    Bug Fixes

    • prefer IPv4 addresses when resolving domains (e17698f), closes #3730

    v6.3.19

    6.3.19 (2022-04-19)

    Bug Fixes

    • client: error out when opening a new tab fails (099b85e)

    v6.3.18

    6.3.18 (2022-04-13)

    Bug Fixes

    • deps: upgrade socket.io to v4.4.1 (52a30bb)

    v6.3.17

    6.3.17 (2022-02-28)

    Bug Fixes

    • deps: update colors to maintained version (#3763) (fca1884)

    v6.3.16

    ... (truncated)

    Changelog

    Sourced from karma's changelog.

    6.4.1 (2022-09-19)

    Bug Fixes

    6.4.0 (2022-06-14)

    Features

    • support SRI verification of link tags (dc51a2e)
    • support SRI verification of script tags (6a54b1c)

    6.3.20 (2022-05-13)

    Bug Fixes

    • prefer IPv4 addresses when resolving domains (e17698f), closes #3730

    6.3.19 (2022-04-19)

    Bug Fixes

    • client: error out when opening a new tab fails (099b85e)

    6.3.18 (2022-04-13)

    Bug Fixes

    • deps: upgrade socket.io to v4.4.1 (52a30bb)

    6.3.17 (2022-02-28)

    Bug Fixes

    • deps: update colors to maintained version (#3763) (fca1884)

    6.3.16 (2022-02-10)

    Bug Fixes

    • security: mitigate the "Open Redirect Vulnerability" (ff7edbb)

    ... (truncated)

    Commits
    • 0013121 chore(release): 6.4.1 [skip ci]
    • 63d86be fix: pass integrity value
    • 84f7cc3 chore(release): 6.4.0 [skip ci]
    • f2d0663 docs: add integrity parameter
    • dc51a2e feat: support SRI verification of link tags
    • 6a54b1c feat: support SRI verification of script tags
    • 5e71cf5 chore(release): 6.3.20 [skip ci]
    • e17698f fix: prefer IPv4 addresses when resolving domains
    • 60f4f79 build: add Node 16 and 18 to the CI matrix
    • 6ff5aaf chore(release): 6.3.19 [skip ci]
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • Bump loader-utils and html-webpack-plugin in /tensorhive/app/web/dev

    Bump loader-utils and html-webpack-plugin in /tensorhive/app/web/dev

    Bumps loader-utils to 1.4.2 and updates ancestor dependency html-webpack-plugin. These dependencies need to be updated together.

    Updates loader-utils from 1.2.3 to 1.4.2

    Release notes

    Sourced from loader-utils's releases.

    v1.4.2

    1.4.2 (2022-11-11)

    Bug Fixes

    v1.4.1

    1.4.1 (2022-11-07)

    Bug Fixes

    v1.4.0

    1.4.0 (2020-02-19)

    Features

    • the resourceQuery is passed to the interpolateName method (#163) (cd0e428)

    v1.3.0

    1.3.0 (2020-02-19)

    Features

    • support the [query] template for the interpolatedName method (#162) (469eeba)
    Changelog

    Sourced from loader-utils's changelog.

    1.4.2 (2022-11-11)

    Bug Fixes

    1.4.1 (2022-11-07)

    Bug Fixes

    1.4.0 (2020-02-19)

    Features

    • the resourceQuery is passed to the interpolateName method (#163) (cd0e428)

    1.3.0 (2020-02-19)

    Features

    • support the [query] template for the interpolatedName method (#162) (469eeba)

    Commits

    Updates html-webpack-plugin from 2.30.1 to 5.5.0

    Changelog

    Sourced from html-webpack-plugin's changelog.

    5.5.0 (2021-10-25)

    Features

    • Support type=module via scriptLoading option (1e42625), closes #1663

    5.4.0 (2021-10-15)

    Features

    5.3.2 (2021-06-22)

    Bug Fixes

    • update lodash and pretty error (9c7fba0

    5.3.1 (2021-03-09)

    Bug Fixes

    • remove loader-utils from plugin core (82d0ee8)

    5.3.0 (2021-03-07)

    Features

    • allow to modify the interpolation options in webpack config (d654f5b)
    • drop loader-utils dependency (41d7a50)

    5.2.0 (2021-02-19)

    Features

    5.1.0 (2021-02-12)

    Features

    • omit html tag attribute with null/undefined/false value (aa6e78d), closes #1598

    5.0.0 (2021-02-03)

    ... (truncated)

    Commits
    • 873d75b chore(release): 5.5.0
    • ddeb774 chore: update examples
    • 1e42625 feat: Support type=module via scriptLoading option
    • 7d3645b Bump pretty-error to 4.0.0 to fix transitive vuln for ansi-regex CVE-2021-3807
    • 79be779 [chore] changes actions to run on pull_requests
    • b7e5859 [chore] fixes CI to avoid race conditions
    • 48131d3 chore(release): 5.4.0
    • 16a841a [chore] rebuild examples
    • 3bb7c17 Update index.js
    • e38ac97 Update index.js
    • Additional commits viewable in compare view
    Maintainer changes

    This version was pushed to npm by jantimon, a new releaser for html-webpack-plugin since your current version.


    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • Bump socket.io-parser and karma in /tensorhive/app/web/dev

    Bump socket.io-parser and karma in /tensorhive/app/web/dev

    Bumps socket.io-parser to 4.2.1 and updates ancestor dependency karma. These dependencies need to be updated together.

    Updates socket.io-parser from 2.3.1 to 4.2.1

    Release notes

    Sourced from socket.io-parser's releases.

    4.2.1

    Bug Fixes

    • check the format of the index of each attachment (b5d0cb7)

    Links

    4.2.0

    Features

    • allow the usage of custom replacer and reviver (#112) (b08bc1a)

    Links

    4.1.2

    Bug Fixes

    • allow objects with a null prototype in binary packets (#114) (7f6b262)

    Links

    4.1.1

    Links

    4.1.0

    Features

    • provide an ESM build with and without debug (388c616)

    Links

    4.0.5

    Bug Fixes

    • check the format of the index of each attachment (b559f05)

    Links

    ... (truncated)

    Changelog

    Sourced from socket.io-parser's changelog.

    4.2.1 (2022-06-27)

    Bug Fixes

    • check the format of the index of each attachment (b5d0cb7)

    4.2.0 (2022-04-17)

    Features

    • allow the usage of custom replacer and reviver (#112) (b08bc1a)

    4.1.2 (2022-02-17)

    Bug Fixes

    • allow objects with a null prototype in binary packets (#114) (7f6b262)

    4.1.1 (2021-10-14)

    4.1.0 (2021-10-11)

    Features

    • provide an ESM build with and without debug (388c616)

    4.0.4 (2021-01-15)

    Bug Fixes

    • allow integers as event names (1c220dd)

    4.0.3 (2021-01-05)

    4.0.2 (2020-11-25)

    ... (truncated)

    Commits
    • 5a2ccff chore(release): 4.2.1
    • b5d0cb7 fix: check the format of the index of each attachment
    • c7514b5 chore(release): 4.2.0
    • 931f152 chore: add Node.js 16 in the test matrix
    • 6c9cb27 chore: bump @​socket.io/component-emitter to version 3.1.0
    • b08bc1a feat: allow the usage of custom replacer and reviver (#112)
    • aed252c chore(release): 4.1.2
    • 89209fa chore: bump cached-path-relative from 1.0.2 to 1.1.0 (#113)
    • 0a3b556 chore: bump path-parse from 1.0.6 to 1.0.7 (#108)
    • 7f6b262 fix: allow objects with a null prototype in binary packets (#114)
    • Additional commits viewable in compare view

    Updates karma from 1.7.1 to 6.4.1

    Release notes

    Sourced from karma's releases.

    v6.4.1

    6.4.1 (2022-09-19)

    Bug Fixes

    v6.4.0

    6.4.0 (2022-06-14)

    Features

    • support SRI verification of link tags (dc51a2e)
    • support SRI verification of script tags (6a54b1c)

    v6.3.20

    6.3.20 (2022-05-13)

    Bug Fixes

    • prefer IPv4 addresses when resolving domains (e17698f), closes #3730

    v6.3.19

    6.3.19 (2022-04-19)

    Bug Fixes

    • client: error out when opening a new tab fails (099b85e)

    v6.3.18

    6.3.18 (2022-04-13)

    Bug Fixes

    • deps: upgrade socket.io to v4.4.1 (52a30bb)

    v6.3.17

    6.3.17 (2022-02-28)

    Bug Fixes

    • deps: update colors to maintained version (#3763) (fca1884)

    v6.3.16

    ... (truncated)

    Changelog

    Sourced from karma's changelog.

    6.4.1 (2022-09-19)

    Bug Fixes

    6.4.0 (2022-06-14)

    Features

    • support SRI verification of link tags (dc51a2e)
    • support SRI verification of script tags (6a54b1c)

    6.3.20 (2022-05-13)

    Bug Fixes

    • prefer IPv4 addresses when resolving domains (e17698f), closes #3730

    6.3.19 (2022-04-19)

    Bug Fixes

    • client: error out when opening a new tab fails (099b85e)

    6.3.18 (2022-04-13)

    Bug Fixes

    • deps: upgrade socket.io to v4.4.1 (52a30bb)

    6.3.17 (2022-02-28)

    Bug Fixes

    • deps: update colors to maintained version (#3763) (fca1884)

    6.3.16 (2022-02-10)

    Bug Fixes

    • security: mitigate the "Open Redirect Vulnerability" (ff7edbb)

    ... (truncated)

    Commits
    • 0013121 chore(release): 6.4.1 [skip ci]
    • 63d86be fix: pass integrity value
    • 84f7cc3 chore(release): 6.4.0 [skip ci]
    • f2d0663 docs: add integrity parameter
    • dc51a2e feat: support SRI verification of link tags
    • 6a54b1c feat: support SRI verification of script tags
    • 5e71cf5 chore(release): 6.3.20 [skip ci]
    • e17698f fix: prefer IPv4 addresses when resolving domains
    • 60f4f79 build: add Node 16 and 18 to the CI matrix
    • 6ff5aaf chore(release): 6.3.19 [skip ci]
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
Releases(1.1.0)
  • 1.1.0(Feb 2, 2022)

    What's Changed

    Main changes:

    • Extended resource protection mechanism (https://github.com/roscisz/TensorHive/pull/347)
      • configurable protection levels, with new mode (level 2) which does not allow using GPUs without reservations, even if there are no other reservations violated
      • violations are detected for all available resources, not only for current reservations
      • configurable process killing levels (0 - no killing, 1 - try to kill as the process owner, 2 - kill using sudo)
      • gather multiple violated resources in one message in both mailer and message sending handlers
      • send violation messages to user TTYs across multiple hosts
    • Reverse proxy support in (https://github.com/roscisz/TensorHive/pull/329)
    • Persisting selected resources in calendar view (https://github.com/roscisz/TensorHive/pull/353)

    Minor changes:

    • dependency version upgrades
    • bugfixes

    Configuration Changes

    Standard upgrade process using pip or git pull will work. However, the following changes in main_config.ini should be taken into account:

    • enabled option under [protection_service] has been deleted and is no longer relevant
    • level option has been introduced under [protection_service] see description
    • kill_processes option has been introduced under [protection_service] see description
    • url_schema and url_port options have been introduced under [api] see description

    New Contributors

    • @tboinski made their first contribution in https://github.com/roscisz/TensorHive/pull/329

    Full Changelog: https://github.com/roscisz/TensorHive/compare/1.0.0...1.1.0

    Source code(tar.gz)
    Source code(zip)
  • 1.0.0(Jun 8, 2021)

    What's Changed

    Main improvements:

    • expansion of the Task Execution module to the new Job execution module:
      • tasks can grouped into jobs and managed jointly
      • tasks can be cloned
      • tasks can be edited considering the individual command segments
    • job queue for non reserved time - automatic scheduler can run jobs when the required resources are not reserved

    Minor improvements:

    • e-mail queuing and multiple admin addresses
    • dependency fixes
    • bugfixes

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.3.6...1.0.0

    Source code(tar.gz)
    Source code(zip)
  • 0.3.6(Dec 23, 2020)

    What's Changed

    • Hotfix/r0.3.6 (https://github.com/roscisz/TensorHive/pull/320)

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.3.5...0.3.6

    Source code(tar.gz)
    Source code(zip)
  • 0.3.5(Dec 14, 2020)

    What's Changed

    Main improvements:

    • access control - users and groups can be granted privileges to reserve, monitor and utilize specific GPUs
    • security improvements
    • functional API tests, increased test coverage

    Minor improvements:

    • redesigned configuration and DB initialization CLI
    • automatic DB migrations
    • OpenAPI upgrade to 3.0.3
    • upgraded webapp dependencies
    • removed unnecessary versioning from API URL
    • improved tail mode in tasks overview
    • several bugfixes

    New Contributors

    • @jszemplinski made their first contribution in https://github.com/roscisz/TensorHive/pull/243
    • @matpiotrowski made their first contribution in https://github.com/roscisz/TensorHive/pull/247
    • @Acrobot made their first contribution in https://github.com/roscisz/TensorHive/pull/260
    • @martyole made their first contribution in https://github.com/roscisz/TensorHive/pull/307

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.3.3...0.3.5

    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Mar 5, 2020)

  • 0.3.2(Mar 10, 2020)

  • 0.3.1(Nov 27, 2019)

    What's Changed

    • UX improvements in webapp (redesigned reservation schedule component, calendar refreshing, highlighting of current users reservations, saving watches in local storage)
    • commandline UX improvements
    • showing GPU IDs in violation messages
    • using local time in reservation dates
    • CLI commands for connection testing and displaying TH public SSH key
    • handling missing SSH keys
    • updated configuration guides
    • allowing custom API URLs
    • vulnerable dependency version bumps
    • improved README with new screenshots
    • various bugfixes

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.3...0.3.1

    Source code(tar.gz)
    Source code(zip)
  • 0.3(Jul 9, 2019)

    What's Changed

    The 0.3 version introduces task scheduling module, which allows for:

    • easy defining computational tasks in the web application, using templates with variable placeholders
    • automatic spawning tasks on the assigned computational devices
    • viewing task logs in the web application
    • easy terminating and killing a task or group of tasks
    • attaching terminal to a running task using screen
    • scheduling tasks to a given time slot
    • linking tasks to a specific reservation

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.2.4...0.3

    Source code(tar.gz)
    Source code(zip)
  • 0.2.4(Mar 3, 2020)

    What's Changed

    • added user registration through adding generated RSA key to authorized_keys
    • added separate intruder/admin timer in mailbot, improved mailbot logic
    • fixed bug in reservation statistics which resulted in no statistics shown and not hiding statistic files
    • fixed lack of calendar refreshing after reservation cancelling
    • fixed refresh token handling in webapp
    • UX improvements: button sizes, datetime string conversion, default values to inputs, calendar refreshing, confirmation button in reservation datetime picker, add version information
    • added mypy code checks and flake8, also for test code

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.2.3...0.2.4

    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(Mar 11, 2019)

    What's Changed

    • mailbot handler for reservation protection service
    • basic reservation usage statistics
    • vertical calendar view in webapp
    • reservation and user edition dialogs in webapp
    • bugfixes and minor improvements

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.2.2...0.2.3

    Source code(tar.gz)
    Source code(zip)
  • 0.2.2(Nov 5, 2018)

    What's Changed

    The 0.2.2 release with monitoring and reservation modules.

    New Contributors

    • @szarakawka made their first contribution in https://github.com/roscisz/TensorHive/pull/71

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.2.1...0.2.2

    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Jul 26, 2018)

    What's Changed

    The 0.2.1 pre-release with basic monitoring module.

    Full Changelog: https://github.com/roscisz/TensorHive/compare/0.2...0.2.1

    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Aug 3, 2018)

Owner
Paweł Rościszewski
Paweł Rościszewski
XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

null 92 Dec 14, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 5, 2023
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 7, 2023
Uber Open Source 1.6k Dec 31, 2022
🎛 Distributed machine learning made simple.

?? lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Nov 27, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 9, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8.1k Dec 30, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 3, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 23.3k Dec 31, 2022
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

null 4.1k Jan 9, 2023
Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

Max Pumperla 1.6k Dec 29, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. 10x Larger Models 10x Faster Trainin

Microsoft 8.4k Dec 30, 2022
a distributed deep learning platform

Apache SINGA Distributed deep learning system http://singa.apache.org Quick Start Installation Examples Issues JIRA tickets Code Analysis: Mailing Lis

The Apache Software Foundation 2.7k Jan 5, 2023
WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.

Shigang Li 6 Jun 18, 2022
An open-source library of algorithms to analyse time series in GPU and CPU.

An open-source library of algorithms to analyse time series in GPU and CPU.

Shapelets 216 Dec 30, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 7, 2023
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

null 2.5k Dec 28, 2022