Distribute PySPI jobs across a PBS cluster

Oliver Cliff

Last update: Feb 10, 2022

Related tags

Miscellaneous pyspi-distribute

Overview

Distribute PySPI jobs across a PBS cluster

This repository contains scripts for distributing PySPI jobs across a PBS-type cluster. Each job will contain one calculator object that is associated with one multivariate time series (MTS).

The scripts assume the directory structure that is already set up in this repo: there is a database directory (database) that contains all MTS files, along with a YAML configuration file (sample.yaml) that specifies the relative location of each file (and, optionally, their name, dim_order, and any relevant labels).

Usage

Follow the PySPI documentation to install and set-up PySPI on your cluster
Ensure that any relevant initialization procedures (for setting up conda or anything else) is contained in the PBS script.
Copy all MTS (as numpy files) to the database folder, and update the sample.yaml file accordingly.
activate your conda environment:

conda activate pyspi

Submit the jobs:

python distribute_jobs.py

The results will be stored in the database under the same name as the numpy files. For example, if you have the file database/sample1.npy in your YAML file, then there will be a new folder called database/sample1 with a calc.pkl file inside that contains the calculator.

In order to access the results, load the calculator with dill

import dill

with open('calc.pkl','rb') as f:
  calc = dill.load(f)

Then you can view the contents as per the standard PySPI documentation, e.g.,

calc.table
calc.table['cov_EmpiricalCovariance]

Python cluster client for the official redis cluster. Redis 3.0+.

redis-py-cluster This client provides a client for redis cluster that was added in redis 3.0. This project is a port of redis-rb-cluster by antirez, w

1.1k Jan 5, 2023

SinglepassTextCluster, an TextCluster tools based on Singlepass cluster algorithm that use tfidf vector and doc2vec，which can be used for individual real-time corpus cluster task。基于single-pass算法思想的自动文本聚类小组件，内置tfidf和doc2vec两种文本向量方法，可自动输出聚类数目、类簇文档集合和簇类大小，用于自有实时数据的聚类任务。

项目的背景 SinglepassTextCluster, an TextCluster tool based on Singlepass cluster algorithm that use tfidf vector and doc2vec，which can be used for individ

34 Dec 18, 2022

Package, distribute, and update any app for Linux and IoT.

Snapcraft Package, distribute, and update any app for Linux and IoT. Snaps are containerised software packages that are simple to create and install.

1.1k Jan 2, 2023

Python desktop application to create, distribute, discover, and run codegames

2 Nov 16, 2021

Randomly distribute members by groups making sure that every sector is represented

Generate Groups Randomly distribute members by groups making sure that every sector is represented The Scenario Imagine that you have a large group of

1 Oct 22, 2021

Distribute a portion of your yield to other addresses 💙

YSHARE Distribute a portion of your yield to other addresses. How does it work Desposit your yToken or tokens into this contract Set the benificiaries

11 Nov 24, 2021

Here I plotted data for the average test scores across schools and class sizes across school districts.

HW_02 Here I plotted data for the average test scores across schools and class sizes across school districts. Average Test Score by Race This graph re

7 Oct 27, 2021

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Luigi is a Python (3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow managemen

16.2k Jan 1, 2023

Run MapReduce jobs on Hadoop or Amazon Web Services

mrjob: the Python MapReduce library mrjob is a Python 2.7/3.4+ package that helps you write and run Hadoop Streaming jobs. Stable version (v0.7.4) doc

2.6k Dec 22, 2022

Crontab jobs management in Python

Plan Plan is a Python package for writing and deploying cron jobs. Plan will convert Python code to cron syntax. You can easily manage you

1.2k Dec 28, 2022

Finds Jobs on LinkedIn using web-scraping

Find Jobs on LinkedIn 📔 This program finds jobs by scraping on LinkedIn 👨‍💻 Relies on User Input. Accepts: Country, City, State 📑 Data about jobs

44 Dec 27, 2022

Python 3.6+ toolbox for submitting jobs to Slurm

Submit it! What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps

768 Jan 3, 2023

Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

TF Watcher TF Watcher is a simple to use Python package and web app which allows you to monitor 👀 your Machine Learning training or testing process o

54 Nov 1, 2022

Tools to help record data from Qiskit jobs

archiver4qiskit Tools to help record data from Qiskit jobs. Install with pip install git+https://github.com/NCCR-SPIN/archiver4qiskit.git Import the

0 Dec 10, 2021

Using AWS Batch jobs to bulk copy/sync files in S3

14 Sep 19, 2022

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

2 May 18, 2022

Quick & dirty controller to schedule Kubernetes Jobs later (once)

K8s Jobber Operator Quickly implemented Kubernetes controller to enable scheduling of Jobs at a later time. Usage: To schedule a Job later, Set .spec.

2 Feb 11, 2022

Search and Find Jobs in Ethiopia

✨ EthioJobs ✨ Search and Find Jobs in Ethiopia Easy start critical warning Use pycharm No vscode No sublime No Vim No nothing when you want to use

12 Nov 9, 2022

Jobinja.ir jobs scraper.

Jobinja.ir Dataset Introduction This project is a simple web scraper that scraps pages of jobinja.ir concurrently and writes and update (if file gets

3 Apr 15, 2022

Comments

Added functionality for command-line inputs
Suggesting the following changes based on preliminary cluster usage:

implementation of command-line inputs for distribute_jobs.py to allow user to customise data location, sample YAML file, walltime, overwriting, and pbs notifications

pyspi_distribute.py now automatically generates a PBS script for each job based on user inputs

flexibility for user-supplied config.yaml to define a subset of SPIs to examine

creation of a script (create_yaml_for_samples.R) that automatically generates sample YAML file if the user wishes

updated readme to reflect these changes
opened by anniegbryant 2
Add option for user to only save calc.table to pickle file

Hi Oliver! In working with a large number of datasets using pyspi-distribute, I noticed that saving the entire calc object to calc.pkl was quickly taking up a good amount of space -- whereas my only downstream use for the calc object is calc.table. I added an option for a user to specify if they only want to save calc.table, in which case they can add the flag --table_only. The default is False, such that without specifying, the entire calc object is saved. Not sure if this is helpful/useful, just wanted to send your way.

opened by anniegbryant 0

Distribute PySPI jobs across a PBS cluster

Related tags

Overview

Distribute PySPI jobs across a PBS cluster

Usage

You might also like...

Python cluster client for the official redis cluster. Redis 3.0+.

Package, distribute, and update any app for Linux and IoT.

Python desktop application to create, distribute, discover, and run codegames

Randomly distribute members by groups making sure that every sector is represented

Distribute a portion of your yield to other addresses 💙

Here I plotted data for the average test scores across schools and class sizes across school districts.

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Run MapReduce jobs on Hadoop or Amazon Web Services

Crontab jobs management in Python

Finds Jobs on LinkedIn using web-scraping

Python 3.6+ toolbox for submitting jobs to Slurm

Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

Tools to help record data from Qiskit jobs

Using AWS Batch jobs to bulk copy/sync files in S3

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Quick & dirty controller to schedule Kubernetes Jobs later (once)

Search and Find Jobs in Ethiopia

Jobinja.ir jobs scraper.

Comments

Added functionality for command-line inputs

Add option for user to only save calc.table to pickle file

Owner

Oliver Cliff

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Search and Find Jobs in Ethiopia

easy_sbatch - Batch submitting Slurm jobs with script templates

Running a complete single-node all-in-one cluster instance of TIBCO ActiveMatrix™ BusinessWorks 6.8.0.

Basic repository showing how to use Hydra + Hydra launchers on SLURM cluster

With the initiation of the COVID vaccination drive across India for all individuals above the age of 18, I wrote a python script which alerts the user regarding open slots in the vicinity!

A set of scripts for a two-step procedure to measure the value of access to destinations across several modes of travel within a geographic area.

ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete

Nautobot-custom-jobs - Custom jobs for Nautobot

ClusterMonitor - a very simple python script which monitors and records the CPU and RAM consumption of submitted cluster jobs