Distribute PySPI jobs across a PBS cluster

Overview

Distribute PySPI jobs across a PBS cluster

This repository contains scripts for distributing PySPI jobs across a PBS-type cluster. Each job will contain one calculator object that is associated with one multivariate time series (MTS).

The scripts assume the directory structure that is already set up in this repo: there is a database directory (database) that contains all MTS files, along with a YAML configuration file (sample.yaml) that specifies the relative location of each file (and, optionally, their name, dim_order, and any relevant labels).

Usage

  1. Follow the PySPI documentation to install and set-up PySPI on your cluster
  2. Ensure that any relevant initialization procedures (for setting up conda or anything else) is contained in the PBS script.
  3. Copy all MTS (as numpy files) to the database folder, and update the sample.yaml file accordingly.
  4. activate your conda environment:
conda activate pyspi
  1. Submit the jobs:
python distribute_jobs.py

The results will be stored in the database under the same name as the numpy files. For example, if you have the file database/sample1.npy in your YAML file, then there will be a new folder called database/sample1 with a calc.pkl file inside that contains the calculator.

In order to access the results, load the calculator with dill

import dill

with open('calc.pkl','rb') as f:
  calc = dill.load(f)

Then you can view the contents as per the standard PySPI documentation, e.g.,

calc.table
calc.table['cov_EmpiricalCovariance]
You might also like...
Python cluster client for the official redis cluster. Redis 3.0+.

redis-py-cluster This client provides a client for redis cluster that was added in redis 3.0. This project is a port of redis-rb-cluster by antirez, w

Package, distribute, and update any app for Linux and IoT.

Snapcraft Package, distribute, and update any app for Linux and IoT. Snaps are containerised software packages that are simple to create and install.

Python desktop application to create, distribute, discover, and run codegames
Python desktop application to create, distribute, discover, and run codegames

Python desktop application to create, distribute, discover, and run codegames

Randomly distribute members by groups making sure that every sector is represented

Generate Groups Randomly distribute members by groups making sure that every sector is represented The Scenario Imagine that you have a large group of

Distribute a portion of your yield to other addresses 💙

YSHARE Distribute a portion of your yield to other addresses. How does it work Desposit your yToken or tokens into this contract Set the benificiaries

Here I plotted data for the average test scores across schools and class sizes across school districts.
Here I plotted data for the average test scores across schools and class sizes across school districts.

HW_02 Here I plotted data for the average test scores across schools and class sizes across school districts. Average Test Score by Race This graph re

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Luigi is a Python (3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow managemen

Run MapReduce jobs on Hadoop or Amazon Web Services
Run MapReduce jobs on Hadoop or Amazon Web Services

mrjob: the Python MapReduce library mrjob is a Python 2.7/3.4+ package that helps you write and run Hadoop Streaming jobs. Stable version (v0.7.4) doc

Crontab jobs management in Python

Plan Plan is a Python package for writing and deploying cron jobs. Plan will convert Python code to cron syntax. You can easily manage you

Finds Jobs on LinkedIn using web-scraping
Finds Jobs on LinkedIn using web-scraping

Find Jobs on LinkedIn 📔 This program finds jobs by scraping on LinkedIn 👨‍💻 Relies on User Input. Accepts: Country, City, State 📑 Data about jobs

Python 3.6+ toolbox for submitting jobs to Slurm

Submit it! What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps

Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle
Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

TF Watcher TF Watcher is a simple to use Python package and web app which allows you to monitor 👀 your Machine Learning training or testing process o

Tools to help record data from Qiskit jobs

archiver4qiskit Tools to help record data from Qiskit jobs. Install with pip install git+https://github.com/NCCR-SPIN/archiver4qiskit.git Import the

 Using AWS Batch jobs to bulk copy/sync files in S3
Using AWS Batch jobs to bulk copy/sync files in S3

Using AWS Batch jobs to bulk copy/sync files in S3

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Quick & dirty controller to schedule Kubernetes Jobs later (once)

K8s Jobber Operator Quickly implemented Kubernetes controller to enable scheduling of Jobs at a later time. Usage: To schedule a Job later, Set .spec.

Search and Find Jobs in Ethiopia

✨ EthioJobs ✨ Search and Find Jobs in Ethiopia Easy start critical warning Use pycharm No vscode No sublime No Vim No nothing when you want to use

Jobinja.ir jobs scraper.

Jobinja.ir Dataset Introduction This project is a simple web scraper that scraps pages of jobinja.ir concurrently and writes and update (if file gets

Comments
  • Added functionality for command-line inputs

    Added functionality for command-line inputs

    Suggesting the following changes based on preliminary cluster usage:

    • implementation of command-line inputs for distribute_jobs.py to allow user to customise data location, sample YAML file, walltime, overwriting, and pbs notifications
    • pyspi_distribute.py now automatically generates a PBS script for each job based on user inputs
    • flexibility for user-supplied config.yaml to define a subset of SPIs to examine
    • creation of a script (create_yaml_for_samples.R) that automatically generates sample YAML file if the user wishes
    • updated readme to reflect these changes
    opened by anniegbryant 2
  • Add option for user to only save calc.table to pickle file

    Add option for user to only save calc.table to pickle file

    Hi Oliver! In working with a large number of datasets using pyspi-distribute, I noticed that saving the entire calc object to calc.pkl was quickly taking up a good amount of space -- whereas my only downstream use for the calc object is calc.table. I added an option for a user to specify if they only want to save calc.table, in which case they can add the flag --table_only. The default is False, such that without specifying, the entire calc object is saved. Not sure if this is helpful/useful, just wanted to send your way.

    opened by anniegbryant 0
Owner
Oliver Cliff
Hi! I'm a researcher studying complexity, from epidemics and brain dynamics to multi-robot systems.
Oliver Cliff
Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Alexandra 2 May 18, 2022
Search and Find Jobs in Ethiopia

✨ EthioJobs ✨ Search and Find Jobs in Ethiopia Easy start critical warning Use pycharm No vscode No sublime No Vim No nothing when you want to use

Abdimk 12 Nov 9, 2022
easy_sbatch - Batch submitting Slurm jobs with script templates

easy_sbatch - Batch submitting Slurm jobs with script templates

Wei Shen 13 Oct 11, 2022
Running a complete single-node all-in-one cluster instance of TIBCO ActiveMatrix™ BusinessWorks 6.8.0.

TIBCO ActiveMatrix™ BusinessWorks 6.8 Docker Image Image for running a complete single-node all-in-one cluster instance of TIBCO ActiveMatrix™ Busines

Federico Alpi 1 Dec 10, 2021
Basic repository showing how to use Hydra + Hydra launchers on SLURM cluster

Slurm-Hydra-Submitit This repository is a minimal working example on how to: setup Hydra setup batch of slurm jobs on top of Hydra via submitit-launch

Raphael Meudec 2 Jul 25, 2022
With the initiation of the COVID vaccination drive across India for all individuals above the age of 18, I wrote a python script which alerts the user regarding open slots in the vicinity!

cowin_notifier With the initiation of the COVID vaccination drive across India for all individuals above the age of 18, I wrote a python script which

null 13 Aug 1, 2021
A set of scripts for a two-step procedure to measure the value of access to destinations across several modes of travel within a geographic area.

A set of scripts for a two-step procedure to measure the value of access to destinations across several modes of travel within a geographic area.

Institute for Transportation and Development Policy 2 Oct 16, 2022
ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete

ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete

null 240 Jan 6, 2023
Nautobot-custom-jobs - Custom jobs for Nautobot

nautobot-custom-jobs This repo contains custom jobs for Nautobot. Installation P

Dan Peachey 9 Oct 27, 2022
ClusterMonitor - a very simple python script which monitors and records the CPU and RAM consumption of submitted cluster jobs

ClusterMonitor A very simple python script which monitors and records the CPU and RAM consumption of submitted cluster jobs. Usage To start recording

null 23 Oct 4, 2021