A workflow management tool for numerical models on the NCI computing systems

Overview

Payu

https://coveralls.io/repos/github/payu-org/payu/badge.svg?branch=master&killcache=1 https://readthedocs.org/projects/docs/badge/?version=latest

Payu is a climate model workflow management tool for supercomputing environments.

Payu is currently only configured for use on computing clusters maintained by NCI (National Computational Infrastructure) in Australia.

See the documentation for more details.

Comments
  • Support JRA55 style forcing for MOM5

    Support JRA55 style forcing for MOM5

    JRA55 forcing data sets for MOM5-SIS configuration are split into one file per year due to their large size. Changing the forcing file requires dynamically altering the entry in the data_table file.

    Entries look like this:

    "ATM" , "p_surf" , "SLP" , "./INPUT/slp.nc" , "bicubic" , 1.

    Can we have a discussion about the best approach for this? I'll put in some suggestions below

    opened by aidanheerdegen 39
  • Work with Environment Modules 4 on Gadi

    Work with Environment Modules 4 on Gadi

    Today I tried running on Gadi. Building succeeded. Unfortunately, I was unable to run the 1deg_jra55_ryf experiment. The error in payu/envmod.py line 40 is

    FileNotFoundError: [Errno 2] No such file or directory: '/opt/Modules/v4.3.0/init/.modulespath’
    

    This error is essentially because payu is written for Environment Modules version 3.2.6 on Rajin, and the version of Environment Modules on Gadi is 4.3.0. This version of Environment Modules is backwards incompatible with version 3.2.6. See https://modules.readthedocs.io/en/latest/diff_v3_v4.html Therefore payu needs to be changed to be compatible with Environment Modules version 4, in particular the current configuration on Gadi. It may also be possible that the configuration of Environment Modules on Gadi could change.

    See also pull request #128 and issue #200.

    opened by penguian 20
  • Add file tracking to payu

    Add file tracking to payu

    Add input/restart and executable file tracking to payu using the yamanifest module to create YaML based manifest files which are tracked by git.

    When they exist, manifest files are used to populate the input directories, restarts (when required) and links to the executables. This should allow for easier configuration sharing, as the experiment just needs to be cloned and can be run without further configuration changes.

    opened by aidanheerdegen 19
  • Add user script to run on failure

    Add user script to run on failure

    Possibly a useful hook to have in any case, but could be used as a temporary fix until https://github.com/payu-org/payu/issues/43 is resolved.

    Users can add their own failure user script which can decide to resubmit under certain conditions. This is useful for the ACCESS-OM2-01 model, which is experiencing random frequent segfaults which do not reoccur on resubmission. See https://github.com/COSIMA/access-om2/issues/193.

    opened by aidanheerdegen 18
  • Checksum errors keep popping up

    Checksum errors keep popping up

    I keep getting errors of this sort:

    FATAL from PE     0: MOM_restart(restore_state): Checksum of input field DTBT 4027A842B7D0AE91 does not match value 1EFB741F9B08614    8 stored in INPUT/MOM.res.nc
    

    See, e.g., job 9363751 in /home/552/nc3020/SOchanBcBtEddySat/layer2/layer2_tau5e-0_manyshortridges. (Possibly the logs are archived because I swept.)

    opened by navidcy 18
  • Collating regional outputs for core counts exceeding 10,000

    Collating regional outputs for core counts exceeding 10,000

    I have an ACCESS-OM2-01 simulation where I am trying to save some regional diagnostics. The simulation uses Andrew’s 10461 core count for MOM, meaning that the regional diagnostics routine (which writes out 1 netcdf tile per core) now has 6 digits in the filename after the .nc — like rregionocean-2d30m-vorticity_z-3-hourly-mean-ym_2012_01.nc.010431 .

    It seems that payu doesn't ask mppnccombine to collate these files, likely because of this: https://github.com/payu-org/payu/blob/9348acdf92ca18aae229fc06b0b716d4cd85e1aa/payu/models/fms.py#L65-L66

    Is there a nice way to generalise this bit of code?

    opened by AndyHoggANU 17
  • Handling runlog remotes during a local clone

    Handling runlog remotes during a local clone

    I am testing out automatic pushing of output to a target repository on github, and I've noticed that our liberal git cloneing of others' configurations has created lots of origin remotes pointing to the origin directory, often the directories of other people to whom we probably do not want to be pushing changes.

    Not exactly an "issue", but any thoughts on this situation? Do we want to wipe out the remotes in some way here? Or am I worrying too much about nothing?

    (Mostly for @aidanheerdegen and @nicjhan )

    opened by marshallward 17
  • access-om2-01 crashing

    access-om2-01 crashing

    Sorry for the hopelessly vague title.

    I've apparently hit a problem in /projects/v45/apps/payu/aek with access-om2-01. See run dir: /short/v45/aek156/access-om2/control/01deg_jra55_ryf. On the weekend run 29 (job 3477651) crashed after <40min walltime. archive/restart029 was created but there was no work dir or archive/output029.

    see archive/pbs_logs/01deg_jra55_ryf.e3477651:

    WARNING: no update with \d+ (\d+) \d+ i2o.nc.
    WARNING: no update with \d+ (\d+) \d+ o2i.nc.
    WARNING: no update with \w{4} \w{4} LAG=\+(\d+).
    WARNING: no update with \d+ (\d+) \d+ i2o.nc.
    WARNING: no update with \d+ (\d+) \d+ o2i.nc.
    WARNING: no update with \w{4} \w{4} LAG=\+(\d+).
    Currently Loaded Modulefiles:
      1) payu/aek        2) python/2.7.6    3) openmpi/1.6.3   4) pbs
    Traceback (most recent call last):
      File "/jobfs/local/pbs/mom_priv/jobs/3477651.r-man2.SC", line 9, in <module>
        run_cmd.runscript()
      File "/projects/v45/apps/payu/aek/lib/payu/subcommands/run_cmd.py", line 135, in runscript
        expt.archive()
      File "/projects/v45/apps/payu/aek/lib/payu/experiment.py", line 603, in archive
        self.model.archive()
      File "/projects/v45/apps/payu/aek/lib/payu/models/access.py", line 187, in archive
        shutil.copy2(o2i_src, o2i_dst)
      File "/apps/python/2.7.6/lib/python2.7/shutil.py", line 130, in copy2
        copyfile(src, dst)
      File "/apps/python/2.7.6/lib/python2.7/shutil.py", line 82, in copyfile
        with open(src, 'rb') as fsrc:
    IOError: [Errno 2] No such file or directory: '/short/v45/aek156/access-om2/work/01deg_jra55_ryf/ocean/o2i.nc'
    

    I thought there might have been something wrong with run 28, so I re-ran 28 and then tried 29 again today (job 3562946) but it failed in exactly the same way. (I kept the previous restart029 in restart029-3477651)

    opened by aekiss 16
  • Laboratory name is always used as a path

    Laboratory name is always used as a path

    I thought I could set laboratory name, without a path, and that would allow a config to be shared without a hard coded path. I formed that belief based on this code block:

    https://github.com/marshallward/payu/blob/master/payu/laboratory.py#L65-L69

    That code is never reached, as this logic:

    https://github.com/marshallward/payu/blob/master/payu/laboratory.py#L40-L44

    means it isn't called, even if laboratory isn't a full path.

    bug 
    opened by aidanheerdegen 13
  • MITgcm does not correctly support runlog

    MITgcm does not correctly support runlog

    MITgcm does not properly track config files in it's git repository when using runlog:

    [aph502@raijin3 SOMs_v1_no_tides]$ pwd
    /home/157/cjs157/payu/mitgcm/SOMs_v1_no_tides
    [aph502@raijin3 SOMs_v1_no_tides]$ ls
    archive      data      data.diagnostics  data.kpp  data.obcs  data.rbcs            eedata      mitgcm.err  readme
    config.yaml  data.cal  data.exf          data.mnc  data.pkg   dz.copy_me_to_input  eedata.mth  mitgcm.out  work
    [aph502@raijin3 SOMs_v1_no_tides]$ git ls-tree -r master --name-only
    config.yaml
    

    This is a major problem. People have been using runlog thinking it was tracking their experimental setup, and it has not.

    It is because MITgcm adds config files in setup by matching files that start with data, but runlog is initialised in init

    https://github.com/marshallward/payu/blob/4c15c888cd3faf0cc12963580874166341f2b2b2/payu/experiment.py#L92

    Possible solutions would be to add all known MITgcm config paths to optional_config_files, or initialise runlog in setup.

    bug 
    opened by aidanheerdegen 12
  • Save output/error logs of crashed runs in archive

    Save output/error logs of crashed runs in archive

    Currently, we preserve the most recent .err and .out logs when doing payu sweep. However, a crashed job with many processors can generate very large backtrace logs which can start to fill up one's home directory quickly.

    I am proposing here that we no longer preserve these logs, or at least make it conditional. The original idea here was that people will casually discard useful logs, and we want to keep some record of potential problems, but I don't think this is the way to do it.

    opened by marshallward 12
  • Update docs to refer to Gadi not Raijin

    Update docs to refer to Gadi not Raijin

    The docs still refer to Raijin in a number of places. In some cases, instructions given don't work on Gadi, e.g. https://payu.readthedocs.io/en/latest/install.html#nci-users

    opened by dougiesquire 0
  • Add CESM model?

    Add CESM model?

    I've been mucking around with some CESM configurations and have ported some to payu as part of this. I've added a Cesm model to my own payu fork that should support CESM configurations run with NUOPC-CMEPS, see https://github.com/dougiesquire/payu/tree/cesm_cmeps.

    The implementation probably needs some work, but I'm using it to run the following CESM configurations:

    • "GMOM_JRA" (MOM6-CICE6-DATM-DROF-SLND-SWAV-SGLC): https://github.com/dougiesquire/gmom_jra
    • "GMOM_JRA_WD" (MOM6-CICE6-WW3-DATM-DROF-SLND-SGLC): https://github.com/dougiesquire/gmom_jra_wd

    Is there interest in a PR to add the Cesm model?

    opened by dougiesquire 2
  • Proposal: explicitly support branches

    Proposal: explicitly support branches

    Currently payu doesn't prevent the use of branches in the git repo of the model experiment directory, but it has no explicit knowledge or support for it.

    I propose a change to the way payu names the archive and work directories by appending the branch name to the model name to work and archive directories.

    This has the advantage that a single experiment control directory can be used for perturbations/tests/modifications and they can happily co-exist as fully-formed experiments. Simply changing experiment with git branch will automatically switch between archive directories.

    This would require changing the symbolic link to the archive directory when git branch called. This could be done using git hooks.

    feature 
    opened by aidanheerdegen 3
  • Problematic hardcoded relative path to modulescmd executable

    Problematic hardcoded relative path to modulescmd executable

    While running the regression tests locally on my laptop, one test fails because payu assumes that the modulescmd executable is located in $MODULESHOME/bin, which is not the case for all the linux distros (e.g., Debian and Ubuntu). Obviously not a problem when running on Gadi, but could be annoying for someone else trying to use payu in a different machine.

    PS: The CI does not see the problem because no environment modules are installed in the Ubuntu image being used.

    cross platform 
    opened by micaeljtoliveira 1
  • `payu run` shouldn’t force changed to be tracked in git

    `payu run` shouldn’t force changed to be tracked in git

    Hi,

    I’m trying to run many climate models as part of a pipeline, and compile the results. These models all need to be run from temporary directories, which are either not in a repository or are in a gitignored directory.

    This gives me the warning The following paths are ignored by one of your .gitignore files: temp_folder hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" Which then crashes payu due to non zero exit code from git.

    I couldn’t figure out a way to disable this behaviour with config or with some flag. Is there something I’m missing?

    Thanks, Tali

    docs 
    opened by talidemestre 4
  • Porting to pawsey

    Porting to pawsey

    Wrap ldd in try/except as executables on pawsey seem to be statically linked. Also for the same reason don't assume LD_LIBRARY_PATH is set.

    Commented out call to load_modules. Pawsey has a lot of default modules that it relies on, so can't reliably monkey with that.

    Removed a couple of the bespoke flags Marshall added to the slurm scheduler, and also explicitly pass through the PAYU environment variables. Also set in the current environment, but that didn't seem to make it through to the submitted job.

    opened by aidanheerdegen 3
Owner
The Payu Organization
A gathering of the Payu workflow tool and relevant experiments
The Payu Organization
Daiho Tool is a Script Gathering for Windows/Linux systems written in Python.

Daiho is a Script Developed with Python3. It gathers a total of 22 Discord tools (including a RAT, a Raid Tool, a Nuker Tool, a Token Grabberr, etc). It has a pleasant and intuitive interface to facilitate the use of all with help and explanations for each of them.

AstraaDev 32 Jan 5, 2023
osqueryIR is an artifact collection tool for Linux systems.

osqueryIR osqueryIR is an artifact collection tool for Linux systems. It provides the following capabilities: Execute osquery SQL queries Collect file

AbdulRhman Alfaifi 7 Nov 2, 2022
A library to easily convert climbing route grades between different grading systems.

pyclimb A library to easily convert climbing route grades between different grading systems. In rock climbing, mountaineering, and other climbing disc

Ilias Antonopoulos 4 Jan 26, 2022
Basic loader is a small tool that will help you generating Cloudflare cookies

Basic Loader Cloudflare cookies loader This tool may help some people getting valide cloudflare cookies Installation ?? : pip install -r requirements.

IHateTomLrge 8 Mar 30, 2022
Python tool to check a web applications compliance with OWASP HTTP response headers best practices

Check Your Head A quick and easy way to check a web applications response headers!

Zak 6 Nov 9, 2021
Handy Tool to check the availability of onion site and to extract the title of submitted onion links.

This tool helps is to quickly investigate a huge set of onion sites based by checking its availability which helps to filter out the inactive sites and collect the site title that might helps us to categories what site we are handling.

Balaji 13 Nov 25, 2022
A tool for testing improper put method vulnerability

Putter-CUP A tool for testing improper put method vulnerability Usage :- python3 put.py -f live-subs.txt Result :- The result in txt file "result.txt"

Zahir Tariq 6 Aug 6, 2021
Tool for generating Memory.scan() compatible instruction search patterns

scanpat Tool for generating Frida Memory.scan() compatible instruction search patterns. Powered by r2. Examples $ ./scanpat.py arm.ks:64 'sub sp, sp,

Ole André Vadla Ravnås 13 Sep 19, 2022
Stubmaker is an easy-to-use tool for generating python stubs.

Stubmaker is an easy-to-use tool for generating python stubs. Requirements Stubmaker is to be run under Python 3.7.4+ No side effects during

Toloka 24 Aug 28, 2022
PyHook is an offensive API hooking tool written in python designed to catch various credentials within the API call.

PyHook is the python implementation of my SharpHook project, It uses various API hooks in order to give us the desired credentials. PyHook Uses

Ilan Kalendarov 158 Dec 22, 2022
A Tool that provides automatic kerning for ligature based OpenType fonts in Microsoft Volt

Kerning A Tool that provides automatic kerning for ligature based OpenType fonts in Microsoft Volt There are three stages of the algorithm. The first

Sayed Zeeshan Asghar 6 Aug 1, 2022
Python based tool to extract forensic info from EventTranscript.db (Windows Diagnostic Data)

EventTranscriptParser EventTranscriptParser is python based tool to extract forensically useful details from EventTranscript.db (Windows Diagnostic Da

P. Abhiram Kumar 24 Nov 18, 2022
It is a tool that looks for a specific username in social networks

It is a tool that looks for a specific username in social networks

MasterBurnt 6 Oct 7, 2022
This tool lets you perform some quick tasks for CTFs and Pentesting.

This tool lets you convert strings and numbers between number bases (2, 8, 10 and 16) as well as ASCII text. You can use the IP address analyzer to find out details on IPv4 and perform abbreviation as well as expansion on IPv6 addresses.It can also perform a two's complement calculation as well.

Ayomide Ayodele-Soyebo 1 Jul 16, 2022
a tool for annotating table

table_annotate_tool a tool for annotating table motivated by wiki2bio,we create a tool to annoate all types of tables,this tool can annotate a table w

wisdom under lemon trees 4 Sep 23, 2021
A simple tool that updates your pubspec.yaml file, of a Flutter project, without altering the structure of your file.

A simple tool that updates your pubspec.yaml file, of a Flutter project, without altering the structure of your file.

null 3 Dec 10, 2021
A tool written in python to generate basic repo files from github

A tool written in python to generate basic repo files from github

Riley 7 Dec 2, 2021
Simple Python tool that generates a pseudo-random password with numbers, letters, and special characters in accordance with password policy best practices.

Simple Python tool that generates a pseudo-random password with numbers, letters, and special characters in accordance with password policy best practices.

Joe Helle 7 Mar 25, 2022
SmarTool - Smart Util Tool for Python

A set of tools that keep Python sweeter.

Liu Tao 9 Sep 30, 2022