signac-flow - manage workflows with signac

Glotzer Group

Last update: Oct 14, 2022

Related tags

Overview

signac-flow - manage workflows with signac

The signac framework helps users manage and scale file-based workflows, facilitating data reuse, sharing, and reproducibility.

The signac-flow tool provides the basic components to set up simple to complex workflows for projects managed by the signac framework. That includes the definition of data pipelines, execution of data space operations and the submission of operations to high-performance super computers.

Resources

Framework documentation: Examples, tutorials, topic guides, and package Python APIs.
Package documentation: API reference for the signac-flow package.
Slack Chat Support: Get help and ask questions on the signac Slack workspace.
signac website: Framework overview and news.

Installation

The recommended installation method for signac-flow is through conda or pip. The software is tested for Python versions 3.6+ and is built for all major platforms.

To install signac-flow via the conda-forge channel, execute:

conda install -c conda-forge signac-flow

To install signac-flow via pip, execute:

pip install signac-flow

Detailed information about alternative installation methods can be found in the documentation.

Testing

You can test this package by executing

$ python -m pytest tests/

within the repository root directory.

Acknowledgment

When using signac as part of your work towards a publication, we would really appreciate that you acknowledge signac appropriately. We have prepared examples on how to do that here. Thank you very much!

The signac framework is a NumFOCUS Affiliated Project.

Comments

Implementing a grouping feature to organize flow operations
Description

Goal of this branch is to create a grouping feature that will allow for operations in FlowProject to be organized into "metaoperations." Currently to impliment this behavior either a new FlowProject has to operate on the original set of operations or user commenting has to be done to prevent the group operations from running individually.

An example snippet of expected functional shows in more detail a potential API and use of the feature,

a_group = Project.make_group('a') @Pr.operation @a_group def foo(job): pass @Pr.operation @a_group def bar(job): pass

Then to use the grouping a command like, submit --group a_group would output a job for the scheduler with the final command being something akin to python project.py run -j abc123 -o foo bar

Motivation and Context

This change is a feature change that seeks to give the user more control over the workflow of their project. The group concept will allow for operations to be tagged to be run at once in a single job which can work even if the pre and post conditions are not met initially but will be once other operations have run.

This abstraction of a workflow is useful for multiple scenerios such as automatically restricting operations to particular environements, running a job repeatedly in a single submit #4 , and moving workflow logic into the FlowProject and not the individual operations for idiomatic code. Another use case would to be mark operations as submit only #33 . Another issue tangentially related would be #41 as groups could be considered to create graphs themselves or groups might make up higher level graphs.

Types of Changes

[x] Documentation update

[ ] Bug fix

[x] New feature

[x] Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] My code follows the code style guideline of this project.

[ ] The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

[ ] I have updated the API documentation as part of the package doc-strings.

[x] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.

[ ] I have updated the changelog.

groups
opened by b-butler 68
Feature/enable aggregate operations
This pull request refactors the current job-operation model into jobs-operations, that means each operation is a function of one or more jobs.

Prior to merging we need to tackle the following items:

[ ] Implement parallelized status update

[ ] Update changelog

[ ] Deprecate JobOperation

[ ] Ensure that the scheduler status is updated prior to submission.

GSoC aggregation
opened by csadorf 25
Adopt pytest as testing framework
Description

Adopting pytest as testing framework. This version of my code outputs an assertion error.

Motivation and Context

After this change, We'll be using pytest instead of unittest to perform the unit tests.

Types of Changes

[ ] Documentation update

[ ] Bug fix

[x] New feature

[ ] Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[ ] My name is on the list of contributors.

[x] My code follows the code style guideline of this project.

[x] The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

[ ] I have updated the API documentation as part of the package doc-strings.

[ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.

[x] I have updated the changelog.

enhancement
opened by kidrahahjo 23
Deprecation Warning for --cmd option in script
Description

Added eligibility check for operations generated when cmd is provided in main_script

Motivation and Context

Fixes #218

Types of Changes

[ ] Documentation update

[ ] Bug fix

[ ] New feature

[x] Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[ ] My name is on the list of contributors.

[x] My code follows the code style guideline of this project.

[ ] The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

[ ] I have updated the API documentation as part of the package doc-strings.

[ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.

[ ] I have updated the changelog.
opened by vishav1771 22
issue #113, create markdown with jinja template in status view
Description

Motivation and Context

see issue #113

Types of Changes

[ ] Documentation update

[ ] Bug fix

[x] New feature

[ ] Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] My code follows the code style guideline of this project.

[ ] The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

[x] I have updated the API documentation as part of the package doc-strings.

[ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.

[ ] I have updated the changelog.

enhancement
opened by zhou-pj 22
Implement execution hooks framework
Description

This pull request implements an execution hooks framework that enables users to execute certain functions on start, on finish, on success, and on fail of the execution of an operation.

Resolves #28, resolves #14.

Motivation and Context

The motivation for this framework is to enable users to automatically execute certain functions with operations for example for logging and tracking purposes. For example, to log when a specific operation is executed, a user can provide the following function:

@Project.operation @Project.hooks.on_start(lambda op: logger.info(f"Executing {op}...")) def foo(job): pass

The framework comes with a selection of pre-implemented hook systems to cover all the currently expected use cases, roughly in order of expensiveness:

LogOperation - Log basic information about the execution of operations to a log file.

TrackOperations - Keep detailed metadata records about the state of the project root directory and the operation, including directives, to a log file, optionally in conjunction with git.

SnapshotProject - Create a snapshot of the project root directory to keep track of the code used for the execution of an operation.

TrackWorkspaceWithGit - The workspace is treated as a git repository and automatically committed to before and after the execution of an operation. The can be done either on a per-job basis or workspace global.

A hook system is meant to describe a collection of hook functions that together achieve a specific purpose. The shipped hook systems are implemented as classes that can be installed project-wide via the respective install_hooks function.

High-level API

Hooks can be installed either on a per-operation or a per-project basis. In this way users have the option to execute hook functions either with specific operations or with all operations of a project.

Furthermore, there are two ways that hooks can be installed. One way is directly in Python, for example within the __main__ clause of the project.py file:

# ... if __name__ == '__main__': from flow.hooks import LogOperations LogOperations().install_hooks(Project()).main()

Alternatively, hooks can also be installed through the (project) configuration file:

project = my_project [flow] hooks=flow.hooks.LogOperations,flow.hooks.SnapshotProject(compress=True)

It is assumed that the entities provided here, are callable, i.e., are either a function or a functor. The first argument must be the instance of FlowProject.

Low-level API

The FlowProject class has a hooks attribute with four lists: on_start, on_finish, on_success, and on_fail, which can be appended to. Hook functions installed in this way are executed for all operations.

Furthermore, hook functions can be installed for individual operations either with a decorator:

@Project.operation @Project.hook.on_start(my_hook_function) def op(job): pass

or by passing a dictionary to the add_operation() function: project.add_operation(..., hooks=dict(on_start=my_hook_function)).

Custom hook systems

Users can very easily implement their own hook systems and install them in similar manner. For example:

# myhooks.py def log_op(operation): with open(operation.job.fn('operations.log'), 'a') as logfile: logfile.write(f"Executed operation '{operation}'.") def install(project): project.hooks.on_success.append(log_op) return project

This could then be installed either in the project module:

# ... if __name__ == '__main__': import myhooks myhooks.install(Project()).main()

or via the configuration file:

project = myproject [flow] hooks=myhooks.install

Types of Changes

[ ] Documentation update

[ ] Bug fix

[x] New feature

[ ] Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] My code follows the code style guideline of this project.

[x] The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

[x] I have updated the API documentation as part of the package doc-strings.

[ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.

[x] I have updated the changelog.

enhancement
opened by csadorf 21
show_traceback = on by default

Feature description

Related to #61 and #144, I think it would be good if show_traceback = on by default.

Proposed solution

Change default behavior to enable traceback.

@glotzerlab/signac-developers What are your thoughts about this potential change? The tracebacks are sometimes a little complicated, but I think it's reasonable to expect that Python scripts will show a traceback on error.
enhancement

opened by bdice 19
Add default aggregate support to flow
Description

This pull request adds the support of aggregator class in project.py

Motivation and Context

Types of Changes

[ ] Documentation update

[ ] Bug fix

[x] New feature

[ ] Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] My code follows the code style guideline of this project.

[x] The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

[ ] I have updated the API documentation as part of the package doc-strings.

[ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.

[ ] I have updated the changelog.

GSoC aggregation
opened by kidrahahjo 18
Conditional use of Pool and ThreadPool
We can now use Pool or ThreadPool conditionally.

Description

This pull request is a consequence of #269, which was accidentally closed by me.

Motivation and Context

Fixes #264

Types of Changes

[ ] Documentation update

[ ] Bug fix

[x] New feature

[ ] Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] My code follows the code style guideline of this project.

[ ] The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

[ ] I have updated the API documentation as part of the package doc-strings.

[ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.

[ ] I have updated the changelog.
opened by kidrahahjo 18

CUDA initialized before forking

Description

I am trying to integrate fbpic, a well-known CUDA code (based on Python + Numba) for laser-plasma simulation with signac. The integration repo is signac-driven-fbpic.

I managed to succesfully run on a single GPU, via python3 src/project.py run from inside the signac folder, but if I add --parallel I get

numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking

The goal is to get 8 (independent) copies of fbpic (with different input params) running in parallel on the 8 NVIDIA P100 GPUs that are on the same machine.

To reproduce

Clone the signac-driven-fbpic repo and follow the install instructions. Then go to the signac subfolder, and do

conda activate signac-driven-fbpic
python3 src/init.py
python3 src/project.py run --parallel

Error output

(signac-driven-fbpic) andrei@ServerS:~/Development/signac-driven-fbpic/signac$ python3 src/project.py run --parallel --show-traceback
Using environment configuration: UnknownEnvironment
Serialize tasks|----------------------------------------------------------------------------------Serialize tasks|#####-----------------------------------------------------------------------------Serialize tasks|##########------------------------------------------------------------------------Serialize tasks|###############-------------------------------------------------------------------Serialize tasks|####################--------------------------------------------------------------Serialize tasks|##########################--------------------------------------------------------Serialize tasks|###############################---------------------------------------------------Serialize tasks|####################################----------------------------------------------Serialize tasks|#########################################-----------------------------------------Serialize tasks|###############################################-----------------------------------Serialize tasks|####################################################------------------------------Serialize tasks|#########################################################-------------------------Serialize tasks|##############################################################--------------------Serialize tasks|###################################################################---------------Serialize tasks|#########################################################################---------Serialize tasks|##############################################################################----Serialize tasks|##################################################################################Serialize tasks|##################################################################################Serialize tasks|##############################################################################################|100%
ERROR: Encountered error during program execution: 'CUDA initialized before forking'
Execute with '--show-traceback' or '--debug' to get more information.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2727, in _fork_with_serialization
    project._fork(project._loads_op(operation))
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1467, in _fork
    self._operation_functions[operation.name](operation.job)
  File "src/project.py", line 172, in run_fbpic
    verbose_level=2,
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/fbpic/main.py", line 232, in __init__
    n_guard, n_damp, None, exchange_period, use_all_mpi_ranks )
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/fbpic/boundaries/boundary_communicator.py", line 267, in __init__
    self.d_left_damp = cuda.to_device( self.left_damp )
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devices.py", line 212, in _require_cuda_context
    return fn(*args, **kws)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/api.py", line 103, in to_device
    to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 683, in auto_device
    devobj = from_array_like(obj, stream=stream)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 621, in from_array_like
    writeback=ary, stream=stream, gpu_data=gpu_data)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py", line 102, in __init__
    gpu_data = devices.get_context().memalloc(self.alloc_size)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 697, in memalloc
    self._attempt_allocation(allocator)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 680, in _attempt_allocation
    allocator()
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 695, in allocator
    driver.cuMemAlloc(byref(ptr), bytesize)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 290, in safe_cuda_api_call
    self._check_error(fname, retcode)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 324, in _check_error
    raise CudaDriverError("CUDA initialized before forking")
numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "src/project.py", line 238, in <module>
    Project().main()
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2721, in main
    _exit_or_raise()
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2689, in main
    args.func(args)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 2414, in _main_run
    run()
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/legacy.py", line 193, in wrapper
    return func(self, jobs=jobs, names=names, *args, **kwargs)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1597, in run
    np=np, timeout=timeout, progress=progress)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1421, in run_operations
    pool, cloudpickle, operations, progress, timeout)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/site-packages/flow/project.py", line 1458, in _run_operations_in_parallel
    result.get(timeout=timeout)
  File "/home/andrei/anaconda3/envs/signac-driven-fbpic/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking

Relevant numba link.

System configuration

Operating System: Ubuntu 16.04
Version of Python: 3.6.8
Version of signac: 1.1.0
Version of signac-flow: 0.7.1
NVIDIA Driver Version: 410.72

enhancement expertise needed

opened by berceanu 18

Add a class for Directives
By - @b-butler

Adds two classes Directives and DirectivesItem that serve as a smart mapping for the environment and user-specified directives and a specification for environmental directives respectively. FlowProject and FlowGroup have been changed accordingly.

Description

Motivation and Context

This resolves #265 and helps to centralize logic for directives. This pull request is a necessary follow-up for #282. Also resolves #240.

Types of Changes

[ ] Documentation update

[ ] Bug fix

[x] New feature

[ ] Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Tasks to accomplish:

[x] Test the individual DirectivesItems.

[x] Test the Directives class.

[x] Determine and add environment specific DirectivesItem and correct their get_default_directives function (this at least at the decision level will likely involve a discussion between multiple people).

[x] Some probably large degree of code refactoring (I was just initially trying to get the outline working and there).

[ ] Opening a PR in signac-docs to update documentation.

[x] Going through docstrings and ensuring they are complete, grammatically correct, and helpful. Multiple methods will need these as well.

[x] Add tracking of user specified directives. Previously TrackGetItemDict found in flow/util/misc.py was used. We can use this internally or create our own fix.

[x] Fix code so tests pass

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] My code follows the code style guideline of this project.

[ ] The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

[ ] I have updated the API documentation as part of the package doc-strings.

[ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.

[x] I have updated the changelog.

enhancement GSoC directives
opened by kidrahahjo 17
Move to a fully pyproject.toml based build
Description

This PR removes setup.py and setup.cfg entirely, migrating all project and build configuration information into pyproject.toml. In the process, all linter configs have also been moved into pyproject.toml. The exception is flake8, which does not (and will not) support pyproject.toml, so the flake8 configuration is now stored in the .flake8 file, which is specific to this linter. Additionally, bump2version also does not support pyproject.toml (although unlike flake8 the proposal has not been entirely rejected, so it may eventually), so that configuration has also been moved to a project-specific .bumpversion file.

Motivation and Context

Various changes to Python packaging over the last 6 or 7 years have moved towards more static packaging and towards storing data in a backend-agnostic format. These changes allow these of setuptools alternatives (like flit) as well as more reproducible builds based on build isolation into virtual environments that provide all necessary build dependencies. Direct invocation of setup.py has been deprecated in the process. The changes in this PR modernize flow's build system for compatibility with these new approaches.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] The changes introduced by this pull request are covered by existing or newly introduced tests.

[ ] The package documentation and framework documentation in signac-docs are up to date with these changes.

[ ] I have updated the changelog and added any related issue and pull request numbers for future reference.
opened by vyasr 1
Allow setting account via flag when submitting
Feature description

Discussed with @joaander today.

I would like to be able to set the account on the command line when submitting jobs through flow.

I'm setting it via environment variable, so signac-flow gives me this warning. I would like to be able to silence this warning.

Environment 'DeltaEnvironment' allows the specification of an account name that will be charged for jobs' compute time. Set the account name with the command: $ signac config --global set 'flow.DeltaEnvironment.account' ACCOUNT_NAME

Proposed solution

This may be a good first issue.

Additional context

Other ways to set account:

environment variable

flow config

custom template
opened by cbkerr 0
fix: Multi-node GPU summissions for greatlakes and picotte.
Description

Fixes logic where the --ntasks-per-node would not normalize based on number of nodes for GPU submissions where the number of tasks is often the number of GPUs.

Motivation and Context

Resolved: #566

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[ ] The changes introduced by this pull request are covered by existing or newly introduced tests.

[x] The package documentation and framework documentation in signac-docs are up to date with these changes.

[ ] I have updated the changelog and added any related issue and pull request numbers for future reference.
opened by b-butler 3
Change Conditions Execution Order
Description

Currently we execute operation decorators from outside in or top bottom. This is confusing when using decorators as functions directly.

@FlowProject.operation def op(job): pass FlowProject.pre(expesive_cond)(FlowProject.pre.true("foo")(op))

This actually runs the expensive computation first. This is to make this

@FlowProject.pre.true("foo") @FlowProject.pre(expensive_cond) @FlowProject.operation def op(job): pass

more intuitive. However, I disagree that this is more intuitive. As someone learns Python in fact this begins to become less and less intuitive to the point that without our documentation suggesting the correct ordering, a Python expert would write,

@FlowProject.pre(expensive_cond) @FlowProject.pre.true("foo") @FlowProject.operation def op(job): pass

Given our recent decorator ordering requirements we are already making users come to understand decorators apply bottom to top.

Suggestion

I think we should apply conditions in the order they come in the execution (i.e. bottom to top). As an irrelevant side note this would make the project definition run faster.
opened by b-butler 1
Add GitHub Actions.
Description

Migrates signac-flow's CI to use GitHub Actions.

This PR drops coverage for a few things that I think are non-essential:

Nightly/weekly testing of pip install signac signac-flow and conda install signac signac-flow

Testing against the latest commit of signac (which is currently disabled because it's broken until we release 2.0)

Checking Zenodo metadata on release/.* branches.

If another contributor wishes to add these to GitHub Actions, that would be fine.

Before merging, a repo administrator will need to update the CI checks used for branch protections.

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] The changes introduced by this pull request are covered by existing or newly introduced tests.

[ ] The package documentation and framework documentation in signac-docs are up to date with these changes.

[ ] I have updated the changelog and added any related issue and pull request numbers for future reference.
opened by bdice 1
Added the no-progress flag for project.py status to hide the progress…
Description

Added the no-progress flag for FlowProject.print_status. This hides the progress bar output generated when FlowProject.print_status, and adds the options to hide the progress bar when desired.

Motivation and Context

The progress bar displays a significant amount of the output for small workspaces and does not give any benefit. Additionally, using FlowProject.print_status in Jupyter notebooks requires special configurations or else it does not work. This addresses issue #602

Checklist:

[x] I am familiar with the Contributing Guidelines.

[x] I agree with the terms of the Contributor Agreement.

[x] My name is on the list of contributors.

[x] The changes introduced by this pull request are covered by existing or newly introduced tests.

[x] The package documentation and framework documentation in signac-docs are up to date with these changes.

[x] I have updated the changelog and added any related issue and pull request numbers for future reference.
opened by iblanco11981870 1

Releases(v0.23.0)

v0.23.0(Dec 9, 2022)
Version 0.23.0

2022-12-09

Added

Official Python 3.11 support (#697).

The flow.FlowProject.operation decorator now has an aggregator keyword argument: @FlowProject.operation(aggregator=aggregator.groupsof(2)) (#681).

The FlowGroupEntry class can be called with a directives keyword argument: FlowGroupEntry(directives={...}) (#696).

Changed

Deprecated using flow.aggregate.aggregator as a decorator.

Deprecated placing @FlowProject.pre and @FlowProject.post before the FlowProject.operation decorator (#690).

Require signac version 1.8.0 (#693).

Deprecated alias CLI argument to flow init (#693).

Algorithm for computing cluster job ids (#695).

Deprecated FlowGroupEntry.with_directives in favor of a directives keyword argument in FlowGroupEntry()(#696).

Fixed

Detecting correct environment on Delta GPU nodes (#682).

Identical aggregates are used only once in submission and running (#694, #688).

Removed

show_traceback from CLI and config (#690).

Formatting the output of a FlowCmdOperation (#686).

@flow.cms and flow.with_job (#686, #669).

@FlowProject.operation.with_directives (#686).

The flow.testing module (#691, #692).

Source code(tar.gz)
Source code(zip)
v0.22.0(Oct 14, 2022)
Version 0.22

[0.22.0] -- 2022-10-14

Added

Support for formatting with operation function arguments for FlowCmdOperation (#666, #630).

The CLI status command can show document parameters by using flag -p doc.PARAM (#667).

FlowProject.operation now has cmd, with_job, and directives keyword only arguments (#679, #655, #669).

Changed

Deprecated formatting the output of a FlowCmdOperation (#666, #630).

@flow.cmd and flow.with_job are deprecated (#679, #669, #665).

@FlowProject.operation.with_directives is deprecated (#679, #665).

Deprecated the --show-traceback option for flow's CLI run and submitcommands (#674, #169).

flow CLI run and submit show tracebacks by default (#674, #169).

Broke TestBidict and TestTemplateFilters into smaller and simpler functions (#626).

Source code(tar.gz)
Source code(zip)
v0.21.0(Aug 18, 2022)
Version 0.21

[0.21.0] -- 2022-08-18

Added

XSEDE Delta environment and template (#658).

Changed

Changed get_config_value template filter to error without default on missing key (#649).

Changed get_config_value template filter now takes a FlowProject as its first argument (#649).

Removed

Removed require_config_value template filter (#649).

Removed configuration key 'flow.import_packaged_environments' (#653).

Removed configuaration key 'flow.environment_modules' (#651).

Source code(tar.gz)
Source code(zip)
v0.20.0(Jun 23, 2022)
Version 0.20

[0.20.0] -- 2022-06-23

Added

Added support to run aggregate operations in parallel (#642, #644).

Added an argument, run_options, to FlowProject.make_group which allows passing options to exec for operations running in a different process (#650).

Changed

Deprecated configuaration key 'flow.environment_modules' (#651).

Deprecated configuration key 'flow.import_packaged_environments' (#651).

Changed argument options to submit_options in FlowProject.make_group (#650).

Removed

Dropped support for cloudpickle versions below 1.6.0 (#644).

Removed upper bound on python_requires (#654).

Source code(tar.gz)
Source code(zip)
v0.19.0(Apr 8, 2022)
Version 0.19

This release changes the names of hook triggers and improves the behavior of progress bars in Jupyter notebooks. The minimum supported Python version is now 3.8. We also welcome the first released contributions from @rohanbabbar04!

[0.19.0] -- 2022-04-07

Changed

Dropped support for tqdm versions older than 4.60.0 (#614).

Renamed hook triggers on_finish to on_exit and on_fail to on_exception (#612, #627).

Fixed

Progress bars shown in notebooks fall back to text-based output if ipywidgets is not available (#602, #614).

Removed

Internal utility functions have been removed from the public API (#615).

Dropped support for Python 3.6 and Python 3.7 following the recommended support schedules of NEP 29 (#628).

Dropped support for Jinja2 versions below 3.0.0 (#628).

Source code(tar.gz)
Source code(zip)
v0.18.1(Feb 14, 2022)
Version 0.18.1

Hey signac users, we fixed some bugs to make signac-flow work better for you. Happy Valentine's Day from the signac team! 🌹

[0.18.1] -- 2022-02-14

Fixed

Fixed bug in project status output when no operations are eligible (#607, #609).

Improved traceback handling for errors in signac-flow (#608).

Source code(tar.gz)
Source code(zip)
v0.18.0(Feb 4, 2022)
Version 0.18

[0.18.0] -- 2022-02-03

Added

Feature to install execution hooks for the automated execution of functions with operations (#28, #189, #508).

Changed

Add user defined FlowGroups to status output (#547, #593).

Raise UserOperationError for failed execution of FlowProject operations (#571, #601).

Fixed

Fix issue with GPU submission on Bridges-2 (#588, #589).

Source code(tar.gz)
Source code(zip)
v0.17.0(Nov 16, 2021)
This release adds Python 3.10 support and addresses a couple bugs. Thanks to all the contributors in this release! :art:

Version 0.17

[0.17.0] -- 2021-11-15

Added

Add official support for Python version 3.10 (#578).

Fixed

Scripts are now generated correctly when project path contains spaces and special characters (#572).

XSEDE Expanse template has been fixed to remove leading spaces (#575, #576).

Changed

FlowProject configuration is now validated independently of signac Project configuration (#573).

jsonschema is now a dependency (#573).

Source code(tar.gz)
Source code(zip)
v0.16.0(Aug 20, 2021)
This release adds support for the XSEDE Expanse Cluster, simplifies user customization of templates, improves documentation, and fixes minor bugs. Thanks to everyone who contributed :art:

Added

The --job-output command line flag for submission can be set for SLURM, PBS, and LSF schedulers (#550).

Added a custom_content block to templates for user customization (#553, #520).

Added official support for XSEDE Expanse cluster (#537).

Added FlowProjectDefinitionError for exceptions involving FlowProject definitions (#549).

Changed

Improved documentation of directives (#554).

Raise FlowProjectDefinitionError error for inaccurate FlowProject definitions (#549).

Removed

Removed deprecated environment classes (#556).

Removed support for signac < 1.3.0 (#558).

Removed support for decommissioned XSEDE Comet cluster (#537).

Removed --memory option from University of Michigan Great Lakes cluster submission. Use directives instead (#563).

Source code(tar.gz)
Source code(zip)
v0.15.0(Jun 24, 2021)
This release adds the ability for operations to operate on subset of jobs known as aggregates and improves submissions to HPC clusters. Thanks to @klywang as a first-time contributor for this release. :art:

Added

Add support for aggregation (operations acting on multiple jobs) via flow.aggregator (#464, #516, #542).

Add official support for Andes cluster (#500).

Decorator for setting directives while registering operation function FlowProject.operation.with_directives (#309, #502).

Add new flow command flow template create for automatic creation of custom templates (#520, #534).

Changed

Jinja templates are indented for easier reading (#461, #495).

flow.directives is deprecated in favor of flow.FlowProject.operation.with_directives (#309, #502).

All environments require a scheduler in order to submit, even in pretend mode (#533).

Submitting in pretend mode will show additional scheduler command information (#533).

Support fractional timeout value in Python and command line interface (#541).

Fixed

Errors raised during submission were not being shown to users (#517, #518).

Fixed dependency flag for SLURM submissions (#530, #531).

Source code(tar.gz)
Source code(zip)
v0.14.0(Apr 27, 2021)
This release improvements to status output, documentation on directives, and fixed regressions in version 0.13. Thanks to @Charlottez112 as a first time contributor and the other 4 people who contributed code for this release. :art:

Added

Documentation for all directives (#480).

Defined validators for the fork directive (#480).

Submission summary now appears in FlowProject status output, showing the number of queued, running, unknown statuses. (#472, #488).

Status overview now shows the number of jobs with incomplete operations and totals for the label overviews (#481, #501).

Changed

Renamed TorqueEnvironment and related classes to PBSEnvironment (#388, #491).

LSF and SLURM schedulers will appear to be present if the respective commands bjobs -V or sbatch --version exit with a non-zero error code (#498).

Only known JobStatus values will be written to the project document, to save space and writing time (#503).

Fixed

Strictly enforce that operation functions cannot be used as condition functions (and vice-versa) and prevent the registration of two operations with the same name (#496).

Changed default value of status_parallelization to none, to avoid bugs in user code caused by thread parallelism and overhead in process parallelism (#486).

Memory directives are converted to an integer number of gigabytes or megabytes in submission scripts (#482, #484).

Fixed behavior of --only-incomplete-operations (#481, #501).

Removed

Removed FlowProject.add_operation (#479, #487).

Removed deprecated --walltime argument (#478).

Removed deprecated flow.run interface (#478).

Removed deprecated FlowProject.export_job_statuses (#478).

Removed deprecated script feature, use submit --pretend instead (#478).

Removed deprecated CPUEnvironment, GPUEnvironment classes (#478).

Source code(tar.gz)
Source code(zip)
v0.13.0(Mar 17, 2021)
This release adds support for the new Bridges-2 cluster, expands the use of directives to include memory and walltime requests, removes deprecated features, and fixes bugs. Thanks to the 7 people who contributed code to this release, including first-time contributors @berceanu and @adgnabro!

Added

Add official support for Bridges-2 cluster (#441).

Add support for memory requests via directives (#258, #466).

Add support for walltime requests via directives, deprecated --walltime argument to submit (#240, #476).

Fixed

Support for multi-line @flow.cmd operations (#451, #453).

FlowProject status shows labels and correct number of jobs for projects with zero operations (#454, #460).

Removed

Removed public API of deprecated class JobOperation (#445).

Removed public API of deprecated methods eligible and complete of BaseFlowOperation and FlowGroup (#445).

Removed configuration option use_buffered_mode (#445).

Removed public API of script, next_operations and submit_operations of FlowProject (#445).

Removed support for decommissioned Bridges cluster (#441).

Removed support for memory command line argument in submit (#466).

Source code(tar.gz)
Source code(zip)
v0.12.0(Jan 30, 2021)
This release includes a wide range of performance improvements and internal refactoring that will enable the addition of an "aggregation" feature in subsequent releases (not yet available). Performance of a sample workflow that checks status, runs, and submits a FlowProject with 1000 jobs, 3 operations, and 2 label functions has improved roughly 4x compared to the 0.11.0 release.

Added

Code is formatted with black and isort pre-commit hooks (#365).

Add official support for Python version 3.9 (#365).

Documentation has been added for all public classes and methods (#387, #389).

Added internal support for aggregates of jobs (#334, #348, #351, #364, #383, #390, #415, #422, #430).

Added code coverage to continuous integration (#405).

Changed

Command line interface always uses --job-id instead of --jobid (#363, #386).

CPUEnvironment and GPUEnvironment classes are deprecated (#381).

Docstrings are now written in numpydoc style (#392).

Default environment for the University of Minnesota Mangi cluster changed from SLURM to Torque (#393).

Run commands are evaluted lazily (#70, #396).

Deprecated method export_job_statuses (#402).

Improved internal caching of scheduler status (#410).

Refactored status fetching code (#368, #417).

Optimization: Directives are no longer deep-copied (#420, #421).

The use_buffered_mode config option is deprecated. Buffering is always internally enabled (#425).

Evaluate directives when called instead of when defined (#398, #402).

Various internal refactorings and optimizations (#371, #373, #374, #375, #376, #377, #378, #379, #380, #400, #410, #416, #423, #426).

Scheduler is now an abstract base class (#426).

flow.scheduling.fakescheduler has been renamed to flow.scheduling.fake_scheduler (#426).

Arguments to submit have been changed for all scheduler classes (#426).

Python 3.6 is only tested with oldest dependencies (#436).

Drop support for tqdm versions older than 4.48.1 (#436, #440).

Drop support for Jinja2 versions older than 2.10.0 (#436).

Fixed

Ensure that directives are always evaluated before running or submitting (#408, #409).

Cache the fully qualified domain name during environment detection to fix a performance issue on macOS (#339, #394).

Ensure that next CLI command displays eligible jobs for the exact operation name provided (#443).

Removed

Removed the deprecated method flow.util.misc.write_human_readable_statepoints (#397).

Removed the deprecated argument --no-parallelize (#424).

Removed the deprecated env argument from submission methods (#424).

flow.render_status.Renderer class has been removed. FlowProject.print_status no longer returns the renderer (#426).

Removed deprecated status.py module (#426).

Removed the --test argument from FlowProject.submit (#439).

Source code(tar.gz)
Source code(zip)
v0.11.0(Oct 9, 2020)
Added

Added classes _Directives and _Directive that serve as a smart mapping for directives specified by the environment or user (#265, #283).

Added support for pre-commit hooks (#333).

Add environment profile for University of Minnesota, Minnesota Supercomputing Institute, Mangi supercomputer (#353).

Changed

Make FlowCondition class private (#307, #315).

Deprecate JobOperation class, make SubmissionJobOperation a private class and deprecate the following methods of FlowProject: script, run_operations, submit_operations, next_operations. (#313)

Deprecate the following methods: FlowGroup.eligible, FlowGroup.complete, BaseFlowOperation.eligible, BaseFlowOperation.complete (#337).

Fixed

Serial execution on Summit correctly counts total node requirements (#342).

Fixed performance regression in job submission in large workspaces (#354).

Removed

Drop support for Python 3.5 (#305). The signac project will follow the NEP 29 deprecation policy going forward.

Remove the deprecated methods always, make_bundles, and JobOperation.get_id (#312).

Source code(tar.gz)
Source code(zip)
v0.10.1(Aug 21, 2020)
Fixed

Fix issue with the submission of bundled operations on cluster environments that do not allow slashes ('/') in cluster scheduler job names (#343).

Source code(tar.gz)
Source code(zip)
v0.10.0(Jun 27, 2020)
Added

Add FlowGroup (one or more operations can be grouped within an execution environment) (#114).

Add official support for University of Michigan Great Lakes cluster (#185).

Add official support for Bridges AI cluster (#222).

Add IgnoreConditions option for submit(), run() and script() (#38).

Add pytest support for Testing Framework (#227, #232).

Add markdown and html format support for print_status() (#113, #163).

Add memory flag option for default Slurm scheduler (#256).

Add optional environment variable to specify submission script separator (#262).

Add status_parallelization configuration to specify the parallelization used for fetching status (#264, #271).

Changed

Raises ValueError when an operation function is passed to FlowProject.pre() and FlowProject.post(), or a non-operation function passed to FlowProject.pre.after() (#248, #249).

The option to provide the env argument to submit and submit_operations has been deprecated (#245).

The command line option --cmd for script has been deprecated and will trigger a DeprecationWarning upon use until removed (#243, #218).

Raises ValueError when --job-name is passed by the user because that interferes with status checking (#164, #241).

Submitting with --memory no longer assumes a unit of gigabytes on Bridges and Comet clusters (#257).

Buffering is enabled by default, improving the performance of status checks (#273).

Deprecate the use of no_parallelize argument while printing status (#264, #271).

Submission via the command-line interface now calls the FlowProject.submit function instead of bypassing it for FlowProject.submit_operations (#238, #286).

Updated Great Lakes GPU request syntax (#299).

Fixed

Ensure that label names are used when displaying status (#263).

Fix node counting for large resource sets on Summit (#294).

Removed

Removed ENVIRONMENT global variable in the flow.environment module (#245).

Removed vendored tqdm module and replaced it with a requirement (#247).

Source code(tar.gz)
Source code(zip)
v0.9.0(Jan 9, 2020)
Added

Add official support for Python version 3.8 (#190, #210).

Add descriptive error message when tag is not set and cannot be autogenerated for conditions (#195).

Add "fork" directive to enforce the execution of operations within a subprocess (#159).

Operation graph detection based on function comparison (#178).

Exceptions raised during operations always show tracebacks of user code (#169, #171).

Changed

Raise a warning when a condition's tag is not set and raise an error if this occurs during graph detection (#195).

Raise errors if a forked process or @cmd operation returns a non-zero exit code. (#170, #172).

Removed

Drop support for Python version 2.7 (#157, #158, #201).

The "always" condition has been deprecated and will trigger a DeprecationWarning upon use until removed (#179).

Removed deprecated UnknownEnvironment in favor of StandardEnvironment (#204).

Removed support for decommissioned INCITE Titan and Eos computers (#204).

Removed support for the legacy Python-based submission script generation (#200).

Removed legacy compatibility layers for Python 2, signac < 1.0, and soft dependencies (#205).

Removed deprecated support for implied operation names with the run command (#205).

Source code(tar.gz)
Source code(zip)
v0.8.0(Sep 1, 2019)
Added

Add feature for integrated profiling of status updates (status --profile) to aid with the optimization of a FlowProject implementation (#107, #110).

The status view is generated with Jinja2 templates and thus more easily customizable (#67, #111).

Automatically show an overview of the number of eligible jobs for each operation in status view (#134).

Allow the provision of multiple operation-functions to the pre.after and *.copy_from conditions (#120).

Add option to specify the operation execution order (#121).

Add a testing module to easily initialize a test project (#130).

Enable option to always show the full traceback with show_traceback = on within the [flow] section of the signac configuration (#61, #144).

Add full launcher support for job submission on XSEDE Stampede2 for large parallel single processor jobs (#85, #91).

Fixes

Both the nranks and omp_num_threads directives properly support callables (#118).

Show submission error messages in combination with a TORQUE scheduler (#103, #104).

Fix issue that caused the "Fetching operation status" progress bar to be inaccurate (#108).

Fix erroneous line in the torque submission template (#126).

Ensure default parameter range detection in status printing succeeds for nested state points (#154).

Fix issue with the resource set calculation on INCITE Summit (#101).

Changed

Packaged environments are now available by default. Set import_packaged_environments = off within the [flow] section of the signac configuration to revert to previous behavior.

The following methods of the FlowProject class have been deprecated and will trigger a DeprecationWarning upon use until their removal:

classify (use labels() instead)

next_operation (use next_operations() instead)

export_job_stati (replaced by export_job_statuses)

eligible_for_submission (removed without replacement)

update_aliases (removed without replacement)

The support for Python version 2.7 is deprecated.

Removed

The support for Python version 3.4 has been dropped.

Support for signac version 0.9 has been dropped.

Source code(tar.gz)
Source code(zip)
v0.7.1(Mar 25, 2019)
Added

Add function to automatically print all varying state point parameters in the detailed status view triggered by providing option -p/--parameters without arguments (#19, #87).

Add clear environment notification when submitting job scripts (#43, #88).

Fixes

Fix issue where the scheduler status of job-operations would not be properly updated for ineligible operations (#96).

Fixes (compute environments)

Fix issue with the TORQUE scheduler that occured when there was no job scheduled at all on the system (for any user) (#92, #93).

Changed

The performance of status updates has been significantly improved (up to a factor of 1,000 for large data spaces) by applying a more efficient caching strategy (#94).

Source code(tar.gz)
Source code(zip)
v0.7.0(Mar 14, 2019)
Added

Add legend explaining the scheduler-related symbols to the detailed status view (#68).

Allow the specification of the number of tasks per resource set and additional jsrun arguments for Summit scripts.

Fixes (general)

Fixes issue where callable cmd-directives were not evaluated (#47).

Fixes issue where the source file of wrapped functions was not determined correctly (#55).

Fix a Python 2.7 incompatibility and another unrelated issue with the TORQUE scheduler driver (#54, #81).

Fixes issue where providing the wrong argument type to Project.submit() would go undetected and lead to unexpected behavior (#58).

Fixes issue where using the buffered mode would lead to confusing error messages when condition-functions would raise an AttributeError exception.

Fixes issue with erroneous unused-directive-keys-warning.

Fixes (compute environments)

Fixes issues with the Summit environment resource set calculation for parallel operations under specific conditions (#63).

Fix the node size specified in the template for the ORNL Eos system (#77).

Fixes issue with a missing --gres directive when using the GPU-shared partition on the XSEDE Bridges system (#59).

Fixed University of Michigan Flux hostname pattern to ignore the Flux Hadoop cluster (#82).

Remove the Ascent environment (host decommissioned).

Note: The official support for Python 3.4 will be dropped beginning with version 0.8.0.
Source code(tar.gz)
Source code(zip)