Related Issues:
If agreed this issue should supersede:
- https://github.com/cylc/cylc-flow/issues/4657
- https://github.com/cylc/cylc-flow/issues/4653
After a long chat with @dpmatthews (who proposed yet another triggering approach 😁) I think we can generalise the trigger problem into two dimensions:
- Continue (yes/no).
- After I trigger the task will the flow continue from that point immediately.
- Or does it only continue if/when a flow front catches up with it.
- I.E. Should the triggered tasks spawn children on completion or after "merge".
- Overrun (yes/no).
- Should the "merge" [1] condition be based on the pool or the DB?
- I.E. Should triggered tasks overrun previous runs of tasks?
- I.E. Should the following flow overrun the triggered tasks?
Note: From the internal implementation these two dimensions may appear flip-sides of the same coin since they both boil down to the flow_nums
, however, considering them from a user standpoint I think it's fair to prise them apart.
Note: Purposefully using new terminology to avoid conflation with existing terms, we may want to workshop "continue" and "overrun" a touch.
[1]: The quoted "merge" above relates to the interaction between two tasks with different flow_nums
in general and not to the more specific concept of "flow merging" in the pool exclusively.
Combing these we get four spaces:
| | Continue | Don't Continue |
|------------|-----------------------|---------------------------------------------------|
| Overrun | (1) Reflow (as currently implemented) | (3) No Flow (current default trigger behaviour) |
| No Overrun | (2) Continue (@dpmatthews new proposed implementation) | (4) No Flow (@oliver-sanders proposed implementation) |
- The bad news is it looks like we have use cases for all four.
- Dave & I think the no-overrun cases are more important than the overrun ones.
- The good news is that they can coexist and the mechanism for supporting all four is currently implemented, it's mostly an interface problem.
Going through the four spaces in detail:
1) Reflow (implemented)
Equivalent to cylc trigger --flow=<new-flow-number>
.
Continue: Yes
Overrun: Yes
- Tasks are triggered with a new flow number.
- The reflow can overrun previous flows.
- The reflow will merge if it collides with another flow in the pool (and only in the pool i.e. overrun).
The use case is for re-running over tasks which have been previously run e.g. change configuration and re-run a sub-graph.
2) Continue (proposed)
Equivalent to cylc trigger --flow=<all-flow-numbers>,<new-flow-number>
.
Continue: Yes
Overrun: No
- A new trigger approach proposed by @dpmatthews.
- Tasks are triggered with all existing flow numbers plus a new flow number (which we added purely so the new flow can still be targeted by CLI tools).
- Because this flow contains all existing flow numbers it will not be overrun by any of the flows which exist at the time of the trigger.
- This is intended for the sort of use cases we would expect
--flow=1
to be used for, but has been generalised to be reflow compatible.
This approach feels quite "natural". The use cases are setting off another bit of the same flow where you don't want tasks to be overrun.
3) No Flow (implemented)
Equivalent to cylc trigger --flow -1
.
I am using a negative flow number rather than None
to distinguish the two no-flow approaches.
Internally we can still maintain the same no-flow logic as present but would need to change the marker.
Continue: No
Overrun: Yes
Useful for running one-off tasks that you do not want to impact the workflow in any way (i.e. cylc submit
type uses).
4) No Flow (proposed)
Equivalent to cylc trigger --flow -2
.
I am using a negative flow number rather than None to distinguish the two no-flow approaches.
Internally we can still maintain the same no-flow logic as present.
Continue: No
Overrun: No
Use case is for manually intervening in graph execution by ignoring dependencies or runahead limit and skipping ahead to a task which you want to be considered a part of the approaching flow front.
Interface
The internals to handle the four cases are already in-place, flow_nums, DB lookups etc, so it mostly boils down to an interface / documentation issue.
I think all four methods could be exposed via a single --flow
argument, however, it is sensible to provide defaults for the different behaviours. I think it would be good to document the --flow
equivalents as they may help users to understand their function.
Note that --reflow
currently determines the new flow number server rather than client side which is sensible.
1) Enable behaviours explicitly
If we are happy with the continue/overrun model (after workshopping the terms) we could expose it directly something like:
# 1) reflow
cylc trigger --continue --overrun
# 2) continue
cylc trigger --continue
# 3) no-flow (implemented)
cylc trigger --overrun
# 4) no-flow (proposed)
cylc trigger
This is quite nice as you have to explicitly opt in to each behaviour separately reducing the scope for unintended results and accidents.
2) Single --flow
argument
if we don't like the continue/overrun model we could move the presets into the flow argument something like:
# 1) reflow
cylc trigger --flow=new
# 2) continue
cylc trigger --flow=any
# 3) no-flow (implemented)
cylc trigger --flow=none
# 4) no-flow (proposed)
cylc trigger --flow=next
It's less behaviour driven so we would need to explain each option separately.
3) Separate flag for each approach
An alternative to (2) would be to could come up with three/four different flags:
# 1) reflow
cylc trigger --reflow
# 2) continue
cylc trigger --flow
# 3) no-flow (implemented)
cylc trigger --rerun
# 4) no-flow (proposed)
cylc trigger # --run
Default
I think no-continue & no-overrun is the safest, sanest default because:
- The minimum set of behaviours is the simplest.
- The "Continue" cases have a dramatic impact on the workflow execution and are hard to revoke.
- The "Re-run" cases are quite advanced and require additional knowledge to operate.
But I'm biased. I think the default is less important than the clear separation of behaviours.