Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide inputs to non-root nodes #9

Open
ztaylor54 opened this issue Jan 13, 2022 · 1 comment
Open

Provide inputs to non-root nodes #9

ztaylor54 opened this issue Jan 13, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request needs:triage

Comments

@ztaylor54
Copy link
Contributor

💪 Motivation

Situations often arise where it would be nice to inject inputs farther down-the-line of pipeline execution than the root node. This is often useful during testing, where the behavior of individual pipeline steps needs to be examined without needing to run data & inputs all the way through the pipeline first.

It is also useful when pipeline steps fail or must be re-run due to misconfiguration or other issues, such as a failure in an externally-configured service. In cases like these, it would be desirable to execute a partial re-run of a pipeline, starting from where the previous run left off. This would avoid duplication of (possibly expensive) work performed by earlier pipeline steps.

Note: The use-case for a partial re-run likely warrants some method of "replaying" pipeline inputs - this could be achieved by caching inputs in the manager's work queues, or something similar.

📖 Additional Details

For a more concrete example, consider the following pipeline:

              +-------+      +--------+      +------+
 datainput -> | start | ---> | middle | ---> | last | -> end
              +-------+      +--------+      +------+

If an error occurs in middle, we might have reason to send data from the datainput directly to middle, thus bypassing start. This might be implemented in a DataInput spec as follows:

spec:
  data: 
    <data block>
  target: middle # add target: <node>

Which would result in the DataInput's container pushing data to middle's work queue, instead of root.

There are a few considerations / caveats:

  • The DataInput schema will need to be updated to include the target: <node> option, specifying that the output queue of the DataInput should be something other than the root node. Will default to the root node of target is not specified.
  • With the current implementation, a given node may have more than one workqueue (incoming edge) it gets inputs from in round-robin. Shortcut-inputs could be evenly distributed, put all into one queue, or handled separately - the correct approach is unclear.
  • While the DataInput can somewhat-easily be configured to pass data to a different step in the pipeline, it is less straightforward to get the underlying container to pass inputs that middle would care about (i.e. emulate start's output).
    • This is where an input "replay" will come in handy, but there's still the case where inputs are unavailable such as during a test of a single pipeline step. This likely requires a new DataInput container to be created specifically for this purpose.
@ztaylor54 ztaylor54 added enhancement New feature or request needs:triage labels Jan 13, 2022
@ztaylor54 ztaylor54 self-assigned this Jan 13, 2022
@ztaylor54
Copy link
Contributor Author

Adding some more thoughts on the replay option.. I think this is the easiest case (for the user, at least):

We might support a replay key in the DataInput spec that links to the ID of a previously-run DataInput:

spec:
  replay: <datainput-id>
  target: middle # add target: <node>

This would eliminate the need for an image specification, and a replay of inputs for the past DataInput would be handled by internal mechanisms (likely on the manager). We'd need to keep track of DataInput IDs during execution, but that wouldn't be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs:triage
Projects
None yet
Development

No branches or pull requests

1 participant