You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Situations often arise where it would be nice to inject inputs farther down-the-line of pipeline execution than the root node. This is often useful during testing, where the behavior of individual pipeline steps needs to be examined without needing to run data & inputs all the way through the pipeline first.
It is also useful when pipeline steps fail or must be re-run due to misconfiguration or other issues, such as a failure in an externally-configured service. In cases like these, it would be desirable to execute a partial re-run of a pipeline, starting from where the previous run left off. This would avoid duplication of (possibly expensive) work performed by earlier pipeline steps.
Note: The use-case for a partial re-run likely warrants some method of "replaying" pipeline inputs - this could be achieved by caching inputs in the manager's work queues, or something similar.
📖 Additional Details
For a more concrete example, consider the following pipeline:
If an error occurs in middle, we might have reason to send data from the datainput directly to middle, thus bypassing start. This might be implemented in a DataInput spec as follows:
Which would result in the DataInput's container pushing data to middle's work queue, instead of root.
There are a few considerations / caveats:
The DataInput schema will need to be updated to include the target: <node> option, specifying that the output queue of the DataInput should be something other than the root node. Will default to the root node of target is not specified.
With the current implementation, a given node may have more than one workqueue (incoming edge) it gets inputs from in round-robin. Shortcut-inputs could be evenly distributed, put all into one queue, or handled separately - the correct approach is unclear.
While the DataInput can somewhat-easily be configured to pass data to a different step in the pipeline, it is less straightforward to get the underlying container to pass inputs that middle would care about (i.e. emulate start's output).
This is where an input "replay" will come in handy, but there's still the case where inputs are unavailable such as during a test of a single pipeline step. This likely requires a new DataInput container to be created specifically for this purpose.
The text was updated successfully, but these errors were encountered:
This would eliminate the need for an image specification, and a replay of inputs for the past DataInput would be handled by internal mechanisms (likely on the manager). We'd need to keep track of DataInput IDs during execution, but that wouldn't be an issue.
💪 Motivation
Situations often arise where it would be nice to inject inputs farther down-the-line of pipeline execution than the root node. This is often useful during testing, where the behavior of individual pipeline steps needs to be examined without needing to run data & inputs all the way through the pipeline first.
It is also useful when pipeline steps fail or must be re-run due to misconfiguration or other issues, such as a failure in an externally-configured service. In cases like these, it would be desirable to execute a partial re-run of a pipeline, starting from where the previous run left off. This would avoid duplication of (possibly expensive) work performed by earlier pipeline steps.
Note: The use-case for a partial re-run likely warrants some method of "replaying" pipeline inputs - this could be achieved by caching inputs in the manager's work queues, or something similar.
📖 Additional Details
For a more concrete example, consider the following pipeline:
If an error occurs in
middle
, we might have reason to send data from the datainput directly tomiddle
, thus bypassingstart
. This might be implemented in a DataInput spec as follows:Which would result in the DataInput's container pushing data to
middle
's work queue, instead of root.There are a few considerations / caveats:
target: <node>
option, specifying that the output queue of the DataInput should be something other than the root node. Will default to the root node oftarget
is not specified.middle
would care about (i.e. emulatestart
's output).The text was updated successfully, but these errors were encountered: