Node output mapping #6

ztaylor54 · 2022-01-13T21:36:31Z

💪 Motivation

Currently all KDP pipelines are basically parallelized linked lists - each node load balances output to all later steps. We currently don't support conditional branching, but it would be a useful feature to implement. This would allow for far more complex pipeline structure, giving pipeline designers much more flexibility to adapt existing workflows to use KDP.

📖 Additional Details

We're going to need to add conditional branching to the edge definitions in a pipeline. There are probably a bunch of ways to go about this.. some ok, most bad. I'd like to be thoughtful here so that we don't hack something together that breaks a lot of the KDP "core tenets," if you will.

The specification should likely go in the pipeline.spec, as the conditionals should be immutable. We should also force that all nodes be reachable, i.e. having a path from root given some set of conditionals. The pipeline validator (yet to be linked to the operator) should also check for possible cycles & emit a warning, possibly even block application of the pipeline unless a specific flag has_cycles (or something to that effect) is specified with the pipeline definition.

The goal here is to be as concise and readable as possible so as not to bloat the pipeline definition schema, while maintaining high extendability & flexibility for pipeline developers.

Ideas for implementation

Consider the following pipeline definition:

              +-------+      +--------+      +------+
 datainput -> | start | ---> | middle | ---> | last | -> end
              +-------+      +--------+      +------+

With the following corresponding yaml:

graph:
    nodes:
      start:
        # <...>
      middle:
        # <...>
      last:
        # <...>
    edges:
    - source: start
      target: middle
    - source: middle
      target: last

A valid use case might look like the following:

              +-------+      +--------+      +--------+
 datainput -> | start | ---> | middle | ---> | last_0 | -> end
              +-------+      +--------+      +--------+
                                 |
                                 |           +--------+
                                 └---------> | last_1 | -> end
                                             +--------+

Where input flows to last_0 in a nominal case, and to last_1 in an error condition or some other result. Let's say that we set a boolean flag got_error in the output of middle at the top-level of the JSON object.

We could extend graph.edges to include the mapping in the following manner:

edges:
- source: start
  target: middle
- source: middle
  target: last_0
  # lack of "conditions" could imply default branch
- source: middle
   target: last_1
   conditions:
   - match_value: # support multiple types of conditions
       key: "got_error"
       val: "true"
   # array for multiple ? or perhaps take an approach similar to elasticsearch boolean queries

As is evidenced above, there's still plenty to think about. This seems like a good start.

The text was updated successfully, but these errors were encountered:

ztaylor54 added enhancement New feature or request needs:triage labels Jan 13, 2022

ztaylor54 self-assigned this Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node output mapping #6

Node output mapping #6

ztaylor54 commented Jan 13, 2022

Node output mapping #6

Node output mapping #6

Comments

ztaylor54 commented Jan 13, 2022

💪 Motivation

📖 Additional Details

Ideas for implementation