Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node output mapping #6

Open
ztaylor54 opened this issue Jan 13, 2022 · 0 comments
Open

Node output mapping #6

ztaylor54 opened this issue Jan 13, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request needs:triage

Comments

@ztaylor54
Copy link
Contributor

💪 Motivation

Currently all KDP pipelines are basically parallelized linked lists - each node load balances output to all later steps. We currently don't support conditional branching, but it would be a useful feature to implement. This would allow for far more complex pipeline structure, giving pipeline designers much more flexibility to adapt existing workflows to use KDP.

📖 Additional Details

We're going to need to add conditional branching to the edge definitions in a pipeline. There are probably a bunch of ways to go about this.. some ok, most bad. I'd like to be thoughtful here so that we don't hack something together that breaks a lot of the KDP "core tenets," if you will.

The specification should likely go in the pipeline.spec, as the conditionals should be immutable. We should also force that all nodes be reachable, i.e. having a path from root given some set of conditionals. The pipeline validator (yet to be linked to the operator) should also check for possible cycles & emit a warning, possibly even block application of the pipeline unless a specific flag has_cycles (or something to that effect) is specified with the pipeline definition.

The goal here is to be as concise and readable as possible so as not to bloat the pipeline definition schema, while maintaining high extendability & flexibility for pipeline developers.

Ideas for implementation

Consider the following pipeline definition:

              +-------+      +--------+      +------+
 datainput -> | start | ---> | middle | ---> | last | -> end
              +-------+      +--------+      +------+

With the following corresponding yaml:

graph:
    nodes:
      start:
        # <...>
      middle:
        # <...>
      last:
        # <...>
    edges:
    - source: start
      target: middle
    - source: middle
      target: last

A valid use case might look like the following:

              +-------+      +--------+      +--------+
 datainput -> | start | ---> | middle | ---> | last_0 | -> end
              +-------+      +--------+      +--------+
                                 |
                                 |           +--------+
                                 └---------> | last_1 | -> end
                                             +--------+

Where input flows to last_0 in a nominal case, and to last_1 in an error condition or some other result. Let's say that we set a boolean flag got_error in the output of middle at the top-level of the JSON object.

We could extend graph.edges to include the mapping in the following manner:

edges:
- source: start
  target: middle
- source: middle
  target: last_0
  # lack of "conditions" could imply default branch
- source: middle
   target: last_1
   conditions:
   - match_value: # support multiple types of conditions
       key: "got_error"
       val: "true"
   # array for multiple ? or perhaps take an approach similar to elasticsearch boolean queries

As is evidenced above, there's still plenty to think about. This seems like a good start.

@ztaylor54 ztaylor54 added enhancement New feature or request needs:triage labels Jan 13, 2022
@ztaylor54 ztaylor54 self-assigned this Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs:triage
Projects
None yet
Development

No branches or pull requests

1 participant