Add pipeline composition RFC #723

Hardcode84 · 2024-04-18T16:09:12Z

Please review these guidelines to help with the review process:

Have you provided a meaningful PR description?
Have you added a test, a reproducer, or a reference to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
If this PR is a work in progress, are you filing the PR as a draft?
Have you organized your commits logically and ensured each can be built by itself?

chencha3 · 2024-04-19T15:24:34Z

docs/rfcs/PipelineComposition.md

+    LogicalResult run(Operation *op);
+};
+```
+`PipelineSchedule` object encapsulates compiled pipeline graph. Main method is `LogicalResult run(Operation *op);` which follows existing MLIR `PassManager::run`.


What do you mean by the compiled pipeline graph?

PipelineGraph object is populated by set of pipelines with dependencies, and and then it compiles them into some internal representation which runs those pipelines in order, according to those dependencies.

Is the schedule the result of the linearization of the DAG or is it the class that will linearize it?

PipelineSchedule is linearized DAG, createPipelineSchedule will do the linearization.

chencha3 · 2024-04-19T15:27:03Z

docs/rfcs/PipelineComposition.md

+        ArrayRef<StringRef> predecessors,
+        ArrayRef<StringRef> successors,
+        ArrayRef<StringRef> jumpTargets,
+        std::function<void(OpPassManager &)> populateFunc);


So the pipeline is a set of Patterns populated in populateFunc?

Pipeline is set of passes.

docs/rfcs/PipelineComposition.md

Jianhui-Li · 2024-04-27T00:54:15Z

docs/rfcs/PipelineComposition.md

+
+## Motivation
+
+TBD use cases from IREE, TPP


It would help reader to start with a motivation. I assume that the dependency-based graph would avoid some mistake when user configure the pipeline manually and unintentionally break the dependency. Is it correct?

Expanded motivation.

rengolin · 2024-05-07T14:19:58Z

docs/rfcs/PipelineComposition.md

+After user populated the graph object they must call `createPipelineSchedule` method to compile the resulted graph into runnable schedule.
+`createPipelineSchedule` will build a DAG from pipelines dependencies provided by user, and will try to get linear execution order to satify these dependencies.
+
+If two pipelines doesn't have direct and indirect dependencies, order in which they will be executed is not specified, but stable.


I think that's asking too much of this framework. I'd say "stability is depending on the passes accepting canonical forms from each other", and we make sure we always run canonicalization between DAG nodes.

Here I only meant order of pipelines/passes is stable regardless of in which order registerPipelines were called (in POC impl I'm just sorting by pipeline name first to make it stable), but yes, I can remove this for more implementation freedom.

rengolin · 2024-05-07T14:21:40Z

docs/rfcs/PipelineComposition.md

+    LogicalResult run(Operation *op);
+};
+```
+`PipelineSchedule` object encapsulates compiled pipeline graph. Main method is `LogicalResult run(Operation *op);` which follows existing MLIR `PassManager::run`.


Is the schedule the result of the linearization of the DAG or is it the class that will linearize it?

rengolin · 2024-05-07T14:24:55Z

docs/rfcs/PipelineComposition.md

+Passes inside pipeline can set this attribute to indicate they want compilatin flow to jump to the specific point.
+After current pipeline is finished, runtime will check if module object have attribute set and if it does, jump to the selected pipeline and clear the attribute.
+
+Setting attribute to the value, which wasnt in `jumpTargets` for the current pipeline will result in error and abort the compilation flow.


jumpTargets seem to be used for control flow.

I'd create a conditional and looping semantics instead as a type of sub-graph.

For example a pipeline node that lists a bunch of passes (or sub-nodes) and has arity (ex. until-converge-max-n). Or another that has two sub nodes with a select from an IR property (ex. DLTI target information).

Giving users the ability to jump to arbitrary targets is a foot gun that we might not want to create.

FYI, for looping until convergence/fixed point I've added llvm/llvm-project#87166.

It's works fine for simple cases like canonicalization+CSE, but in numba-mlir I had a dozen of passes in the potential loop from multiple different pipelines so I wanted an explicit control when I want to loop.

rengolin

Some items are missing from our discussion, mainly how to build the DAG and how to schedule it.

Building the DAG

Building a DAG is simply taking all passes and insert in the first available slot (similar to tree insertion). Since there is no implicit ordering for passes, this may be restricted to O(n^2).

We could reduce the complexity by creating sub-graphs inside sub-graphs and connecting the super-graphs together.

For example:

All passes before bufferization are a sub-graph that leads into bufferization. There is no implicit order (needs to be scheduled). This is equivalent in saying bufferization depends on all of those passes, but explicitly joining all nodes into a single one.
Bufferization as a node with all cleanups
Same for vectorization, lowering, etc.

         /----\        /----\        /----\
Ingress -------- Buff -------- Vect -------- Lower -> HW
         \----/        \----/        \----/

Where Buff, Vect and Lower are fixed sequences of passes (per target, so can be conditional).

Scheduling

Each of those sub-graphs above will need to be scheduled. This is just graph scheduling, and can be much simpler if we hide loops and conditionals inside nodes.

Loops become a single node that is guaranteed to finish (run until convergence, but stop hard at N iterations, where N is configurable but less than a global MaxN).

If we follow the sub-graph design, then scheduling is always restricted to the sub-graph. This works well with a recursive algorithm that schedules the outer-most graph, then descends into sub-graphs, expanding them in linear from.

Cleaning up

After the graph is linear (with potential loop and conditional nodes), we can start the cleanup, for example, de-duplicating passes that have no writable transforms in between.

Failure

Failure can happen at any stage above and the error message must make clear which stage and what happened. Failed creating a DAG, sub-DAG, scheduling some sub-graph, etc.

rengolin · 2024-05-07T14:33:14Z

docs/rfcs/PipelineComposition.md

+    void registerPipeline(
+        StringRef name,
+        ArrayRef<StringRef> predecessors,
+        ArrayRef<StringRef> successors,


I'd also avoid having both predecessors and sucessors. This feels like a duplication and hard to get right on larger graphs.

What I had in mind is just:

Dependencies: Passes that you must run before (analyses and transforms)

Post-clean up: Canonicalization that can help the following passes

Dependencies can be bundles or specific passes. Bundles can be just a list of passes (ex. buff+ownership), a loop or a conditional (see below). Both bundles and passes have deps/cleanups and we can simplify the graph after linearization.

Post-cleanups would also be simplified (de-duped) if one pass lists it as its cleanups and the following pass lists it as its dependencies.

Regarding having both predecessors and successors,

Following (hypothetical) pipeline:

numpy-to-linalg torch-to-linalg \ / bufferization / \ linalg-to-cpu linalg-to-gpu

We don't want to bufferization to know about specific ***-to-linalg pipelines, as it is a frontend details, irrelevant to bufferization, and we don't want it to know about linalg-to-*** either as it backend details.
So pipeline should looks like

numpy-to-linalg: [], [bufferization] torch-to-linalg: [], [bufferization] bufferization: [], [] linalg-to-cpu: [bufferization], [] linalg-to-gpu: [bufferization], []

Hardcode84 · 2024-05-07T15:58:49Z

Subgarphs are useful by itself, but regarding encapsulating control flow into subgraphs, let's say we have following pipeline:

frontend
    |
    V
python-to-standard
    |
    V
lower-to-llvm

frontend: [], [], []
python-to-standard: [frontend], [lower-to-llvm], []
lower-to-llvm: [], [], []

Now, (external) user wants to add numpy-to-linalg stage and both python-to-standard and numpy-to-linalg stages must run until fixed point.

frontend
    |
    V
python-to-standard
   | ^
   V |
numpy-to-linalg
    |
    V
lower-to-llvm

With jumps they can just do

numpy-to-linalg: [python-to-standard], [lower-to-llvm], /*jump*/[python-to-standard]
... bufferization and such...

And the rest of the pipeline will stay unchanged.

With subgraphs they will have to extract existing python-to-standard stage, wrap both in subgraph and reinsert it into pipeline.

chencha3 reviewed Apr 19, 2024

View reviewed changes

Jianhui-Li reviewed Apr 27, 2024

View reviewed changes

docs/rfcs/PipelineComposition.md Outdated Show resolved Hide resolved

Jianhui-Li reviewed Apr 27, 2024

View reviewed changes

rengolin reviewed May 7, 2024

View reviewed changes

Hardcode84 force-pushed the graph-rfc branch from e78f869 to e8ce1fa Compare May 17, 2024 15:25

Hardcode84 added 7 commits May 17, 2024 23:14

Add pipeline composition RFC

5caba5a

typos

668be9c

motivation

32104b2

add numba-mlir example

53f094a

TPP TBD

29b08c2

add TPP pipeline

a3c8cc6

subgraphs, some clarifications

117f1c1

Hardcode84 force-pushed the graph-rfc branch from e8ce1fa to 117f1c1 Compare May 17, 2024 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline composition RFC #723

Add pipeline composition RFC #723

Hardcode84 commented Apr 18, 2024

chencha3 Apr 19, 2024

Hardcode84 Apr 19, 2024

rengolin May 7, 2024

Hardcode84 May 7, 2024

chencha3 Apr 19, 2024 •

edited

Loading

Hardcode84 Apr 19, 2024

Jianhui-Li Apr 27, 2024

Hardcode84 May 7, 2024

rengolin May 7, 2024

Hardcode84 May 7, 2024

rengolin May 7, 2024

rengolin May 7, 2024

Hardcode84 May 7, 2024

rengolin left a comment

rengolin May 7, 2024

Hardcode84 May 7, 2024

Hardcode84 commented May 7, 2024 •

edited

Loading

Add pipeline composition RFC #723

Are you sure you want to change the base?

Add pipeline composition RFC #723

Conversation

Hardcode84 commented Apr 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chencha3 Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rengolin left a comment

Choose a reason for hiding this comment

Building the DAG

Scheduling

Cleaning up

Failure

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hardcode84 commented May 7, 2024 • edited Loading

chencha3 Apr 19, 2024 •

edited

Loading

Hardcode84 commented May 7, 2024 •

edited

Loading