Add support for scheduling in attention operators #253

harsh-nod · 2024-11-05T20:52:45Z

This PR adds support for scheduling in
attention operators. In particular, the following
changes are implemented:

Parallel Floyd Warshall allows for faster scheduling
Support for iter_args in rotating registers
Support for VALU and SHUFFLE delays and resources
Add infer_type for remaining wave ops

This PR adds support for scheduling in attention operators. In particular, the following changes are implemented: 1. Parallel Floyd Warshall allows for faster scheduling 2. Support for iter_args in rotating registers 3. Support for VALU and SHUFFLE delays and resources 4. Add infer_type for remaining wave ops Signed-off-by: Harsh Menon <[email protected]>

martin-luecke

LGTM, just a few comments

iree/turbine/kernel/wave/scheduling/loop_reconstruction.py

iree/turbine/kernel/wave/scheduling/graph_utils.py

iree/turbine/kernel/ops/wave_ops.py

raikonenfnu · 2024-11-11T23:58:07Z

iree/turbine/kernel/wave/scheduling/loop_reconstruction_utils.py

@@ -227,11 +249,11 @@ def liveness_analysis(
            logger.debug(
                f"Node: {node}, User: {user.fx_node}, lifetime: {user.scheduling_parameters['stage'] - custom.scheduling_parameters['stage']}"
            )
-            lifetime[node] = max(
+            user_lifetime = (


is there a purpose to this refactoring aside from styling?

I was doing something with it, but dropped and decided to keep it like this for styling.

raikonenfnu · 2024-11-11T23:59:51Z

iree/turbine/kernel/wave/scheduling/resources.py

-    Operation.WRITE_GLOBAL: np.array([[1, 0, 0]]),
-    Operation.MMA: np.array([[0, 0, 1]]),
-    Operation.NOOP: np.array([[0, 0, 0]]),
+    Operation.READ_SHARED: np.array([[0, 1, 0, 0, 0]]),


Not related to this PR, but does this mean we cannot read and write at the same time?

This just specifies how many resources the instruction uses in one cycle. We can have multiple reads/writes by increasing the total number of resources available.

raikonenfnu · 2024-11-12T00:03:30Z

iree/turbine/kernel/wave/scheduling/loop_reconstruction.py

                if (
                    pipelining_stage == PipelineStage.KERNEL
-                    or pipelining_stage == PipelineStage.EPILOGUE
+                    or pipelining_stage == PipelineStage.PROLOGUE


can you explain this change little bit more?

sure, will also add a comment.
# In situations where we have an iter_arg as a rotating register,
# we also have the output as a rotating register. So when we
# are updating the output, we update the iter_arg as well with the
# old value of the output rotating register. Consider this example:
# Say we have the following:
#
# Stage 0:
# iter_arg0
#
#
# output = compute(...) -> here we update iter_arg0 to have the output value
# for the next stage, so that it gets picked up in stage1.
#
# Stage 1:
# b = use(iter_arg0)

raikonenfnu · 2024-11-12T00:04:02Z

iree/turbine/kernel/wave/scheduling/loop_reconstruction.py

-                new_node.index[dim] = new_node.index[dim].subs(
-                    {induction_variable: current_induction_variables[iteration]}
-                )
+            if new_node.index:


are these for the new ops such as Register which doesnt have index?

I hit this for the ExtractOps that don't have an index.

raikonenfnu · 2024-11-12T00:04:12Z

iree/turbine/kernel/wave/scheduling/loop_reconstruction.py

+                    new_node.index[dim] = new_node.index[dim].subs(
+                        {induction_variable: current_induction_variables[iteration]}
+                    )
+            if custom_node.expanded_dims:


Can you explain why this is needed now?

The expanded dims are needed for the reshape ops during codegen.

Signed-off-by: Harsh Menon <[email protected]>

Hardcode84

Overall LGTM, just a few comments on parallelism and subs

Hardcode84 · 2024-11-12T21:43:23Z

iree/turbine/kernel/wave/scheduling/graph_utils.py

+        D[i, j] = edge.weight.delay - edge.weight.iteration_difference * T
+
+    # Parallel implementation
+    pool = mp.get_context("fork").Pool(processes=mp.cpu_count())


Pool creation/destruction can be expensive (as it starts new processes and then wait them to die in close/join). We probably should have some shared global pool (created on demand).

Yes that makes sense. Right now, we instantiate this pool multiple times within the modulo scheduling loop. Instead, we could do this just once in the constructor and tear it down in the destructor. How does that sound?

Hardcode84 · 2024-11-12T21:46:14Z

iree/turbine/kernel/wave/scheduling/graph_utils.py

+    pool = mp.get_context("fork").Pool(processes=mp.cpu_count())
+    for k in range(N):
+        func = partial(all_pairs_longest_path_parallel, N, D, k)
+        results = pool.map(func, range(N))


We can potentially parallelize it even more by having 2 loops: 1st calling pool.map_async and 2nd aggregating results from them.

Nice, that makes sense. thanks!

So I tried this and you cannot do this because each iteration depends on the results of the previous iteration. So all the iterations of the loop cannot use the same value of the D matrix but after every iteration, we need to update the D matrix and then use the updated D matrix.

Hardcode84 · 2024-11-12T21:47:58Z

iree/turbine/kernel/wave/scheduling/graph_utils.py

@@ -190,7 +257,8 @@ def evaluate_all_pairs_longest_paths(
    """
    D_static = dict(D)
    for key in D_static:
-        D_static[key] = D_static[key].subs(T, initiation_interval)
+        if isinstance(D_static[key], sympy.Expr):


We have utils.safe_subs

sure will change.

harsh-nod force-pushed the fa_sched branch 18 times, most recently from 51d25a6 to dd182ab Compare November 10, 2024 20:33

harsh-nod force-pushed the fa_sched branch from dd182ab to 7e449d5 Compare November 11, 2024 04:34

harsh-nod requested review from raikonenfnu, Hardcode84 and martin-luecke November 11, 2024 04:40

martin-luecke approved these changes Nov 11, 2024

View reviewed changes

iree/turbine/kernel/wave/scheduling/loop_reconstruction.py Outdated Show resolved Hide resolved

iree/turbine/kernel/wave/scheduling/graph_utils.py Outdated Show resolved Hide resolved

iree/turbine/kernel/ops/wave_ops.py Show resolved Hide resolved

raikonenfnu reviewed Nov 11, 2024

View reviewed changes

raikonenfnu reviewed Nov 12, 2024

View reviewed changes

Address comments

17e6c98

Signed-off-by: Harsh Menon <[email protected]>

raikonenfnu approved these changes Nov 12, 2024

View reviewed changes

Hardcode84 approved these changes Nov 12, 2024

View reviewed changes

harsh-nod merged commit 79ec575 into iree-org:main Nov 14, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for scheduling in attention operators #253

Add support for scheduling in attention operators #253

harsh-nod commented Nov 5, 2024 •

edited

Loading

martin-luecke left a comment

raikonenfnu Nov 11, 2024

harsh-nod Nov 12, 2024

raikonenfnu Nov 11, 2024

harsh-nod Nov 12, 2024

raikonenfnu Nov 12, 2024

harsh-nod Nov 12, 2024

raikonenfnu Nov 12, 2024

harsh-nod Nov 12, 2024

raikonenfnu Nov 12, 2024

harsh-nod Nov 12, 2024

Hardcode84 left a comment

Hardcode84 Nov 12, 2024

harsh-nod Nov 14, 2024

Hardcode84 Nov 12, 2024

harsh-nod Nov 14, 2024

harsh-nod Nov 14, 2024

Hardcode84 Nov 12, 2024

harsh-nod Nov 14, 2024

Add support for scheduling in attention operators #253

Add support for scheduling in attention operators #253

Conversation

harsh-nod commented Nov 5, 2024 • edited Loading

martin-luecke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hardcode84 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harsh-nod commented Nov 5, 2024 •

edited

Loading