[core][compiled graph] Support inter-execution compute-communication overlap #47944
Labels
compiled-graph
enhancement
Request for new feature and/or capability
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
Description
Existing compute-communication overlap in compiled graphs only support intra-execution overlap: i.e., only operations from the same execution loop can be overlapped. This is insufficient for some of the use-cases (e.g., vLLM) where the performance gain mainly comes from overlapping compute and communication operations from different executions.
We need to design and implement a mechanism to support inter-execution overlap.
Use case
No response
The text was updated successfully, but these errors were encountered: