Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize merge algorithm for data sizes equal or greater then 4M items #1933

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

SergeyKopienko
Copy link
Contributor

@SergeyKopienko SergeyKopienko commented Nov 6, 2024

In this PR we optimize merge algorithm for data sizes equal or greater then 4M items.
The main idea - we doing two submits:

  1. in the first submit we find split point in some "base" diagonal's subset.
  2. in the second submit we find split points in all other diagonal and run serial merge for each diagonal (as before).
    But when we find split point on the current diagonal, we setup some indexes limits for rng1 and 'rng2'.
    For these limits we load split point's data from previous and next "base" diagonals, calculated on the step (1).

Applying this approach we have good perf profit for biggest data sizes with float and int data types.

As additional profit, we have sign performance boost for small and middle data sizes in the merge_sort algorithm.

@SergeyKopienko SergeyKopienko added this to the 2022.8.0 milestone Nov 6, 2024
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/optimize_merge_to_main branch 2 times, most recently from 5a8ff9e to fedebda Compare November 6, 2024 16:56
…introduce new function __find_start_point_in

Signed-off-by: Sergey Kopienko <[email protected]>
…introduce __parallel_merge_submitter_large for merge of biggest data sizes

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/optimize_merge_to_main branch 2 times, most recently from 142ffa0 to a6164fd Compare November 7, 2024 08:41
…using __parallel_merge_submitter_large for merge data equal or greater then 4M items

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/optimize_merge_to_main branch from a6164fd to d4721ca Compare November 7, 2024 12:24
Signed-off-by: Sergey Kopienko <[email protected]>
auto __scratch_acc = __result_and_scratch.template __get_scratch_acc<sycl::access_mode::write>(
__cgh, __dpl_sycl::__no_init{});

__cgh.parallel_for<_FindSplitPointsKernelOnMidDiagonal>(
Copy link
Contributor Author

@SergeyKopienko SergeyKopienko Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rarutyun Compile error is here:
https://github.com/oneapi-src/oneDPL/actions/runs/11722920053/job/32653481992?pr=1933

D:\a\oneDPL\oneDPL\include\oneapi\dpl\pstl\hetero\dpcpp\parallel_backend_sycl_merge.h(322,64): error: definition with same mangled name '...' as another definition


_PRINT_INFO_IN_DEBUG_MODE(__exec);

using _FindSplitPointsOnMidDiagonalKernel =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rarutyun I have fixed the error here. Is it correct way?
I am using __kernel_name_generator here because I should have two Kernel names: one passed as template parameter pack and the second name I should create inside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet looked at this in detail, but can't we just pass the _IdType to __kernel_name_generator directly, and use a single _find_split_points_kernel_on_mid_diagonal type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants