Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/radix sort #1867

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Feature/radix sort #1867

wants to merge 19 commits into from

Conversation

oleksandr-pavlyk
Copy link
Collaborator

@oleksandr-pavlyk oleksandr-pavlyk commented Oct 17, 2024

This PR implements radix sort, and exposes it via new kind keyword of dpt.sort and dpt.argsort functions.
Supported values of kind keyword are "stable", "radixsort", and "mergesort", with the default value being None (same as "stable"). The "stable" kind uses radix sort for boolean and short integral types,


  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

Add common typedef for sort_contig_fn_ptr_t from merge_sort header
and radix_sort header files into a new file.
Used it in cpp files.
Supported values for kind are "radixsort", "mergesort", "stable".
The default is None (same as "stable").

For stable, radix sort is used for bool, (u)int8, (u)int16.

Radix sort is supported for integral, boolean and real floating point
types.
Copy link

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_145 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@coveralls
Copy link
Collaborator

coveralls commented Oct 17, 2024

Coverage Status

coverage: 87.705% (+0.02%) from 87.686%
when pulling 09236c9 on feature/radix-sort
into 7ffcca4 on master.

@oleksandr-pavlyk
Copy link
Collaborator Author

Build with DPCTL_TARGET_CUDA=ON succeeds, test suite run on RTX 3050 passed.

@oleksandr-pavlyk oleksandr-pavlyk marked this pull request as ready for review October 18, 2024 21:34
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_149 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_158 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@ndgrigorian
Copy link
Collaborator

static constexpr size_t determine_automatically = 0;

This line and the initial sorted_block_size value seem to be unused.

static constexpr size_t determine_automatically = 0;
size_t sorted_block_size =
    (sort_nelems >= 512) ? 512 : determine_automatically;

True or not, in merge_sort_detail::sort_over_work_group_contig_impl the sorted_block_size is set to the same value with no check against the current value of sorted_block_size. So these lines can seemingly be removed entirely.

…eter

The intent is to reduce the build time, build memory footprint, and
binary size of the sorting_impl module.

With this change it stands at 46MB, before it was 72MB.
…x_impl

With this change, _tensor_sorting_impl goes back to 17MB, and
_tensor_sorting_radix_impl is 30MB. The memory footprint of linking
should be greatly reduced, speeding up the building process,
reducing the required memory footprint, and providing better
parallelisation opportunities for the build job.

The build time on my Core i7 reduced from 45 minutes to 33 minutes.
…late-parameter-to-an-argument

Change descending from template parameter to an argument
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_174 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@ndgrigorian
Copy link
Collaborator

Nit: renaming of sort.cpp/sort.hpp and argsort.cpp/argsort.hpp to merge_sort and merge_argsort may be good for clarity.

sort.cpp -> merge_sort.cpp, argsort.cpp -> merge_argsort.cpp

Refined exception texts thrown when implementation function
pointer is found missing.
Copy link

github-actions bot commented Nov 4, 2024

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_188 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@oleksandr-pavlyk
Copy link
Collaborator Author

@ndgrigorian I have pushed changes to address all the issues highlighted in review.

ndgrigorian
ndgrigorian previously approved these changes Nov 4, 2024
Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM now, I've run some benchmarks for different array sizes and shapes and run into no problems.

Thank you @oleksandr-pavlyk !

Copy link

github-actions bot commented Nov 6, 2024

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_189 ran successfully.
Passed: 895
Failed: 0
Skipped: 119


template <typename argTy, typename IndexTy>
sycl::event
radix_argsort_axis1_contig_impl(sycl::queue &exec_q,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impl function seems to be unused anywhere. Should it be kept around or removed as dead code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants