Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HOST_UDF aggregation for groupby #17592

Merged
merged 24 commits into from
Dec 20, 2024

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Dec 13, 2024

This implements HOST_UDF aggregation, allowing to execute a host-side user-defined function (UDF) through libcudf aggregation framework.

  • A host-side function can be an arbitrarily independent function running on the host machine. It may or may not call other device kernels depending on its implementation.
  • Such user-defined function must follow the libcudf provided interface (cudf::host_udf_base). The interface provides the ability to fully interact with libcudf aggregation framework.
  • Since it is implemented on the user application side, it has a very high degree of freedom to perform arbitrary operations to satisfy the user's need.

Partially contributes to #16633.


Usage

  1. Define a functor deriving from cudf::host_udf_base and implement the required virtual functions declared in that base struct. For example:
struct my_aggregation : cudf::host_udf_base {
   ...
};
  1. Create an instance of libcudf HOST_UDF aggregation which is constructed from an instance of the functor defined above. For example:
auto agg = cudf::make_host_udf_aggregation<cudf::groupby_aggregation>(
    std::make_unique<my_aggregation>());
  1. Perform aggregation operation on the created instance.

@ttnghia ttnghia added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Dec 13, 2024
@ttnghia ttnghia requested a review from PointKernel December 13, 2024 19:51
@ttnghia ttnghia self-assigned this Dec 13, 2024
@ttnghia ttnghia requested review from a team as code owners December 13, 2024 19:51
@ttnghia ttnghia requested a review from davidwendt December 13, 2024 19:51
@github-actions github-actions bot added the CMake CMake build issue label Dec 13, 2024
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a few nits and questions on how to improve the documentation.

cpp/tests/groupby/host_udf_example_tests.cu Outdated Show resolved Hide resolved
cpp/tests/groupby/host_udf_example_tests.cu Outdated Show resolved Hide resolved
cpp/tests/groupby/host_udf_tests.cpp Outdated Show resolved Hide resolved
cpp/include/cudf/aggregation.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/aggregation.hpp Outdated Show resolved Hide resolved
@ttnghia ttnghia requested a review from res-life December 18, 2024 15:13
Signed-off-by: Nghia Truong <[email protected]>
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small questions otherwise looks great

cpp/include/cudf/aggregation.hpp Outdated Show resolved Hide resolved
cpp/src/groupby/sort/aggregate.cpp Outdated Show resolved Hide resolved
@ttnghia ttnghia requested a review from PointKernel December 18, 2024 19:15
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ship it :) (oh my, sorry for the typo. corrected)

Copy link
Contributor

@res-life res-life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the Java part.

cpp/include/cudf/aggregation.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/aggregation.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/aggregation.hpp Outdated Show resolved Hide resolved
cpp/src/groupby/groupby.cu Show resolved Hide resolved
cpp/src/groupby/groupby.cu Show resolved Hide resolved
cpp/src/groupby/sort/aggregate.cpp Show resolved Hide resolved
@ttnghia ttnghia requested a review from a team as a code owner December 19, 2024 20:24
@ttnghia ttnghia requested a review from davidwendt December 19, 2024 21:45
Signed-off-by: Nghia Truong <[email protected]>
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake approval (didn't look at the C++).

Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia
Copy link
Contributor Author

ttnghia commented Dec 20, 2024

/merge

Signed-off-by: Nghia Truong <[email protected]>
@rapids-bot rapids-bot bot merged commit 27404bc into rapidsai:branch-25.02 Dec 20, 2024
104 of 105 checks passed
@ttnghia ttnghia deleted the host_udf_groupby branch December 20, 2024 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants