Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group() built-in function in DaphneDSL #903

Open
pdamme opened this issue Nov 12, 2024 · 2 comments
Open

group() built-in function in DaphneDSL #903

pdamme opened this issue Nov 12, 2024 · 2 comments
Assignees

Comments

@pdamme
Copy link
Collaborator

pdamme commented Nov 12, 2024

DaphneIR has a GroupOp for relational-style grouping and aggregation, plus a corresponding group-kernel. However, at the moment, this operation can only be created through DAPHNE's SQL parser, but not through DaphneDSL.

This task is to add a group() built-in function to DaphneDSL that creates a GroupOp in DaphneIR. The interface should be (following the notation used in the docs: group(arg:frame, groupCols:str, ..., sumCol:str). That is, the built-in function gets a frame, an arbitrary number of columns to group on, and a single column to calculate the sum on. While this interface does not allow to use all features of the GroupOp/group-kernel (e.g., multiple aggregates, other aggregate functions than sum), it would be a good first step and sufficient for implementing the Star Schema Benchmark in DaphneDSL. We can reflect the full functionality of the GroupOp is DaphneDSL later.

Hints:

  • Add the new built-in function in the DaphneDSL parser.
  • Add a few script-level test cases for the group built-in function.
  • group is a variadic built-in/op/kernel. See existing ops like createFrame for an example. E.g., have a look at how they are handled in src/parser/daphnedsl/DaphneDSLBuiltins.cpp.
  • For an example of how to create a GroupOp, see the SQL parser in src/parser/sql/SQLVisitor.cpp.
@saminbassiri
Copy link
Contributor

Hi, I will work on this issue.

@pdamme
Copy link
Collaborator Author

pdamme commented Nov 13, 2024

Great, please go ahead!

saminbassiri added a commit to saminbassiri/daphne that referenced this issue Nov 20, 2024
 - This built-in function creates a GroupOp in DaphneIR.
 - Only support 'SUM' as an aggregation function.
 - Get only one aggregation column.
 - Get an arbitrary number of columns to group on.
- Add support for string values in the 'group' kernel function.
  - SUM, MIN, and MAX are the only aggregation functions applied to string columns.
  - Other aggregation functions throw an exception if they receive strings as arguments or results.
  - Additionally, 'DeduceValueTypeAndExecute' cannot handle string values due to unsupported operations on strings.
  - Therefore, 'ColumnGroupAggStringVTArg' that is specialized for strings is used.
  - Or 'ColumnGroupAgg' is called exclusively with string values.
- The 'group' function internally calls the 'order' and 'extractCol' kernel functions.
  - These two functions are updated to handle string values correctly.
- Added script-level test cases to validate the new functionality.
- Close issue daphne-eu#903
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants