Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

901 inner join and semi join with result cardinality hint #918

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

saminbassiri
Copy link
Contributor

Enhance Join Operations with Customizable Result Size Allocation


PR Description

Enhancements to Join Operations

This update introduces a new optional parameter, numRowRes, to the InnerJoin and SemiJoin DaphneDSL Operations, enabling precise control over result size allocation. Addresses issue #901.

Key Changes:

  1. Kernel changes:

    • innerJoin:
      • If numRowRes = -1, the result size defaults to numRowRhs * numRowLhs (cartesian product).
      • Otherwise, the result size is defined by numRowRes.
    • semiJoin:
      • If numRowRes = -1, the result size defaults to numRowLhs.
      • Otherwise, the result size is defined by numRowRes.
  2. DaphneDSL Updates:

    • numRowRes is now an optional argument for innerJoin and semiJoin.
    • Defaults to -1 if not provided.
  3. DaphneIR Adjustments:

    • numRowRes is now a mandatory argument for InnerJoinOp and SemiJoinOp.
  4. Implementation Updates:

    • Modified DaphneDSLBuiltins.cpp to set default values for numRowRes.
    • Updated SQLVisitor.cpp to ensure compatibility by passing -1 as numRowRes.
    • Adjusted kernels.json to reflect the new parameter for relevant operations.
  5. Testing:

    • Added script-level test cases to validate correct behavior across various scenarios.

Bug Fixes

  • Resolved issues in CastScar, CastObj, and EwBinaryObjSca kernel functions, where string values were not handled correctly during type casting.

- Introduced `numRowRes` as a parameter for `InnerJoin` and `SemiJoin` kernel functions, indicating the size of the result.
- In `InnerJoin`:
  - If `numRowRes` is -1, the result size is set to `numRowRhs * numRowLhs`.
  - Otherwise, the result size is determined by `numRowRes`.
- In `SemiJoin`:
  - If `numRowRes` is -1, the result size defaults to `numRowLhs`.
  - Otherwise, the result size is determined by `numRowRes`.
- Updated DaphneDSL:
  - Added `numRowRes` as an optional parameter for `innerJoin` and `semiJoin` built-in functions.
  - If not provided, `numRowRes` defaults to -1, which is passed to DaphneIR operations.
- Modified DaphneIR:
  - Made `numRowRes` a mandatory argument for `InnerJoinOp` and `SemiJoinOp`.
- Implementation Updates:
  - Updated `DaphneDSLBuiltins.cpp` to handle default `numRowRes` values.
  - Set `numRowRes` to -1 in `SQLVisitor.cpp` for compatibility.
  - Adjusted `kernels.json` to reflect the new parameter in `innerJoin` and `semiJoin`.
- Added script-level test cases to validate the new functionality.
- Addresses issue `daphne-eu#901` by allowing users to specify result size to prevent over-allocation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant