Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] nested spike discussion #193

Closed
wants to merge 5 commits into from

Conversation

forestmvey
Copy link

@forestmvey forestmvey commented Dec 16, 2022

Description

This proposal will outline a suggestion for the scope and implementation of the nested function in the V2 engine.

Requirements

nested in the new engine needs to support all functionality of nested in the legacy engine. This includes:

  • nested in SELECT, WHERE, GROUP BY. example queries:

    • SELECT nested(message.info) FROM multi_nested

    • SELECT nested(message.*) FROM multi_nested

    • SELECT nested(message.info) FROM multi_nested WHERE nested('message', message.info = 'a' OR message.info = 'a')

    • SELECT * FROM opensearch-sql_test_index_nested_type WHERE nested(message.info)='b';

    • SELECT nested(message.info) FROM multi_nested GROUP BY nested(message.info)

    • SELECT nested(message.info) FROM multi_nested GROUP BY message.info

  • alias of the result should be the argument of the nested function

  • nested in function calls like SELECT nested(message.info) FROM multi_nested WHERE wildcardquery(nested(message.info), 'z*')

  • multilevel nested field queries like SELECT nested(message.author.address.street) FROM opensearch-sql_test_index_multi_nested

  • multiple nested functions in one query which is a limitation on elasticsearch like SELECT nested(message.author.address.street), nested(message.info), nested(message.dayOfWeek) FROM opensearch-sql_test_index_multi_nested

  • nested accepts strings as arguments for identifiers (This still needs to be investigated if it should not be supported). Example queries are:

    • SELECT nested("field.subield") FROM test;

    • SELECT nested("field"."subield") FROM test;

    • SELECT nested('field.subield') FROM test;

    • SELECT nested('field'.'subield') FROM test;

Limitations

If we decide to not support strings as argument, then this will become a breaking change for 3.x

Implementation

The new SQL engine implementation of nested function will be similar to the implementation of nested function in legacy. The logical plan builder implementation will be the same. Legacy’s logical plan builder is a part of NestedFieldProjection while the new engine will contain it in OpenSearchRequestBuilder. This is currently implemented in this PoC for nested in only the SELECT clause. Implementation for nested in WHERE and GROUP BY may be different.

Proof of Concept

The PoC does not satisfy all requirements for nested in SELECT. It only supports an end to end implementation of SELECT nested(message.info) FROM opensearch-sql_test_index_nested_type;. Further testing and support for more cases will be added once we are in agreement of the PoC.

SQL queries to support for each stage of implementation

// Support nested in SELECT (POC)
SELECT nested(message.info), someField FROM opensearch-sql_test_index_nested_type;

// Support for *
SELECT nested(message.*) FROM opensearch-sql_test_index_nested_type;

// WHERE support
SELECT * FROM opensearch-sql_test_index_nested_type WHERE nested(comment.likes) > 1;
SELECT * FROM opensearch-sql_test_index_nested_type WHERE nested(comment.likes) < 3;
SELECT * FROM opensearch-sql_test_index_nested_type WHERE nested(message.info)='b';
SELECT * FROM opensearch-sql_test_index_nested_type WHERE nested(message.info) = IN_TERMS('a', 'b');
SELECT * FROM opensearch-sql_test_index_nested_type WHERE nested('message', message.info = 'a' AND message.author ='h');
SELECT * FROM opensearch-sql_test_index_nested_type WHERE nested('message', message.info = 'a' AND message.author ='i');
SELECT * FROM opensearch-sql_test_index_nested_type WHERE nested(message.info) = 'a';
SELECT comment.data FROM opensearch-sql_test_index_nested_type WHERE MATCH_QUERY(NESTED(comment.data), 'aa');
SELECT comment.data FROM opensearch-sql_test_index_nested_type WHERE SCORE(MATCH_QUERY(NESTED(comment.data), 'ab'), 10);

// GROUP BY support
SELECT COUNT(*) FROM opensearch-sql_test_index_nested_type GROUP BY  nested(message.info)
SELECT COUNT(*) FROM opensearch-sql_test_index_nested_type GROUP BY message.info

// Multilevel nested support
SELECT nested(message.author.address.street) FROM opensearch-sql_test_index_multi_nested;
SELECT nested(message.author.address) FROM opensearch-sql_test_index_multi_nested;
SELECT nested(message.author.address.street), nested(message.info), nested(message.dayOfWeek) FROM opensearch-sql_test_index_multi_nested;

// Additional tests from legacy
SELECT COUNT(*) FROM opensearch-sql_test_index_nested_type GROUP BY  nested(message.info),filter('myFilter',message.info = 'a'),reverse_nested(comment.data,'~comment');
SELECT min(nested(message.dayOfWeek)) as minDays FROM opensearch-sql_test_index_nested_type;
SELECT sum(reverse_nested(comment.likes,'~comment')) bla FROM opensearch-sql_test_index_nested_type GROUP BY  nested(message.info),filter('myFilter',message.info = 'a');
SELECT sum(reverse_nested(myNum)) bla FROM opensearch-sql_test_index_nested_type GROUP BY nested(message.info),filter('myFilter',message.info = 'a');
SELECT COUNT(*) FROM opensearch-sql_test_index_nested_type GROUP BY  nested(message.info),filter('myFilter',message.info = 'a'),reverse_nested(someField);
SELECT sum(nested(message.dayOfWeek)) as sumDays FROM opensearch-sql_test_index_nested_type;
SELECT COUNT(*) FROM opensearch-sql_test_index_nested_type GROUP BY  nested(message.info),filter('myFilter',message.info = 'a'),histogram('field'='comment.likes','reverse_nested'='~comment','interval'='2' , 'alias' = 'someAlias' );
SELECT COUNT(*) FROM opensearch-sql_test_index_nested_type GROUP BY  nested(message.info),filter('myFilter',message.info = 'a'),histogram('field'='myNum','reverse_nested'='','interval'='2', 'alias' = 'someAlias' );
SELECT COUNT(*) FROM opensearch-sql_test_index_nested_type GROUP BY  nested(message.info),filter('myFilter',message.info = 'a'),reverse_nested(someField,'');
SELECT COUNT(*) FROM opensearch-sql_test_index_nested_type GROUP BY  nested(message.info),filter('myFilter',message.info = 'a');

// Support for string arguments as legacy supports
SELECT nested("field.subield") FROM test;
SELECT nested("field"."subield") FROM test;
SELECT nested('field.subield') FROM test;
SELECT nested('field'.'subield') FROM test;

Schedule

Priority Description Target Version
P0 nested function working in SELECT, WHERE, and GROUP BY 2.x
P0 Filter and histogram should work with nested fields as specified in the legacy IT tests 2.x
P1 Fix bugs in opensearch-project#52 if they still exist 2.x

Issues Resolved

opensearch-project#1111

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Dec 16, 2022

Codecov Report

Merging #193 (da8a674) into draft-nested-spike (4b3cdbb) will decrease coverage by 2.11%.
The diff coverage is 32.07%.

@@                   Coverage Diff                    @@
##             draft-nested-spike     #193      +/-   ##
========================================================
- Coverage                 98.36%   96.26%   -2.11%     
- Complexity                 3643     3644       +1     
========================================================
  Files                       343      348       +5     
  Lines                      9017     9227     +210     
  Branches                    585      605      +20     
========================================================
+ Hits                       8870     8882      +12     
- Misses                      142      334     +192     
- Partials                      5       11       +6     
Flag Coverage Δ
sql-engine 96.26% <32.07%> (-2.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ain/java/org/opensearch/sql/analysis/Analyzer.java 98.74% <0.00%> (-1.26%) ⬇️
...c/main/java/org/opensearch/sql/expression/DSL.java 99.49% <0.00%> (-0.51%) ⬇️
...opensearch/sql/expression/ReferenceExpression.java 82.35% <0.00%> (-17.65%) ⬇️
...org/opensearch/sql/planner/DefaultImplementor.java 96.96% <0.00%> (-3.04%) ⬇️
.../opensearch/sql/planner/logical/LogicalNested.java 0.00% <0.00%> (ø)
...ch/sql/planner/logical/LogicalPlanNodeVisitor.java 95.65% <0.00%> (-4.35%) ⬇️
...ch/sql/planner/optimizer/LogicalPlanOptimizer.java 100.00% <ø> (ø)
.../sql/planner/physical/PhysicalPlanNodeVisitor.java 95.00% <0.00%> (-5.00%) ⬇️
...pensearch/sql/planner/physical/UnnestOperator.java 0.00% <0.00%> (ø)
.../opensearch/sql/storage/read/TableScanBuilder.java 90.00% <0.00%> (-10.00%) ⬇️
... and 23 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@GabeFernandez310
Copy link

Can you please add some example queries? I'm not entirely sure how the nested function behaves and how it's meant to be used.

@@ -149,6 +149,8 @@ public enum BuiltinFunctionName {
STDDEV_POP(FunctionName.of("stddev_pop")),
// take top documents from aggregation bucket.
TAKE(FunctionName.of("take")),
// TODO: nested aggregation function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we wanted to remove the TODO?

Suggested change
// TODO: nested aggregation function
//nested aggregation function

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this should remain as it is not implemented yet.

@@ -53,6 +53,7 @@ public class TestsConstants {
public final static String TEST_INDEX_BEER = TEST_INDEX + "_beer";
public final static String TEST_INDEX_NULL_MISSING = TEST_INDEX + "_null_missing";
public final static String TEST_INDEX_CALCS = TEST_INDEX + "_calcs";
public final static String TEST_INDEX_MULTI_NESTED= TEST_INDEX + "_multi_nested";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public final static String TEST_INDEX_MULTI_NESTED= TEST_INDEX + "_multi_nested";
public final static String TEST_INDEX_MULTI_NESTED = TEST_INDEX + "_multi_nested";

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be fixed in a non draft PR as there are many checkstyle failures with this PoC.

;

nestedFunction
: NESTED LR_BRACKET nestedField RR_BRACKET

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this support the following query?
SELECT nested(message.info) FROM multi_nested WHERE nested('message', message.info = 'a' OR message.info = 'a')

Specifically the second part
nested('message', message.info = 'a' OR message.info = 'a')

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. Nested in this PoC is only supported in SELECT. Nested in WHERE will be implemented in follow up tasks.

@forestmvey forestmvey force-pushed the dev-spike-nested branch 2 times, most recently from 87d836a to e5fdf25 Compare February 9, 2023 15:43
vamsi-amazon and others added 5 commits February 10, 2023 15:02
Sql Plugin integ tests require add files to keystore to add prometheus datasource. So adding a custom integ script which can be consumed by build repo while triggering integ tests during every release.

Signed-off-by: Vamsi Manohar <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: forestmvey <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants