Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Nested Function Support In SELECT Clause #1490

Merged

Conversation

forestmvey
Copy link
Collaborator

@forestmvey forestmvey commented Mar 30, 2023

Description

Syntax: nested( [field] | [field,path] )

Nested function allows the query of nested types of an index. The path parameter is determined dynamically if not provided by the user, and the output query should be identical assuming the user input the correct parameter values. The condition parameter is not supported when the nested function is used in the SELECT clause. Nested types structure is flattened making the full path of an object the key, and the object it refers to the value. Wildcard support will be added in a follow up PR along with Support for push down with relevance based search functions.

   Sample input=
   keys = ['comments.likes']
   row = comments: {
   likes: 2
   }
   output =
   flattenedRow = {comment.likes: 2}

Example Queries

Simple nested query, array of message.info is flattened into two rows.

SELECT nested(message.info) FROM nested_objects;

After flattening notice comment.data has repeating ab

SELECT nested(message.info), nested(comment.data) FROM nested_objects;

someField is of type object

SELECT nested(message.info), someField FROM nested_objects;

To Do

  • nested function nested in other function call (Not currently supported in legacy)

Changes from Legacy Functionality

  • Nested function does not support condition parameter in SELECT clause.
  • Updated partiql.rst to handle object arrays same as V2 engine.
  • Legacy implementation uses the FIELD as the column identifier. V2 engine uses function name.

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov-commenter
Copy link

codecov-commenter commented Mar 30, 2023

Codecov Report

Merging #1490 (fcb3558) into main (e805151) will decrease coverage by 1.31%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #1490      +/-   ##
============================================
- Coverage     98.49%   97.18%   -1.31%     
- Complexity     3928     4100     +172     
============================================
  Files           347      371      +24     
  Lines          9771    10448     +677     
  Branches        645      703      +58     
============================================
+ Hits           9624    10154     +530     
- Misses          142      287     +145     
- Partials          5        7       +2     
Flag Coverage Δ
sql-engine 97.18% <100.00%> (-1.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rg/opensearch/sql/analysis/ExpressionAnalyzer.java 100.00% <ø> (ø)
...ch/sql/planner/optimizer/LogicalPlanOptimizer.java 100.00% <ø> (ø)
...pensearch/sql/sql/parser/AstExpressionBuilder.java 100.00% <ø> (ø)
...ain/java/org/opensearch/sql/analysis/Analyzer.java 100.00% <100.00%> (ø)
...va/org/opensearch/sql/analysis/NestedAnalyzer.java 100.00% <100.00%> (ø)
...main/java/org/opensearch/sql/executor/Explain.java 100.00% <100.00%> (ø)
...c/main/java/org/opensearch/sql/expression/DSL.java 100.00% <100.00%> (ø)
...opensearch/sql/expression/ReferenceExpression.java 100.00% <100.00%> (ø)
...h/sql/expression/function/BuiltinFunctionName.java 100.00% <100.00%> (ø)
...h/sql/expression/function/OpenSearchFunctions.java 100.00% <100.00%> (ø)
... and 16 more

... and 27 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Comment on lines +363 to +375
for (UnresolvedExpression expr : node.getProjectList()) {
NestedAnalyzer nestedAnalyzer = new NestedAnalyzer(
namedExpressions, expressionAnalyzer, child
);
child = nestedAnalyzer.analyze(expr, context);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we generate a single LogicalNested here rather than get them merged later in MergeNestedAndNested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update to create the single LogicalNested here rather than merging further down query execution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm just thinking if we have all the project items and always merge multiple LogicalNested later, we may be able to save that optimizer rule.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the latest revision and wondering if the for loop is needed? Because we generate single nested operator with all namedExpressions right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the for loop is still needed. The namedExpressions are passed in each time for project push down, but we are fulfilling the LogicalNested nested fields by each nested Function in the projectList. Both namedExpressions and nested functions are required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using for-loop inside NesteAnalyzer, and change interface?

child = nestedAnalyzer.analyze(node.getProjectList(), context);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be my preference to keep the for-loop as is so we don't have to complicate adding additional arguments to an already formed LogicalPlan. If we move the for-loop into the NestedAnalyzer we will have to do some not-so-nice logic in NestedAnalyzer:Analyze to aggregate the arguments from the analyzed LogicalPlan's.
NestedAnalyzer.java#L36-L74
If you would prefer this implementation I can make the necessary revisions!

sql/src/main/antlr/OpenSearchSQLParser.g4 Outdated Show resolved Hide resolved
@dai-chen
Copy link
Collaborator

dai-chen commented Apr 4, 2023

Meanwhile could you double check if the logical and physical operator is generic and extensible enough:

  1. Later can we expose this capability to PPL? I see there are SPL commands for JSON, MultiValue and flatten too.
  2. Does the physical operator has anything specific to OpenSearch? Does it work for any Struct or Struct Array field no matter where it comes from? ex. Prometheus, CloudWatch etc.

@dai-chen dai-chen added the enhancement New feature or request label Apr 4, 2023
@forestmvey
Copy link
Collaborator Author

Meanwhile could you double check if the logical and physical operator is generic and extensible enough:

  1. Later can we expose this capability to PPL? I see there are SPL commands for JSON, MultiValue and flatten too.
  2. Does the physical operator has anything specific to OpenSearch? Does it work for any Struct or Struct Array field no matter where it comes from? ex. Prometheus, CloudWatch etc.

I think the physical operator should work well with exposing functionality in PPL. The functionality is generic and should be able to handle Struct and Arrays fields from other datasources. More investigation will be needed later but the code is generic that porting shouldn't be too difficult.

sql/src/main/antlr/OpenSearchSQLParser.g4 Outdated Show resolved Hide resolved
docs/user/dql/functions.rst Show resolved Hide resolved
Comment on lines +363 to +375
for (UnresolvedExpression expr : node.getProjectList()) {
NestedAnalyzer nestedAnalyzer = new NestedAnalyzer(
namedExpressions, expressionAnalyzer, child
);
child = nestedAnalyzer.analyze(expr, context);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the latest revision and wondering if the for loop is needed? Because we generate single nested operator with all namedExpressions right?

GumpacG
GumpacG previously approved these changes Apr 6, 2023
``nested(field | [field, path])``

The ``nested`` function maps to the ``nested`` query used in search engine. It returns nested field types in documents that match the provided specified field(s).
If the user does not provide the ``path`` parameter it will be generated dynamically. The ``field`` ``user.office.cubicle`` would dynamically generate the path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is user.office.cubicle? is it an example sub field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is just an example field.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, i feel it miss context info, could you add, for example, The field user.office.cubicle would dynamically generate the path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated the comment to be more descriptive.

sql/src/main/antlr/OpenSearchSQLParser.g4 Show resolved Hide resolved
Comment on lines +363 to +375
for (UnresolvedExpression expr : node.getProjectList()) {
NestedAnalyzer nestedAnalyzer = new NestedAnalyzer(
namedExpressions, expressionAnalyzer, child
);
child = nestedAnalyzer.analyze(expr, context);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using for-loop inside NesteAnalyzer, and change interface?

child = nestedAnalyzer.analyze(node.getProjectList(), context);

import org.junit.jupiter.api.Disabled;
import org.opensearch.sql.legacy.SQLIntegTestCase;

public class NestedIT extends SQLIntegTestCase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Are nested functions limited to being used only in the select clause, or can they also be used in the where or group by clause? what is the expecuted result if nested function been used in where / group by clause?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is only to support nested in the SELECT clause. Follow up PR's will be made for GROUP BY and WHERE clauses.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: rename to NestedInSelectIT

@forestmvey forestmvey dismissed stale reviews from GumpacG and MaxKsyunz via c04d962 April 6, 2023 19:16
@MaxKsyunz MaxKsyunz merged commit fbc72a4 into opensearch-project:main Apr 12, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1490-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fbc72a41ef66d33c4650171260ddf8c8702903ca
# Push it to GitHub
git push --set-upstream origin backport/backport-1490-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1490-to-2.x.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.7 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.7 2.7
# Navigate to the new working tree
cd .worktrees/backport-2.7
# Create a new branch
git switch --create backport/backport-1490-to-2.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fbc72a41ef66d33c4650171260ddf8c8702903ca
# Push it to GitHub
git push --set-upstream origin backport/backport-1490-to-2.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.7

Then, create a pull request where the base branch is 2.7 and the compare/head branch is backport/backport-1490-to-2.7.

forestmvey added a commit to Bit-Quill/opensearch-project-sql that referenced this pull request Apr 13, 2023
Users can now query nested fields in an index.

---------

Signed-off-by: forestmvey <[email protected]>
(cherry picked from commit fbc72a4)
forestmvey added a commit to Bit-Quill/opensearch-project-sql that referenced this pull request Apr 13, 2023
Users can now query nested fields in an index.

---------

Signed-off-by: forestmvey <[email protected]>
(cherry picked from commit fbc72a4)
Signed-off-by: forestmvey <[email protected]>
Yury-Fridlyand pushed a commit that referenced this pull request Apr 13, 2023
Users can now query nested fields in an index.

---------


(cherry picked from commit fbc72a4)

Signed-off-by: forestmvey <[email protected]>
Yury-Fridlyand pushed a commit that referenced this pull request Apr 13, 2023
Users can now query nested fields in an index.

---------

Signed-off-by: forestmvey <[email protected]>
(cherry picked from commit fbc72a4)
acarbonetto pushed a commit to Bit-Quill/opensearch-project-sql that referenced this pull request Apr 18, 2023
…roject#1490)

Users can now query nested fields in an index.

---------

Signed-off-by: forestmvey <[email protected]>
forestmvey added a commit to Bit-Quill/opensearch-project-sql that referenced this pull request May 23, 2023
Users can now query nested fields in an index.

---------

Signed-off-by: forestmvey <[email protected]>
(cherry picked from commit fbc72a4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants