Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] P0 Nested Function Support #1111

Closed
forestmvey opened this issue Nov 25, 2022 · 5 comments
Closed

[FEATURE] P0 Nested Function Support #1111

forestmvey opened this issue Nov 25, 2022 · 5 comments

Comments

@forestmvey
Copy link
Collaborator

forestmvey commented Nov 25, 2022

Is your feature request related to a problem?
The nested query is supported in the legacy engine and not the V2 engine. As Legacy engine is becoming deprecated we need to support this functionality in the V2 engine.

Example:

SELECT nested(message.info) FROM nested_objects;
SELECT * FROM nested_objects WHERE nested(message.info)='b';
SELECT nested(message.info) FROM nested_objects ORDER BY nested(message.info);
SELECT nested(message.*) FROM nested_objects;

What solution would you like?
Port over functionality for the nested query to the V2 engine with the same functionality as in Legacy.

Release-Schedule

  • P[0] SELECT clause support
    • P[0] DOT notation field support - Released: 1490
    • P[0] * support as inner field, Example: nestedField.innerFields.* - Released: 1773
  • P[0] WHERE clause support; one syntax will be prioritized.
    • P[0] Syntax: nested(field | field, path) OPERATOR LITERAL - Released: 1657
  • P[0] ORDER BY clause support - Released: 1789

What alternatives have you considered?
N/A

Do you have any additional context?
N/A

@MaxKsyunz
Copy link
Collaborator

MaxKsyunz commented Mar 21, 2023

Update 2023-04-25: Discussed this with Forest offline.


@dai-chen I'd like to follow-up on this comment. As I understand it these two queries are to be equivalent

SELECT nested(message.info), name FROM table
SELECT m.info, t.name FROM table AS t, t.message AS m.

In particular, for table like

message name
{info : 1 } nameA
[{info : 2}, {info : 3}] nameB

the result will be

message.info name
1 nameA
2 nameB
3 nameB

What's the use case for the cartesian product behaviour? As an end-user, I'd expect SELECT <expression> FROM table to return SELECT COUNT(*) FROM table number of results. For UIs, complex relationships usually shown in a parent-child arrangement, so I'd expect to have deal with complex properties like message.info.

@forestmvey
Copy link
Collaborator Author

forestmvey commented Apr 25, 2023

Demo video for nested function use in SELECT clause.

nested_select_clause_demo.mp4

@GumpacG
Copy link
Collaborator

GumpacG commented Apr 27, 2023

Hi @dai-chen and @penghuo, nested(field.*) needs to be supported and some grammar changes are required. Below are options that were most reasonable out of many that were explored.

A new rule is proposed in order to add a new node that is similar to AllFields and will be used in SymbolTable to find all fields with the same path as the argument (e.g. field.* will include field.subfield1, field.subfield2,...). The proposed rule is to have an allTupleFields which can be used for both object and nested types when we begin to support SELECT field.*.

allTupleFields
    : ident DOT STAR
    ;

Options for grammar changes:

  1. allTupleFields in expressionAtom
  2. allTupleFields in functionCall
  3. allTupleFields in functionArg

Option details:

Grammar changes Pros Cons
1. allTupleFields in expressionAtom
expressionAtom
     : allTupleFields             #allTupleFieldsExpressionAtom
  • adds support for partiQL syntax
  • less/no refactoring when adding support for partiQL and is used in the following clauses:
    • SELECT
    • WHERE
    • GROUP BY
    • ORDER BY
    • HAVING
  • FROM clause is not affected
  • Supports:
    • SELECT message.* FROM nested
    • SELECT nested(message.*) FROM nested
May introduce errors or worse, invalid results like:
opensearchsql> SELECT * FROM nested_table GROUP BY message.*;
fetched rows / total rows = 0/0
+---------------------+----------------+------------------+
| message.dayOfWeek   | message.info   | message.author   |
|---------------------+----------------+------------------|
+---------------------+----------------+------------------+
opensearchsql> SELECT * FROM nested_table GROUP BY myNum.*;
fetched rows / total rows = 0/0
2. Specific case only for nestedAllFunctionCall in functionCall
functionCall
    : nestedFunctionName LR_BRACKET allTupleFields RR_BRACKET      #nestedAllFunctionCall
  • Isolated implementation for star within nested function which shouldn’t affect other functions
  • Supports
    SELECT nested(message.*) FROM nested
  • Doesn’t support PartiQL which may lead to some refactoring when implementing PartiQL support.
3. allTupleFields in functionArg
functionArg
    : expression
    | allTupleFields
  • Only affects functions
  • Supports
    SELECT nested(message.*) FROM nested
  • May introduce errors based on user errors for queries like
    SELECT abs(int0.*) FROM calcs

Additional Context:

Option 1:
This change has the same effect as adding allTupleFields to columnName as it is only used in the expressionAtom. PartiQL implementation can be started in order for this option to reduce the risks. Implementing PartiQL could take a great amount of effort as it depends on updating indexScans to support table aliases.

Option 2:
This option seems the safest in the meantime while PartiQL support is yet to start. Value is added in the short term while refactoring if needed in the future is minimal. This may not need to be refactor, if it does then it is just a matter of removing it from the grammar.

Option 3:
This option has the risk of option 1 without its benefits of avoiding PartiQL refactoring in the future.

@GumpacG
Copy link
Collaborator

GumpacG commented Apr 27, 2023

Discussed offline and agreed on option 2.
cc: @penghuo, @dai-chen

@acarbonetto
Copy link
Collaborator

Phase 2 support: #1806

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants