-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Array Support POC #282
base: integ-array-support-poc
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## integ-array-support-poc #282 +/- ##
=============================================================
- Coverage 99.98% 99.30% -0.69%
- Complexity 2624 2637 +13
=============================================================
Files 205 206 +1
Lines 5955 6025 +70
Branches 378 392 +14
=============================================================
+ Hits 5954 5983 +29
- Misses 1 38 +37
- Partials 0 4 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Regarding future support for features like geo_shape, consider cases like:
Note: this example may be an edge case and may not represent typical real world use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.
Please, add more tests:
- Array of objects
- Array in object
- Array in nested
Consider different tricky combinations, like this for the first test:
[
{
"id" : 1,
"name" : "one"
},
{
"id" : 2,
"name" : "two"
},
null
]
2.
What if I do?
select int0[0] from calcs
Please, add a test for this.
3.
Please add a design doc.
4.
Please, fix column headers/names.
@@ -254,7 +254,7 @@ public void nested_function_and_field_with_order_by_clause() { | |||
rows("a", 4), | |||
rows("b", 2), | |||
rows("c", 3), | |||
rows("zz", new JSONArray(List.of(3, 4)))); | |||
rows("zz", 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't a breaking change? Please, compare this with nested
behavior in V1 and V2 @ 2.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is V1 behaviour, but my updates make this consistent with V2 behaviour. I believe this should be considered a bug and not a breaking change because if we were to do a select on this value without the nested function, a 3
would be returned. Very weird case here but this new change aligns with whats expected with V2.
@@ -814,6 +815,11 @@ columnName | |||
: qualifiedName | |||
; | |||
|
|||
arrayColumnName | |||
: qualifiedName LT_SQR_PRTHS COLON_SYMB RT_SQR_PRTHS | |||
| qualifiedName LT_SQR_PRTHS decimalLiteral RT_SQR_PRTHS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could decimalLiteral
be negative?
@Override | ||
public UnresolvedExpression visitArrayColumnName(ArrayColumnNameContext ctx) { | ||
UnresolvedExpression qualifiedName = visit(ctx.qualifiedName()); | ||
if (ctx.decimalLiteral() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make decimalLiteral
named value in ANTLR grammar, so you can reference the name here.
@@ -105,8 +107,17 @@ public List<NamedExpression> visitAllFields(AllFields node, | |||
AnalysisContext context) { | |||
TypeEnvironment environment = context.peek(); | |||
Map<String, ExprType> lookupAllFields = environment.lookupAllFields(Namespace.FIELD_NAME); | |||
return lookupAllFields.entrySet().stream().map(entry -> DSL.named(entry.getKey(), | |||
new ReferenceExpression(entry.getKey(), entry.getValue()))).collect(Collectors.toList()); | |||
return lookupAllFields.entrySet().stream().map(entry -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an example in the PR description for SELECT *
? Will this return the full array for a column when SELECT *
is queried? Since SELECT array
still returns just the first value, is this still the case for SELECT *
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the functionality for SELECT *
will stay the same, otherwise would be a breaking change.
@@ -107,6 +107,6 @@ private static boolean isQuoted(String text, String mark) { | |||
} | |||
|
|||
public static String removeParenthesis(String qualifier) { | |||
return qualifier.replaceAll("\\[.+\\]", ""); | |||
return qualifier.replaceAll("\\[\\d+\\]", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a regex? Do you want to match :
or \d+:\d+
(or even \d*:\d*
)?
Add a comment for this function please.
Signed-off-by: forestmvey <[email protected]>
Signed-off-by: forestmvey <[email protected]>
Signed-off-by: forestmvey <[email protected]>
Signed-off-by: forestmvey <[email protected]>
Signed-off-by: forestmvey <[email protected]>
f2df802
to
41d7bf5
Compare
Description
Add support for indexing arrays using square parenthesis. This implementation gives users the responsibility on how to handle their data. If array indexing parenthesis are supplied then the plugin assumes an array is to be found with supplied field name. In the event of non-array values or invalid indexes used by the user, an empty value will be returned.
The goal of adding the syntax option for returning and indexing arrays is to provide users the ability to utilize arrays without the need for introducing a breaking change with the SQL plugin. By offloading the responsibility for how to handle stored array data we can avoid the need for additional data mapping.
Syntax
array[:]
array[index]<[index]>
Examples
Dataset:
Query:
Result:
1
Query:
Result:
[[1], [2, [3, 4]], 5]
Query:
Result:
[1]
Query:
Result:
[2, [3, 4]]
Query:
Result:
[3, 4]
Query:
Result:
3
Query:
Result:
5
Query:
Result:
null
Object Arrays
The V2 engine has the current functionality for object arrays.
Dataset:
Query:
Result
{"id": 1}
Query:
Result
[{"id": [1, 2]}, {"id": 2}]
Added Functionality
Query:
Result
[{"id": [1, 2]}, {"id": 2}]
Query:
Result
{"id": 2}
Query:
Result
[1, 2]
Query:
Result
2
Edge Cases
Additional Functionality
array[<firstIndex>:<lastIndex>]
Limitations
Array support is limited to the support that OpenSearch has for arrays. Arrays are not mapped as array type but the type defined within the arrays. As well arrays are stored as a single document and will return entire arrays if a portion of the array matches a filter. For example an array consisting of
[1, 2]
would return the whole array when filtering for values of1
or2
.Clause Support
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.