You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ideally should be built as a separate crate (e.g. datafusion-json / seafowl-datafusion-json) that defines DataFusion UDFs/UDAFs to manipulate JSON data, but if it isn't implementable as that, we can keep it in Seafowl or contribute as a patch to DataFusion.
Minimum useful set of functions
json_extract (SQLite-style) / json_extract_path (PG-style): get a value out of a JSON value
json_set: replace/add a value inside of a JSON value
json_group_array+json_group_object (SQLite-style) / json_agg+json_object_agg (PG-style): aggregation function that turns a table into a JSON object/array
json_object/json_array: create an inline JSON object/array from the function arguments
Investigate how to "tag" a JSON value using Arrow's type system
SQLite tries to get away with treating JSON values as strings, but sometimes we do need to distinguish between a string and a string that is actually a JSON object.
In the first case, we're setting the value of a to be the string {"b": 42}. In the second case, we're setting it to a JSON object. We might need to define an equivalent function / Arrow datatype (even if we ultimately store JSON values as strings in the Parquet file, we need to occasionally distinguish between strings and strings that are JSON values in our type system).
SQLite JSON functions like json_each/json_tree and PG JSON functions like json_each/json_array_elements (that return SETOF) actually return a table that we can then join on.
User-defined table functions are currently unsupported in DataFusion (apache/datafusion#1604 / apache/datafusion#1608): we could contribute support for them and then build JSON support on top of that, or implement it using DataFusion's custom logical plan / optimizer rules if possible.
The text was updated successfully, but these errors were encountered:
Support for querying / writing JSON values, similar to SQLite's support (https://www.sqlite.org/json1.html) and PG's support (https://www.postgresql.org/docs/14/functions-json.html).
Ideally should be built as a separate crate (e.g. datafusion-json / seafowl-datafusion-json) that defines DataFusion UDFs/UDAFs to manipulate JSON data, but if it isn't implementable as that, we can keep it in Seafowl or contribute as a patch to DataFusion.
Minimum useful set of functions
json_extract
(SQLite-style) /json_extract_path
(PG-style): get a value out of a JSON valuejson_set
: replace/add a value inside of a JSON valuejson_group_array
+json_group_object
(SQLite-style) /json_agg
+json_object_agg
(PG-style): aggregation function that turns a table into a JSON object/arrayjson_object
/json_array
: create an inline JSON object/array from the function argumentsInvestigate how to "tag" a JSON value using Arrow's type system
SQLite tries to get away with treating JSON values as strings, but sometimes we do need to distinguish between a string and a string that is actually a JSON object.
There's a difference between
and
In the first case, we're setting the value of
a
to be the string{"b": 42}
. In the second case, we're setting it to a JSON object. We might need to define an equivalent function / Arrow datatype (even if we ultimately store JSON values as strings in the Parquet file, we need to occasionally distinguish between strings and strings that are JSON values in our type system).We could use the Arrow struct type (https://github.com/apache/arrow-rs/blob/master/arrow-schema/src/datatype.rs#L169), though it requires the types of the values to be predefined. We could instead make the struct have a single field,
json: Utf8
, that Seafowl will know to treat as JSON.Some inspiration from Snowflake? https://docs.snowflake.com/en/sql-reference/data-types-semistructured.html
Table-valued functions (set-returning functions)
SQLite JSON functions like
json_each
/json_tree
and PG JSON functions likejson_each
/json_array_elements
(that returnSETOF
) actually return a table that we can then join on.User-defined table functions are currently unsupported in DataFusion (apache/datafusion#1604 / apache/datafusion#1608): we could contribute support for them and then build JSON support on top of that, or implement it using DataFusion's custom logical plan / optimizer rules if possible.
The text was updated successfully, but these errors were encountered: