From 15abb8af61e23482eb2518351fcd6af071cc0edb Mon Sep 17 00:00:00 2001
From: st1page <1245835950@qq.com>
Date: Tue, 1 Aug 2023 20:19:00 +0800
Subject: [PATCH 1/6] add Background

---
 rfcs/xxxx-lambda-expression | 155 ++++++++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)
 create mode 100644 rfcs/xxxx-lambda-expression

diff --git a/rfcs/xxxx-lambda-expression b/rfcs/xxxx-lambda-expression
new file mode 100644
index 00000000..4be3d8d1
--- /dev/null
+++ b/rfcs/xxxx-lambda-expression
@@ -0,0 +1,155 @@
+---
+feature: my_excited_feature
+authors:
+  - "st1page"
+start_date: "2023/08/01"
+---
+
+# Lambda expression
+
+## Background
+
+To better support user's semi-structure data, We have supported `Array` and `Struct` types compatible with PG. But in some user's requirement, We found we can not support them easily.
+
+### Induction
+
+```SQL
+CREATE TABLE t(
+  arr_i int[],
+  arr_json jsonb[],
+  k int primary key
+);
+```
+
+Consider we have the table definition and want to get the sum of the `arr_i` and sum of `arr_json`'s field `x`. In PG user can write the correlated subquery like this.
+
+```SQL
+dev=# explain (verbose) 
+    select *, 
+        (select sum(i) from unnest(arr_i) i) as sum_i, 
+        (select sum((i->'x')::int) from unnest(arr_json) i) as sum_x 
+    from t;
+                                        QUERY PLAN                                        
+------------------------------------------------------------------------------------------
+ Seq Scan on public.t  (cost=0.00..294.75 rows=850 width=84)
+   Output: t.arr_i, t.arr_json, t.k, (SubPlan 1), (SubPlan 2)
+   SubPlan 1
+     ->  Aggregate  (cost=0.13..0.14 rows=1 width=8)
+           Output: sum(i.i)
+           ->  Function Scan on pg_catalog.unnest i  (cost=0.00..0.10 rows=10 width=4)
+                 Output: i.i
+                 Function Call: unnest(t.arr_i)
+   SubPlan 2
+     ->  Aggregate  (cost=0.18..0.19 rows=1 width=8)
+           Output: sum(((i_1.i -> 'x'::text))::integer)
+           ->  Function Scan on pg_catalog.unnest i_1  (cost=0.00..0.10 rows=10 width=32)
+                 Output: i_1.i
+                 Function Call: unnest(t.arr_json)
+```
+
+The correlated column is in the table function `Unnest` and it is hard to do the sub-query decorrelation. Unfortunately, we do not implement `Apply` operator in streaming execution.
+Their are some equivalent query that I think could be decorrelated plan. But they all are complex and with high cost.
+
+```SQL
+  select * from t t1 
+    join (
+      select k, sum(i)
+      from (select k, unnest(arr_i) i from t) u group by k
+    ) t2 on t1.k = t2.k
+    join (
+      select k, sum((i->'x')::int) 
+      from (select k, unnest(arr_json) i from t) u group by k
+    ) t3 on t1.k = t3.k;
+  /* Or */
+  select distinct on(k)
+    k, arr_i, arr_json,
+    sum(i) over (partition by k),
+    sum((j->'x')::int) over (partition by k)
+  from ( select *, unnest(arr_i) i, unnest(arr_json) j from t) foo;
+```
+
+In investigation of other SQL systems, there are 3 levels' enhancement on this kind of query, this RFC will be concerned with the three levels.
+
+### level 1: array aggregate expression
+
+It is a kind of SQL **scalar expression** which accept a array type and return the array's aggregation result, such as `array_sum`, `array_max`. It can solve some part of the issue but can not handle the complex nested datatype in the array.
+
+```SQL
+select *, array_sum(arr_i) from t;
+```
+
+### level 2: scalar lambda expression
+
+Implement lambda expression function to make user can define there `transform` logic for array's each elements.
+It can be used with the array aggregate expression together.
+
+```SQL
+select *,
+  array_sum(arr_i),
+  array_sum(transform(arr_json, v -> (v->'x')::int))
+from t;
+```
+
+### level 3: aggregate lambda expression
+
+Based on the level 2, these system support the `reduce` to make user use lambda function implement the aggregation logic.
+
+```SQL
+select *,
+  reduce(arr_i, 0, (s, v) -> s + v),
+  reduce(arr_json, 0, (s, v) -> s + (v->'x')::int)),
+from t;
+```
+
+Btw, It often can be used to express UDAF too.
+
+## Investigation
+
+Simple array expressions
+
+| array_sum | array_max/min | array_sort | array_zip | array_zip |
+|  ----  | ----  |
+|   |  |
+|   |  |
+
+array expressions with lambda expression paramter
+
+### PrestoDB
+
+PrestoDB supports many array aggregate expressions <https://trino.io/docs/current/functions/array.html?highlight=array#array_max>. Also it implement both scalar and aggregation lambda expression on array
+
+- <https://prestodb.io/blog/2020/03/02/presto-lambda>
+
+And Trino support them too. <https://trino.io/docs/current/functions/array.html?highlight=array#reduce>
+
+### Starrocks
+
+### Doris
+
+### Databricks
+
+### ClickHouse
+
+<https://clickhouse.com/docs/en/sql-reference/functions#higher-order-functions---operator-and-lambdaparams-expr-function>
+
+## Motivation
+
+Why are you want to introduce the feature?
+
+## Design
+
+Explain the feature in detail.
+
+## Unresolved questions
+
+- Are there some questions that haven't been resolved in the RFC?
+- Can they be resolved in some future RFCs?
+- Move some meaningful comments to here.
+
+## Alternatives
+
+What other designs have been considered and what is the rationale for not choosing them?
+
+## Future possibilities
+
+Some potential extensions or optimizations can be done in the future based on the RFC.

From ebd4ed532136d9901ad0adfb06927b7280062359 Mon Sep 17 00:00:00 2001
From: st1page <1245835950@qq.com>
Date: Wed, 2 Aug 2023 03:57:21 +0800
Subject: [PATCH 2/6] investigation

---
 rfcs/xxxx-lambda-expression | 103 +++++++++++++++++++++++++-----------
 1 file changed, 71 insertions(+), 32 deletions(-)

diff --git a/rfcs/xxxx-lambda-expression b/rfcs/xxxx-lambda-expression
index 4be3d8d1..f982583e 100644
--- a/rfcs/xxxx-lambda-expression
+++ b/rfcs/xxxx-lambda-expression
@@ -105,40 +105,79 @@ Btw, It often can be used to express UDAF too.
 
 ## Investigation
 
-Simple array expressions
-
-| array_sum | array_max/min | array_sort | array_zip | array_zip |
-|  ----  | ----  |
-|   |  |
-|   |  |
-
-array expressions with lambda expression paramter
-
-### PrestoDB
-
-PrestoDB supports many array aggregate expressions <https://trino.io/docs/current/functions/array.html?highlight=array#array_max>. Also it implement both scalar and aggregation lambda expression on array
-
-- <https://prestodb.io/blog/2020/03/02/presto-lambda>
-
-And Trino support them too. <https://trino.io/docs/current/functions/array.html?highlight=array#reduce>
-
-### Starrocks
-
-### Doris
-
-### Databricks
-
-### ClickHouse
-
-<https://clickhouse.com/docs/en/sql-reference/functions#higher-order-functions---operator-and-lambdaparams-expr-function>
-
-## Motivation
-
-Why are you want to introduce the feature?
+ Array expressions support. (All of them supports the `array_size/cardinality`, `unnest/flatten/explode` which is not listed in the table.)
+
+|            | array_cnt | array_sum | array_cum_sum | array_avg |  array_max/min   | array_sort | array_distinct | array_zip | Unnest<br>with idx |
+| ---------- | :-------: | :-------: | :-----------: | :-------: | :--------------: | :--------: | :------------: | :-------: | :----------------: |
+| Databricks |           |           |               |           |        Y         |     Y      |       Y        |     Y     |         Y          |
+| Trino      |           |           |               |           |        Y         |     Y      |       Y        |     Y     |         Y          |
+| PrestoDB   |     Y     |     Y     |       Y       |     Y     |        Y         |     Y      |       Y        |     Y     |         Y          |
+| Starrocks  |     Y     |     Y     |       Y       |     Y     |        Y         |     Y      |       Y        |     Y     |                    |
+| Doris      |     Y     |     Y     |       Y       |     Y     |        Y         |     Y      |       Y        |     Y     |                    |
+| ClickHouse |     Y     |     Y     |       Y       |     Y     |        Y         |     Y      |       Y        |     Y     |                    |
+| DuckDB     |     Y     |     Y     |       Y       |     Y     |        Y         |     Y      |       Y        |           |                    |
+| Vertica    |     Y     |     Y     |               |     Y     |        Y         |            |                |           |         Y          |
+| FlinkSQL   |           |           |               |     Y     | only MAX, no MIN |            |                |           |                    |
+| Snowflake  |           |           |               |           |                  |            |       Y        |           |         Y          |
+
+
+
+Array expressions with lambda expression paramter
+
+|            | transform | transform<br>with idx | filter | filter<br>with idx | sort cmp | reduce | UDAF  | Lambda Limitations                                                                                                     |
+| ---------- | --------- | --------------------- | ------ | ------------------ | -------- | ------ | :---: | ---------------------------------------------------------------------------------------------------------------------- |
+| Databricks | Y         | Y                     | Y      | Y                  | Y        | Y      |   Y   | No subquery, No SQL UDF                                                                                                |
+| Trino      | Y         |                       | Y      |                    | Y        | Y      |   Y   | No subquery, No Aggregations                                                                                           |
+| PrestoDB   | Y         |                       | Y      |                    | Y        | Y      |   Y   | No subquery, No Aggregations                                                                                           |
+| Starrocks  | Y         |                       | Y      |                    | Y        |        |       |                                                                                                                        |
+| Doris      | Y         |                       | Y      |                    | Y        |        |       |                                                                                                                        |
+| ClickHouse | Y         | Y                     | Y      | Y                  | Y        |        |       |                                                                                                                        |
+| DuckDB     | Y         |                       | Y      |                    |          |        |       |                                                                                                                        |
+| Vertica    | Y         | Y                     | Y      | Y                  | Y        |        |       | arg name [limitations](https://docs.vertica.com/23.3.x/en/sql-reference/language-elements/lambda-functions/#arguments) |
+| FlinkSQL   |           |                       |        |                    |          |        |       |                                                                                                                        |
+| Snowflake  |           |                       |        |                    |          |        |       |                                                                                                                        |
+
+  - No capture in lambda function, I think No system support refer column or sub query in the lambda function
+  - Snowflake do not anything about it and fall back to the `unnest/flatten/explode` https://stackoverflow.com/questions/62807878/how-to-get-maximum-value-from-an-array-column-in-snowflake
+  - [to confirm] Accroding to the doc, FlinkSQL only support flatten the array in join clause https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#array-expansion
+  - There are some Syntactic sugar. 
+    - Trino/PrestoDB/Starrocks: support pass multiple arrays in `unnest/flatten/explode` so that the arrays will do zip first.
+    - ClickHousr/Starrocks: `array_sum(lambda_function, arr1,arr2...) = array_sum(array_map(lambda_function, arr1,arr2...))`
+  - ClickHouse support any aggregation function on array directly, e.g. `SELECT arrayReduce('max', [1, 2, 3]);`
+  
+### Refers
+  - Databricks
+    - https://docs.databricks.com/sql/language-manual/sql-ref-lambda-functions.html
+    - https://docs.databricks.com/sql/language-manual/sql-ref-functions-builtin-alpha.html
+  - Trino
+    - https://trino.io/docs/current/functions/lambda.html
+    - https://trino.io/docs/current/functions/array.html?highlight=array#reduce
+    - https://trino.io/docs/current/functions/aggregate.html?highlight=lambda+expressions#lambda-aggregate-functions
+    - https://trino.io/docs/current/functions/array.html?highlight=array#array_max
+  - PrestoDB
+    - https://prestodb.io/blog/2020/03/02/presto-lambda
+    - https://prestodb.io/docs/current/functions/lambda.html
+    - https://prestodb.io/docs/current/functions/aggregate.html#reduce_agg
+  - Starrocks
+    - https://docs.starrocks.io/en-us/3.0/sql-reference/sql-functions/Lambda_expression
+  - Doris
+    - https://doris.apache.org/docs/dev/sql-manual/sql-functions/array-functions/array_max
+    - https://doris.apache.org/search?q=lambda
+  - ClickHouse
+    - https://clickhouse.com/docs/en/sql-reference/functions/array-functions
+    - https://clickhouse.com/docs/en/sql-reference/functions#higher-order-functions---operator-and-lambdaparams-expr-function
+  - DuckDB
+    - https://duckdb.org/docs/sql/functions/nested#list-aggregates
+    - https://duckdb.org/docs/sql/functions/nested#lambda-functions
+  - Vertica
+    - https://docs.vertica.com/23.3.x/en/sql-reference/language-elements/lambda-functions/
+  - FlinkSQL
+    - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/
 
 ## Design
-
-Explain the feature in detail.
+[TODO]
+We can support missing array function's at first and they has been implemented for aggregators so we can reuse the code.
+For the lambda expression, it is like a anonymous SQL UDF. We can implement the SQL UDF first and design lambda expression later.
 
 ## Unresolved questions
 

From bb05310ba9587178814a3178a37839c4c4b1c49e Mon Sep 17 00:00:00 2001
From: st1page <1245835950@qq.com>
Date: Wed, 2 Aug 2023 03:57:49 +0800
Subject: [PATCH 3/6] fin

---
 rfcs/xxxx-lambda-expression | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/rfcs/xxxx-lambda-expression b/rfcs/xxxx-lambda-expression
index f982583e..d934a02a 100644
--- a/rfcs/xxxx-lambda-expression
+++ b/rfcs/xxxx-lambda-expression
@@ -180,15 +180,16 @@ We can support missing array function's at first and they has been implemented f
 For the lambda expression, it is like a anonymous SQL UDF. We can implement the SQL UDF first and design lambda expression later.
 
 ## Unresolved questions
+[TODO]
 
 - Are there some questions that haven't been resolved in the RFC?
 - Can they be resolved in some future RFCs?
 - Move some meaningful comments to here.
 
 ## Alternatives
-
+[TODO]
 What other designs have been considered and what is the rationale for not choosing them?
 
 ## Future possibilities
-
+[TODO]
 Some potential extensions or optimizations can be done in the future based on the RFC.

From db68201ab6e0556f91acf800b6dd6c61c074057b Mon Sep 17 00:00:00 2001
From: st1page <1245835950@qq.com>
Date: Wed, 2 Aug 2023 12:00:03 +0800
Subject: [PATCH 4/6] rename

---
 rfcs/{xxxx-lambda-expression => 0069-lambda-expression.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename rfcs/{xxxx-lambda-expression => 0069-lambda-expression.md} (100%)

diff --git a/rfcs/xxxx-lambda-expression b/rfcs/0069-lambda-expression.md
similarity index 100%
rename from rfcs/xxxx-lambda-expression
rename to rfcs/0069-lambda-expression.md

From 0084b771e3e41e77b8ecd6ca0fe7c1a91e43190f Mon Sep 17 00:00:00 2001
From: stonepage <40830455+st1page@users.noreply.github.com>
Date: Tue, 8 Aug 2023 14:50:40 +0800
Subject: [PATCH 5/6] Update rfcs/0069-lambda-expression.md

Co-authored-by: TennyZhuang <zty0826@gmail.com>
---
 rfcs/0069-lambda-expression.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0069-lambda-expression.md b/rfcs/0069-lambda-expression.md
index d934a02a..42eb2acc 100644
--- a/rfcs/0069-lambda-expression.md
+++ b/rfcs/0069-lambda-expression.md
@@ -1,5 +1,5 @@
 ---
-feature: my_excited_feature
+feature: lambda_expression
 authors:
   - "st1page"
 start_date: "2023/08/01"

From 0c39fa02a3b591011da5a65b6d3ec1844312569f Mon Sep 17 00:00:00 2001
From: Eric Fu <eric@singularity-data.com>
Date: Fri, 11 Aug 2023 15:51:18 +0800
Subject: [PATCH 6/6] fix typo

Co-authored-by: CAJan93 <jan@singularity-data.com>
---
 rfcs/0069-lambda-expression.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0069-lambda-expression.md b/rfcs/0069-lambda-expression.md
index 42eb2acc..aa25a9a1 100644
--- a/rfcs/0069-lambda-expression.md
+++ b/rfcs/0069-lambda-expression.md
@@ -11,7 +11,7 @@ start_date: "2023/08/01"
 
 To better support user's semi-structure data, We have supported `Array` and `Struct` types compatible with PG. But in some user's requirement, We found we can not support them easily.
 
-### Induction
+### Introduction
 
 ```SQL
 CREATE TABLE t(