substrait-io · scgkiran · Aug 14, 2024 · Aug 14, 2024 · Aug 15, 2024 · Aug 16, 2024
@@ -0,0 +1,138 @@
+# Substrait Test Format
+
+This document describes the format for Substrait scalar test files.
+A test file consists of the following elements:
+
+1. Version declaration
+2. Optional include statements
+3. One or more test groups, each containing one or more test cases
+
+## Syntax
+
+### Version Declaration
+The version declaration must be the first line of the file. It specifies the version of the test file format. The version declaration must be in the following format:
+```
+### SUBSTRAIT_SCALAR_TEST: V1
+```
+
+### Include Statements
+Include statements should have at least one include statement. The include statement specifies the path to substrait extension functions. The include statement must be in the following format:
+```
+### SUBSTRAIT_INCLUDE: /extensions/functions_aggregate_approx.yaml
+```
+
+### Test Groups
+A test group is a collection of test cases that are logically related.
+- **description**: A string describing the test group or case. The description must start with a `#` character.
+    ```code
+    # Common Maths
+    ```
+### Test Cases
+A test case consists of the following elements:
+
+- **function**: The name of the function being tested. The function name must be a string.
+- **arguments**: Comma-separated list of arguments to the function. The arguments must be literals.
+- **options**: Optional comma-separated list of options in `key:value` format. The options describe the behavior of the function. The test should be run only on dialects that support the options. If options are not specified, the test should be run for all permutations of the options.
+- **result**: The expected result of the function. Either `SUBSTRAIT_ERROR` or a literal value.
+- **literal**: In the format `<name>::<datatype>`
+- **description**: A string describing the test case
+
+    ```code
+    add(126::i8, 1::i8) = 127::i8  # addition of two numbers
+    ```
+
+### Spec
+
+```
+doc         := <version>
+               (<include>)+
+               ((<test_group>)?(<test_case>)+\n)+
+version     := ### SUBSTRAIT_SCALAR_TEST: <test_library_version>
+include     := ### SUBSTRAIT_INCLUDE: <uri>
+test_group  := # <description>
+test_case   := <function>(<arguments>) ([<options>])? = <result> (#<description>)?
+description := string
+function    := string
+arguments   := <argument>, <argument>, ... <argument>
+argument    := <literal>
+literal     := <name>::<datatype>
+result      := SUBSTRAIT_ERROR | <literal>
+options     := <optLiteral>, <optLiteral>, ... <optLiteral>
+optLiteral  := <option_name>:<option_value>
+```
+
+**TODO:** use ANTLR to describe the grammar and generate parser
+### Literals
+
+#### String
+- **string**, **fixedchar**, **varchar**: A sequence of characters enclosed in single quotes. Example: 'Hello, world!'
+
+#### Integer
+Integers are represented as sequences of digits. Negative numbers are preceded by a minus sign.
+- **i8**: 8-bit integer, range: -128 to 127
+- **i16**: 16-bit integer, range: -32,768 to 32,767
+- **i32**: 32-bit integer, range: -2,147,483,648 to 2,147,483,647
+- **i64**: 64-bit integer, range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
+
+#### Fixed Point decimals
+- **decimal**: Fixed-point decimal number. Maximum 38 digits total, with up to 37 digits after the decimal point.
+  Example: 123.456
+
+#### Floating Point numbers
+- **float**: General floating-point number, can be represented as:
+  * Standard decimal notation: 123.456
+  * Scientific notation: 1.23e4
+  * Special values: `nan` (Not a Number), `+inf` (Positive Infinity), `-inf` (Negative Infinity)
+- **float32**: Single-precision float, approximately 6 significant digits, range: ~1.2e-38 to ~3.4e38
+- **float64**: Double-precision float, approximately 15 significant digits, range: ~2.3e-308 to ~1.7e308
+
+#### Boolean
+- Valid values: TRUE, FALSE, NULL
+
+#### Date and Time
+All date and time literals use ISO 8601 format:
+
+- **date**: `YYYY-MM-DD`, example: `2021-01-01`
+- **time**: `HH:MM:SS[.fraction]`, example: `12:00:00.000`
+- **timestamp**: `YYYY-MM-DD HH:MM:SS[.fraction]`, example: `2021-01-01 12:00:00`
+- **timestamp_tz**: `YYYY-MM-DD HH:MM:SS[.fraction]±HH:MM`, example: `2021-01-01 12:00:00+05:30`
+- **interval year**: `INTERVAL 'P[n]Y[n]M'`, example: `INTERVAL 'P2Y3M'` (2 years, 3 months)
+- **interval days**: `INTERVAL 'P[n]DT[n]H[n]M[n]S'`, example: `INTERVAL 'P2DT3H2M9S'` (2 days, 3 hours, 2 minutes, 9 seconds)
+
+#### Other complex types
+**TODO** Add support for complex types like arrays, structs, maps etc.
+
+### Data Types
+
+- **bool**: Boolean
+- **i8**: 8-bit signed integer
+- **i16**: 16-bit signed integer
+- **i32**: 32-bit signed integer
+- **i64**: 64-bit signed integer
+- **f32**: 32-bit floating point number
+- **f64**: 64-bit floating point number
+- **dec**: Fixed-point `decimal<P,S>`
+- **str**: Variable-length string
+- **fchar**: Fixed-length string `fixedchar<N>`
+- **vchar**: Variable-length string `varchar<N>`
+- **vbin**: Fixed-length binary `fixedbinary<N>`
+- **date**: Date
+- **time**: Time
+- **ts**: Timestamp
+- **tstz**: Timestamp with timezone
+- **iyear**: Interval year
+- **iday**: Interval days
+
+### Example of a test file
+
+```code
+### SUBSTRAIT_SCALAR_TEST:V1
+### SUBSTRAIT_INCLUDE: /extensions/functions_arithmetic.yaml
+
+# Common Maths
+add(126::i8, 1::i8) = 127::i8
+
+# Arithmetic Overflow Tests
+add(127::i8, 1::i8) [overflow:ERROR] = <!ERROR>  #check overflow
+```
+The above test file has two test groups "Common Maths" and "Arithmetic Overflow Tests". Each has one test case. The test case in the second group has a name whereas case in the first one does not.
@@ -0,0 +1,31 @@
+### SUBSTRAIT_SCALAR_TEST: 1.0
+### SUBSTRAIT_INCLUDE: /extensions/functions_arithmetic.yaml
+
+# basic: Basic examples without any special cases
+add(120::i8, 5::i8) = 125::i8
+add(100::i16, 100::i16) = 200::i16
+add(30000::i32, 30000::i32) = 60000::i32
+add(2000000000::i64, 2000000000::i64) = 4000000000::i64
+
+# overflow: Examples demonstrating overflow behavior
+add(120::i8, 10::i8) [overflow:ERROR] = error
+add(30000::i16, 30000::i16) [overflow:ERROR] = error
+add(2000000000::i32, 2000000000::i32) [overflow:ERROR] = error
+add(9223372036854775807::i64, 1::i64) [overflow:ERROR] = error
+
+# overflow: Examples demonstrating overflow behavior tests: overflow with SATURATE
+add(120::i8, 10::i8) [overflow:SATURATE] = 127::i8
+add(-120::i8, -10::i8) [overflow:SATURATE] = -128::i8
+
+# overflow: Examples demonstrating overflow behavior tests: overflow with SILENT
+add(120::i8, 10::i8) [overflow:SILENT] = undefined
+
+# floating_exception: Examples demonstrating exceptional floating point cases
+add(1.5e+308::fp64, 1.5e+308::fp64) = inf::fp64
+add(-1.5e+308::fp64, -1.5e+308::fp64) = -inf::fp64
+
+# rounding: Examples demonstrating floating point rounding behavior
+add(4.5::fp32, 2.500001::fp32) [rounding:TIE_TO_EVEN] = 7.000001::fp32
+
+# types: Examples demonstrating behavior of different data types
+add(4.5::fp64, 2.5000007152557373::fp64) = 7.00000071525573::fp64
@@ -0,0 +1,21 @@
+### SUBSTRAIT_SCALAR_TEST: 1.0
+### SUBSTRAIT_INCLUDE: extensions/functions_arithmetic_decimal.yaml
+
+# basic: Basic examples without any special cases
+power(8::decimal, 2::decimal<38, 0>) = 64::fp64
+power(1.0::decimal, -1.0::decimal<38, 0>) = 1.0::fp64
+power(2.0::decimal<38, 0>, -2.0::decimal<38, 0>) = 0.25::fp64
+power(13::decimal<38, 0>, 10::decimal<38, 0>) = 137858491849::fp64
+
+# result_more_than_input_precison: Examples demonstrating result with more precision than input
+power(16::decimal<2, 0>, 4::decimal<38, 0>) = 65536::fp64
+
+# floating_exception: Examples demonstrating exceptional floating point cases
+power(1.5e+10::decimal<38, 0>, 1.5e+20::decimal<38, 0>) = inf::fp64
+power(-16::decimal<4, 0>, 1001::decimal<4, 0>) = -inf::fp64
+
+# complex_number: Examples demonstrating complex number output
+power(-1::decimal, 0.5::decimal<38,1>) [complex_number_result:NAN] = nan
+
+# complex_number: Examples demonstrating complex number output tests: complex_number_result with ERROR
+power(-1::decimal, 0.5::decimal<38,1>) [complex_number_result:ERROR] = error
@@ -0,0 +1,25 @@
+### SUBSTRAIT_SCALAR_TEST: 1.0
+### SUBSTRAIT_INCLUDE: /extensions/functions_datetime.yaml
+
+# timestamps: examples using the timestamp type
+lt(2016-12-31 13:30:15::timestamp, 2017-12-31 13:30:15::timestamp) = True::boolean
+lt(2018-12-31 13:30:15::timestamp, 2017-12-31 13:30:15::timestamp) = False::boolean
+
+# timestamp_tz: examples using the timestamp_tz type
+lt(1999-01-08 01:05:05-08:00::timestamp_tz, 1999-01-08 04:05:06-05:00::timestamp_tz) = True::boolean
+lt(1999-01-08 01:05:06-08:00::timestamp_tz, 1999-01-08 04:05:06-05:00::timestamp_tz) = False::boolean
+
+# date: examples using the date type
+lt(2020-12-30::date, 2020-12-31::date) = True::boolean
+lt(2020-12-31::date, 2020-12-30::date) = False::boolean
+
+# interval: examples using the interval type
+lt(INTERVAL 'P7D'::interval, INTERVAL 'P6D'::interval) = False::boolean
+lt(INTERVAL 'P5D'::interval, INTERVAL '6D'::interval) = True::boolean
+lt(INTERVAL 'P5Y'::interval, INTERVAL 'P6Y'::interval) = True::boolean
+lt(INTERVAL 'P7Y'::interval, INTERVAL 'P6Y'::interval) = False::boolean
+
+# null_input: examples with null args or return
+lt(None::interval, INTERVAL 'P5D'::interval) = Null::boolean
+lt(None::date, 2020-12-30::date) = Null::boolean
+lt(None::timestamp, 2018-12-31 13:30:15::timestamp) = Null::boolean