feat: add support for multiple partition columns and filters in to_pyarrow_dataset() and OR filters in write_datalake() #1722

… to_pyarrow_dataset() - Partitions with multiple columns can be passed as lists of tuples in DNF format - Multiple partition filters can be passed

- Add tests for various filter/partition scenarios which can be passted to to_pyarrow_dataset()

Ref delta-io#1479

- Tests partition filters based on AND and OR conditions using a single and multiple partition columns

- validate_filters ensures partitions and filters are in DNF format (list of tuples, list of lists of tuples) and checks for empty lists - stringify_partition_values ensures values are converted from dates, ints, etc to string for partition columns

- Use pyarrow.parquet filters_to_expression instead of the custom implementation - Move __stringify_partition_values to _util to be able to test more easily - Move partition validation to validate_filters function - Move fragment building to separate method

…validate_filters

- validated_filters is guaranteed to be a list of list of tuples

- Shows that the output will still be a list of lists of tuples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for multiple partition columns and filters in to_pyarrow_dataset() and OR filters in write_datalake() #1722

feat: add support for multiple partition columns and filters in to_pyarrow_dataset() and OR filters in write_datalake() #1722

Commits on Oct 13, 2023

Commits on Oct 14, 2023

Commits on Oct 17, 2023

feat: add support for multiple partition columns and filters in to_pyarrow_dataset() and OR filters in write_datalake() #1722

Are you sure you want to change the base?

feat: add support for multiple partition columns and filters in to_pyarrow_dataset() and OR filters in write_datalake() #1722

Commits on Oct 13, 2023

Commits on Oct 14, 2023

Commits on Oct 17, 2023