-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for multiple partition columns and filters in to_pyarrow_dataset() and OR filters in write_datalake() #1722
base: main
Are you sure you want to change the base?
Commits on Oct 13, 2023
-
Add support for multiple partition columns and multiple partitions in…
… to_pyarrow_dataset() - Partitions with multiple columns can be passed as lists of tuples in DNF format - Multiple partition filters can be passed
Configuration menu - View commit details
-
Copy full SHA for 4c0551a - Browse repository at this point
Copy the full SHA 4c0551aView commit details -
Add test_pyarrow_dataset_partitions pytest
- Add tests for various filter/partition scenarios which can be passted to to_pyarrow_dataset()
Configuration menu - View commit details
-
Copy full SHA for b128bc3 - Browse repository at this point
Copy the full SHA b128bc3View commit details
Commits on Oct 14, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 133f246 - Browse repository at this point
Copy the full SHA 133f246View commit details -
Add test_overwriting_multiple_partitions pytest
- Tests partition filters based on AND and OR conditions using a single and multiple partition columns
Configuration menu - View commit details
-
Copy full SHA for 3b53dc4 - Browse repository at this point
Copy the full SHA 3b53dc4View commit details -
Configuration menu - View commit details
-
Copy full SHA for f42494c - Browse repository at this point
Copy the full SHA f42494cView commit details
Commits on Oct 17, 2023
-
Add validate_filters and stringify_partition_values to _util.py
- validate_filters ensures partitions and filters are in DNF format (list of tuples, list of lists of tuples) and checks for empty lists - stringify_partition_values ensures values are converted from dates, ints, etc to string for partition columns
Configuration menu - View commit details
-
Copy full SHA for 4d390d5 - Browse repository at this point
Copy the full SHA 4d390d5View commit details -
Refactor dataset expressions and fragment building DeltaTable
- Use pyarrow.parquet filters_to_expression instead of the custom implementation - Move __stringify_partition_values to _util to be able to test more easily - Move partition validation to validate_filters function - Move fragment building to separate method
Configuration menu - View commit details
-
Copy full SHA for 99e2041 - Browse repository at this point
Copy the full SHA 99e2041View commit details -
Add tests for filters_to_expression, stringify_partition_values, and …
…validate_filters
Configuration menu - View commit details
-
Copy full SHA for 87e397f - Browse repository at this point
Copy the full SHA 87e397fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 32224d4 - Browse repository at this point
Copy the full SHA 32224d4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1b52925 - Browse repository at this point
Copy the full SHA 1b52925View commit details -
Update types and add validated_filters variable
- validated_filters is guaranteed to be a list of list of tuples
Configuration menu - View commit details
-
Copy full SHA for e87a1ec - Browse repository at this point
Copy the full SHA e87a1ecView commit details -
Configuration menu - View commit details
-
Copy full SHA for 87ce8e1 - Browse repository at this point
Copy the full SHA 87ce8e1View commit details -
- Shows that the output will still be a list of lists of tuples
Configuration menu - View commit details
-
Copy full SHA for 0b8b6bb - Browse repository at this point
Copy the full SHA 0b8b6bbView commit details