Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: always include the Invariants writerFeature when upgrading to v7 #2957

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Oct 23, 2024

The presence of a timestampntz column in CreateBuilder results in the reader/writer being set to 3/7. The protocol reaquires that if the table is writer v7, that invariants "must exist in the table protocl's writerFeatures.

[DELTA_FEATURES_PROTOCOL_METADATA_MISMATCH] Unable to operate on this table because the following table features are enabled in metadata but not listed in protocol: invariants. SQLSTATE: 0AKDE

Sidenote: Yes I know that we cannot actually support invariant utilization in delta-rs, but creating a table with timestampntz will (rightfully) be invalid and unreadable by Databricks and other compliant Delta readers:

Sponsored-by: Scribd Inc

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Oct 23, 2024
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 84.09091% with 7 lines in your changes missing coverage. Please review.

Project coverage is 72.37%. Comparing base (c05931a) to head (a84692a).

Files with missing lines Patch % Lines
crates/core/src/kernel/models/actions.rs 0.00% 4 Missing ⚠️
crates/core/src/operations/create.rs 92.50% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2957      +/-   ##
==========================================
+ Coverage   72.35%   72.37%   +0.02%     
==========================================
  Files         131      131              
  Lines       40602    40644      +42     
  Branches    40602    40644      +42     
==========================================
+ Hits        29376    29416      +40     
- Misses       9336     9339       +3     
+ Partials     1890     1889       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The presence of a timestampntz column in CreateBuilder results in the
reader/writer being set to 3/7. The protocol reaquires that if the table
is writer v7, that `invariants` "must exist in the table protocl's
writerFeatures.

Sponsored-by: Scribd Inc
Signed-off-by: R. Tyler Croy <[email protected]>
@github-actions github-actions bot added the binding/python Issues for the Python package label Oct 23, 2024
@rtyler rtyler marked this pull request as ready for review October 23, 2024 01:44
@rtyler rtyler enabled auto-merge October 23, 2024 01:45
@ion-elgreco
Copy link
Collaborator

@rtyler to my understanding it's not required, but it's assumed when a table has NOT NULL columns

@ion-elgreco
Copy link
Collaborator

See discussion here: #2882 (comment)

@rtyler
Copy link
Member Author

rtyler commented Oct 23, 2024

I added this because of the error and this statement in the protocol

If the table is on Writer Version 7, the feature invariants must exist in the table protocol's writerFeatures.

@rtyler
Copy link
Member Author

rtyler commented Oct 23, 2024

@ion-elgreco I should also mention that this came from a table created by delta-rs and does not contain any actual invariants. I believe newer DBRs and Delta/Spark releases are more strictly adhering to the protocol and expect invariants to be defined on writerVersion 7

The phrasing of the protocol is lousy here. append only, check constraints, and a few other features have similar "to support this feature it must be in the writerFeatures verbiage. But as best as I can tell Delta/Spark is only being pedantic about invariants 😕

(venv) plantain% pwd
/home/tyler/source/github/delta-io/delta-rs/unoptimized-table/table/_delta_log
(venv) plantain% rg "invariant"
(venv) plantain%

@ion-elgreco
Copy link
Collaborator

I added this because of the error and this statement in the protocol

If the table is on Writer Version 7, the feature invariants must exist in the table protocol's writerFeatures.

It's a bit poorly written, but the sentence starts with "To support this feature". I also misread it before and assumed it was required but it's just to support it in v7 we need to add the feature.

Did you check when you create a v3,7 table with nullable columns in spark what the protocol log states?

@rtyler
Copy link
Member Author

rtyler commented Oct 23, 2024

Okay so I did some testing with PySpark and I am going to pull this into draft again to modify the changes a bit. Essentially if a non-nullable column exists and things are upgraded to writer7, the invariants writer feature is added, but no invariants are set 😕

This behavior is not specified anywhere in the protocol which is fun 💀

DeltaTable.create(spark).tableName('nonulliz').addColumn('ts', 'TIMESTAMP_NTZ', nullable=False).location('./nonulliz').execute()
{
  "commitInfo": {
    "timestamp": 1729706895482,
    "operation": "CREATE TABLE",
    "operationParameters": {
      "partitionBy": "[]",
      "clusterBy": "[]",
      "description": null,
      "isManaged": "false",
      "properties": "{}"
    },
    "isolationLevel": "Serializable",
    "isBlindAppend": true,
    "operationMetrics": {},
    "engineInfo": "Apache-Spark/3.5.3 Delta-Lake/3.2.0",
    "txnId": "1e0e57c1-d1a2-41d9-9402-53516b8f4c77"
  }
}
{
  "metaData": {
    "id": "721d696f-8396-4a69-851c-23702b8b61bb",
    "format": {
      "provider": "parquet",
      "options": {}
    },
    "schemaString": "{\"type\":\"struct\",\"fields\":[{\"name\":\"ts\",\"type\":\"timestamp_ntz\",\"nullable\":false,\"metadata\":{}}]}",
    "partitionColumns": [],
    "configuration": {},
    "createdTime": 1729706895473
  }
}
{
  "protocol": {
    "minReaderVersion": 3,
    "minWriterVersion": 7,
    "readerFeatures": [
      "timestampNtz"
    ],
    "writerFeatures": [
      "timestampNtz",
      "invariants"
    ]
  }
}

@rtyler rtyler marked this pull request as draft October 23, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants