Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-2163] Adding Iceberg TableMetadata Validator #4064

Merged
merged 13 commits into from
Oct 29, 2024

Conversation

Blazer-007
Copy link
Contributor

@Blazer-007 Blazer-007 commented Oct 3, 2024

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • [✔️] Here are some details about my PR, including screenshots (if applicable):
    • Added a class which will validate whether schema and partition spec between two tables metadata are similar or not and will throw IllegalArgumentException if there is any mismatch
    • This class will be used in IcebergPartitionDatasetFinder to validate whether we want to proceed with copying of partition data files between source and destination iceberg tables or not.

Tests

  • [✔️] My PR adds the following unit tests OR does not need testing for this extremely good reason:
    • IcebergTableMetadataValidatorTest

Commits

  • [✔️] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@Will-Lo
Copy link
Contributor

Will-Lo commented Oct 3, 2024

Will the integration with IcebergPartitionDatasetFinder be in a separate PR?

@Blazer-007
Copy link
Contributor Author

Will the integration with IcebergPartitionDatasetFinder be in a separate PR?

In this PR only after PR #4058 is merged first.

Copy link

@jainbhupendra24 jainbhupendra24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Looks good with small suggestions

@Blazer-007 Blazer-007 force-pushed the virai_metadata_validator branch from 9e02107 to 753dc94 Compare October 24, 2024 02:33
Comment on lines 98 to 112
@Test
public void testValidateSchemaWithEvolvedSchemaI() {
// TODO: This test should pass in the future when we support schema evolution
// Schema 3 has one more extra field as compared to Schema 1
verifyFailUnlessCompatibleStructureIOException(tableMetadataWithSchema1AndUnpartitionedSpec,
tableMetadataWithSchema3AndUnpartitionedSpec, SCHEMA_MISMATCH_EXCEPTION);
}

@Test
public void testValidateSchemaWithEvolvedSchemaII() {
// TODO: This test should pass in the future when we support schema evolution
// Schema 3 has one more extra field as compared to Schema 1
verifyFailUnlessCompatibleStructureIOException(tableMetadataWithSchema3AndUnpartitionedSpec,
tableMetadataWithSchema1AndUnpartitionedSpec, SCHEMA_MISMATCH_EXCEPTION);
}
Copy link
Contributor

@phet phet Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking across these two test cases, when we do later support schema evolution, do we really expect it to be a symmetrical operation (where both cases would pass)? I'd have thought to only support forward-compatibility, but not backward compat. i.e. S is compatible w/ S`, but S` is no longer compatible w/ S.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point, we can discuss about this in detail but totally agree with your point to only support forward-compatibility, but not backward compat.

Removing the comment from first test.

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work all around, vivek!

@phet phet merged commit b4e4d4a into apache:master Oct 29, 2024
6 checks passed
@Blazer-007
Copy link
Contributor Author

Very nice work all around, vivek!

Thanks for all help and suggestions along the way, Kip.

@Blazer-007 Blazer-007 deleted the virai_metadata_validator branch December 9, 2024 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants