-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set a UUID when building a Schema object. #32399
Conversation
ba86cbd
to
ac58efe
Compare
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
083dac1
to
bfbb621
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this PR contains several indepenent fixes, I'm good with most of them, but not sure about the oneOfType one, suggested to involve other reviewer who might now
} | ||
|
||
@Override | ||
public byte[] getArgument() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a breaking change. However I am not sure the implication, good to involve others reviewer, possibly @robertwb ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous behavior seems incorrect as proto serialization is not necessarily deterministic and the serialized proto equality doesn't match schema equality logic.
But I think that you are correct that this would break schema update compatability because LogicalType base class appears to be serializable.
@robertwb Any ideas on how to best proceed? We could
- keep OneOf as-is and just document that it's equality isn't guaranteed to be correct due to proto serialization and clear the uuid before serializing the proto.
- add a new type with correct semantics and deprecate this one
- try to integrate w/ update compatibility version but that seems difficult given schemas are created all over. We coudl have some static method we call that modifies static state instead of plumbing the option everywhere. getArgument base is to return an Object so we can have single class support both types based upon the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.
First, let's separate this change out into its own PR, so we can get the rest (which looks good) in.
I'm worried about what existing pipelines might break if we change this. Does a (null?) Row eventually get serialized with its schema as proto bytes anyway? We could look into explicitly using deterministic proto serialization here as an improvement at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some change of OneOf is needed because it's schema now has a uuid set and thus doesn't compare equal.
However I changed to keep using serialized proto but to recursively remove the UUID from the schema proto before serializing.
PTAL
Schemas are immutable so this meets the guarantee that a consistent UUID is used for matching schemas. Cleanup some cases setting a random UUID after creating Schema. Fix case where same UUID was assigned to different Schema after sorting. Use Immutable data structures to enforce immutability. Update OneOfType which using serialized proto equality which was incorrect if there was uuid, or encoding positions. Instead we can use a null Row using the generated oneof schema.
6633f19
to
a6af989
Compare
Errors appear unrelated flakiness, retriggering but ready for a look |
Run Java PreCommit |
Run Java_IOs_Direct PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test failure testMultipleClientsFailingIsHandledGracefullyByServer
know issue: #30056, unrelated.
Schemas are immutable so this meets the guarantee that a consistent UUID is used for matching schemas. Cleanup some cases setting a random UUID after creating Schema, fix case where same UUID was assigned to different Schema after sorting, and use Immutable data structures to enforce immutability.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.