-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-2054: [C++] fix return wrong result if lack of hasnull #2055
base: main
Are you sure you want to change the base?
Conversation
Thanks for the fix! It makes sense to me. Please make the CI happy. BTW, has Trino fixed this issue on its writer side? |
The ci is fixed, it seems need approval to rerun. We are trying to fix it on Trino. |
Please fix the format check. Thanks! |
Done, CI passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
What changes were proposed in this pull request?
This pr fix the bug that if the column statistics in a orc file is not fully written, and lack of hasnull field, user may get a wrong result using c++ to read it.
For example, a file struct<string col1, string col2>, has 10 lines, col1 all has value, col2 all is null. the column 1's stat written by trino may be
numberOfValues: 10
stringStatistics {
minimum: "10"
maximum: "100"
sum: 565
}. col2's stat is numberOfValues: 0. They all have no hasnull field. When we want to get where col2 is null, we will get nothing.
Why are the changes needed?
User may get a wrong result with this bug.
How was this patch tested?
Add unit tests.
Was this patch authored or co-authored using generative AI tooling?
No