-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BloomFilter and float point is ambiguous #407
Comments
Gang Wu / @wgtmac:
|
Gang Wu / @wgtmac: |
Gabor Szadovszky / @gszadovszky: If we still want to handle FP bloom filters I agree with @wgtmac's proposal. (It is a similar approach we implemented for min/max values.) Keep in mind that we need to handle the case when someone wants to filter on a NaN. |
Gang Wu / @wgtmac: |
Gabor Szadovszky / @gszadovszky: Maybe we do not need to handle +0.0 and -0.0 differently from the other values. (We needed to handle them separately for min/max values because the comparison is not trivial and there were actual issues.) If someone deals with FP numbers they should know about the difference between +0.0 and -0.0. Because the FP spec allows to have multiple NaN values (even though java use one actual bitmap for it) we need to avoid using Bloom filter in this case. Dictionary is a different thing because we deserialize it to java Double/Float values in a Set so we will have one NaN value that is the very same one we are searching for. (It is more for the other implementations to deal with NaN if the language has several NaN values.) |
Currently, our Parquet can use BloomFilter for any physical types. However, when BloomFilter apply on float:
What does +0 -0 means? Are they equal?
Should qNaN sNaN written in BloomFilter? Are they equal?
Reporter: Xuwei Fu / @mapleFU
Note: This issue was originally created as PARQUET-2255. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: