Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Double in CellValue by Scientific #177

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

olafklinke
Copy link

The XML contains decimal values, which are accurately represented by Scientific but not by Double as in the CellDouble constructor of CellValue. This PR makes CellDouble a pattern synonym and replaces the old constructor by a new constuctor CellDecimal which holds a Scientific value.
For motivation, see this issue.

Copy link
Owner

@qrilka qrilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks fine but let's at least fix the tests.
I'm OK with publishing as version 1.2

@@ -105,5 +106,5 @@ fillCacheFieldsFromRecords fields recs =
then field {cfItems = mapMaybe recToCellValue recVals}
else field
recToCellValue (CacheText t) = Just $ CellText t
recToCellValue (CacheNumber n) = Just $ CellDouble n
recToCellValue (CacheNumber n) = Just $ CellDecimal (fromFloatDigits n)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use Scientific in CacheNumber as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I don't understand what CacheNumber is, I trust in your judgement here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically as far as I remember it's the same thing but for pivot table cache

@@ -105,6 +105,7 @@ Library
, network-uri
, old-locale >= 1.0.0.5
, safe >= 0.3
, scientific
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding some reasonable bounds, e.g. >= 0.3.6 (the version where Hashable was fixed)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we'e only relying on

  • Fractional instance of Scientific,
  • the toRealFloat function

actually a quite low version bound could suffice. 0.3.0.0 appears to be the birth version of toRealFloat.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the safer bound

@olafklinke
Copy link
Author

Fixed the test suite. Note that the code base may have some CellDouble patterns lurking, which do an implicit Scientific -> Double conversion where it may be unreasonable. Sadly the one-directional pattern synonym can not make the change in representation completely transparent. (That is, Haskell currently does not allow writing a bi-directional pattern that uses a different conversion function each way.) We might want to come up with a good test that exhibits the rounding errors in using the CellDouble pattern.
Further I am not an expert at writing prisms, nor lens idiosyncrasies at all, so please feel free to edit my PR so that all required lenses are present. Otherwise I guess this PR would introduce an unnecessary amount of breakage in downstream packages.

@qrilka
Copy link
Owner

qrilka commented May 19, 2024

Would you mind checking my comments on the PR? And it would be great to add the test you proposed as well. As for unforseen corner case - let's take a chance and fix problems when we see some.

@qrilka
Copy link
Owner

qrilka commented Jun 1, 2024

@olafklinke it looks like the microlens flag is a bit broken

@olafklinke
Copy link
Author

Ah yes, the manually defined _CellDecimal prism conditional on the microlens flag was missing. This is fixed in 6c9ddcb. Further there are two warnings about incomplete pattern matches in src/Codec/Xlsx/Writer/Stream.hs:346 and src/Codec/Xlsx/Writer/Stream.hs:357 but these are actually covered by the CellDouble view pattern. Therefore I added a COMPLETE pragma.

@qrilka
Copy link
Owner

qrilka commented Jun 3, 2024

Thanks for fixing that, could you also address my other comments? BTW it would be neat to squash the commits

@olafklinke
Copy link
Author

could you also address my other comments

What is left unaddressed?
Changing CacheNumber to have a Scientific argument is not trivial, because in Codec.Xlsx.Writer.Internal we would need a ToAttrVal Scientific instance, but the txtd function that is used for the Double instance relies on RealFloat which Scientific is not. Hence we need a better conversion Scientific -> Text.Builder first. We could use the one provided by scientific but I have no idea whether that conforms to the XML standard. Would that be covered by the test suite?

@qrilka
Copy link
Owner

qrilka commented Jun 4, 2024

This looks like this uncovers the problem with the PR - it doesn't fix the writer as a result saving and reading will be not idempotent :-
Looks like OOXML spec doesn't specify anything about decimals in cell values :(
But for cached fields it's

<xsd:attribute name="v" use="required" type="xsd:double"/>

so I suppose we could use it as a guide here. Does this sound fine with you?
Also we need to add a decimal which is not representable with the current xlsx version and which will make writer/read not idempontent in this PR.

Also let's add a bound on scientific

@olafklinke
Copy link
Author

There are two idempotency issues here: The immediate Haskell idempotency issue is that Data.Text.Read.rational and Data.Text.Lazy.Builder.scientificBuilder need to be mutually inverse. That is within our power. The second idempotency issue is whether other OOXML-processing software can change the representation of decimal numbers so that it trips up Data.Text.Read.rational, or whether the text representations produced by Data.Text.Lazy.Builder.scientificBuilder adhere to the XML standard. (I think they do, it is quite liberal.)
Indeed the standard for xsd:double states that the value range must adhere to IEEE-754 which Scientific can easily surpass. Conversely, Double has values that are not representable in Scientific, e.g. infty = 1/0 :: Double which fromFloatDigits maps to 1.797693134862316e308.

In fact, even the current published version of xlsxdoes not adhere to the standard, since Data.Text.Read.rational fails on the texts "NaN" and "INF" which are legal xsd:double according to the spec.

Summary: If the OOXML standard really forces all decimals in XLSX to be IEEE-754 doubles, then #176 has no basis, as all double precision floating point numbers do have an exact representation in scientific notation, and a slightly improved rational parser would have us covered for the NaN and INF corner cases.

@qrilka
Copy link
Owner

qrilka commented Jun 5, 2024

As I said in the other doc it seems that I was extrapolating too much, cell values are not xsd:double but cached numbers are. Ok, in this situation I'd propose to leave CacheNumber as is and make read . write == id including currently unrepresentable date values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants