-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(ir): remove the decimal precision promotion logic #8195
Conversation
@webmiche Thoughts on dropping the type evolution here? It seems to be causing an issue for BigQuery which has much simpler promotion rules than DB2. |
432d151
to
e4df99b
Compare
90aad0e
to
e7d3900
Compare
@chelsea-lin I rebased your branch and retargeted the PR against |
For context, I have not been working a lot with databases and ibis lately given that my project finished. I think this touches the exact core discussion we had back in the initial PR: type inference rules are backend-specific whereas ibis tries to build data structures that are as generic as possible. And I guess what we have here is exactly that, a backend that does not follow the same type inference rules. I think there is a range of possibilities of what could be done for this case. From the top of my head:
I personally advise against inferring type inference rules from backend choice as this somewhat defeats the purpose of ibis in the first place. Also, the ibis specific type rules sound like just another specification on top of all the others, so that's no real solution. I think the third choice is probably most reasonable. Honestly, I don't know exactly how the ibis data structures work currently, so I can't tell the ramifications of such a change. Maybe deleting the type inference/binop promotion-functionality already achieves this to some extent. Maybe, this change is only the first towards handling this in a principled way. So, all in all, I am fine with dropping it. If I was still working on the project I implemented the type inference for originally, I would probably add some clue code to handle this in between the ibis data structures and my data structures. For this, I would definitely appreciate it if ibis communicates to me which decimals are defined by the frontend and which it is not sure about. The "opaque" decimal solution would achieve this, AFAIU. Maybe @ingomueller-net has more thoughts on this. |
e7d3900
to
838b242
Compare
@webmiche Thanks for the response! I was thinking something similar re opaque decimals. We took this approach a long time ago with strings, and it has not proven to be an issue while also allowing us to mostly ignore the various I think we'd have to implement this and try it out on a few backends to see what its blast radius would be. I suspect there might be some issues at the Ibis <-> pyarrow boundary. |
838b242
to
8cccf5b
Compare
@kszucs I also need the commit in |
We keep |
436ef1b
to
a64622e
Compare
cec4807
to
d20f5de
Compare
Given this isn't breaking anything I'm inclined to merge it. @kszucs Thoughts? |
794184d
to
5ee75c8
Compare
5ee75c8
to
c46d5b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @chelsea-lin!
Agree. |
Thanks, everybody, for the work on Ibis and this issue! What I have asked myself in the past due to this and similar issues is: what is the degree of portability that Ibis aims for? Since the precision promotion logic is different in some backends, Ibis programs may result in different results when run on different backends. Is that expected? Is there a place where these differences are tracked? When do we consider a backend as "correct" if we allow these kind of differences? |
@ingomueller-net Good questions!
I don't have a general and precise answer for this. Here are some aspects of Ibis's output that aren't portable across backends and likely never will be:
I'm sure there are others.
Depends. Generally speaking, no, but there's functionality that is out of control that cannot be replicated across backends.
At least I can answer this one definitively: no there is not :) Perhaps a docs page with specific details here is in order!
It would be difficult to do without a definition of correctness. I'd like to see if there's a way we can capture the backend differences first, and then see whether such a definition comes out of that. |
Thanks for the prompt answer, @cpcloud!
That's a good one! And somewhat tricky: one could argue that any possible order is "correct" by accepting any possible ordering (like the SQL standard does, I believe). [ Off topic: There are actually quite a number of sources of non-determinism. I just now remembered that, in some previous life, I wrote a paper about them as well as a potential solution, which, however, I think no real system ever implemented ;) ]
+1 |
To add to the issue of aggregations, note that the result of any non-associative function depends on the input order, including And then the result of most window functions obviously depends on the order. Again, all of these are non-deterministic in many SQL systems, so you don't get a guarantee to obtain the same result from run to run against the same system instance there either. I, thus, think it'd be perfectly valid to declare all possible orderings as "correct." However, it has to be clear what exactly the semantics of each operation are, possibly referring to non-determinism for some operations. |
Oh, and another point that came up when I asked the same question on substrait: The semantics of |
Description of changes
This change is removing the decimal precision promotion logic from the unbounded ibis expr. The logic was introduced by #4330 so the decimal precision can matches the DB2. However, this rule is not generic for all. For example, BQ only support
precision=38
.Issues closed