-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
toLeopard: Static number casting issues in equality operator + NaN leaking #123
Comments
I like Possibly better: Regardless, there'll obviously be documentation in the |
I'm glad you're thinking about this. I'm not totally clear on what the difference is between I'm curious what nuance I'm missing. |
@PullJosh Just to clarify, I'm going to refer to The point of these differentiations is to provide more expressive capability to the desiring input. That is, we have a baseline amount of detail that the reporter's spec (in That's the sake of input types in general, which is fairly clear, but what's less obvious is that the desired type fundamentally has more control than the satisfying one (the reporter spec). Our general goal is to transition as smoothly as possible from a satisfying input type to the desired one. (This is why we have a concept of a "satisfying" input type in the first place - so that we don't unnecessarily cast an expression which satisfies boolean to a desired boolean, for example.) I think the part that's most challenging to wrap your head around is that an input type is, in truth, just a keyword declaring an agreement about what value an expression could evaluate to. That's the full sum of its intrinsic meaning... but intrinsic meaning has nothing to do with actual behavior. It's like if I wrote in a dictionary that one quality of a skilled receptionist is to be agreeable. Well, OK, that's a dictionary definition that all involved parties might agree on (let's say so!)... but that same, single definition has very different practical implications for the receptionist and the client. For the receptionist, it indicates what standard they should be looking to meet; for the client, it indicates what standard they're looking to judge by (consciously or not). The dictionary definition has useful meaning, but it doesn't actually do anything on its own. Its practical effect always depends on its interpretation by a specific party - its extrinsic meaning. Likewise, the extrinsic meaning of "a number that might be NaN" has to be interpreted twofold: once by the satisfying reporter, once by the desiring input. For the satisfying input, there's only one interpretation: "If my expression's value is or might be NaN, then I need to declare so." But for the desiring input, there are two interpretations. One is, "I require that you most certainly give me a number. And if it's NaN, then I require that you leave it as it is." The other is, "I require that you most certainly give me a number. But really, I don't mind what that number is. It's okay if it's NaN, but you can give me zero instead, and I'll treat that the same as NaN. Whatever works better for you." See how all three of those interpretations are in line with a keyword that simply means "a number that might be NaN"? I think that's the reason it isn't terribly obvious that you might want to differentiate between those two desiring interpretations, especially when there's only one applicable satisfying interpretation. I only alluded to the key detail briefly in my original post:
This belies that Because the desiring input is the one which, in the end, has more expressive power, I decided to reframe the terminology in terms of how the expression gets cast, instead of what its value might be. That's how the second batch of names (which I've been using in this comment) came about.
Individual reporter specs never differentiate between a desired input type of Instead, it's the in-between casting (performed at the very bottom of So at the crux of it, why do we differentiate between strict and loose casting? The answer comes from the third party in play here - the goal of the caster itself, which is to bridge those two values as smoothly as possible. The trick is that casting a number strictly is less smooth than casting it loosely (read: to zero) - mainly, from a purely syntactical standpoint! Zero casting a number uses (On a less "aesthetic" level - although aesthetics are mostly what count for the caster - strict casting also fundamentally conveys more information than zero casting. Most non-number values get cast to NaN in strict casting; in loose casting they are cast as zero, and so cannot be differentiated from an expression which was actually zero in the first place. This detail turns out to be absolutely critical for the equals comparison operator... but it's also highly inappropriate for other operations to concern themselves with, particularly when they don't care if the value is zero or NaN. In general, inputs shouldn't be asking for more information than they actually need.) To boil everything down:
While I think there's an intuition to this that can be understood fairly easily once you "get" it, it's all rather a bit to wrap your head around in the first place. If I gave myself permission to shake up the status quo and express this in a simpler way from the start, what might I change? Well - personally I think the hardest parts are:
All said, my alternate approach would be to just state extrinsic meanings on their own, explicitly, and NOT try to bundle the extrinsic meanings for satisfying and desiring together. They do both center around the same fundamental intrinsic meanings, yes (such as "a string" or "a number that might be NaN"), but it muddies the waters to try to put them together using one word for two extrinsic meanings. I'd call those extrinsic meanings "traits", and define one set for satisfying, one for desiring: enum DesirableTraits {
IsBoolean,
IsNumber,
IsString,
// only applicable alongside IsNumber
IsCastToNaN,
IsCastToZero,
// if *neither* is desired then it's treated
// the same as "LooselyCastNumber"
}
enum SatisfyingTraits {
IsBoolean,
IsNumber,
IsString,
// only applicable alongside IsNumber
IsNotNaN,
// if not satisfied then it's treated as though
// it may be NaN
} It would be possible - and common - to specify multiple traits, both when satisfying and when desiring. Then the caster compares desired traits with satisfied ones to figure out the smoothest way to bridge, performing the same work it currently does, just with more coherent terminology. I think there's a simpler intuitive logic to this, but to lay it out explicitly:
It'd take a little extra work to go through block definitions with this system, but IMO it would be worth it, because this is a lot easier to parse and intuitively understand in my opinion, and is a lot more extensible for defining more kinds of desirable and satisfying traits later on, if that becomes relevant. Sorry for the devastatingly long reply LOL. Thank you very much for your time checking out and considering all this! |
Thank you for the novel. (Sincerely) This helped me understand the situation a lot better. I definitely prefer your method of explicitly declaring both desirable traits and satisfying traits separately, although it does feel a little funny that they almost always (but not always always) come in perfectly matching pairs. For now, it's probably best to tack on the three slightly different variations of Number/possibly NaN values as you wrote in your original post, but if these kinds of situations keep popping up repeatedly then maybe we would want to revisit this and follow the differentiated input/output traits you described. |
You got it! Since the situation's laid out pretty clearly (including notes for revisiting), I'm not worried about putting this on the backburner and coming back to traits sometime later on. I have a WIP branch with basically all the work done, so apart from putting together a Scratch project that actually tests all the expected behavior, it's pretty much good to go! |
Branching from #9 (comment).
We try to use
===
foroperator_equals
where possible, but at the moment are too greedy. Casting to a number doesn't work exactly the same inCast.compare
and that's something sb-edit is currently unaware of.We only use
===
inoperator_equals
if at least one of the arguments can be parsed as a number. Since it's impossible to representNaN
this way (parseNumber
only returns true if the value is non-NaN after casting to number), we don't need to worry about(...) === NaN
type comparisons (which are treated asString(...).toLowerCase() === "nan"
in Scratch).Therefore, if a block expression returns NaN, it can't be equal to the written number value. This will be useful later.
We similarly have an issue where the division operator can return
NaN
(zero divided by zero) and this is treated as satisfying number inputs. While this is fine for certain situations, in the context of most math operators (in Scratch),NaN
is supposed to be treated as zero. Thus NaN may leak from operators which currently identify as satisfyingNumber
, and this is a compatibility issue.My proposal is to implement a "strict" mode for casting to number in Leopard:
this.toNumber(..., true)
. It would reportNaN
for any values which can never resolve a numerical comparison (in Scratch). This isn't drastically complicated; it mostly means returningNaN
for cases we already detect (and currently return0
for). It also means specifically banning the valuesnull
and whitespace-only strings from casting to zero (seeCast.isWhiteSpace
.)Then replace
InputShape.Number
with three new shapes:InputShape.NumberPossiblyNaN
: A number value which was representable as a number in the first place or is now represented as NaN.NumberNotNaN
.operator_mathop
for sure.this.toNumber((...), true)
InputShape.NumberNotNaN
: A number value which was representable as a number in the first place or is now represented as zero.NumberPossiblyNaN
.NumberNotNaN
inputs.)this.toNumber((...))
InputShape.LooseNumber
: A number value for which0
andNaN
are dynamically treated as equal.NumberNotNaN
andNumberPossiblyNaN
.this.toNumber((...))
Then differentiate the desired input types for all existing uses of
InputShape.Number
. For example, math blocks requireNumberNotNaN
. Equals-comparison of numbers requiresNumberPossiblyNaN
. Blocks like "repeat", "wait", and "say and wait" all benefit from usingLooseNumber
because they will still behave correctly if provided NaN, treating that value as zero, and don't require casting more strictly like equals-comparison - casting to zero instead of NaN is fine.We also need to adapt
staticBlockInputToLiteral
. Since it already takes a desired input shape, this doesn't necessitate an (internal) interface change, conveniently. At the moment values like3ee3e3e3e33eee333
always get returned as strings, which is clearly dangerous. (These are functionally NaN being returned for the desired input shapeNumber
!) Non-numeric values should be cast to zero or NaN adapting to the desired input shape.In summary, we want to avoid using the casting provided by
NumberPossiblyNaN
everywhere except for equals-comparison; we also want to ensure thatNaN
never leaks out of division and into operators that are supposed to treatNaN
as zero but don't currently perform any dynamic casting for any number-shaped reporters.Although the implementation for all this is relatively trivial, it's quite a bit to reason about when ensuring Leopard number behavior matches Scratch number behavior. But these tools are necessary, and together, enable rather clean translation. In the vast majority of cases, only equals-comparison is affected; division is also affected when both operands are a block or one operand is zero, but not when one of the operands is a non-zero constant - likely a large portion of uses of division in general. And inputs which don't care whether passed NaN or zero don't get affected at all, regardless what the reporter satisfies.
The text was updated successfully, but these errors were encountered: