-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the design of quoted literals #477
Conversation
exploration/0477-quoted-literals.md
Outdated
|
||
- **[r1; high priority]** Minimize the need to escape characters inside literals. In particular, choose a delimiter that isn't frequently used in translation content. Having to escape characters inside literals is inconvenient and error-prone when done by hand, and it also introduces the backslash into the message, `\`, which is the escape introducer. The backslash then needs to be escaped too, when the message is embedded in code or containers. (This is how some syntaxes produce the gnarly `\\\`.) | ||
- **[r2; high priority]** Minimize the need to escape characters when embedding messages in code or containers. In particular, choose a delimiter that isn't frequently used as a string delimiter in programming languages and container formats. However, note that many programming languages also provide alternative ways of delimiting strings, e.g. _raw strings_ or triple-quoted literals. | ||
- **[r3; high priority]** Minimize the need to change the message in other ways than to escape some of its characters (e.g. rephrase content or change syntax). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean, and why is it a high-priority requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to include here the need to change straight quotes to curly quotes, or to switch from single quotes to double quote, or the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with dropping this to medium priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate this priority being dropped a peg, and including a bit more explanation in the text. All the other requirements include a bit of reasoning and commentary to explain themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped to medium, added a little bit explanation. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update. This is still missing the sort of rationalisation that is provided for the other requirements. It is not immediately obvious (at least to me) why this is a requirement. What sort of operation on a message source hits this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see that this requirement isn't well worded. I'll try to rephrase it (possibly tomorrow).
As a discussion point for right now, in your commit on dual quoting, you wrote:
If the container format does not itself support dual quoting, the embedded message's quotes may be adjusted to avoid their escaping.
This is an example of "needing to change the message in other ways than to escape some of its characters" that I wanted to capture here.
Ideally, it should be possible to drop the message verbatim into code or a container. The next best thing (in my mind) is to only escape some characters, do it rarely, and also do it without thinking about it too much. OTOH, H\having to modify the body in any other way, e.g. switch from straight quotes to curly quotes in translation content, or switch between single and double quotes around literals, is more effort (again, in my mind). I think we should prioritize solutions that don't require such edits, but I also acknowledge that this isn't as high on the priority list as r1 and r2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to identify a few specific exemplars of likely container formats in which MF2 messages are expected to often be embedded by developers, such as Java, JS, .properties, YAML, JSON, our own resource format. Are there others we should explicitly consider?
exploration/0477-quoted-literals.md
Outdated
- **[r1; high priority]** Minimize the need to escape characters inside literals. In particular, choose a delimiter that isn't frequently used in translation content. Having to escape characters inside literals is inconvenient and error-prone when done by hand, and it also introduces the backslash into the message, `\`, which is the escape introducer. The backslash then needs to be escaped too, when the message is embedded in code or containers. (This is how some syntaxes produce the gnarly `\\\`.) | ||
- **[r2; high priority]** Minimize the need to escape characters when embedding messages in code or containers. In particular, choose a delimiter that isn't frequently used as a string delimiter in programming languages and container formats. However, note that many programming languages also provide alternative ways of delimiting strings, e.g. _raw strings_ or triple-quoted literals. | ||
- **[r3; high priority]** Minimize the need to change the message in other ways than to escape some of its characters (e.g. rephrase content or change syntax). | ||
- **[r4; medium priority]** Don't surprise users with syntax that's too exotic. We expect quoted literals to be rare, which means fewer opportunities to get used to their syntax and remember it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this classified as "medium" rather than "high" priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose "high" for things that can break translations or make working with them objectively harder. The exoticness of the syntax is in the eye of the beholder, and thus I chose "medium" — it's difficult to rate a solution objectively here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that some comparisons here are difficult or impossible to rank objectively: "Is ||
more or less exotic than ()
as text delimiters?" is for instance a near-impossible question to answer.
I would, however, posit that comparisons like "Is ||
more or less exotic than ""
as text delimiters?" may be objectively answered. Witness, for instance, their use in each of our comments on this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. That's why it's medium, not low :)
exploration/0477-quoted-literals.md
Outdated
- [r1 GOOD] Writing `"` and `'` in literals doesn't require escaping them via `\`. This means no extra `\` that need escaping. | ||
- [r2 GOOD] Embedding messages in most code or containers doesn't require escaping the literal delimiters. | ||
- [r3 GOOD] Message don't have to be modified otherwise before embedding them. | ||
- [r4 FAIR] Vertical lines are not commonly used as string delimiters and thus can be harder to learn for beginners. There's prior art in a practice of using vertical lines as delimiters for inline code literals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give a reference for this prior art? Is it Lisp symbol names that you're referring to, or is there something else as well?
I would also contest the "FAIR" classification; this really is "POOR".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a concrete reference other than my own experience from NNTP and IRC, where |foo|
was a common way to delimit code before backticks became popular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.wikipedia.org/wiki/Posting_style#Quoted_line_prefix mentions the vertical bar, but for quoting replies, not inline code.
https://en.wikipedia.org/wiki/Vertical_bar#Delimiter mentions it as a delimiter for strings, but in all fairness, in rare and obscure cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer some reference being included as a link, if the assertion is retained.
I think the applicability of the prior art is debatable, but as long as it's qualified in some way (e.g. via a link, or by describing it in the text as "rare and obscure") I don't think it's worth quibbling over.
I split out "Dual quoting" into its own alternative, and updated "Use quotation marks" to correspond. I left the r3 assessment as @stasm Have a look at semantic line breaks at some point? It's the markdown style we've been using for most content here. |
Oh, sure! I thought we were only using it for spec text. I'l admit that it was easier for me to write this design doc without worrying about line breaks. But, happy to refactor this PR to use them. |
Eh, it makes markdown sources a bit easier to read and edit. I would not consider a hard requirement of any sort, more of a good practice in general. 😇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. A good start.
I'm going to recommend that we merge this design document at the "proposed" maturity level. |
My line comment concerns and suggestions above are still valid and largely unaddressed, and I'd really prefer not needing to reformulate them as a separate PR. |
We discussed this at the teleconference and agreed not to merge before I address all comments. Sorry for not having updated the status here. |
@stasm ping to address comments so we can merge |
Thanks for the ping. I'll do it before the Monday meeting. |
It's POOR when embedding into an XML dialect, but otherwise, <> shouldn't cause too many issues for r2.
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
2090872
to
670bd09
Compare
I added a new alternative to the doc that effectively merges the proposed design with It's an attempt to combine the best of both approaches by allowing for "normal" quotes where they're available, while also supporting something a bit more exotic when embedding in formats like JSON. |
Co-authored-by: Addison Phillips <[email protected]>
Message don't have to be modified otherwise before embedding them, | ||
unless they happen to contain conflicting quote delimiters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a con "in a trenchcoat" :-)
Messages MUST be modified if they "happen to contain conflicting quote delimiters", e.g. those that conflict with the syntax.
My big concern here is: how to do tools up and down the translation stack decide what quotes to use?
There should be a "con" somewhere for having multiple ways of doing the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's r6
, and it's included in the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry, typo above: I meant r5
, the "one way" requirement.
Co-authored-by: Eemeli Aro <[email protected]>
An alternative to #465 — an explainer doc on why we need quoted literals and why we settled on delimiting them with vertical bars.
Rendered.