Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the design of quoted literals #477

Merged
merged 32 commits into from
Nov 27, 2023
Merged

Conversation

stasm
Copy link
Collaborator

@stasm stasm commented Sep 17, 2023

An alternative to #465 — an explainer doc on why we need quoted literals and why we settled on delimiting them with vertical bars.

Rendered.

@stasm stasm marked this pull request as ready for review September 17, 2023 22:24

- **[r1; high priority]** Minimize the need to escape characters inside literals. In particular, choose a delimiter that isn't frequently used in translation content. Having to escape characters inside literals is inconvenient and error-prone when done by hand, and it also introduces the backslash into the message, `\`, which is the escape introducer. The backslash then needs to be escaped too, when the message is embedded in code or containers. (This is how some syntaxes produce the gnarly `\\\`.)
- **[r2; high priority]** Minimize the need to escape characters when embedding messages in code or containers. In particular, choose a delimiter that isn't frequently used as a string delimiter in programming languages and container formats. However, note that many programming languages also provide alternative ways of delimiting strings, e.g. _raw strings_ or triple-quoted literals.
- **[r3; high priority]** Minimize the need to change the message in other ways than to escape some of its characters (e.g. rephrase content or change syntax).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean, and why is it a high-priority requirement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to include here the need to change straight quotes to curly quotes, or to switch from single quotes to double quote, or the other way around.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with dropping this to medium priority.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate this priority being dropped a peg, and including a bit more explanation in the text. All the other requirements include a bit of reasoning and commentary to explain themselves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped to medium, added a little bit explanation. PTAL.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. This is still missing the sort of rationalisation that is provided for the other requirements. It is not immediately obvious (at least to me) why this is a requirement. What sort of operation on a message source hits this issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see that this requirement isn't well worded. I'll try to rephrase it (possibly tomorrow).

As a discussion point for right now, in your commit on dual quoting, you wrote:

If the container format does not itself support dual quoting, the embedded message's quotes may be adjusted to avoid their escaping.

This is an example of "needing to change the message in other ways than to escape some of its characters" that I wanted to capture here.

Ideally, it should be possible to drop the message verbatim into code or a container. The next best thing (in my mind) is to only escape some characters, do it rarely, and also do it without thinking about it too much. OTOH, H\having to modify the body in any other way, e.g. switch from straight quotes to curly quotes in translation content, or switch between single and double quotes around literals, is more effort (again, in my mind). I think we should prioritize solutions that don't require such edits, but I also acknowledge that this isn't as high on the priority list as r1 and r2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to identify a few specific exemplars of likely container formats in which MF2 messages are expected to often be embedded by developers, such as Java, JS, .properties, YAML, JSON, our own resource format. Are there others we should explicitly consider?

- **[r1; high priority]** Minimize the need to escape characters inside literals. In particular, choose a delimiter that isn't frequently used in translation content. Having to escape characters inside literals is inconvenient and error-prone when done by hand, and it also introduces the backslash into the message, `\`, which is the escape introducer. The backslash then needs to be escaped too, when the message is embedded in code or containers. (This is how some syntaxes produce the gnarly `\\\`.)
- **[r2; high priority]** Minimize the need to escape characters when embedding messages in code or containers. In particular, choose a delimiter that isn't frequently used as a string delimiter in programming languages and container formats. However, note that many programming languages also provide alternative ways of delimiting strings, e.g. _raw strings_ or triple-quoted literals.
- **[r3; high priority]** Minimize the need to change the message in other ways than to escape some of its characters (e.g. rephrase content or change syntax).
- **[r4; medium priority]** Don't surprise users with syntax that's too exotic. We expect quoted literals to be rare, which means fewer opportunities to get used to their syntax and remember it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this classified as "medium" rather than "high" priority?

Copy link
Collaborator Author

@stasm stasm Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose "high" for things that can break translations or make working with them objectively harder. The exoticness of the syntax is in the eye of the beholder, and thus I chose "medium" — it's difficult to rate a solution objectively here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that some comparisons here are difficult or impossible to rank objectively: "Is || more or less exotic than () as text delimiters?" is for instance a near-impossible question to answer.

I would, however, posit that comparisons like "Is || more or less exotic than "" as text delimiters?" may be objectively answered. Witness, for instance, their use in each of our comments on this thread.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. That's why it's medium, not low :)

- [r1 GOOD] Writing `"` and `'` in literals doesn't require escaping them via `\`. This means no extra `\` that need escaping.
- [r2 GOOD] Embedding messages in most code or containers doesn't require escaping the literal delimiters.
- [r3 GOOD] Message don't have to be modified otherwise before embedding them.
- [r4 FAIR] Vertical lines are not commonly used as string delimiters and thus can be harder to learn for beginners. There's prior art in a practice of using vertical lines as delimiters for inline code literals.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give a reference for this prior art? Is it Lisp symbol names that you're referring to, or is there something else as well?

I would also contest the "FAIR" classification; this really is "POOR".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a concrete reference other than my own experience from NNTP and IRC, where |foo| was a common way to delimit code before backticks became popular.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Posting_style#Quoted_line_prefix mentions the vertical bar, but for quoting replies, not inline code.

https://en.wikipedia.org/wiki/Vertical_bar#Delimiter mentions it as a delimiter for strings, but in all fairness, in rare and obscure cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer some reference being included as a link, if the assertion is retained.

I think the applicability of the prior art is debatable, but as long as it's qualified in some way (e.g. via a link, or by describing it in the text as "rare and obscure") I don't think it's worth quibbling over.

exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
@eemeli
Copy link
Collaborator

eemeli commented Sep 18, 2023

I split out "Dual quoting" into its own alternative, and updated "Use quotation marks" to correspond.

I left the r3 assessment as ??? for these, as I'm still not sure what it's measuring, and it wasn't clear to me if the previous "FAIR" would apply to both alternatives.

@stasm Have a look at semantic line breaks at some point? It's the markdown style we've been using for most content here.

@stasm
Copy link
Collaborator Author

stasm commented Sep 18, 2023

@stasm Have a look at semantic line breaks at some point? It's the markdown style we've been using for most content here.

Oh, sure! I thought we were only using it for spec text. I'l admit that it was easier for me to write this design doc without worrying about line breaks. But, happy to refactor this PR to use them.

@eemeli
Copy link
Collaborator

eemeli commented Sep 18, 2023

Eh, it makes markdown sources a bit easier to read and edit. I would not consider a hard requirement of any sort, more of a good practice in general. 😇

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. A good start.

exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
exploration/0477-quoted-literals.md Outdated Show resolved Hide resolved
@aphillips
Copy link
Member

I'm going to recommend that we merge this design document at the "proposed" maturity level.

@eemeli
Copy link
Collaborator

eemeli commented Oct 7, 2023

I'm going to recommend that we merge this design document at the "proposed" maturity level.

My line comment concerns and suggestions above are still valid and largely unaddressed, and I'd really prefer not needing to reformulate them as a separate PR.

@stasm
Copy link
Collaborator Author

stasm commented Oct 9, 2023

We discussed this at the teleconference and agreed not to merge before I address all comments. Sorry for not having updated the status here.

@aphillips
Copy link
Member

@stasm ping to address comments so we can merge

@stasm
Copy link
Collaborator Author

stasm commented Oct 27, 2023

Thanks for the ping. I'll do it before the Monday meeting.

@stasm stasm force-pushed the design-quoted-literals branch from 2090872 to 670bd09 Compare November 10, 2023 17:59
@eemeli
Copy link
Collaborator

eemeli commented Nov 10, 2023

I added a new alternative to the doc that effectively merges the proposed design with [a2] by allowing for either |vertical pipes|, 'single quotes' or "double quotes".

It's an attempt to combine the best of both approaches by allowing for "normal" quotes where they're available, while also supporting something a bit more exotic when embedding in formats like JSON.

exploration/quoted-literals.md Outdated Show resolved Hide resolved
exploration/quoted-literals.md Outdated Show resolved Hide resolved
exploration/quoted-literals.md Show resolved Hide resolved
exploration/quoted-literals.md Outdated Show resolved Hide resolved
exploration/quoted-literals.md Outdated Show resolved Hide resolved
@stasm stasm requested a review from eemeli November 17, 2023 13:25
Comment on lines +326 to +327
Message don't have to be modified otherwise before embedding them,
unless they happen to contain conflicting quote delimiters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a con "in a trenchcoat" :-)

Messages MUST be modified if they "happen to contain conflicting quote delimiters", e.g. those that conflict with the syntax.

My big concern here is: how to do tools up and down the translation stack decide what quotes to use?

There should be a "con" somewhere for having multiple ways of doing the same thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's r6, and it's included in the table.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, typo above: I meant r5, the "one way" requirement.

exploration/quoted-literals.md Show resolved Hide resolved
@aphillips aphillips merged commit 849acdc into main Nov 27, 2023
1 check passed
@aphillips aphillips deleted the design-quoted-literals branch November 27, 2023 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants