Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing non-breaking space not caught by trailing whitespace detection #250

Closed
oliverguenther opened this issue Aug 13, 2018 · 1 comment · Fixed by #315
Closed

Trailing non-breaking space not caught by trailing whitespace detection #250

oliverguenther opened this issue Aug 13, 2018 · 1 comment · Fixed by #315

Comments

@oliverguenther
Copy link

oliverguenther commented Aug 13, 2018

I'm not sure If I have overlooked an option or a discussion where this was talked about in the past, but whenever an inline style has a trailing (or leading) nbsp whitespace, the resulting commonmark is invalid since its no longer a flanking run.

For example, this HTML

  <strong>bold&nbsp;</strong>

results in the following string

**bold **

http://jsfiddle.net/29hqet7s/6/

This appears to be related to both #222 and #102.

I'm wondering what the expected result should be. I would expect it to either output the entity or strip it. On the latter issue you mentioned this exact issue regarding trailing whitespace, so I would assume this is the expected behavior? I would like to send a PR for either option (possibly behind a flag?)

@oliverguenther oliverguenther changed the title Trailing non-breaking space turned into regular space Trailing non-breaking space not caught by trailing whitespace detection Aug 13, 2018
oliverguenther added a commit to opf/openproject that referenced this issue Aug 23, 2018
Turndown doesn't replace &nbsp; before syntax identifiers such as
`<strong>foobar&nbsp;</strong>` will result in `**foobar **` (with the
space being `\u00A0`.

We can fix this by replacing all &nbsp; by CKEeditor beforehand,
knowingly modifying the behavior of the spaces.

Bug report at turndown: mixmark-io/turndown#250
oliverguenther added a commit to opf/openproject that referenced this issue Aug 23, 2018
Turndown doesn't replace &nbsp; before syntax identifiers such as
`<strong>foobar&nbsp;</strong>` will result in `**foobar **` (with the
space being `\u00A0`.

We can fix this by replacing all &nbsp; by CKEeditor beforehand,
knowingly modifying the behavior of the spaces.

Bug report at turndown: mixmark-io/turndown#250
oliverguenther added a commit to opf/openproject that referenced this issue Aug 23, 2018
Turndown doesn't replace &nbsp; before syntax identifiers such as
`<strong>foobar&nbsp;</strong>` will result in `**foobar **` (with the
space being `\u00A0`.

We can fix this by replacing all &nbsp; by CKEeditor beforehand,
knowingly modifying the behavior of the spaces.

Bug report at turndown: mixmark-io/turndown#250
oliverguenther added a commit to opf/openproject that referenced this issue Aug 24, 2018
Turndown doesn't replace &nbsp; before syntax identifiers such as
`<strong>foobar&nbsp;</strong>` will result in `**foobar **` (with the
space being `\u00A0`.

We can fix this by replacing all &nbsp; by CKEeditor beforehand,
knowingly modifying the behavior of the spaces.

Bug report at turndown: mixmark-io/turndown#250
oliverguenther added a commit to opf/openproject that referenced this issue Aug 24, 2018
Turndown doesn't replace &nbsp; before syntax identifiers such as
`<strong>foobar&nbsp;</strong>` will result in `**foobar **` (with the
space being `\u00A0`.

We can fix this by replacing all &nbsp; by CKEeditor beforehand,
knowingly modifying the behavior of the spaces.

Bug report at turndown: mixmark-io/turndown#250
@martincizek
Copy link
Collaborator

I'm wondering what the expected result should be.

The current Turndown's approach for ASCII whitespace is to move the space out of the inline element. I.e. <strong>foo </strong>bar becomes **foo** bar.

The best thing we can do for non-ASCII whitespace is actually the same, it just should not be merged with other whitespace. So the best result we can think of is probably

<strong>foo&nbsp;</strong> bar-> **foo**\u00A0 bar.

We have done a thorough analysis on the topic: https://github.com/orchitech/turndown/wiki/Whitespace

And a PR will come soon. :)

martincizek added a commit to orchitech/turndown that referenced this issue Mar 31, 2020
Do not merge ASCII and non-ASCII whitespace.
Make sure non-ASCII whitespace is moved out of inline elements to prevent generating broken Markdown.
Fix mixmark-io#102.
Fix mixmark-io#250.
martincizek added a commit to orchitech/turndown that referenced this issue Jul 6, 2020
Do not merge ASCII and non-ASCII whitespace.
Make sure non-ASCII whitespace is moved out of inline elements to prevent generating broken Markdown.
Fix mixmark-io#102.
Fix mixmark-io#250.
michbart pushed a commit to orchitech/turndown that referenced this issue Nov 30, 2020
Do not merge ASCII and non-ASCII whitespace.
Make sure non-ASCII whitespace is moved out of inline elements to prevent generating broken Markdown.
Fix mixmark-io#102.
Fix mixmark-io#250.
michbart pushed a commit to orchitech/turndown that referenced this issue Nov 30, 2020
Do not merge ASCII and non-ASCII whitespace.
Make sure non-ASCII whitespace is moved out of inline elements to prevent generating broken Markdown.
Fix mixmark-io#102.
Fix mixmark-io#250.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants