Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content trimming #222

Open
fabioimpe opened this issue Apr 11, 2018 · 6 comments
Open

Content trimming #222

fabioimpe opened this issue Apr 11, 2018 · 6 comments

Comments

@fabioimpe
Copy link

Hi,
just a question, is there any way to avoid trimming content during the conversion?

ex.
<p>hello </p>
should be converted in:
hello <-- note the trailing whitespace

Thank you

@domchristie
Copy link
Collaborator

Not at the moment. What's your use case?

@domchristie
Copy link
Collaborator

I should add: whitespace is significant in markdown, so to avoid potential issues when the resulting markdown is parsed, the input is stripped of all insignificant whitespace, and trimmed. For example, if whitespace were not trimmed in <em>hello world </em>, then this would convert to _hello world _, which when parsed would result in: <p>_hello world _</p> :/

@gabibianconi
Copy link

Hi! I was wondering if you can find a way to only trim content when the element is strong, italic, etc?
So, <em>hello world </em> will still be converted to _hello world_
and <p> hello world </p> would be converted to hello world (white space at the beginning and the end)

My use case:
HTML: <span>Date:</span> 7/27/2023 11:00 AM to 12:00 PM
Current markdown: Date:7/27/2023 11:00 AM to 12:00 PM
Expected markdown: Date: 7/27/2023 11:00 AM to 12:00 PM (whitespace after Date:)

@alexander-turner
Copy link

alexander-turner commented Sep 8, 2024

I have a situation where the HTML is <p>It's<i> theoretically </i>possible</p>. This gets converted to the markdown It's_theoretically_possible, which is not what I want.

This isn't getting shunted outside of the _ characters because I need to parse the <i> tags individually to get around the lack of support for #445. So, I recommend just merging that PR instead.

@martincizek
Copy link
Collaborator

Hi! I was wondering if you can find a way to only trim content when the element is strong, italic, etc? So, \<em\>hello world \</em\> will still be converted to _hello world_ and <p> hello world </p> would be converted to hello world (white space at the beginning and the end)

This doesn't have a solution with pure Markdown. It would need to be expressed in HTML and as turndown is a converter to Markdown, it prefers collapsing the whitespace in a manner that doesn't break flanking delimiters, see https://spec.commonmark.org/0.31.2/#emphasis-and-strong-emphasis.

HTML also collapses whitespace, it does so in favour of the first whitespace in a collapsable whitespace run. So maybe even HTML does not work as you want, you can try it in your browser: I say <em style="background-color: red;"> hello world </em> to all.

My use case: HTML: <span>Date:</span> 7/27/2023 11:00 AM to 12:00 PM Current markdown: Date:7/27/2023 11:00 AM to 12:00 PM Expected markdown: Date: 7/27/2023 11:00 AM to 12:00 PM (whitespace after Date:)

Actually I wasn't able to reproduce this, you can try it in Turndown's demo site:
image

@martincizek
Copy link
Collaborator

martincizek commented Sep 8, 2024

I have a situation where the HTML is <p>It's<i> theoretically </i>possible</p>. This gets converted to the markdown It's_theoretically_possible, which is not what I want.

Can't reproduce this, maybe it's because of your rules mentioned below?
image

This isn't getting shunted outside of the _ characters because I need to parse the <i> tags individually to get around the lack of support for #445. So, I recommend just merging that PR instead.

Not sure how do <i>tags relate to malformed lists, where only <ul> and <li> are concerned. Still, the best solution for #445 is to fix the DOM before passing it to Turndown.

And if you need intra-word emphasis, you may want to reconfigure emphasis delimiter to '*', as Markdown spec suggests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants