Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply the standard rule mechanism also to text nodes #304

Closed

Conversation

martincizek
Copy link
Collaborator

There might be strong reasons for creating rules also for text nodes. This simple patch makes it possible. We believe it even makes the current code more consistent, as we can rely on the node name "#text".

Backround:
Our use case is related to text escaping. As the docs admit, it is quite simplistic and adds unnecessary backslashes. Unfortunately any simple GFM escaping also necessarily corrupts data. We aim at nearly-lossless conversions, so we have developed GfmEscape to address context-dependent escaping.

I admit that full escaping complexity might be overkill for Turndown.
But Turndown is still a fantastic framework for user-developed conversions, so let's just provide the desired hook. :)

Thank you!

Copy link
Collaborator

@domchristie domchristie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @martincizek!
Thanks for all your work and your detailed pull requests 💫

I really like how this cleans up the process function. However I'm not sure it's the right solution yet. As you've figured out, escaping in Turndown is simple, but naive. It works for most cases, however it's often customised. I'd like a offer an improved system for customising escaping, perhaps along the lines of addEscape, removeEscape (custom escaping also discussed in #242 (comment)). Is this something you'd be interested in helping out on?

Comment on lines -163 to -169
var replacement = ''
if (node.nodeType === 3) {
replacement = node.isCode ? node.nodeValue : self.escape(node.nodeValue)
} else if (node.nodeType === 1) {
replacement = replacementForNode.call(self, node)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

Comment on lines +8 to +14
replacement: function (content, node, options) {
if (node.isCode) return node.nodeValue
return options.escapes.reduce(function (accumulator, escape) {
return accumulator.replace(escape[0], escape[1])
}, node.nodeValue).trim()
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight concern here is that it requires a fair bit of knowledge to add to the behaviour i.e. a developer has to remember to:

  • pass through unchanged code content
  • iterate over the existing escapes and performs the replacements
  • trim the value

@martincizek
Copy link
Collaborator Author

Hey @domchristie, thank you for your reply! As I went through all the Turndown's code over time (again, nice job!), I admit another approach would be better - introducing textReplacement (similarly to blankReplacement etc.).

We have done it, please check this commit, we just somehow forgot to make a new PR and deprecate this PR. :) Is this the way to go?

Regarding relation to escaping:

  • I believe textReplacement is still a valid extension point even if there were totally customizable escaping
  • Even though we can now override the escape() method, it is not called for isCode nodes and the node and options arguments are not passed. All these things are necessary for implementing for comprehensive escaping.

@martincizek
Copy link
Collaborator Author

Regarding help with the escaping subsystem:

  • Currently, iterative escaping is used. I.e. all text is escaped according to first replacement, the result is escaped with the second replacement, etc.
  • Iterative escaping can only work just in the simplest cases, that's why we created the UnionReplacer project. The README also describes its Turndown-related background and issues with different escaping approaches.

So I believe that Turndown would need something like UnionReplacer to do the custom escaping right, efficiently and user friendly (without need of placeholdering magic).

The mentioned project GfmEscape is actually very thoroughly configured UnionReplacer with comprehensive replacements, so this one is not appropriate to integrate.

But UnionReplacer seems to be a perfect match for configurable escaping (actually it is designed for it :-)). I know Turndown does not use any libs, but this one:

  1. was actually designed to be used with tools like Turndown
  2. it is something like 100 lines (but extreme effort was put into making it this small)
  3. Our performance tests suggested that the only eventual performance footprint is related to increased code amount (those 100 lines). It becomes more efficient than iterative escaping for around 5 replacements depending on the JS engine.

Is integrating UnionReplacer dependency an option when making the configurable escaping? If it is an option, then we can help. :)

If it were done this way, the comprehensive GfmEscape rules can be used as some sort of Turndown plugin then.

@martincizek
Copy link
Collaborator Author

Closing this PR in favour of #339, which provides related hooks in a way more consistent with the rest of Turndown.

@martincizek martincizek closed this Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants