Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XCOMMONS-2349: Improve performance of AttributeFilter in HTMLCleaner #159

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

michitux
Copy link
Contributor

This includes two implementations in two commits, the first is based on ThreadLocal, which is changed to use the execution context in the second commit, which I believe should be the better option. As it is not clear that XPathFactory.newInstance(); is thread-safe, I have also added a synchronized-block which should also be added to all other uses if we agree that this is an issue.

Note: I'm not sure that this is the right approach. I'll stop investigating here for now, but I did a quick profile (in debug mode, no idea how much this impacts performance) of the 1000 HTML macros test page and it seemed to me as if XPath evaluation accounts for almost 90% of the HTMLFilter running time. This is quite bad imho as HTMLFilter still needs about as much time as the rest of the Macro transformation. I think this is way too much even though it is a roughly factor 10 improvement (factor 5 overall less running time). Therefore, we might either need to cache even more (maybe cache the XPathExpression as in the first commit, didn't benchmark that) or use a different implementation for this filter.

Jira issue: https://jira.xwiki.org/browse/XCOMMONS-2349

@michitux
Copy link
Contributor Author

Removed the synchronized-block after a discussion in #xwiki.

* Store the XPathExpression in a thread-local variable.

Note: this will lead to memory leaks if threads are re-used for other applications after stopping XWiki.
* Store the XPathFactory in the execution context.
* Do not protect the call to XPathFactory.newInstance() as it seems thread-safe after all.
* Replace XPath by simple loop over all elements.
* Make sure that replaced attributes are sorted lexicographically to
  guarantee deterministic results.
@michitux
Copy link
Contributor Author

I've just pushed a new version that simply doesn't use any XPath after having learned how to traverse the DOM in another context. In my personal opinion the code isn't more complicated, in fact imho it is much simpler than the version that cached the XPath-Factory. The only thing I didn't check is if there are penalties due to modification of the DOM during the iteration as the DOM list is live. Benchmarks with 100 runs on current 14.5-SNAPSHOT:

Before 1000HTMLMacros:
0.26493 +- 0.00230 seconds

After 1000HTMLMacros:
0.083247 +- 0.000539 seconds

As comparison 1000Macros:
0,051540 +- 0,000744

A short profile shows that org.xwiki.rendering.internal.transformation.XWikiRenderingContext#getTargetSyntax which is called twice by the HTML macro takes about 50% of the running time of the HTML macro. When I wrote the code that calls it I expected this to be a simple property read and not something expensive that reads a configuration option and loads a document (from the cache) to get an XObject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant