-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Rmodepdf and LuaXML to display block HTML elements #469
Comments
You can wrap HTML fragments in some dummy element to prevent parsing issues. I also think that you can process the text nodes for Markdown, so it should be possible to use it here. This is a proof of concept: kpse.set_program_name "luatex"
local domobject = require("luaxml-domobject")
local transform = require("luaxml-transform")
local function parse(block)
-- wrap the text in a container element, so it doesn't matter that the HTML markup can be incomplete
-- <body> is a good candidate
local dom = domobject.html_parse("<body>" .. block .. "</body>")
return dom
end
local function should_expand(element)
-- test if we should expand markdown in this element
local element_name = element:get_element_name()
-- do some tests with the element name
-- ...
-- for now, just return true
return true
end
local function process_markdown(text)
-- this is just an example. the real funtion would need to be much more complex
text = text:gsub("%*(..-)%*", "\\textit{%1}")
return text
end
local function expand_markdown(element)
-- recursively loop over child elements and expand markdown in text nodes
for i, child in ipairs(element:get_children()) do
if child:is_element() then
-- recurse for child elements
expand_markdown(child)
elseif child:is_text() and should_expand(element) then
-- run this only on text nodes in elements that should be processed
child._text = process_markdown(child._text)
end
end
end
local transformer = transform.new()
-- disable escaping of TeX commands and braces
transformer.unicodes = {
[92] = nil,
[123] = nil,
[125] = nil,
}
-- actions for HTML elements
transformer:add_action("i", "\\textit{%s}")
transformer:add_action("b", "\\textbf{%s}")
local test = "Hello <i>world</i>! Another text <b>with *markdown*</b>"
local dom = parse(test)
expand_markdown(dom:root_node())
-- debugging print of the processed DOM
print(dom:serialize())
-- and now convert to TeX
print(transformer:process_dom(dom)) For this test string:
|
That's a compelling approach: First, parse the Markdown document an HTML document, construct a DOM and only then convert the text nodes from Markdown to LaTeX. However, it seems incompatible with the current approach of CommonMark in general and the Markdown package in particular, where we first parse the whole document as a Markdown document and then we identify HTML code within the document. An alternative would be to redefine
However, this seems like a lot of plumbing in TeX, which runs the risk of breaking commands that change catcodes such as \markdownRendererInlineHtmlFragment{2}{<i>}{world}{</i>} However, we can't just do that without breaking compatibility, since users may already rely on |
Well, I don't know much about CommonMark and also how the Markdown package processes the document, so I am not sure what the best way is, so I cannot comment on this :( I can only help on the LuaXML end, I am afraid. |
That's OK, few people do! I am happy to put in the work on the Markdown side of things. |
However, things would still break if, instead of "world", there were some brittle content that needs to appear at the top level of a file. We can still fix this by putting "world" into a separate file.
Come to think of it, in CommonMark, block HTML elements do not necessarily represent complete HTML fragments that can be represented in DOM either. Therefore, we would need to do something similar to the command Both changes seem significant and possibly breaking for some users. Let's do something simpler instead and only use Rmodepdf and LuaXML for raw HTML blocks and HTML file transclusion, as these are both very likely to contain complete HTML fragments. |
As discussed with @michal-h21 before and after their TUG 2024 talk (slides, preprint), we may want to look into using the LuaXML library with the default transformation rules from rmodepdf to display block HTML elements.
For inline HTML elements, this does not seem applicable, because inline HTML elements produce renderers that do not necessarily represent complete HTML fragments that can be represented in DOM:
We can't easily change this, since the CommonMark standard allows Markdown markup within inline HTML elements.
The text was updated successfully, but these errors were encountered: