Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adopt save-and-restore pattern for libxml2 error handlers everywhere #2172

Open
flavorjones opened this issue Jan 13, 2021 · 5 comments
Open

Comments

@flavorjones
Copy link
Member

flavorjones commented Jan 13, 2021

See #2168 and #2169 for details, but the short version is that we should be more rigorous about saving-and-restoring error handlers and error handler metadata around libxml2 calls, in case any are being made recursively within Nokogiri.

For example, these lines in Nokogiri::HTML::Document::EncodingReader are calling HTML::SAX::PushParser to parse a chunk from the IO read callback of a regular document parse:

handler = SAXHandler.new
parser = Nokogiri::HTML::SAX::PushParser.new(handler)
parser << chunk rescue Nokogiri::SyntaxError

To allow users to do similarly complex things, we should always save-and-restore the error callbacks (which are the only global state I can think of that we regularly manipulate).

We're doing this in the HTML::SAX::PushParser class to cover ourselves in the aforementioned case:

Nokogiri_structured_error_func_save_and_set(&handler_state, NULL, NULL);
status = htmlParseChunk(ctx, chunk, size, Qtrue == _last_chunk ? 1 : 0);
Nokogiri_structured_error_func_restore(&handler_state);

typedef struct _libxmlStructuredErrorHandlerState {
void *user_data;
xmlStructuredErrorFunc handler;
} libxmlStructuredErrorHandlerState ;
void init_xml_syntax_error();
void Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state);
void Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state,
void *user_data,
xmlStructuredErrorFunc handler);
void Nokogiri_structured_error_func_restore(libxmlStructuredErrorHandlerState *handler_state);

void
Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state)
{
/* this method is tightly coupled to the implementation of xmlSetStructuredErrorFunc */
handler_state->user_data = xmlStructuredErrorContext;
handler_state->handler = xmlStructuredError;
}
void
Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state,
void *user_data,
xmlStructuredErrorFunc handler)
{
Nokogiri_structured_error_func_save(handler_state);
xmlSetStructuredErrorFunc(user_data, handler);
}
void
Nokogiri_structured_error_func_restore(libxmlStructuredErrorHandlerState *handler_state)
{
xmlSetStructuredErrorFunc(handler_state->user_data, handler_state->handler);
}

This issue is opened to make sure we remember to do this everywhere.

It's somewhat related to wrapping we need to do around any libxml2 callbacks which re-enter the Ruby interpreter and how we handle those exceptions, all of which are detailed at #1610.

@nwellnhof
Copy link

Also see xmlCtxtSetErrorHandler available since libxml2 2.13.0.

@flavorjones
Copy link
Member Author

Thanks for the pointer, @nwellnhof, that's great, and I'll make sure the error functions take the additional parser context argument so that I can polyfill.

Also note, the function names were changed in 04001e7 to be noko__structured_error_func_save_and_set, etc.

In that commit I also did a bit of the described wrapping.

@flavorjones flavorjones added this to the v1.17.0 milestone Jun 26, 2024
@nwellnhof
Copy link

I don't think the polyfill approach will work here. Save-and-restore is the only option for older libxml2.

@flavorjones
Copy link
Member Author

@nwellnhof Sorry, you're right, I used "polyfill" when I meant "shim", something like

void
noko__structured_error_func_save_and_set(
  xmlParserCtxtPtr ctxt,
  libxmlStructuredErrorHandlerState *handler_state,
  void *user_data,
  xmlStructuredErrorFunc handler
)
{
#ifdef HAVE_XMLCTXTSETERRORHANDLER
  xmlCtxtSetErrorHandler(ctxt, handler, user_data);
#else
  noko__structured_error_func_save(handler_state);
  xmlSetStructuredErrorFunc(user_data, handler);
#endif
}

though now I see that there are flavors of this for XPath, XInclude, TestReader, and schemas, too ... in any case, we'll migrate towards these APIs over time for sure.

@nwellnhof
Copy link

That would work but the main benefit is that xmlCtxtSetErrorHandler only has to be called once on a parser context, preferably right after creating it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants