Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling for unclosed tags #1088

Open
ricardoboss opened this issue Jun 2, 2024 · 0 comments
Open

Better handling for unclosed tags #1088

ricardoboss opened this issue Jun 2, 2024 · 0 comments

Comments

@ricardoboss
Copy link

Hi there!

I am having trouble parsing some HTML not in my control that contains unclosed tags.
An example:

<html>
<head>
	<title>Hello World</title>

	<link href="test.css">
</head>
<body>
	<h1>Hello World</h1>
</body>
</html>

As you can see, the <link> tag is not properly closed. This causes the parser to put everything after it inside (so, as a child within) the <link> tag and add a "shadow" </link></head><body></body></html> at the end.

This makes it impossible to traverse the DOM.

I'd like to have a way to configure how such cases are handled. Maybe by specifying which tags cannot contain content (auto close tags). Or maybe by changing a setting that causes the parser to automatically close tags once a parent tag has been closed.

Any help would be appreciated!

@mosuem mosuem transferred this issue from dart-archive/html Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants