-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix whitespace loss, retain doctypes, remove inlined styles from HEAD #9
Conversation
Fix a couple things related to whitespace and style selection.
CI failure is due to Travis failing when pulling the meeseeks dependency from Github. Not sure what that's about, as deps are pulled successfully locally. Tests are passing locally. The next release of meeseeks will be compatible with this change. |
Looks great @willbarrett, thanks! For reference, the whitespace issue is found in I think an API that makes it easy to exchange the HTML parser would be the best way of going about this. Then we can provide an option for a pure elixir dependency in case a developer wouldn't want to compile rust code. The other changes are great too. I've a few ideas for what to improve, but I'll get back to you when I've tried out this PR 😄 |
I've opened #10, but it'll require rebasing of this branch. |
Nice. I'll rebase + follow that approach Monday. |
Sounds good. I'll be working on removing the rest of the |
Sorry, got busy with some unrelated work. I've merged in #11, and I'll start to work on the issues you've highlighted. |
No worries - I got busy too. I'll rebase on this branch today and update it now that your changes are merged. |
FYI, I've worked on the whitespace issue in #12. From what I've read, we want to preserve whitespace only when child elements consists of text nodes and inline-block nodes. So I've let Premailex taken care of the quirks in both Meeseeks and Floki. |
I switched to use Meeseeks for the |
In our case, the full readable whitespace is desirable - premailex is just one step of a long compilation process, so we'd like the result to be readable rather than fully concatenated. Perhaps we could add an option to retain all whitespace vs. only the lexically-relevant whitespace? |
That makes sense, I'll update #12 with the option to retain all whitespace. |
OK, cool. Since you're working on that changeset, is there a part of this that you'd like me to break off into a separate PR? I'd be happy to help out since I brought the issues to light. |
That would be great! I prefer to keep the PR's small, so if you could break this one up into separate PR's:
|
Awesome. I've got a bunch of PR reviews to get to, but I'll open at least one of those this afternoon. Retaining doctypes is somewhat harder under Floki. Is it acceptable if some of these changes require Meeseeks to work properly? |
Yeah, it's acceptable. |
I was thinking it through, and have rewritten the code so instead of removing whitespace by default, it'll only remove the whitespace when doing plain text conversion. So using Meeseeks, you will be able to preserve the original formatting. I'll include the rest of the improvements in a |
Nice. Yesterday afternoon went sideways, hopefully I'll have a PR for you this AM. |
Hey @danschultzer - I've tweaked the functionality described in issue #8 , and added some test cases. Happy to discuss this work! Here's the high-level:
Improvements:
<b>this space</b> <i>should be respected</i>
TODO
Why Meeseeks?
Currently, there isn't a sane HTML parser written in Elixir or Erlang, according to multiple places on the interwebs, and copious research. Meeseeks provides a good wrapper and nice functionality around the html5ever parser, written in Rust, and the maintainer's a super-smart super-helpful guy. The main bug being fixed here, the missing whitespace between tags, will result in broken display for most users of this library at some point. I think it's worth moving to a different parser for the functionality necessary to resolve the issue.