-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚠️ Self-closing tags get corrupted 🚨 #83
Comments
Steps to reproduce: var scrape = require('html-metadata');
scrape.loadFromString('<div itemscope><span itemprop="price" content="139.90" /> <span itemprop="priceCurrency" content="PLN" /></div>').then(e => console.log(JSON.stringify(e)));
// {"schemaOrg":{"items":[{"properties":{"priceCurrency":["PLN"],"price":[" "]}}]}} Possible resolution:
var dom = microdataDom(htmlparser.parseDOM(html, {
decodeEntities: true,
+ recognizeSelfClosing: true
}), config);
var cheerio = require('cheerio');
cheerio.load('<div itemscope><span itemprop="price" content="139.90" /> <span itemprop="priceCurrency" content="PLN" /></div>').html()
// '<html><head></head><body><div itemscope><span itemprop="price" content="139.90"> <span itemprop="priceCurrency" content="PLN"></span></span></div></body></html>' |
Looks like cheeriojs/cheerio#598 might have a solution (setting {xmlMode: true} ? ) |
It's not enough (see # 1). And I'm not sure if "xml mode" supports html5. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The library doesn't support html5 tags (e.g. self-closing
span
).When parsing the following:
It adds "foo ... bar" to the price attribute until it won't find a closing
</span>
tag.The issue is in
chtml
which replaces/>
w/>
The text was updated successfully, but these errors were encountered: