-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better handling of percent-encoded image references #605
base: master
Are you sure you want to change the base?
better handling of percent-encoded image references #605
Conversation
Don't mangle non-encoded non-ASCII characters.
Correctly handle percent encoded URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
IMAGE_SOURCE_FROM_BYTES sounds somewhat useful. Nothing like that is used anywhere?
if ( stream.isNull() ) { | ||
// Try again in case cover_image_path is percent-encoded | ||
cover_image_path = LVCombinePaths(codeBase, DecodeHTMLUrlString(cover_image_href)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we don't need this "try again" ? We're still doing it elsewhere.
The thing is that we may get crappy EPUBs, with sometimes the %escapes literally in the zip item names (that is, the correctly %encoded attribute values in some XML, once decoded, won't be found, because the zip contains the original value with %), so these "try again" were added to handle these crappy cases. Also, zip item names don't have any encoding, so we meet stuff as bytes, which may be or not be utf8.
Some possibly related issues that I managed to find again:
koreader/koreader#7661
Bottomest item in #326.
Search for "try again" in ef95bcc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we don't need this "try again" ? We're still doing it elsewhere. The thing is that we may get crappy EPUBs, with sometimes the %escapes literally in the zip item names (that is, the correctly %encoded attribute values in some XML, once decoded, won't be found, because the zip contains the original value with %), so these "try again" were added to handle these crappy cases.
In theory, but are there actually instances of this?
Also, zip item names don't have any encoding, so we meet stuff as bytes, which may be or not be utf8.
Some possibly related issues that I managed to find again: koreader/koreader#7661 Bottomest item in #326. Search for "try again" in ef95bcc.
At least the current code in ldomNode::getObjectImageSource
does it in the right order: correct behavior first (percent-encoded), ugly workaround second…
But that code in ldomDocumentFragmentWriter::convertHref
seems icky:
// Depending on what's calling us, href may or may not have
// gone thru DecodeHTMLUrlString() to decode %-encoded bits.
// We'll need to try again with DecodeHTMLUrlString() if not
// initially found in "pathSubstitutions" (whose filenames went
// thru DecodeHTMLUrlString(), and so did 'codeBase').
Possibly percent-decoding a string 2 times…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, but are there actually instances of this?
I think so, if the code "try again with" is there :) (May be - don't remember - the original code was bad, and I added the first chunk to do it right, keeping the "bad" chunk as a fallbadk (as it has most often never hurt us) just in case.
At least the current code in ldomNode::getObjectImageSource does it in the right order: correct behavior first (percent-encoded), ugly workaround second…
So, re-order existing "try again" code in the right order - rather than removing the "try again".
I don't remember all this and don't want to dig in - so trusting you. Just remember crengine doesn't enforce/require strictly-perfect-EPUB, but was made (with such "try again" branches) to handle the crappy ones we've met over the years.
Nope. |
DecodeHTMLUrlString
so it does not mangle non-ASCII characterslunasvgDrawImageHelper
to properly handle a percent-encoded image URLCf. koreader/koreader#12656.
This change is