Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallible UTF-8 decoding? #50

Open
askeksa-google opened this issue Sep 23, 2022 · 2 comments
Open

Fallible UTF-8 decoding? #50

askeksa-google opened this issue Sep 23, 2022 · 2 comments

Comments

@askeksa-google
Copy link

The UTF-8 decoding API in Dart has an allowMalformed option that specifies the behavior on malformed input. allowMalformed: true emits replacement characters for malformed portions. allowMalformed: false throws an exception on malformed input.

While allowMalformed: true corresponds directly to string.new_lossy_utf8, it seems difficult to implement allowMalformed: false without validating the input beforehand, which seems silly, since the decoding performs validation anyway.

Would it be possible to add another decoding instruction (e.g. string.new_utf8_try) that fails in a non-trapping manner on malformed input, for instance by producing null?

It's not important for the instruction to communicate why it failed, since the failing case is not considered performance critical, and thus any diagnostic information that the Dart API wants to provide about the failure can just be obtained by scanning the input separately, once the failure situation has been established.

@dcodeIO
Copy link
Contributor

dcodeIO commented Sep 23, 2022

Somewhat related to #19, where in general it seems that trapping operations aren't very useful. Returning null seems fine to me, so an appropriate language-level exception can be thrown as a result, or null returned. Seems preferable over letting the instruction throw, as it seems likely that such an exception would have to be caught and then replaced with a language-level exception anyway.

@Liedtke
Copy link

Liedtke commented Jan 19, 2023

V8 has implemented this here:
string.new_utf8_try, opcode 0xfb8f.
The instruction traps (out of bounds) on invalid offset / length values.
The instruction returns null on any encoding errors.
Otherwise it is equivalent to string.new_utf8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants