New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

UTF-8 并不区分大小端 #1

Open

lilydjwg opened this issue Apr 12, 2016 · 1 comment

lilydjwg commented Apr 12, 2016

区分大小端的 Unicode 编码有 UTF-16、UCS-2、UTF-32、UCS-4，所以描述不太对啦。看上去你的程序用的是 UTF-16LE？

Author

lilydjwg commented Apr 12, 2016

那个，README 6.1 节中也有很多问题呢。

「序号」？如果你指的是「code point」的话，这个建议翻译成「码点」。UTF-16 也是变长的，占用二或四个字节（surrogate pairs）。

「unicode、UCS-2、UTF-16，三者在数值上相同」这句话我并不能理解。Unicode 并无法直接存储和传输，而后两者在规定了大小端之后是可以存储和传输的。它们并没有「数值」这个概念。

目前 UTF-8 最长为四个字节，因为现在 Unicode 码点只分配到 U+10FFFF。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment