Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

《上古音系》中有GB 18030轉換遺存的PUA碼點 #13

Open
Artoria2e5 opened this issue Sep 12, 2016 · 2 comments
Open

《上古音系》中有GB 18030轉換遺存的PUA碼點 #13

Artoria2e5 opened this issue Sep 12, 2016 · 2 comments

Comments

@Artoria2e5
Copy link

Artoria2e5 commented Sep 12, 2016

我在去 whatwg/encoding#27 划水的時候,使用 Google 搜尋了一下“”這個PUA字元,結果發現了 http://ytenx.org/dciangx/cjeng// 這裡。

由於 PUA 字元的臨時本質和 Unicode 字元屬性資料的缺失,建議還是套用我在維基百科彙編的 PUA 對應,換成早在 Unicode 4.1 就有的這些正式字元。對於連結相容性,可以考慮假定請求路徑的 PUA 字元均為 GB 轉換所致,全文替換後正常處理。

@BYVoid
Copy link
Owner

BYVoid commented Mar 27, 2020

不好意思,時隔多年。請問對應的正確字符是?

@Artoria2e5
Copy link
Author

Artoria2e5 commented Mar 28, 2020

有 24 个要换,见 https://www.unicode.org/L2/L2006/06394-gb18030-2005.txt 。处理完之后建议找一下 [\uE700-\uE800] ,还有的话应该是 GBK 残留,可以按照维基百科的私有字符对应表处理。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants