-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doesn't work when specific Japanese characters exist in a tag #250
Comments
For example, the following characters' UTF-8 byte sequence end with \xe2, \x80 or \x8b, so the same problem occurs.
|
Thanks for the extra info, I think I do now understand the cause. The intent of the trim() was to remove the U+2000, i.e. a multibyte character of three pieces/bytes. However, because trim() it is not multibyte aware, it handles it as three separate characters. So we should use here $tags = str_replace("\xe2\x80\x8b", '', $tags); // strip word/wordpad breaklines(U+200b) |
Thanks, I think it works well. I confirmed the following small test code worked expectedly.
|
Thanks for testing. trim() works only on the end of the string, str_replace() everywhere. I think that is fine for tags. I will implement it. |
Tag plugin doesn't work when specific Japanese characters, e.g '一'(
U+4E00
), exist in a tag like as follows.Because '一's UTF-8 byte sequence(
\xE4\xB8\x80
) get corrupted by the following code insyntax_plugin_tag_tag::handle(tag.php)
.It removes
\x80
from\xE4\xB8\x80
('一's UTF-8 byte sequence), and its result becomes an invalid sequence\xE4\xB8
.The text was updated successfully, but these errors were encountered: