Segmenter results for thai sentence seems incorrect. #3208
Unanswered
riajain0412
asked this question in
Q&A
Replies: 2 comments 8 replies
-
The breakpoints are in terms of UTF-8 indices. |
Beta Was this translation helpful? Give feedback.
8 replies
-
Okay. Thank You for your help. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I ran the below code to check the breakpoint for thai sentence:
And it gave the result this: 0 9 21 39 51 60 66.
However, the above thai sentence only have 17-18 characters so howcome ICU4X segmenter giving 39,51,60 etc as breakpoints?
Is this an expected resulted? If yes, then how should I take these indices as?
Beta Was this translation helpful? Give feedback.
All reactions