You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, my question is that for english, the output of model is directly the index of char If I understand correctly,then it can map between char and sequence. And for japanese, what is the output of model? and how to create map between index and kanji of jp.
The text was updated successfully, but these errors were encountered:
I see the english_characters , what about japanese? And too get the japanese_characters, token_type using is 'char' or 'bpe'?
ENGLISH_CHARACTERS = [a-z],
@ymzlygw I think for Japanese, Korean, Chinese we should use subwords instead of characters. If you can define a vocabulary contains all characters of the language like in english then you can use character mode. As far as I know those languages have characters that are a combination of "some characters in alphabet" so I think it's quite a lot for you to define a characters vocabulary file.
Hi, I tried to train a Chinese model and it seems not good, I followed the steps in Conformer the same way with English. can have a suggestion on how could I properly train a Chinese model? Thanks!
Hi, my question is that for english, the output of model is directly the index of char If I understand correctly,then it can map between char and sequence. And for japanese, what is the output of model? and how to create map between index and kanji of jp.
The text was updated successfully, but these errors were encountered: