This is a simple pinyin autocorrection resolution.
This repository is for learning and communicating purpose only.
This program takes a sequence of pinyin as input, and guesses the most possible sequence you want.
For instance,
if you wanna type "苹果" and inputs "zhognguo" (which is wrong), the program fixs it into "zhongguo":
>>> correct("zhognguo")
if you wanna type "清华大学" and you made it wrong with "qignhuadaxeu", it will be fixed into "qinghuadaxue":
>>> correct("qignhuadaxeu")
This function splits a continuous sequence of pinyin inputs into separate tokens. For example:
>>> split("pingguo")
"ping guo"
>>> split("qinghuadaxue")
"qing hua da xue"
The algorithm under the hood is basically edit distance + probabilistic model + n-gram.
Also, dynamic programming is used for pinyin split function.