GitHub - SimZhou/pinyin_autocorrection: a showcase for a pinyin autocorrection tool

Pinyin Autocorrection

This is a simple pinyin autocorrection resolution.

This repository is for learning and communicating purpose only.

This program takes a sequence of pinyin as input, and guesses the most possible sequence you want.

For instance,

if you wanna type "苹果" and inputs "zhognguo" (which is wrong), the program fixs it into "zhongguo":

>>> correct("zhognguo")
"zhongguo"

if you wanna type "清华大学" and you made it wrong with "qignhuadaxeu", it will be fixed into "qinghuadaxue":

>>> correct("qignhuadaxeu")
"qinghuadaxue"

This function splits a continuous sequence of pinyin inputs into separate tokens. For example:

>>> split("pingguo")
"ping guo"
>>> split("qinghuadaxue")
"qing hua da xue"

The algorithm under the hood is basically edit distance + probabilistic model + n-gram.

Also, dynamic programming is used for pinyin split function.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
doc		doc
src		src
.gitignore		.gitignore
README.md		README.md