Skip to content

SimZhou/pinyin_autocorrection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pinyin Autocorrection

This is a simple pinyin autocorrection resolution.

This repository is for learning and communicating purpose only.

Functions

1. Pinyin correction

This program takes a sequence of pinyin as input, and guesses the most possible sequence you want.

For instance,

if you wanna type "苹果" and inputs "zhognguo" (which is wrong), the program fixs it into "zhongguo":

>>> correct("zhognguo")
"zhongguo"

if you wanna type "清华大学" and you made it wrong with "qignhuadaxeu", it will be fixed into "qinghuadaxue":

>>> correct("qignhuadaxeu")
"qinghuadaxue"

2. Pinyin split

This function splits a continuous sequence of pinyin inputs into separate tokens. For example:

>>> split("pingguo")
"ping guo"
>>> split("qinghuadaxue")
"qing hua da xue"

Algorithm

The algorithm under the hood is basically edit distance + probabilistic model + n-gram.

Also, dynamic programming is used for pinyin split function.

About

a showcase for a pinyin autocorrection tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages