Autocorrect

Spelling corrector in python. Currently supports English, Polish, Turkish, Russian and Ukrainian, but you can easily add new languages.

Based on: https://github.com/phatpiglet/autocorrect

Installation

pip install autocorrect

Examples

>>> from autocorrect import Speller
>>> spell = Speller()
>>> spell("I'm not sleapy and tehre is no place I'm giong to.")
"I'm not sleepy and there is no place I'm going to."

>>> spell = Speller(lang='pl')
>>> spell('ptaaki latatją kluczmm')                                         
'ptaki latają kluczem'

Adding new languages

First add special letters in autocorrect/constants.py.

Now, you need a bunch of text. Easiest way is to download wikipedia. For example for Spanish go to: https://dumps.wikimedia.org/eswiki/latest/ and download eswiki-latest-pages-articles.xml.bz2

tar -jxvf eswiki-latest-pages-articles.xml.bz2

After that:

>>> from autocorrect.word_count import count_words
>>> count_words('eswiki-latest-pages-articles.xml', 'ru')

tar -zcvf autocorrect/data/es.tar.gz word_count.json

Speed

%timeit spell("I'm not sleapy and tehre is no place I'm giong to.")
410 µs ± 6.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit spell("There is no comin to consiousnes without pain.")
186 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Contribute

https://github.com/fsondej/autocorrect

Todo

some words are corrected to implausible versions (see english2 in unit_tests)
python2 doesn't support correction with polish special chars
option to disable double typos for speed
it looks that loading spellers multiple times may be leaking memory
in double typos we check same words twice
clean repo: https://stackoverflow.com/questions/2116778/reduce-git-repository-size
maybe use LFS

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
autocorrect		autocorrect
optional_languages		optional_languages
unit_tests		unit_tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autocorrect

Installation

Examples

Adding new languages

Speed

Contribute

Todo

About

Releases

Packages

Languages

License

filyp/autocorrect-deprecated

Folders and files

Latest commit

History

Repository files navigation

Autocorrect

Installation

Examples

Adding new languages

Speed

Contribute

Todo

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages