GitHub - rdonkin/chardet: Python 2/3 compatible character encoding detector.

Chardet: The Universal Character Encoding Detector

Detects

ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
EUC-KR, ISO-2022-KR (Korean)
KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
ISO-8859-5, windows-1251 (Bulgarian)
ISO-8859-1, windows-1252 (Western European languages)
ISO-8859-7, windows-1253 (Greek)
ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
TIS-620 (Thai)

Note

Our ISO-8859-2 and windows-1250 (Hungarian) probers have been temporarily disabled until we can retrain the models.

Requires Python 2.7 or 3.4+.

Installation

Install from PyPI:

pip install chardet

Documentation

For users, docs are now available at https://chardet.readthedocs.io/.

Command-line Tool

chardet comes with a command-line script which reports on the encodings of one or more files:

% chardetect somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

About

This is a continuation of Mark Pilgrim's excellent chardet. Previously, two versions needed to be maintained: one that supported python 2.x and one that supported python 3.x. We've recently merged with Ian Cordasco's charade fork, so now we have one coherent version that works for Python 2.7+ and 3.4+.

maintainer:	Dan Blanchard

Name		Name	Last commit message	Last commit date
Latest commit History 287 Commits
chardet		chardet
docs		docs
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTES.rst		NOTES.rst
README.rst		README.rst
bench.py		bench.py
convert_language_model.py		convert_language_model.py
setup.cfg		setup.cfg
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chardet: The Universal Character Encoding Detector

Installation

Documentation

Command-line Tool

About

About

Releases

Packages

Languages

License

rdonkin/chardet

Folders and files

Latest commit

History

Repository files navigation

Chardet: The Universal Character Encoding Detector

Installation

Documentation

Command-line Tool

About

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages