Releases · BYVoid/uchardet

05 Dec 12:25

Jehan

v0.0.5

886e03a

Version 0.0.5 released. Latest

Latest

Revert UTF-16 and UTF-32 label change:
it was an error to specify endianness for texts with BOM.
The Unicode standard explicitly warns against it, and it actually
even (partially) breaks conversions.
Added supports:
- French: Windows-1252.
- German: ISO-8859-1, Windows-1252
- Esperanto: ISO-8859-3
- Turkish: ISO-8859-3 and ISO-8859-9
- Thai: ISO-8859-11 (and TIS-620 model rebuilt).
Single Byte charset detection algorithm improved:
detection of control characters lowers confidence.

Assets 2

03 Dec 19:08

Jehan

v0.0.4

e4260f4

Version 0.0.4 released.

Add support of ISO-8859-1 and ISO-8859-15 for French.
Re-enable Hungarian language models (ISO-8859-2 and Windows-1250) which used to conflict with other charsets (should be better now).
Differentiate ASCII detection and detection failure.
Improve single-byte charset detection confidence algorithm (fixes for instance Windows-1251 Russian text detection).
"UTF-16" is now outputted with endianness information (UTF-16LE/BE).
Add UTF-32 BOM detection.
Discard single byte charsets upon illegal codepoint detection.
Internal redesign of single-byte charmaps with more semantics, and variable sample size length (different languages have different sizes of grapheme lists).
A lot more test files (33 successful unit tests should be successful with make test).
Adding python scripts to generate language models from Wikipedia data in a single command.

Assets 2

19 Nov 14:35

Jehan

v0.0.3

ff5fd5e

Version 0.0.3 Released.

A quick release after 0.0.2 mostly to fix a bad crash on the command
line tool when charset detection failed (or detected ASCII).

Additionaly:

The build now includes more test files for various language/encoding
and a make test target for unit testing (20 encoding detection tests
should be successful upon running it).
The build has a new BUILD_STATIC option, by default set to ON,
allowing to disable static library building if not needed.
All encoding names are iconv-compatible, enabling developers to
directly feed the result of uchardet_get_charset() into libiconv.
Compilation warnings fixed.

Assets 2

16 Nov 15:18

Jehan

v0.0.2

d0ccdd5

Version 0.0.2

The primary goal of this release is to set a fixed point in time for distributions, since most are using various commits as their source, but still calling it 0.0.1 (there was actually a version 0.0.1 tarball available in GoogleCode, dating from 2011).

Version 0.0.2 mostly fixes various bugs and allow querying charsets for multiple files in the same command with uchardet command line tool.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: BYVoid/uchardet

Version 0.0.5 released.

Version 0.0.4 released.

Version 0.0.3 Released.

Version 0.0.2