You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support of ISO-8859-1 and ISO-8859-15 for French.
Re-enable Hungarian language models (ISO-8859-2 and Windows-1250) which used to conflict with other charsets (should be better now).
Differentiate ASCII detection and detection failure.
Improve single-byte charset detection confidence algorithm (fixes for instance Windows-1251 Russian text detection).
"UTF-16" is now outputted with endianness information (UTF-16LE/BE).
Add UTF-32 BOM detection.
Discard single byte charsets upon illegal codepoint detection.
Internal redesign of single-byte charmaps with more semantics, and variable sample size length (different languages have different sizes of grapheme lists).
A lot more test files (33 successful unit tests should be successful with make test).
Adding python scripts to generate language models from Wikipedia data in a single command.