Skip to content

Commit

Permalink
prepare new version: 0.2.2
Browse files Browse the repository at this point in the history
  • Loading branch information
adbar committed Jun 14, 2022
1 parent bfa6121 commit 8c1d9b0
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 6 deletions.
14 changes: 14 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@ History
=======


0.2.2
-----

* Fixed bug in probability normalization (#6)
* Fully implemented data type argument in ``classify()``
* Adapted training scripts to Python3 (untested)


0.2.1
-----

* Maintenance: update and simplify code


0.2.0
-----

Expand Down
10 changes: 5 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Changes in this fork
Execution speed has been improved and the code base has been optimized for Python 3.6+:

- Import: Loading the package (``import py3langid``) is about 30% faster
- Startup: Loading the default classification model is 20-25x faster
- Startup: Loading the default classification model is 25-30x faster
- Execution: Language detection with ``langid.classify`` is 5-6x faster on paragraphs (less on longer texts)

For implementation details see this blog post: `How to make language detection with langid.py faster <https://adrien.barbaresi.eu/blog/language-detection-langid-py-faster.html>`_.
Expand Down Expand Up @@ -50,7 +50,7 @@ Basics:
>>> text = 'This text is in English.'
# identified language and probability
>>> langid.classify(text)
('en', -56.77428913116455)
('en', -56.77429)
# unpack the result tuple in variables
>>> lang, prob = langid.classify(text)
# all potential languages
Expand All @@ -68,11 +68,11 @@ More options:
>>> identifier.set_languages(['de', 'en', 'fr'])
# this won't work well...
>>> identifier.classify('这样不好')
('en', -81.83166265487671)
('en', -81.831665)
# normalization of probabilities to an interval between 0 and 1
>>> identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)
>>> identifier.classify('This should be enough text.'))
>>> identifier.classify('This should be enough text.')
('en', 1.0)
Expand All @@ -94,7 +94,7 @@ On the command-line
# define a subset of target languages
$ echo "This won't be recognized properly." | langid -n -l fr,it,tr
('it', 0.9703832808613264)
('it', 0.97038305)
Expand Down
2 changes: 1 addition & 1 deletion py3langid/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .langid import classify, rank, set_languages

__version__ = '0.2.1'
__version__ = '0.2.2'

0 comments on commit 8c1d9b0

Please sign in to comment.