Releases: adbar/simplemma
Releases · adbar/simplemma
simplemma-0.5.0
- faster, more efficient code
- dropped support for Python 3.5
simplemma-0.4.0
- new languages: Armenian, Greek, Macedonian, Norwegian (Bokmål), and Polish
- language data reviewed for: Dutch, Finnish, German, Hungarian, Latin, Russian, and Swedish
- Urdu removed of language list due to issues with the data
- add support for Python 3.10 and drop support for Python 3.4
- improved decomposition and tokenization algorithms
simplemma-0.3.0
- improved models and disambiguation
- improved tokenization
- extended rules for German
simplemma-0.2.2
- Work on decomposition rules
- Reviewed language data
- Cleaner code
simplemma-0.2.1
- Better decomposition into subwords by greedy algorithm
- First benchmarks and data-based corrections: German, French, English, Spanish
simplemma-0.2.0
- Languages added: Danish, Dutch, Finnish, Georgian, Indonesian, Latin, Latvian, Lithuanian, Luxembourgish, Turkish, Urdu
- Improved word pair coverage
- Tokenization functions added
- Limit greediness and range of potential candidates
simplemma-0.1.0
v0.1.0 prepare release