Skip to content

Releases: adbar/simplemma

simplemma-0.5.0

19 Nov 16:26
Compare
Choose a tag to compare
  • faster, more efficient code
  • dropped support for Python 3.5

simplemma-0.4.0

19 Oct 16:44
Compare
Choose a tag to compare
  • new languages: Armenian, Greek, Macedonian, Norwegian (Bokmål), and Polish
  • language data reviewed for: Dutch, Finnish, German, Hungarian, Latin, Russian, and Swedish
  • Urdu removed of language list due to issues with the data
  • add support for Python 3.10 and drop support for Python 3.4
  • improved decomposition and tokenization algorithms

simplemma-0.3.0

08 Apr 20:18
Compare
Choose a tag to compare
  • improved models and disambiguation
  • improved tokenization
  • extended rules for German

simplemma-0.2.2

24 Feb 13:31
Compare
Choose a tag to compare
  • Work on decomposition rules
  • Reviewed language data
  • Cleaner code

simplemma-0.2.1

02 Feb 16:27
Compare
Choose a tag to compare
  • Better decomposition into subwords by greedy algorithm
  • First benchmarks and data-based corrections: German, French, English, Spanish

simplemma-0.2.0

25 Jan 17:54
Compare
Choose a tag to compare
  • Languages added: Danish, Dutch, Finnish, Georgian, Indonesian, Latin, Latvian, Lithuanian, Luxembourgish, Turkish, Urdu
  • Improved word pair coverage
  • Tokenization functions added
  • Limit greediness and range of potential candidates

simplemma-0.1.0

18 Jan 18:51
Compare
Choose a tag to compare
v0.1.0

prepare release