Skip to content

Db, Multipartdb, Batch, and more; perf improv with __slots__

Compare
Choose a tag to compare
@thammegowda thammegowda released this 04 Aug 01:03
· 46 commits to master since this release
  • add nlcodec-freqs CLI to setup.py
  • log time and memory usage for learn task
  • log BPE merge operations once every 2s instead of all operations
  • using__slots__: ~25% faster, %30 less memory for BPE with 3M word types
  • nlcodec.db.core with Db and MultipartDb
  • nlcodec.db.batch with Batch and BathIterable
  • CLI nlcodec.learn for learning BPE using pyspark
  • CLI nlcodec.bitextdb to build a database from parallel text