Skip to content

Target vocabulary sizes and run-time selection of SIMD code paths

Compare
Choose a tag to compare
@github-actions github-actions released this 24 Jun 07:00
· 61 commits to master since this release
  • Add support for training with a target vocabulary size. This is an alternative for setting a minimum token count and will attempt to create a vocabulary with the given size. Target vocabulary sizes are enabled through the --context-target-size, --target-size, and --ngram-target-size options. (@sebpuetz)
  • SIMD code paths are now dynamically selected at run-time. It is thus not necessary anymore to compile finalfrontier with specific target features to use code paths for newer SIMD instruction sets. (@danieldk)
  • Add dot product implementation using FMA (fused multiply-add). (@danieldk)
  • Enable training with the fastText indexer. With future changes in finalfusion-rust and finalfusion-convert, this will allow you to crate fastText embeddings with finalfrontier! (@sebpuetz)