Skip to content

Explicit n-grams, single finalfrontier command

Compare
Choose a tag to compare
@danieldk danieldk released this 08 Nov 08:48
· 85 commits to master since this release
  • The most user-visible change is that ff-train-deps and ff-train-skipgram have been merged into one command, finalfrontier. Dependency and skipgram embeddings can be trained with respectively finalfrontier deps and finalfrontier skipgram.

  • Support for training explicit subwords has been added.

    Thus far, finalfrontier has followed the same subword approach as fastText: each subword (n-gram) mapped to an embedding using the FNV-1 hash function. This approach reduces the number of embeddings when the corpus contains a large number of possible embeddings, at the cost of collisions. With the --subwords ngrams option, finalfrontier uses an (explicit) n-gram vocabulary instead.

  • The hogwild and finalfrontier-utils crates have been merged into the finalfrontier crate. Consequently, finalfrontier now consists of a single crate.

  • When the number of threads is not specified, finalfrontier has traditionally used half the logical CPUs. This has been refined to use half the number of logical GPUs, capped at 20 threads. Using more than 20 threads can slow convergence drastically on typical corpora.