Releases: finalfusion/finalfrontier
Skip SSE tests on non-SSE platforms
Directly save to other formats
Do you use fastText, but you would also like to get your hands on structured skipgram, directional skipgram, or dependency embeddings models? This is now possible, since finalfrontier 0.9.1 adds support for saving trained embeddings in the fastText format 🎉.
With the new --output
flag, you can save embeddings to other formats in addition to finalfusion. Options are: fasttext
, word2vec
binary, text
, or textdims
Target vocabulary sizes and run-time selection of SIMD code paths
- Add support for training with a target vocabulary size. This is an alternative for setting a minimum token count and will attempt to create a vocabulary with the given size. Target vocabulary sizes are enabled through the
--context-target-size
,--target-size
, and--ngram-target-size
options. (@sebpuetz) - SIMD code paths are now dynamically selected at run-time. It is thus not necessary anymore to compile finalfrontier with specific target features to use code paths for newer SIMD instruction sets. (@danieldk)
- Add dot product implementation using FMA (fused multiply-add). (@danieldk)
- Enable training with the fastText indexer. With future changes in
finalfusion-rust
andfinalfusion-convert
, this will allow you to crate fastText embeddings with finalfrontier! (@sebpuetz)
CoNLL-U dependencies and improved error messages
- Update the dependency format from CoNLL-X to CoNLL-U.
- Improve error handling and error messages.
- Remove the use of end-of-sentence markers.
- Upgrade to finalfusion 0.12.
Explicit n-grams, single finalfrontier command
-
The most user-visible change is that
ff-train-deps
andff-train-skipgram
have been merged into one command,finalfrontier
. Dependency and skipgram embeddings can be trained with respectivelyfinalfrontier deps
andfinalfrontier skipgram
. -
Support for training explicit subwords has been added.
Thus far, finalfrontier has followed the same subword approach as fastText: each subword (n-gram) mapped to an embedding using the FNV-1 hash function. This approach reduces the number of embeddings when the corpus contains a large number of possible embeddings, at the cost of collisions. With the
--subwords ngrams
option, finalfrontier uses an (explicit) n-gram vocabulary instead. -
The
hogwild
andfinalfrontier-utils
crates have been merged into thefinalfrontier
crate. Consequently, finalfrontier now consists of a single crate. -
When the number of threads is not specified, finalfrontier has traditionally used half the logical CPUs. This has been refined to use half the number of logical GPUs, capped at 20 threads. Using more than 20 threads can slow convergence drastically on typical corpora.
0.6.1
Directional skipgram
This release has the following changes:
- Add support for the directional skip-gram model (Song et al., 2018).
- Store norms in a finalfusion chunk, making it possible to retrieve the unnormalized embeddings.
- Better defaults in for skip-gram models: context size 5 -> 10, dimensions 100 -> 300, epochs: 5 -> 15
- Improved command-line option handling.
Dependency embeddings
The addition in this release is support for dependencies as context. This makes it possible to train dependency embeddings as described by Levy & Goldberg, 2014. The dependency embedding model can be tuned in fine-grained detail (such as the depth of the relations).
- Add dependency relations.
- Refactoring training to make it easier to add different context types.
- Precompiled releases, including a MUSL target.
- Migration to Rust 2018.
ff-train
has been renamed toff-train-skipgram
.
Switch from rust2vec to finalfusion
- The
rust2vec
crate is renamed tofinalfusion
. This minor release changes finalfrontier to usefinalfusion
as a dependency. - This is the first release that provides builds for Linux (both glibc and a static MUSL binary) and macOS.
Directly to finalfusion
The most important change in this release is that finalfrontier stores trained embeddings in the finalfusion format, which is implemented by rust2vec and finalfusion-python. This format is more generic than the old finalfrontier format and easier to implement readers for.
As a result of these changes, finalfrontier is now only for training embeddings. To actually use the embeddings in your own program, use rust2vec.
Summary of changes since v0.2.0:
- Store trained embeddings in finalfusion format.
- Remove
ff-similarity
,ff-convert
, andff-compute-accuracy
. This functionality is provided by rust2vec.