All notable changes to this project will be documented in this file. Note that the project (and python wheel) is built from a duorepo (2 separate repos used together), so changes from both will be reflected here, but the commits are spread between both.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning since v1.0.0.
- NativeWFST support for checking for impossible graphs (no successful path), which can then fail to compile.
- Debugging info for NativeWFST.
lattice_beam
default value reduced from8.0
to6.0
, to hopefully avoid occasional errors.
- Reloading grammars with NativeWFST.
- Minor fix for OpenBLAS compilation for some architectures on linux/mac
- Native FST support, via direct wrapping of OpenFST, rather than Python text-format implementation
- Eliminates grammar (G) FST compilation step
- Internalized many graph construction steps, via direct use of native Kaldi/OpenFST functions, rather than invoking separate CLI processes
- Eliminates need for many temporary files (FSTs,
.conf
s, etc) and pipes
- Eliminates need for many temporary files (FSTs,
- Example usage for allowing mixing of free dictation with strict command phrases
- Experimental support for "look ahead" graphs, as an alternative to full HCLG compilation
- Experimental support for rescoring with CARPA LMs
- Experimental support for rescoring with RNN LMs
- Experimental support for "priming" RNNLM previous left context for each utterance
- OpenBLAS is now the default linear algebra library (rather than Intel MKL) on Linux/MacOS
- Because it is open source and provides good performance on all hardware (including AMD)
- Windows is more difficult for this, and will be implemented soon in a later release
- Default
tmp_dir
is now set to[model_dir]/cache.tmp
tmp_dir
is now optional, and only needed if caching compiled FSTs (or for certain framework/option combinations)- File cache is now stored at
[model_dir]/file_cache.json
- Optimized adding many new words to the lexicon, in many different grammars, all in one loading session: only rebuild
L_disambig.fst
once at the end. - External interfaces:
Compiler.__init__()
, decoding setup, etc. - Internal interfaces: wrappers, etc.
- Major refactoring of C++ components, with a new inheritance hierarchy and configuration mechanism, making it easier to use and test features with and without "activity"
- Many build changes
- Python 2.7 support: it may still work, but will not be a focus.
- Google cloud speech-to-text removed, as an unneeded dependency. Alternative dictation is still supported as an option, via a callback to an external provider.
- Separate CLI Kaldi/OpenFST executables
- Indirect AGF graph compilation (framework==
agf-indirect
) - Non-native FSTs
- parsing_framework==
text
- New speech models (should be better in general, and support new noise resistance)
- Make failed AGF graph compilation save and output stderr upon failure automatically
- Example of complete usage with a grammar and microphone audio
- Various documentation
- Top FST now accepts various noise phones (if present in speech model), making it more resistant to noise
- Cleanup error handling in compiler, supporting Dragonfly backend automatically printing excerpt of the Rule that failed
- Mysterious windows newline bug in some environments
- Add automatic saving of text FST & compiled FST files with log level 5
- Miscellaneous naming
- Support compiling some complex grammars (Caster text manipulation), by simplifying during compilation (remove epsilons, and determinize)
- Add missing rnnlm library file in MacOS build
- Windows wheels now only require the VS2017 (not VS2019) redistributables to be installed
- Can now pass configuration dict to
KaldiAgfNNet3Decoder
,PlainDictationRecognizer
(withoutHCLG.fst
). - Continuous Integration builds run on GitHub Actions for Windows (x64), MacOS (x64), Linux (x64).
- Refactor of passing configuration to initialization.
PlainDictationRecognizer.decode_utterance
can takechunk_size
parameter.- Smaller binaries: MacOS 11MB -> 7.6MB, Linux 21MB -> 18MB.
- Confidence measurement in the presence of multiple, redundant rules.
- Python3 int division bug for cloud dictation.