v0.1.6

EricLBuehler released this 13 May 09:33

· 545 commits to master since this release

What's Changed

Causal Masking and model selection from .toml files by @EricLBuehler in #278
Remove sliding window mask from quantized phi3 by @EricLBuehler in #280
Fix Causal Mask by @EricLBuehler in #282
Fix mask caching by @EricLBuehler in #283
More intelligent scheduler by @EricLBuehler in #279
Use warn! macro by @EricLBuehler in #289
Use a public repo for tests tokenizer.json by @EricLBuehler in #290
Implement Speculative Decoding by @EricLBuehler in #242
Add X-LoRA support for GGUF by @EricLBuehler in #293
Add some "senseful" fallbacks for isq by @LLukas22 in #272
Implement dynamic LoRA swapping by @EricLBuehler in #262
More verbose logging when loading locally by @EricLBuehler in #298
Make speculative decoding faster without anything fancy by @EricLBuehler in #297
fix bug with mistralrs cuda by @joshpopelka20 in #299

New Contributors

@joshpopelka20 made their first contribution in #299

New Features

Speculative decoding introduced
GGUF support for Phi 3
Dynamic LoRA adapter activation support

Full Changelog: v0.1.5...v0.1.6

Contributors

LLukas22, EricLBuehler, and joshpopelka20

Assets 2