Release v0.1.8 · EricLBuehler/mistral.rs

Overview

Documentation improvements
Better handling of CTRL-C in interactive mode
Matmul via low-precision kernels to take advantage of faster cuBLAS GEMM kernels (thanks @lucasavila00)
New loading API (thanks @Jeadie)
Various small bug fixes
Reduce dependancy complexity (thanks @LLukas22)

bug fix: llama kv cache part by @keisuke-niimi-insightedge-jp in #300
Refactor cache manager and kv cache by @EricLBuehler in #304
Update the docs for ISQ and misc by @EricLBuehler in #310
Make pyo3 an optional dependency in mistralrs-core by @LLukas22 in #303
Update kv cache by @EricLBuehler in #312
Print gguf metadata consistently by @EricLBuehler in #313
Allow loading LoRA without activating adapters and fix bugs by @EricLBuehler in #306
Remove spurious tokenizer warnings by @EricLBuehler in #314
Better handling of ctrlc by @EricLBuehler in #315
Add analysis bot by @EricLBuehler in #316
Quantized: Use cublas for prompt by @lucasavila00 in #238
Support loading model into pipeline from local filesystem by @Jeadie in #308
Fix the ctrlc handler by @EricLBuehler in #318
Don't force QLlama to have >2 input dims @Jeadie by @Jeadie in #320
Matmul via f16 when possible by @EricLBuehler in #317

Full Changelog: v0.1.7...v0.1.8