Releases: EricLBuehler/mistral.rs
Releases Β· EricLBuehler/mistral.rs
v0.1.15
What's Changed
- Patch incorrect unwrap and bump version by @EricLBuehler in #383
Full Changelog: v0.1.14...v0.1.15
v0.1.14
What's Changed
- Improve docs for loading
gguf
purely locally by @EricLBuehler in #371 - Fix message with chat template by @EricLBuehler in #374
- Fix tokenizer write on read only file system by @EricLBuehler in #373
- Manual subtyping for u32 in GGUF max seq len by @EricLBuehler in #376
- refactor: DRY
varbuilder_utils.rs
by @polarathene in #364 - Support multiple GGUF files by @EricLBuehler in #379
- Organize normal loading metadata by @EricLBuehler in #381
- Bump version 0.1.13 -> 0.1.14 by @EricLBuehler in #382
Full Changelog: v0.1.13...v0.1.14
v0.1.13
What's Changed
- Prepare to accept multiple model types by @EricLBuehler in #369
- Fix chat template and F16/BF16 CUDA GEMM when RUST_BACKTRACE by @EricLBuehler in #370
Full Changelog: v0.1.12...v0.1.13
v0.1.12
What's Changed
- Add an example by @EricLBuehler in #357
- Fix no auth token for local loading by @EricLBuehler in #360
- fix: Ensure committed files are normalized to LF by @polarathene in #361
- Fix unauth check by @EricLBuehler in #362
- Allow default unigram unk token for GGUF by @EricLBuehler in #363
- Disable cublaslt if using f16 kernels by @EricLBuehler in #359
- refactor: GGUF + GGML Loaders with
ModelKind
by @polarathene in #356 - Clamp n device layers to n model layers by @EricLBuehler in #367
- Bump version to 0.1.12 by @EricLBuehler in #368
Full Changelog: v0.1.11...v0.1.12
v0.1.11
What's Changed
- refactor:
ModelKind
withstrum
+derive_more
by @polarathene in #335 - Update dependencies by @EricLBuehler in #343
- Set device to cpu if loading isq by @EricLBuehler in #346
- Expose some APIs on the Rust side by @EricLBuehler in #348
- Propogating Regex init error by @gregszumel in #349
- Add a verbose mode by @EricLBuehler in #353
- Refactor
deserialize_chat_template
by @Jeadie in #354 - Add support for using GGUF tokenizer by @EricLBuehler in #345
New Contributors
- @gregszumel made their first contribution in #349
Full Changelog: v0.1.10...v0.1.11
v0.1.10
What's Changed
- Fixes and verbosity improvements for device mapping by @EricLBuehler in #332
- chore:
SimpleModelPaths
should be renamed toLocalModelPaths
by @polarathene in #331 - Remove candle-layer-norm dep by @EricLBuehler in #333
- Refactor layers.rs by @EricLBuehler in #338
- chore: Simplify
utils/token.rs:get_token()
by @polarathene in #328 - chore: Use
strum
to simplifyGGUFArchitecture
maintenance by @polarathene in #334 - Fix mistral model repeat kv by @EricLBuehler in #340
New Contributors
- @polarathene made their first contribution in #331
Full Changelog: v0.1.9...v0.1.10
v0.1.9
What's Changed
- Improve chat templates docs by @EricLBuehler in #327
- Use cuBLASlt in attention by @EricLBuehler in #325
Full Changelog: v0.1.8...v0.1.9
v0.1.8
Overview
- Documentation improvements
- Better handling of CTRL-C in interactive mode
- Matmul via low-precision kernels to take advantage of faster cuBLAS GEMM kernels (thanks @lucasavila00)
- New loading API (thanks @Jeadie)
- Various small bug fixes
- Reduce dependancy complexity (thanks @LLukas22)
What's Changed
- bug fix: llama kv cache part by @keisuke-niimi-insightedge-jp in #300
- Refactor cache manager and kv cache by @EricLBuehler in #304
- Update the docs for ISQ and misc by @EricLBuehler in #310
- Make
pyo3
an optional dependency inmistralrs-core
by @LLukas22 in #303 - Update kv cache by @EricLBuehler in #312
- Print gguf metadata consistently by @EricLBuehler in #313
- Allow loading LoRA without activating adapters and fix bugs by @EricLBuehler in #306
- Remove spurious tokenizer warnings by @EricLBuehler in #314
- Better handling of ctrlc by @EricLBuehler in #315
- Add analysis bot by @EricLBuehler in #316
- Quantized: Use cublas for prompt by @lucasavila00 in #238
- Support loading model into pipeline from local filesystem by @Jeadie in #308
- Fix the ctrlc handler by @EricLBuehler in #318
- Don't force QLlama to have >2 input dims @Jeadie by @Jeadie in #320
- Matmul via f16 when possible by @EricLBuehler in #317
New Contributors
- @keisuke-niimi-insightedge-jp made their first contribution in #300
- @Jeadie made their first contribution in #308
Full Changelog: v0.1.7...v0.1.8
v0.1.7
What's Changed
- Add terminate on next step handler via ctrlc by @EricLBuehler in #301
- Update containers to cuda 12.4 + Fix missing libraries by @LLukas22 in #302
Full Changelog: v0.1.6...v0.1.7
This release has relatively few changes, its major purpose is to update the containers and synchronize the versions.
v0.1.6
What's Changed
- Causal Masking and model selection from
.toml
files by @EricLBuehler in #278 - Remove sliding window mask from quantized phi3 by @EricLBuehler in #280
- Fix Causal Mask by @EricLBuehler in #282
- Fix mask caching by @EricLBuehler in #283
- More intelligent scheduler by @EricLBuehler in #279
- Use
warn!
macro by @EricLBuehler in #289 - Use a public repo for tests tokenizer.json by @EricLBuehler in #290
- Implement Speculative Decoding by @EricLBuehler in #242
- Add X-LoRA support for GGUF by @EricLBuehler in #293
- Add some "senseful" fallbacks for
isq
by @LLukas22 in #272 - Implement dynamic LoRA swapping by @EricLBuehler in #262
- More verbose logging when loading locally by @EricLBuehler in #298
- Make speculative decoding faster without anything fancy by @EricLBuehler in #297
- fix bug with mistralrs cuda by @joshpopelka20 in #299
New Contributors
- @joshpopelka20 made their first contribution in #299
New Features
- Speculative decoding introduced
- GGUF support for Phi 3
- Dynamic LoRA adapter activation support
Full Changelog: v0.1.5...v0.1.6