Release v0.6.4 · PygmalionAI/aphrodite-engine

What's Changed

frontend: enable kobold api by default by @AlpinDale in #803
feat: add serviceinfo endpoint by @AlpinDale in #807
feat: update to serviceinfo v0.2 by @AlpinDale in #808
Mask dynatemp using min/max, rather than exp by @50h100a in #813
fix: temperature issues by @50h100a in #814
fix: --max-seq-len-to-capture arg by @AlpinDale in #818
[IMPORTANT] updating test units by @AlpinDale in #769
fix: tokenization api test by @AlpinDale in #821
feat: add chat method for LLM class by @AlpinDale in #822
feat: support chunked prefill with LoRA by @AlpinDale in #823
SPMD optimizations by @AlpinDale in #824
fix: sampler test with new transformers version by @AlpinDale in #826
feat: add cuda sampling kernels for top_k and top_p by @AlpinDale in #828
feat: add metrics for prefix cache hit rate by @AlpinDale in #829
fix: unbound tokenizer error by @AlpinDale in #830
feat: multi-step scheduling by @AlpinDale in #831
feat: Add DRY (Do not Repeat Yourself) sampling by @selalipop in #827
feat: add no_repeat_ngram sampler by @AlpinDale in #832
feat: add skew sampling by @AlpinDale in #834
fix: hidden states handling in batch expansion for spec decoding by @AlpinDale in #839
chore: refactor executor classes for easier inheritance by @AlpinDale in #840
fix: latency and serving benchmarks by @AlpinDale in #841
feat: Machete Kernels for Hopper GPUs by @AlpinDale in #842
feat: add sampler_priorty by @AlpinDale in #837
fix: disable awq_marlin override for awq models by @AlpinDale in #843
chore: bump mistral_common to 1.5.0 by @AlpinDale in #844
ci: bump version to 0.6.4 by @AlpinDale in #845

New Contributors

@dependabot made their first contribution in #796
@selalipop made their first contribution in #827

Full Changelog: v0.6.3...v0.6.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.4

What's Changed

New Contributors

Contributors