Releases: PygmalionAI/aphrodite-engine
Releases · PygmalionAI/aphrodite-engine
v0.4.3
This is a big release! We've had many new and exciting changes.
What's New
- Mixtral 8x7B support by @AlpinDale in #155
- add ROCm support for MI200-300 GPUs by @AlpinDale in #95
- implement fused Add RMSNorm kernels by @AlpinDale in #125
- add SqueezeLLM support by @AlpinDale in #140
- add chat templates for the OpenAI endpoint by @AlpinDale in #138
- speed up compilation times by 2 to 3x by @AlpinDale in #130
- support Phi 1.5 models by @AlpinDale in #121
NOTE: You'll need to run pip install megablocks
if you're using the wheels.
New Contributors
Full Changelog: v0.4.2...v0.4.3
v0.4.2
What's Changed
- fix: correct auto ntk scaling_factor for 4k ctx case by @sandwichdoge in #101
- fix: cpu memory limit detection for containers by @g4rg in #103
- feat: yi support by @AlpinDale in #104
- fix: docker port by @Krisseck in #105
- feat: min_p by @StefanGliga in #106
- chore: api keys for OAI server by @AlpinDale in #107
New Contributors
- @sandwichdoge made their first contribution in #101
- @Krisseck made their first contribution in #105
Full Changelog: v0.4.1...v0.4.2
v0.4.1
v0.4
What's Changed
- Make entrypoint executable by @city-unit in #83
- Correct Conda Env Creation in Dockerfile by @city-unit in #82
- feat: prompt logprobs and batched samplers by @AlpinDale in #77
- feat: add mistral support for GPTQ by @AlpinDale in #86
- feat: finish up tests and workflows by @AlpinDale in #87
- feat: flattened 1D tensor -> 2D tensor by @AlpinDale in #85
- chore: reformats by @AlpinDale in #90
- fix: pylint complaints by @AlpinDale in #91
- fix: remove unnecessary lines by @g4rg in #81
- fix: sync CPU delay in sampler by @AlpinDale in #93
- New Mirostatv2 implementation by @50h100a in #96
- feat: spaces between special tokens by @AlpinDale in #94
- chore: clean up endpoints by @AlpinDale in #98
- feat: add exllamav2 for GPTQ by @AlpinDale in #99
- fix: force v2 for ctxlen larger than 8192 by @AlpinDale in #100
New Contributors
- @city-unit made their first contribution in #83
Full Changelog: v0.3.7...v0.4
v0.3.7
What's Changed
- fix: prompt processing overhead introduced by #66 by @AlpinDale in #71
- fix: launch AWQ kernels on the current CUDAStream by @AlpinDale in #75
- Added
min_tokens
and reimplementedignore_eos
using a new logit processor by @50h100a in #70 - feat: add PagedAttention V2 kernels by @AlpinDale in #76
- feat:Enable banning tokens by @StefanGliga in #80
Full Changelog: v0.3.6...v0.3.7
v0.3.6
What's Changed
- Locked Ray version to 2.5.1 by @RecoveredApparatus in #58
- fix: requests stalling in KAI non-streaming endpoint by @g4rg in #46
- feat: refactor megatron and quants by @AlpinDale in #57
- chore: fix datatype check by @AlpinDale in #65
- feat: YaRN context window extension support by @AlpinDale in #67
- fix: change the timing of logit sorting by @AlpinDale in #66
New Contributors
- @RecoveredApparatus made their first contribution in #58
Full Changelog: v0.3.5...v0.3.6
v0.3.5
What's Changed
- fix: add kcpp /generate/check stub by @g4rg in #47
- fix: more KAI parameter adaptations by @g4rg in #45
- Allow CORS connections from anywhere by @thesentinel2615 in #51
- fix: attention kernel attribute by @AlpinDale in #52
- feat: AWQ support for Turing GPUs by @AlpinDale in #53
- Micromamba Runtime by @henk717 in #54
- Make NVCC work for different versions by @official-elinas in #55
- chore: allow the user to specify install method by @AlpinDale in #56
New Contributors
- @thesentinel2615 made their first contribution in #51
- @henk717 made their first contribution in #54
- @official-elinas made their first contribution in #55
Full Changelog: v0.3.4...v0.3.5
v0.3.4
What's Changed
- feat: add detokenize test suite by @AlpinDale in #33
- feat: add model and sampler tests by @AlpinDale in #34
- license: AGPL-3.0 -> MIT by @AlpinDale in #32
- KoboldAI endpoint by @g4rg in #31
- Revert license back to AGPLv3 by @AlpinDale in #38
- fix: torch version mismatch by @AlpinDale in #43
- Adds a copy of embedded Kobold Lite Web UI by @LostRuins in #42
New Contributors
- @g4rg made their first contribution in #31
- @LostRuins made their first contribution in #42
Full Changelog: v0.3.3...v0.3.4
v0.3.3
What's Changed
- feat: add tail-free sampling by @StefanGliga in #23
- chore:fix tfs by @StefanGliga in #29
- Restore RoPE Scaling by @50h100a in #25
- Added top_a and repetition_penalty samplers. by @50h100a in #24
- Fix LogitProcessor infrastructure by @50h100a in #26
- feat: Add Eta, Epsilon and Locally Typical sampling by @StefanGliga in #27
New Contributors
- @StefanGliga made their first contribution in #23
- @50h100a made their first contribution in #25
Full Changelog: v0.3.2...v0.3.3