v0.0.0beta33
yunfeng-scale
released this
20 May 23:23
·
112 commits
to main
since this release
What's Changed
- Necessary Changes for long context llama-3-8b by @sam-scale in #516
- Increase max gpu utilization for 70b models by @dmchoiboi in #517
- Infer hardware from model name by @yunfeng-scale in #515
- Improve TensorRT-LLM Functionality by @seanshi-scale in #487
- Upgrade vLLM version for batch completion by @dmchoiboi in #518
- Revert "Upgrade vLLM version for batch completion" by @dmchoiboi in #520
- Allow H100 to be used by @yunfeng-scale in #522
- vLLM version 0.4.2 Docker image by @squeakymouse in #521
- Image cache and balloon on H100s, also temporarily stop people from using A100 by @yunfeng-scale in #523
Full Changelog: v0.0.0beta32...v0.0.0beta33