Release v0.0.0beta33 · scaleapi/llm-engine

What's Changed

Necessary Changes for long context llama-3-8b by @sam-scale in #516
Increase max gpu utilization for 70b models by @dmchoiboi in #517
Infer hardware from model name by @yunfeng-scale in #515
Improve TensorRT-LLM Functionality by @seanshi-scale in #487
Upgrade vLLM version for batch completion by @dmchoiboi in #518
Revert "Upgrade vLLM version for batch completion" by @dmchoiboi in #520
Allow H100 to be used by @yunfeng-scale in #522
vLLM version 0.4.2 Docker image by @squeakymouse in #521
Image cache and balloon on H100s, also temporarily stop people from using A100 by @yunfeng-scale in #523

Full Changelog: v0.0.0beta32...v0.0.0beta33