Releases · scaleapi/llm-engine

04 Jun 18:25

seanshi-scale

v0.0.0beta34

ad24f65

v0.0.0beta34 Latest

Latest

What's Changed

Hardcode llama 3 70b endpoint param by @yunfeng-scale in #524
Don't fail checking GPU memory by @yunfeng-scale in #525
Option to read Redis URL from AWS Secret by @seanshi-scale in #526
Fix formatting on completions documentation guide by @saiatmakuri in #527
Higher priority for gateway by @yunfeng-scale in #529
Non-interactive installation during docker build by @yunfeng-scale in #533
[Client] Add guided_grammar and other missing fields by @seanshi-scale in #532

Full Changelog: v0.0.0beta33...v0.0.0beta34

Contributors

seanshi-scale, yunfeng-scale, and saiatmakuri

Assets 2

20 May 23:23

yunfeng-scale

v0.0.0beta33

2f71b89

v0.0.0beta33

What's Changed

Necessary Changes for long context llama-3-8b by @sam-scale in #516
Increase max gpu utilization for 70b models by @dmchoiboi in #517
Infer hardware from model name by @yunfeng-scale in #515
Improve TensorRT-LLM Functionality by @seanshi-scale in #487
Upgrade vLLM version for batch completion by @dmchoiboi in #518
Revert "Upgrade vLLM version for batch completion" by @dmchoiboi in #520
Allow H100 to be used by @yunfeng-scale in #522
vLLM version 0.4.2 Docker image by @squeakymouse in #521
Image cache and balloon on H100s, also temporarily stop people from using A100 by @yunfeng-scale in #523

Full Changelog: v0.0.0beta32...v0.0.0beta33

Contributors

dmchoiboi, squeakymouse, and 3 other contributors

Assets 2

07 May 22:30

dmchoiboi

v0.0.0beta32

1106435

v0.0.0beta32

What's Changed

Add emitting token count metrics to datadog statsd by @seanshi-scale in #458
Downgrade sse-starlette version by @squeakymouse in #478
Return 400 for botocore client errors by @yunfeng-scale in #479
Increase Kaniko Memory by @saiatmakuri in #481
Batch job metrics by @yunfeng-scale in #480
Use base model name as metric tag by @yunfeng-scale in #483
Change LLM Engine base path from global var by @squeakymouse in #482
Remove fine-tune limit for internal users by @squeakymouse in #484
Parallel Python execution for tool completion by @yunfeng-scale in #470
Allow JSONL for fine-tuning datasets by @squeakymouse in #486
Fix throughput_benchmarks ITL calculation, add option to use a json file of prompts by @seanshi-scale in #485
Add Model.update() to Python client by @squeakymouse in #490
Bump idna from 3.4 to 3.7 in /clients/python by @dependabot in #491
Bump idna from 3.4 to 3.7 in /model-engine by @dependabot in #492
Properly add mixtral 8x22b by @yunfeng-scale in #493
support mixtral 8x22b instruct by @saiatmakuri in #495
fix return_token_log_probs on vLLM > 0.3.3 endpoints by @saiatmakuri in #498
Package update + more docs on dev setup by @dmchoiboi in #500
Add Llama 3 models by @yunfeng-scale in #501
Enforce model checkpoints existing for endpoint/bundle creation by @dmchoiboi in #503
guided decoding with grammar by @saiatmakuri in #488
adding asyncenginedead error catch by @ian-scale in #504
Default include_stop_str_in_output to None by @squeakymouse in #506
get latest inference framework tag from configmap by @saiatmakuri in #505
integration tests for completions by @saiatmakuri in #507
patch service config identifier by @saiatmakuri in #509
require safetensors for LLM endpoint creation by @saiatmakuri in #510
Add py.typed for proper typechecking support on clients by @dmchoiboi in #513
Fix package name mapping in setup.py by @dmchoiboi in #514

New Contributors

@dmchoiboi made their first contribution in #500

Full Changelog: v0.0.0beta28...v0.0.0beta32

Contributors

dmchoiboi, squeakymouse, and 5 other contributors

Assets 2

21 Mar 04:50

yunfeng-scale

v0.0.0beta28

5f6cd32

v0.0.0beta28

What's Changed

Tool completion respect num new tokens by @yunfeng-scale in #469
Azure fixes + additional asks by @squeakymouse in #468
Metrics for stuck async requests by @squeakymouse in #471
Fix cacher by @yunfeng-scale in #472
Add retries to deflake integration tests by @squeakymouse in #473
add suffix to integration tests by @saiatmakuri in #474
fix docs tests gateway endpoint by @saiatmakuri in #475
Guided decoding by @yunfeng-scale in #476

Full Changelog: v0.0.0beta27...v0.0.0beta28

Contributors

squeakymouse, yunfeng-scale, and saiatmakuri

Assets 2

12 Mar 16:57

seanshi-scale

v0.0.0beta27

b09c106

v0.0.0beta27

What's Changed

Try to fix async requests getting stuck by @squeakymouse in #466
[Client] Add num_prompt_tokens to the client's CompletionOutputs by @seanshi-scale in #467

Full Changelog: v0.0.0beta26...v0.0.0beta27

Contributors

squeakymouse and seanshi-scale

Assets 2

08 Mar 21:56

yunfeng-scale

v0.0.0beta26

bfcfbba

v0.0.0beta26

What's Changed

[SC-836587] Pin boto3 and urllib3 versions to fix error in inference image by @edgan8 in #432
include stop string in completions output by @saiatmakuri in #435
Logging post inference hook implementation by @tiffzhao5 in #428
add codellama-70b models by @saiatmakuri in #436
Add hook validation and support logging for python client by @tiffzhao5 in #437
Azure refactor for async endpoints by @squeakymouse in #425
Remove post inference hook handling in main container by @tiffzhao5 in #438
Clean up logs for logging hook by @tiffzhao5 in #439
Fix Infra Task Gateway by @saiatmakuri in #443
support gemma models by @saiatmakuri in #444
Fix infra config dependency by @squeakymouse in #449
Add emitted timestamp for logging by @tiffzhao5 in #450
Change cache update time for async endpoint integration test by @tiffzhao5 in #451
Bump aiohttp from 3.9.1 to 3.9.2 in /model-engine by @dependabot in #446
Bump python-multipart from 0.0.6 to 0.0.7 in /model-engine by @dependabot in #447
Bump gitpython from 3.1.32 to 3.1.41 in /model-engine by @dependabot in #453
Log endpoint in sensitive_log_mode by @squeakymouse in #455
Bump orjson from 3.8.6 to 3.9.15 in /model-engine by @dependabot in #456
Allow the load test script to use a csv of inputs by @seanshi-scale in #440
add some debugging to vllm docker by @yunfeng-scale in #454
Add product label validation by @edgan8 in #442
Add log statement for gateway sending async task by @tiffzhao5 in #459
Some batch inference improvements by @yunfeng-scale in #460
Fix cacher by @yunfeng-scale in #462
Fix vllm batch docker image by @yunfeng-scale in #463
Add tool completion to batch inference by @yunfeng-scale in #461
fix llm-engine finetune.create failures by @ian-scale in #464
Change back batch infer GPU util and add tool completion client changes by @yunfeng-scale in #465

New Contributors

@edgan8 made their first contribution in #432

Full Changelog: v0.0.0beta25...v0.0.0beta26

Contributors

edgan8, squeakymouse, and 6 other contributors

Assets 2

07 Feb 23:10

seanshi-scale

v0.0.0beta25

e07fc7a

v0.0.0beta25

What's Changed

LLM benchmark script improvements by @seanshi-scale in #427
Allow using pydantic v2 by @seanshi-scale in #429
Fix helm chart nodeSelector for GPU endpoints by @squeakymouse in #430
Allow pydantic 2 in python client requested requirements by @seanshi-scale in #433
Fix batch job permissions by @yunfeng-scale in #431
[Client] Add Auth headers to the python async routes by @seanshi-scale in #434

Full Changelog: v0.0.0beta22...v0.0.0beta25

Contributors

squeakymouse, seanshi-scale, and yunfeng-scale

Assets 2

26 Jan 01:56

yunfeng-scale

v0.0.0beta22

a9843a1

v0.0.0beta22

What's Changed

Change middleware format by @squeakymouse in #393
Fix custom framework Dockerfile by @squeakymouse in #395
fixing tensorrt-llm enum value (fixes #390) by @ian-scale in #396
overriding model length for zephyr 7b alpha by @ian-scale in #398
time completions use case by @saiatmakuri in #397
update docs to show model len / context windows by @ian-scale in #401
Add MultiprocessingConcurrencyLimiter to gateway by @squeakymouse in #399
change code-llama to codellama by @ian-scale in #400
fix completions request id by @saiatmakuri in #402
Allow latest inference framework tag by @squeakymouse in #403
Bump helm chart version by @seanshi-scale in #406
4x sqlalchemy pool size by @yunfeng-scale in #405
bump datadog module to 0.47.0 by @saiatmakuri in #407
Fix autoscaler node selector by @seanshi-scale in #409
Log request sizes by @yunfeng-scale in #410
add support for mixtral-8x7b and mixtral-8x7b-instruct by @saiatmakuri in #408
Make sure metadata is not incorrectly wiped during endpoint update by @yunfeng-scale in #413
Always return output for completions sync response by @yunfeng-scale in #412
handle update endpoint errors by @saiatmakuri in #414
[bug-fix] LLM Artifact Gateway .list_files() by @saiatmakuri in #416
enable sensitive log mode by @song-william in #415
Throughput benchmark script by @yunfeng-scale in #411
Upgrade vllm to 0.2.7 by @yunfeng-scale in #417
LLM batch completions API by @yunfeng-scale in #418
Small update to vllm batch by @yunfeng-scale in #419
sensitive content flag by @yunfeng-scale in #421
Revert a broken refactoring by @yunfeng-scale in #423
[Logging I/O] Post inference hooks as background tasks by @tiffzhao5 in #422
Batch inference client / doc by @yunfeng-scale in #424
Minor fixes for batch inference by @yunfeng-scale in #426

Full Changelog: v0.0.0beta20...v0.0.0beta22

Contributors

squeakymouse, song-william, and 5 other contributors

Assets 2

27 Nov 23:55

ian-scale

v0.0.0beta20

de7a493

v0.0.0beta20

What's Changed

Patch post_file client method by @song-william in #323
Add pod disruption budget to all endpoints by @yunfeng-scale in #328
create celery worker with inference worker profile by @saiatmakuri in #327
Bump http forwarder request CPU by @yunfeng-scale in #330
[Docs] Clarify get-events API usage by @seanshi-scale in #320
Enable additional Datadog tagging for jobs by @song-william in #324
fix celery worker profile for s3 access by @saiatmakuri in #333
Hardcode number of forwarder workers by @yunfeng-scale in #334
Standardize logging initialization by @song-william in #337
Fix up the mammoth max length issue. by @sam-scale in #335
Add docs for Model.create, update default values and fix per_worker concurrency by @yunfeng-scale in #332
updating docs to add codellama models by @ian-scale in #343
Add PodDisruptionBudget to model engine by @yunfeng-scale in #342
Allow auth to accept API keys by @saiatmakuri in #326
Add job_name in build logs for easier debugging by @song-william in #340
Make PDB optional by @yunfeng-scale in #344
Revert "fix celery worker profile for s3 access" by @yixu34 in #345
Revert "Revert "fix celery worker profile for s3 access"" by @saiatmakuri in #346
Pass file ID to fine-tuning script by @squeakymouse in #347
llama should have None max length by @sam-scale in #348
taking out codellama13b and 34b by @ian-scale in #349
Change DATADOG_TRACE_ENABLED to DD_TRACE_ENABLED by @edwardpark97 in #350
Allow fine-tuning hyperparameter to be Dict by @squeakymouse in #353
adding real auth to integration tests by @ian-scale in #352
add new llm-jp models to llm-engine by @ian-scale in #354
Generalize SQS region by @jaisanliang in #355
Track LLM Metrics by @saiatmakuri in #356
Remove extra trace facet "launch.resource_name" by @saiatmakuri in #359
Ianmacleod/add codellama instruct, refactor codellama models by @ian-scale in #360
Various changes/bugfixes to chart/code to streamline deployment on different forms of infra by @seanshi-scale in #339
Add PR template by @song-william in #341
Unmount aws config from root by @song-william in #361
Implement automated code coverage for CI by @tiffzhao5 in #362
Download only known files by @squeakymouse in #364
Documentation fix by @squeakymouse in #365
Change more AWS config mount paths by @squeakymouse in #367
Validating inference framework image tags by @tiffzhao5 in #357
Ianmacleod/add codellama 34b by @ian-scale in #369
Better error when model is not ready for predictions by @tiffzhao5 in #368
Improve metrics route team tags by @saiatmakuri in #371
Enable custom istio metric tags with Telemetry API by @song-william in #373
Use Variable name for Telemetry Helm Resources by @song-william in #374
Forward HTTP status code for sync requests by @yunfeng-scale in #375
Integrate TensorRT-LLM by @yunfeng-scale in #358
Fine-tuning e2e integration test by @tiffzhao5 in #372
Found a bug in the codellama vllm model_len logic. by @sam-scale in #380
Fix sample.yaml by @yunfeng-scale in #381
count prompt tokens by @saiatmakuri in #366
Fix integration test by @yunfeng-scale in #383
emit metrics on token counts by @saiatmakuri in #382
Increase llama-2 max_input_tokens by @sam-scale in #384
Revert "Found a bug in the codellama vllm model_len logic." by @yunfeng-scale in #386
Some updates to integration tests by @yunfeng-scale in #385
Celery autoscaler by @squeakymouse in #378
Don't install Celery autoscaler for test deployments by @squeakymouse in #388
LLM update API route by @squeakymouse in #387
adding zephyr 7b by @ian-scale in #389
update tensor-rt llm in enum by @ian-scale in #390
pypi version bump by @ian-scale in #391

New Contributors

@edwardpark97 made their first contribution in #350
@jaisanliang made their first contribution in #355
@tiffzhao5 made their first contribution in #362

Full Changelog: v0.0.0beta19...v0.0.0beta20

Contributors

yixu34, squeakymouse, and 9 other contributors

Assets 2

13 Oct 04:50

yunfeng-scale

v0.0.0beta19

744e263

v0.0.0beta19

What's Changed

Increase graceful timeout and hardcode AWS_PROFILE by @squeakymouse in #306
bump pypi version by @ian-scale in #303
Ianmacleod/add mistral by @ian-scale in #307
Ianmacleod/add falcon 180b by @ian-scale in #309
update 180b inference framework by @ian-scale in #310
Adding code llama to TGI by @mfagundo-scale in #311
Add AWQ enum by @yunfeng-scale in #317
Fix documentation to reference Files API by @squeakymouse in #312
Return TGI errors by @yunfeng-scale in #313
Fix streaming endpoint failure handling by @yunfeng-scale in #314
Validate quantization by @yunfeng-scale in #315
Properly return PENDING status for docker image batch jobs/fine tune jobs by @seanshi-scale in #318
add user_id and team_id as log facets by @song-william in #321
publish 0.0.0b19 by @yunfeng-scale in #322

New Contributors

@mfagundo-scale made their first contribution in #311

Full Changelog: v0.0.0beta18...v0.0.0beta19

Contributors

squeakymouse, song-william, and 4 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: scaleapi/llm-engine

v0.0.0beta34

What's Changed

Contributors

v0.0.0beta33

What's Changed

Contributors

v0.0.0beta32

What's Changed

New Contributors

Contributors

v0.0.0beta28

What's Changed

Contributors

v0.0.0beta27

What's Changed

Contributors

v0.0.0beta26

What's Changed

New Contributors

Contributors

v0.0.0beta25

What's Changed

Contributors

v0.0.0beta22

What's Changed

Contributors

v0.0.0beta20

What's Changed

New Contributors

Contributors

v0.0.0beta19

What's Changed

New Contributors

Contributors