Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.6.0] Release Candidate #481

Merged
merged 721 commits into from
Sep 3, 2024
Merged

[0.6.0] Release Candidate #481

merged 721 commits into from
Sep 3, 2024

Conversation

AlpinDale
Copy link
Member

@AlpinDale AlpinDale commented May 25, 2024

Target all PRs to this branch.

Warning

Highly experimental branch. A lot of things may not work.

Currently, the following are dysfunctional in this branch:

AlpinDale and others added 29 commits August 22, 2024 14:11
* feat: massive api server refactoring

* fix: tokenizer endpoint issues

* fix: BatchResponseData body should be optional
* add utils for getting the partition offset and size for current tp rank

* disable asymmetric TP for quants and lora, handle GQA allocation

* the actual splitting work in the linear layers

* padding size for the vocab/lm_head should be optional

* cache engine and spec decode model runner (kwargs only)

* pass the tp_rank to model runners

* llama support
This reverts commit 6dd6408.
AlpinDale and others added 28 commits September 2, 2024 03:18
* refactor gguf kernels

* fix: incorrect filename for vecdotq header

* finish up the re-impl

* add requirements
…618)

* add getting started page

* add debugging tips

* add openai docs

* add distributed guide

* add production metrics and model support matrix

* add guide for adding new models

* huge update

* add vlm usage docs
@AlpinDale AlpinDale changed the title [0.5.4] Release Candidate [0.6.0] Release Candidate Sep 3, 2024
@AlpinDale AlpinDale merged commit f1d0b77 into main Sep 3, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants