-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.6.0] Release Candidate #481
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* feat: massive api server refactoring * fix: tokenizer endpoint issues * fix: BatchResponseData body should be optional
…uired feature space
* add utils for getting the partition offset and size for current tp rank * disable asymmetric TP for quants and lora, handle GQA allocation * the actual splitting work in the linear layers * padding size for the vocab/lm_head should be optional * cache engine and spec decode model runner (kwargs only) * pass the tp_rank to model runners * llama support
This reverts commit 6dd6408.
…ling and non-beam search usecase (#616)
* refactor gguf kernels * fix: incorrect filename for vecdotq header * finish up the re-impl * add requirements
…618) * add getting started page * add debugging tips * add openai docs * add distributed guide * add production metrics and model support matrix * add guide for adding new models * huge update * add vlm usage docs
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Target all PRs to this branch.
Warning
Highly experimental branch. A lot of things may not work.
Currently, the following are dysfunctional in this branch: