-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Website documentation for vllm inferencing using rayserve on AWS Inferentia #637
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sindhupalakodety thanks for the PR! Minor comments. Also for next time, please open an issue in the repo for tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc PR 👍🏼 Left few comments to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inf2.8x has only one neuron and we are showing 6 here. Please update accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't delete this file as a part of this PR. We can deprecate this later on along with the code in favour of RayServe with vLLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Vara, this is the file I changed. Changed the name of it to RayServe with vLLM .. do you want a copy of the original file to be in the repo.
--- | ||
title: RayServe with vLLM | ||
sidebar_position: 2 | ||
description: Deploy Llama-3 models on AWS Inferentia accelerators for efficient inference using vLLM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the description to "Deploying Llama-3 Models on AWS Inferentia2 with Ray for Efficient Inference Using vLLM"
kubectl get ds neuron-device-plugin-daemonset --namespace kube-system | ||
``` | ||
```bash | ||
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE | ||
neuron-device-plugin-daemonset 2 2 2 2 2 <none> 2d2h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the daemonset name from neuron-device-plugin-daemonset
to neuron-device-plugin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
…n AWS Inferentia
What does this PR do?
🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.
Created a website documentation for vllm inferencing using rayserve on AWS Inferentia
Motivation
To have users use vllm in association with RayServe and Trainium and to use llmperf tool for benchmarking
More
website/docs
orwebsite/blog
section for this featurepre-commit run -a
with this PR. Link for installing pre-commit locallyFor Moderators
Additional Notes