docs: Website documentation for vllm inferencing using rayserve on AWS Inferentia #637

sindhupalakodety · 2024-09-05T18:42:38Z

…n AWS Inferentia

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

Created a website documentation for vllm inferencing using rayserve on AWS Inferentia

Motivation

To have users use vllm in association with RayServe and Trainium and to use llmperf tool for benchmarking

More

Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

…n AWS Inferentia

askulkarni2

@sindhupalakodety thanks for the PR! Minor comments. Also for next time, please open an issue in the repo for tracking.

website/docs/gen-ai/inference/Neuron/vLLM-rayserve.md

vara-bonthu

Thanks for the doc PR 👍🏼 Left few comments to be addressed.

vara-bonthu · 2024-09-06T15:14:23Z

website/docs/gen-ai/inference/img/llama3-inf2-architecture-vllm.png

inf2.8x has only one neuron and we are showing 6 here. Please update accordingly

Sure. Will work on it.

vara-bonthu · 2024-09-06T15:18:51Z

website/docs/gen-ai/inference/Neuron/llama3-inf2.md

Don't delete this file as a part of this PR. We can deprecate this later on along with the code in favour of RayServe with vLLM

Hi Vara, this is the file I changed. Changed the name of it to RayServe with vLLM .. do you want a copy of the original file to be in the repo.

vara-bonthu · 2024-09-06T15:21:51Z

website/docs/gen-ai/inference/Neuron/vLLM-rayserve.md

+---
+title: RayServe with vLLM
+sidebar_position: 2
+description: Deploy Llama-3 models on AWS Inferentia accelerators for efficient inference using vLLM.


Update the description to "Deploying Llama-3 Models on AWS Inferentia2 with Ray for Efficient Inference Using vLLM"

ratnopamc · 2024-09-06T15:40:28Z

website/docs/gen-ai/inference/Neuron/vLLM-rayserve.md

+kubectl get ds neuron-device-plugin-daemonset --namespace kube-system
+```
+```bash
+NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
+neuron-device-plugin-daemonset   2         2         2       2            2           <none>          2d2h


Please change the daemonset name from neuron-device-plugin-daemonset to neuron-device-plugin.

Sure will work on it.

ratnopamc · 2024-09-06T15:51:42Z

website/docs/gen-ai/inference/img/ray_on_kubernetes.png

Please remove this image.

Sure. Will work on it.

ratnopamc

lgtm.

Created a website documentation for vllm inferencing using rayserve o…

3071d6d

…n AWS Inferentia

askulkarni2 reviewed Sep 5, 2024

View reviewed changes

website/docs/gen-ai/inference/Neuron/vLLM-rayserve.md Outdated Show resolved Hide resolved

website/docs/gen-ai/inference/Neuron/vLLM-rayserve.md Outdated Show resolved Hide resolved

askulkarni2 changed the title ~~Created a website documentation for vllm inferencing using rayserve on AWS Inferentia~~ feat: Website documentation for vllm inferencing using rayserve on AWS Inferentia Sep 5, 2024

askulkarni2 changed the title ~~feat: Website documentation for vllm inferencing using rayserve on AWS Inferentia~~ doc: Website documentation for vllm inferencing using rayserve on AWS Inferentia Sep 5, 2024

Update website/docs/gen-ai/inference/Neuron/vLLM-rayserve.md

f8a9999

askulkarni2 changed the title ~~doc: Website documentation for vllm inferencing using rayserve on AWS Inferentia~~ docs: Website documentation for vllm inferencing using rayserve on AWS Inferentia Sep 5, 2024

vara-bonthu reviewed Sep 6, 2024

View reviewed changes

ratnopamc reviewed Sep 6, 2024

View reviewed changes

sindhupalakodety and others added 4 commits September 11, 2024 22:00

Merge branch 'awslabs:main' into main

65715eb

Fixed comments for PR

9222c19

Delete website/docs/gen-ai/inference/Neuron/llama3-inf2.md

c48be96

Updates to the names in the doc

83e5149

ratnopamc self-requested a review September 19, 2024 00:36

ratnopamc approved these changes Sep 19, 2024

View reviewed changes

vara-bonthu merged commit 5da52b2 into awslabs:main Sep 19, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Website documentation for vllm inferencing using rayserve on AWS Inferentia #637

docs: Website documentation for vllm inferencing using rayserve on AWS Inferentia #637

sindhupalakodety commented Sep 5, 2024

askulkarni2 left a comment

vara-bonthu left a comment

vara-bonthu Sep 6, 2024

sindhupalakodety Sep 9, 2024

vara-bonthu Sep 6, 2024

sindhupalakodety Sep 9, 2024

vara-bonthu Sep 6, 2024

ratnopamc Sep 6, 2024

sindhupalakodety Sep 9, 2024

ratnopamc Sep 6, 2024

sindhupalakodety Sep 9, 2024

ratnopamc left a comment

docs: Website documentation for vllm inferencing using rayserve on AWS Inferentia #637

docs: Website documentation for vllm inferencing using rayserve on AWS Inferentia #637

Conversation

sindhupalakodety commented Sep 5, 2024

What does this PR do?

Motivation

More

For Moderators

Additional Notes

askulkarni2 left a comment

Choose a reason for hiding this comment

vara-bonthu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ratnopamc left a comment

Choose a reason for hiding this comment