From 35e30a9297f7c1a96b3ed4fcb08eddb478ed83a9 Mon Sep 17 00:00:00 2001 From: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com> Date: Mon, 14 Oct 2024 21:41:13 -0700 Subject: [PATCH] Fix dangling open skip in README.md 1 - Fix an extraneous skip end that is out of order with a skip begin. 2 - fix some typos PS: This might cause some README tests to fail, as they have not been run in a long time. --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 4f58f714c..6e8eaa061 100644 --- a/README.md +++ b/README.md @@ -171,7 +171,7 @@ python3 torchchat.py download llama3.1 Additional Model Inventory Management Commands ### Where -This subcommand shows location of a particular model. +This subcommand shows the location of a particular model. ```bash python3 torchchat.py where llama3.1 ``` @@ -216,7 +216,6 @@ This mode generates text based on an input prompt. python3 torchchat.py generate llama3.1 --prompt "write me a story about a boy and his bear" ``` -[skip default]: end ### Server This mode exposes a REST API for interacting with a model. @@ -286,6 +285,8 @@ First, follow the steps in the Server section above to start a local server. The streamlit run torchchat/usages/browser.py ``` +[skip default]: end + Use the "Max Response Tokens" slider to limit the maximum number of tokens generated by the model for each response. Click the "Reset Chat" button to remove the message history and start a fresh chat. @@ -293,7 +294,7 @@ Use the "Max Response Tokens" slider to limit the maximum number of tokens gener ### AOTI (AOT Inductor) [AOTI](https://pytorch.org/blog/pytorch2-2/) compiles models before execution for faster inference. The process creates a [DSO](https://en.wikipedia.org/wiki/Shared_library) model (represented by a file with extension `.so`) -that is then loaded for inference. This can be done with both Python and C++ enviroments. +that is then loaded for inference. This can be done with both Python and C++ environments. The following example exports and executes the Llama3.1 8B Instruct model. The first command compiles and performs the actual export. @@ -308,9 +309,9 @@ python3 torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.s For more details on quantization and what settings to use for your use case visit our [customization guide](docs/model_customization.md). -### Run in a Python Enviroment +### Run in a Python Environment -To run in a python enviroment, use the generate subcommand like before, but include the dso file. +To run in a python environment, use the generate subcommand like before, but include the dso file. ``` python3 torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --prompt "Hello my name is" @@ -377,7 +378,7 @@ While ExecuTorch does not focus on desktop inference, it is capable of doing so. This is handy for testing out PTE models without sending them to a physical device. -Specifically there are 2 ways of doing so: Pure Python and via a Runner +Specifically, there are 2 ways of doing so: Pure Python and via a Runner
Deploying via Python