Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dangling open skip in README.md #1299

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ python3 torchchat.py download llama3.1
<summary>Additional Model Inventory Management Commands</summary>

### Where
This subcommand shows location of a particular model.
This subcommand shows the location of a particular model.
```bash
python3 torchchat.py where llama3.1
```
Expand Down Expand Up @@ -216,7 +216,6 @@ This mode generates text based on an input prompt.
python3 torchchat.py generate llama3.1 --prompt "write me a story about a boy and his bear"
```

[skip default]: end

### Server
This mode exposes a REST API for interacting with a model.
Expand Down Expand Up @@ -286,14 +285,16 @@ First, follow the steps in the Server section above to start a local server. The
streamlit run torchchat/usages/browser.py
```

[skip default]: end

Use the "Max Response Tokens" slider to limit the maximum number of tokens generated by the model for each response. Click the "Reset Chat" button to remove the message history and start a fresh chat.


## Desktop/Server Execution

### AOTI (AOT Inductor)
[AOTI](https://pytorch.org/blog/pytorch2-2/) compiles models before execution for faster inference. The process creates a [DSO](https://en.wikipedia.org/wiki/Shared_library) model (represented by a file with extension `.so`)
that is then loaded for inference. This can be done with both Python and C++ enviroments.
that is then loaded for inference. This can be done with both Python and C++ environments.

The following example exports and executes the Llama3.1 8B Instruct
model. The first command compiles and performs the actual export.
Expand All @@ -308,9 +309,9 @@ python3 torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.s
For more details on quantization and what settings to use for your use
case visit our [customization guide](docs/model_customization.md).

### Run in a Python Enviroment
### Run in a Python Environment

To run in a python enviroment, use the generate subcommand like before, but include the dso file.
To run in a python environment, use the generate subcommand like before, but include the dso file.

```
python3 torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --prompt "Hello my name is"
Expand Down Expand Up @@ -377,7 +378,7 @@ While ExecuTorch does not focus on desktop inference, it is capable
of doing so. This is handy for testing out PTE
models without sending them to a physical device.

Specifically there are 2 ways of doing so: Pure Python and via a Runner
Specifically, there are 2 ways of doing so: Pure Python and via a Runner

<details>
<summary>Deploying via Python</summary>
Expand Down
Loading