pytorch · mikekgfb · Oct 15, 2024
diff --git a/README.md b/README.md
@@ -171,7 +171,7 @@ python3 torchchat.py download llama3.1
 <summary>Additional Model Inventory Management Commands</summary>
 
 ### Where
-This subcommand shows location of a particular model.
+This subcommand shows the location of a particular model.
 ```bash
 python3 torchchat.py where llama3.1
 ```
@@ -216,7 +216,6 @@ This mode generates text based on an input prompt.
 python3 torchchat.py generate llama3.1 --prompt "write me a story about a boy and his bear"
 ```
 
-[skip default]: end
 
 ### Server
 This mode exposes a REST API for interacting with a model.
@@ -286,14 +285,16 @@ First, follow the steps in the Server section above to start a local server. The
 streamlit run torchchat/usages/browser.py
 ```
 
+[skip default]: end
+
 Use the "Max Response Tokens" slider to limit the maximum number of tokens generated by the model for each response. Click the "Reset Chat" button to remove the message history and start a fresh chat.
 
 
 ## Desktop/Server Execution
 
 ### AOTI (AOT Inductor)
 [AOTI](https://pytorch.org/blog/pytorch2-2/) compiles models before execution for faster inference. The process creates a [DSO](https://en.wikipedia.org/wiki/Shared_library) model (represented by a file with extension `.so`)
-that is then loaded for inference. This can be done with both Python and C++ enviroments.
+that is then loaded for inference. This can be done with both Python and C++ environments.
 
 The following example exports and executes the Llama3.1 8B Instruct
 model.  The first command compiles and performs the actual export.
@@ -308,9 +309,9 @@ python3 torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.s
 For more details on quantization and what settings to use for your use
 case visit our [customization guide](docs/model_customization.md).
 
-### Run in a Python Enviroment
+### Run in a Python Environment
 
-To run in a python enviroment, use the generate subcommand like before, but include the dso file.
+To run in a python environment, use the generate subcommand like before, but include the dso file.
 
 ```
 python3 torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --prompt "Hello my name is"
@@ -377,7 +378,7 @@ While ExecuTorch does not focus on desktop inference, it is capable
 of doing so. This is handy for testing out PTE
 models without sending them to a physical device.
 
-Specifically there are 2 ways of doing so: Pure Python and via a Runner
+Specifically, there are 2 ways of doing so: Pure Python and via a Runner
 
 <details>
 <summary>Deploying via Python</summary>