Skip to content

Latest commit

 

History

History
30 lines (23 loc) · 808 Bytes

Together_README.md

File metadata and controls

30 lines (23 loc) · 808 Bytes

Together README

WIP commit to test serving of 125M model. Once it's ready, we can serve the larger models easily.

Important files:

  • The Python serving script is in H3/examples/serving_h3.py.
  • The bash serving script is in serve.sh.
  • The Dockerfile is in Together_dockerfile.
  • The config is in local-cfg.yaml.

By default, we expect the model checkpoint to be in /home/user/.together/models/H3-125M/model.pt inside the Docker container.

You can get it by running these commands:

git lfs install
git clone https://huggingface.co/danfu09/H3-125M

You may need to run these commands after install to install FlashAttention (we can add them to the Dockerfile):

git submodule init
git submodule update
cd flash-attention

git submodule init
git submodule update

pip install -e .