Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shortfin gibberish tokens and add cpu_llm_server_test.py #351

Merged
merged 5 commits into from
Oct 29, 2024

Conversation

renxida
Copy link
Contributor

@renxida renxida commented Oct 29, 2024

Rob and I found some weirdness in paged_llm_v1 that was inconsistent with how shortfin worked.

We hacked a solution for now but we really need to fix paged_llm_v1 by cleaning up the seq_lens and start_positions and adding some CLEAR documentation as to EXACTLY what they do.

Currently we're able to make the llm count.

Right now the context length is effectively 16 tokens. By adding cache allocation, we'll soon be able to actually talk with the LLM.

@renxida renxida force-pushed the make_llm_count branch 2 times, most recently from 789abe8 to 59bbe42 Compare October 29, 2024 02:55
@renxida
Copy link
Contributor Author

renxida commented Oct 29, 2024

removed failing integration tests & will put them in another pr

@renxida
Copy link
Contributor Author

renxida commented Oct 29, 2024

To test these changes, run: https://gist.github.com/renxida/d40e14b1c9b4606279d19899bcce5e9a

while token_int != 128001:
# TODO: Use correct eot token from config.
# while token_int != 128001:
for i in range(15):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a proper fix to iterate over a variable number of tokens. okay to land and iterate.

@renxida renxida merged commit a8cd1d8 into nod-ai:main Oct 29, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants