Fix shortfin gibberish tokens and add cpu_llm_server_test.py #351

renxida · 2024-10-29T02:32:12Z

Rob and I found some weirdness in paged_llm_v1 that was inconsistent with how shortfin worked.

We hacked a solution for now but we really need to fix paged_llm_v1 by cleaning up the seq_lens and start_positions and adding some CLEAR documentation as to EXACTLY what they do.

Currently we're able to make the llm count.

Right now the context length is effectively 16 tokens. By adding cache allocation, we'll soon be able to actually talk with the LLM.

…server_test

…nd gpu

renxida · 2024-10-29T03:01:41Z

removed failing integration tests & will put them in another pr

renxida · 2024-10-29T03:02:17Z

To test these changes, run: https://gist.github.com/renxida/d40e14b1c9b4606279d19899bcce5e9a

kumardeepakamd · 2024-10-29T04:07:34Z

shortfin/python/shortfin_apps/llm/components/generate.py

-            while token_int != 128001:
+            # TODO: Use correct eot token from config.
+            # while token_int != 128001:
+            for i in range(15):


needs a proper fix to iterate over a variable number of tokens. okay to land and iterate.

renxida requested review from rsuderman and kumardeepakamd October 29, 2024 02:32

renxida added 2 commits October 28, 2024 19:54

package up the fixes from Xida & Rob's debug session and add cpu_llm_…

bb3a970

…server_test

remove old test and add some boilerplate for later testing both cpu a…

2623cbe

…nd gpu

renxida force-pushed the make_llm_count branch 2 times, most recently from 789abe8 to 59bbe42 Compare October 29, 2024 02:55

renxida added 2 commits October 28, 2024 19:57

add transformers because we need AutoTokenizers

8e32419

remove testing changes

56e5b2a

renxida force-pushed the make_llm_count branch from 59bbe42 to 56e5b2a Compare October 29, 2024 03:00

kumardeepakamd approved these changes Oct 29, 2024

View reviewed changes

Merge branch 'main' into make_llm_count

326a6a3

renxida merged commit a8cd1d8 into nod-ai:main Oct 29, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix shortfin gibberish tokens and add cpu_llm_server_test.py #351

Fix shortfin gibberish tokens and add cpu_llm_server_test.py #351

renxida commented Oct 29, 2024

renxida commented Oct 29, 2024

renxida commented Oct 29, 2024

kumardeepakamd Oct 29, 2024

Fix shortfin gibberish tokens and add cpu_llm_server_test.py #351

Fix shortfin gibberish tokens and add cpu_llm_server_test.py #351

Conversation

renxida commented Oct 29, 2024

renxida commented Oct 29, 2024

renxida commented Oct 29, 2024

kumardeepakamd Oct 29, 2024

Choose a reason for hiding this comment