-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shortfin LLM Accuracy Degradation in Latest iree-compiler
Release Candidate
#437
Comments
This test is currently failing. See #437 for details.
Highlighting the commit range: iree-org/iree@candidate-20241105.1069...candidate-20241106.1070 |
Tips for running a bisect: https://iree.dev/developers/debugging/compile-time-regressions/#running-git-bisect If you have your machine configured to build SHARK-Platform/build_tools/integration_tests/llm/conftest.py Lines 101 to 111 in 79fe7e2
Hopefully that will then point to a specific commit that regressed the accuracy. If you also need to vary the IREE runtime, your IREE source dir would need to be used from the shortfin build: SHARK-Platform/shortfin/CMakeLists.txt Lines 174 to 175 in 79fe7e2
|
Used bisect to triage the issue to the following commit: iree-org/iree@2a5d123 VerificationFailing Local TestUsing commit listed above: iree-org/iree@2a5d123 Commit: 2a5d12323c216e275dcc5f955b70aa60d89d47ed
Output:
INFO cpu_llm_server_test:cpu_llm_server_test.py:35 Generating request...
INFO cpu_llm_server_test:cpu_llm_server_test.py:48 Prompt text:
INFO cpu_llm_server_test:cpu_llm_server_test.py:49 1 2 3 4 5
INFO cpu_llm_server_test:cpu_llm_server_test.py:52 Generate endpoint status code: 200
INFO cpu_llm_server_test:cpu_llm_server_test.py:54 Generated text:
INFO cpu_llm_server_test:cpu_llm_server_test.py:87 6000 1 10000 10000 1000 Passing Local TestCompared to output of iree-org/iree@1afe2bc Commit: 1afe2bc8993456702841c1b78dd2e4bf211bdca0
Output:
INFO cpu_llm_server_test:cpu_llm_server_test.py:35 Generating request…
INFO cpu_llm_server_test:cpu_llm_server_test.py:48 Prompt text:
INFO cpu_llm_server_test:cpu_llm_server_test.py:49 1 2 3 4 5
INFO cpu_llm_server_test:cpu_llm_server_test.py:52 Generate endpoint status code: 200
INFO cpu_llm_server_test:cpu_llm_server_test.py:54 Generated text:
INFO cpu_llm_server_test:cpu_llm_server_test.py:87 6 7 8 9 10 11 12 13 14 |
@IanWood1 @MaheshRavishankar fyi ^ |
We should revert the change at HEAD |
…iree-org#19032)" This reverts commit 2a5d123. Seems to cause accuracy regressions nod-ai/shark-ai#437 Signed-off-by: MaheshRavishankar <[email protected]>
…#19032) (#19070) This reverts commit 2a5d123. Seems to cause accuracy regressions nod-ai/shark-ai#437 Signed-off-by: MaheshRavishankar <[email protected]>
Description
There appears to be an accuracy drop in using shortfin LLM server with the latest iree-compiler, which causes failures in the CPU LLM Server Integration Test.
I haven't pinned down the exact commit to classify what code may have caused the issue, but it appears to be an accuracy degradation with the latest
iree-compiler
RC.Associated PRs between
latest RC
andlatest - 1 RC
can be found here.Degradation from latest iree-compiler RC:
Long story short, test passes locally for
(Iree-compiler = 20241105.1069, Iree-runtime = 20241105.1069)
and(Iree-compiler = 20241105.1069, iree-runtime = 20241106.1070)
.A.K.A test is passing for
latest -1 RC
for iree-compiler/iree-runtime, and forlatest RC
for iree-runtime andlatest - 1
RC for iree-compiler.But, test begins to fail locally for
(Iree-compiler = 20241106.1070, iree-runtime = 20241105.1069)
.A.K.A test is failing for latest iree-compiler.
There is a level of flakiness in this test involved due to the actual output of the model.
Passing Output
For the passing tests, log output shows:
For both passing combinations.
Note:
6 7 8 8 8 8 8 8
, which is still a pretty inaccurate response. I believe this is mostly due to the fact that it uses a Llama 3b CPU model, which by itself isn't exactly the epitome of accuracy. I've seen better results locally using Llama 8b GPU. May be a useful discussion to upgrade to using 8b.Failing Output
However, using latest iree-compiler RC, we still see a more inaccurate output:
Which shows SOMETHING is still going on in relation to shortfin accuracy with latest iree-compiler candidate.
Most of the commits that differ between
latest -1
andlatest
are additions (new functionality) or test/doc related changes. It seems like the modifications of existing functionality appear in the following commits:19035, 19032, 19033
So, those are what appear to be the best candidates for what may be causing the difference in shortfin output.
Full range of PRs between
latest - 1 RC
andlatest RC
can be found here.The text was updated successfully, but these errors were encountered: