Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortfin LLM Accuracy Degradation in Latest iree-compiler Release Candidate #437

Open
stbaione opened this issue Nov 6, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@stbaione
Copy link
Contributor

stbaione commented Nov 6, 2024

Description

There appears to be an accuracy drop in using shortfin LLM server with the latest iree-compiler, which causes failures in the CPU LLM Server Integration Test.

I haven't pinned down the exact commit to classify what code may have caused the issue, but it appears to be an accuracy degradation with the latest iree-compiler RC.

Associated PRs between latest RC and latest - 1 RC can be found here.

Degradation from latest iree-compiler RC:

Long story short, test passes locally for (Iree-compiler = 20241105.1069, Iree-runtime = 20241105.1069) and (Iree-compiler = 20241105.1069, iree-runtime = 20241106.1070).

A.K.A test is passing for latest -1 RC for iree-compiler/iree-runtime, and for latest RC for iree-runtime and latest - 1 RC for iree-compiler.

But, test begins to fail locally for (Iree-compiler = 20241106.1070, iree-runtime = 20241105.1069).

A.K.A test is failing for latest iree-compiler.

There is a level of flakiness in this test involved due to the actual output of the model.

Passing Output

For the passing tests, log output shows:

INFO     cpu_llm_server_test:cpu_llm_server_test.py:33 Generating request...
INFO     cpu_llm_server_test:cpu_llm_server_test.py:46 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:47 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:50 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:84 6 7 8 8 8 8 8 8 

For both passing combinations.

Note: 6 7 8 8 8 8 8 8, which is still a pretty inaccurate response. I believe this is mostly due to the fact that it uses a Llama 3b CPU model, which by itself isn't exactly the epitome of accuracy. I've seen better results locally using Llama 8b GPU. May be a useful discussion to upgrade to using 8b.

Failing Output

However, using latest iree-compiler RC, we still see a more inaccurate output:

INFO     cpu_llm_server_test:cpu_llm_server_test.py:33 Generating request...
INFO     cpu_llm_server_test:cpu_llm_server_test.py:46 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:47 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:50 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:84 6000 1.

Which shows SOMETHING is still going on in relation to shortfin accuracy with latest iree-compiler candidate.

Most of the commits that differ between latest -1 and latest are additions (new functionality) or test/doc related changes. It seems like the modifications of existing functionality appear in the following commits:

19035, 19032, 19033

So, those are what appear to be the best candidates for what may be causing the difference in shortfin output.

Full range of PRs between latest - 1 RC and latest RC can be found here.

@stbaione stbaione added the bug Something isn't working label Nov 6, 2024
@stbaione stbaione linked a pull request Nov 7, 2024 that will close this issue
stbaione added a commit that referenced this issue Nov 7, 2024
This test is currently failing. See
#437 for details.
@ScottTodd
Copy link
Member

Highlighting the commit range: iree-org/iree@candidate-20241105.1069...candidate-20241106.1070

@ScottTodd
Copy link
Member

Tips for running a bisect: https://iree.dev/developers/debugging/compile-time-regressions/#running-git-bisect

If you have your machine configured to build iree-compile from source, you can bisect through the IREE commits in that range. Then put that iree-build/tools/iree-compile on your PATH ahead of any from your Python install so that the test uses it:

subprocess.run(
[
"iree-compile",
mlir_path,
"-o",
vmfb_path,
]
+ settings["device_flags"],
check=True,
)
logger.info(f"Model successfully compiled to {vmfb_path}")

Hopefully that will then point to a specific commit that regressed the accuracy.

If you also need to vary the IREE runtime, your IREE source dir would need to be used from the shortfin build:

if(SHORTFIN_IREE_SOURCE_DIR)
add_subdirectory(${SHORTFIN_IREE_SOURCE_DIR} shortfin_iree SYSTEM EXCLUDE_FROM_ALL)

@stbaione
Copy link
Contributor Author

stbaione commented Nov 7, 2024

Used bisect to triage the issue to the following commit: iree-org/iree@2a5d123

Verification

Failing Local Test

Using commit listed above: iree-org/iree@2a5d123
We get the following output:

Commit: 2a5d12323c216e275dcc5f955b70aa60d89d47ed
Output:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:35 Generating request...
INFO     cpu_llm_server_test:cpu_llm_server_test.py:48 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:49 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:54 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:87 6000 1 10000 10000 1000

Passing Local Test

Compared to output of iree-org/iree@1afe2bc
This is the commit directly before the one listed above

Commit: 1afe2bc8993456702841c1b78dd2e4bf211bdca0
Output:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:35 Generating request…
INFO     cpu_llm_server_test:cpu_llm_server_test.py:48 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:49 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:54 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:87 6 7 8 9 10 11 12 13 14 

@ScottTodd
Copy link
Member

@IanWood1 @MaheshRavishankar fyi ^

@MaheshRavishankar
Copy link

We should revert the change at HEAD

MaheshRavishankar added a commit to MaheshRavishankar/iree that referenced this issue Nov 8, 2024
…iree-org#19032)"

This reverts commit 2a5d123.

Seems to cause accuracy regressions nod-ai/shark-ai#437

Signed-off-by: MaheshRavishankar <[email protected]>
MaheshRavishankar added a commit to iree-org/iree that referenced this issue Nov 8, 2024
…#19032) (#19070)

This reverts commit 2a5d123.

Seems to cause accuracy regressions
nod-ai/shark-ai#437

Signed-off-by: MaheshRavishankar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants