Shortfin LLM Accuracy Degradation in Latest `iree-compiler` Release Candidate #437

stbaione · 2024-11-06T22:33:22Z

Description

There appears to be an accuracy drop in using shortfin LLM server with the latest iree-compiler, which causes failures in the CPU LLM Server Integration Test.

I haven't pinned down the exact commit to classify what code may have caused the issue, but it appears to be an accuracy degradation with the latest iree-compiler RC.

Associated PRs between latest RC and latest - 1 RC can be found here.

Degradation from latest iree-compiler RC:

Long story short, test passes locally for (Iree-compiler = 20241105.1069, Iree-runtime = 20241105.1069) and (Iree-compiler = 20241105.1069, iree-runtime = 20241106.1070).

A.K.A test is passing for latest -1 RC for iree-compiler/iree-runtime, and for latest RC for iree-runtime and latest - 1 RC for iree-compiler.

But, test begins to fail locally for (Iree-compiler = 20241106.1070, iree-runtime = 20241105.1069).

A.K.A test is failing for latest iree-compiler.

There is a level of flakiness in this test involved due to the actual output of the model.

Passing Output

For the passing tests, log output shows:

INFO     cpu_llm_server_test:cpu_llm_server_test.py:33 Generating request...
INFO     cpu_llm_server_test:cpu_llm_server_test.py:46 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:47 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:50 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:84 6 7 8 8 8 8 8 8

For both passing combinations.

Note: 6 7 8 8 8 8 8 8, which is still a pretty inaccurate response. I believe this is mostly due to the fact that it uses a Llama 3b CPU model, which by itself isn't exactly the epitome of accuracy. I've seen better results locally using Llama 8b GPU. May be a useful discussion to upgrade to using 8b.

Failing Output

However, using latest iree-compiler RC, we still see a more inaccurate output:

INFO     cpu_llm_server_test:cpu_llm_server_test.py:33 Generating request...
INFO     cpu_llm_server_test:cpu_llm_server_test.py:46 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:47 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:50 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:84 6000 1.

Which shows SOMETHING is still going on in relation to shortfin accuracy with latest iree-compiler candidate.

Most of the commits that differ between latest -1 and latest are additions (new functionality) or test/doc related changes. It seems like the modifications of existing functionality appear in the following commits:

19035, 19032, 19033

So, those are what appear to be the best candidates for what may be causing the difference in shortfin output.

Full range of PRs between latest - 1 RC and latest RC can be found here.

The text was updated successfully, but these errors were encountered:

This test is currently failing. See #437 for details.

ScottTodd · 2024-11-07T17:38:24Z

Highlighting the commit range: iree-org/iree@candidate-20241105.1069...candidate-20241106.1070

ScottTodd · 2024-11-07T17:46:48Z

Tips for running a bisect: https://iree.dev/developers/debugging/compile-time-regressions/#running-git-bisect

If you have your machine configured to build iree-compile from source, you can bisect through the IREE commits in that range. Then put that iree-build/tools/iree-compile on your PATH ahead of any from your Python install so that the test uses it:

SHARK-Platform/build_tools/integration_tests/llm/conftest.py

Lines 101 to 111 in 79fe7e2

    
           subprocess.run( 
        
               [ 
        
                   "iree-compile", 
        
                   mlir_path, 
        
                   "-o", 
        
                   vmfb_path, 
        
               ] 
        
               + settings["device_flags"], 
        
               check=True, 
        
           ) 
        
           logger.info(f"Model successfully compiled to {vmfb_path}")

Hopefully that will then point to a specific commit that regressed the accuracy.

If you also need to vary the IREE runtime, your IREE source dir would need to be used from the shortfin build:

SHARK-Platform/shortfin/CMakeLists.txt

Lines 174 to 175 in 79fe7e2

    
           if(SHORTFIN_IREE_SOURCE_DIR) 
        
             add_subdirectory(${SHORTFIN_IREE_SOURCE_DIR} shortfin_iree SYSTEM EXCLUDE_FROM_ALL)

stbaione · 2024-11-07T22:38:41Z

Used bisect to triage the issue to the following commit: iree-org/iree@2a5d123

Verification

Failing Local Test

Using commit listed above: iree-org/iree@2a5d123
We get the following output:

Commit: 2a5d12323c216e275dcc5f955b70aa60d89d47ed
Output:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:35 Generating request...
INFO     cpu_llm_server_test:cpu_llm_server_test.py:48 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:49 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:54 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:87 6000 1 10000 10000 1000

Passing Local Test

Compared to output of iree-org/iree@1afe2bc
This is the commit directly before the one listed above

Commit: 1afe2bc8993456702841c1b78dd2e4bf211bdca0
Output:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:35 Generating request…
INFO     cpu_llm_server_test:cpu_llm_server_test.py:48 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:49 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:54 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:87 6 7 8 9 10 11 12 13 14

ScottTodd · 2024-11-07T22:41:04Z

@IanWood1 @MaheshRavishankar fyi ^

MaheshRavishankar · 2024-11-08T06:58:46Z

We should revert the change at HEAD

…iree-org#19032)" This reverts commit 2a5d123. Seems to cause accuracy regressions nod-ai/shark-ai#437 Signed-off-by: MaheshRavishankar <[email protected]>

…#19032) (#19070) This reverts commit 2a5d123. Seems to cause accuracy regressions nod-ai/shark-ai#437 Signed-off-by: MaheshRavishankar <[email protected]>

stbaione added the bug Something isn't working label Nov 6, 2024

stbaione mentioned this issue Nov 7, 2024

XFail Accuracy of CPU LLM Server Integration Test #439

Merged

stbaione linked a pull request Nov 7, 2024 that will close this issue

XFail Accuracy of CPU LLM Server Integration Test #439

Merged

stbaione removed a link to a pull request Nov 7, 2024

XFail Accuracy of CPU LLM Server Integration Test #439

Merged

stbaione added a commit that referenced this issue Nov 7, 2024

XFail Accuracy of CPU LLM Server Integration Test (#439)

eeae91d

This test is currently failing. See #437 for details.

MaheshRavishankar mentioned this issue Nov 8, 2024

Revert "Reapply "[DispatchCreation] Extend multi-use producer fusion" (#19032) iree-org/iree#19070

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shortfin LLM Accuracy Degradation in Latest `iree-compiler` Release Candidate #437

Shortfin LLM Accuracy Degradation in Latest `iree-compiler` Release Candidate #437

stbaione commented Nov 6, 2024

ScottTodd commented Nov 7, 2024

ScottTodd commented Nov 7, 2024

stbaione commented Nov 7, 2024

ScottTodd commented Nov 7, 2024

MaheshRavishankar commented Nov 8, 2024

Shortfin LLM Accuracy Degradation in Latest iree-compiler Release Candidate #437

Shortfin LLM Accuracy Degradation in Latest iree-compiler Release Candidate #437

Comments

stbaione commented Nov 6, 2024

Description

Degradation from latest iree-compiler RC:

Passing Output

Failing Output

ScottTodd commented Nov 7, 2024

ScottTodd commented Nov 7, 2024

stbaione commented Nov 7, 2024

Verification

Failing Local Test

Passing Local Test

ScottTodd commented Nov 7, 2024

MaheshRavishankar commented Nov 8, 2024

Shortfin LLM Accuracy Degradation in Latest `iree-compiler` Release Candidate #437

Shortfin LLM Accuracy Degradation in Latest `iree-compiler` Release Candidate #437