CPU LLM Integration Test #373

stbaione · 2024-10-30T15:03:12Z

Adds integration test for CPU LLM Server (will make sure all CIs are passing before merge)
Small fix in shortfin/python/shortfin_apps/llm/components/service.py to re-enable llm serving after recent changes

renxida · 2024-10-30T16:24:09Z

https://github.com/nod-ai/SHARK-Platform/blob/fb4be2774500a6773e0db71ead7a039a0ca6f83e/.github/workflows/ci-sharktank.yml

Looks like we should maybe paste together the sharktank and shortfin ci ymls to make a new environment to do integration testing on.

@ScottTodd need your opinion on this.

Maybe also make it nightly because it is pretty slow running.

But I kindda want to make sure nobody pushes something to break shortfin serving & maybe we could make it fast enough via caching the HF downloads and tokenizer conversions.

stbaione · 2024-10-30T16:24:43Z

https://github.com/nod-ai/SHARK-Platform/blob/fb4be2774500a6773e0db71ead7a039a0ca6f83e/.github/workflows/ci-sharktank.yml

Looks like we should maybe paste together the sharktank and shortfin ci ymls to make a new environment to do integration testing on.

@ScottTodd need your opinion on this.

Reached out to @ScottTodd this morning. Am currently working on setting up a workflow file

renxida · 2024-10-30T16:27:35Z

Sounds good! Lmk when you set that up - - I also have tests that need to go under the same bucket.

.github/workflows/ci_linux_x64_nogil-libshortfin.yml

shortfin/tests/apps/llm/cpu_llm_server_test.py

stbaione · 2024-10-30T16:39:16Z

@ScottTodd Thanks for the feedback! Working on requested changes and setting up CI workflow

Small patch reflecting some recent changes in `sf.Program` and `sf.ProgramFunction`. Was originally included as part of this PR, which adds an integration test to shortfin llm serving: #373 But, parsing it out, since that may take a little more time to make adjustments/add workflow file. Without it, you get the following error when trying to launch the server: ```text [2024-10-30 11:59:09.939] [info] [manager.py:40] System manager command processor stopped [2024-10-30 11:59:09.991] [error] [on.py:121] Traceback (most recent call last): File "/home/amd/stephen/repos/forks/SHARK-Platform/.venv/lib/python3.12/site-packages/starlette/routing.py", line 693, in lifespan async with self.lifespan_context(app) as maybe_state: File "/home/amd/.pyenv/versions/3.12.5/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/amd/stephen/repos/forks/SHARK-Platform/.venv/lib/python3.12/site-packages/shortfin_apps/llm/server.py", line 42, in lifespan service.start() File "/home/amd/stephen/repos/forks/SHARK-Platform/.venv/lib/python3.12/site-packages/shortfin_apps/llm/components/service.py", line 69, in start self.inference_program = sf.Program( ^^^^^^^^^^^ TypeError: __new__(): incompatible function arguments. The following argument types are supported: 1. __new__(cls: object, modules: collections.abc.Sequence[_shortfin_default.lib.local.ProgramModule], *, devices: collections.abc.Sequence[_shortfin_default.lib.local.Device], trace_execution: bool = False, isolation: _shortfin_default.lib.local.ProgramIsolation = ProgramIsolation.PER_FIBER) -> _shortfin_default.lib.local.Program Invoked with types: nanobind.nb_type_0, kwargs = { modules: list, fiber: _shortfin_default.lib.local.Fiber, trace_execution: bool } [2024-10-30 11:59:09.991] [error] [on.py:59] Application startup failed. Exiting. ``` With it, you're able to start server, send requests, and receive responses.

stbaione · 2024-10-30T20:02:56Z

Pushing in pieces to keep better track of feedback

stbaione · 2024-11-04T16:18:52Z

The CI - shortfin - ASan workflow is failing on the integration test,

With the reason being a leak in the sentencepiece module:

==5818==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x562fe591c1a3 in malloc (/home/runner/work/SHARK-Platform/SHARK-Platform/pyenv/versions/3.12.3/bin/python3.12+0xc71a3) (BuildId: 967a1d6b86eb9d5315fb009c5d4c54dd3a71a6cb)
    #1 0x7f9854031cd8 in _PyObject_New /tmp/python-build.20240924192752.4394/Python-3.12.3/Objects/object.c:319:33
    #2 0x7f983ba6f7f4  (/home/runner/work/SHARK-Platform/SHARK-Platform/pyenv/versions/3.12.3/lib/python3.12/site-packages/sentencepiece/_sentencepiece.cpython-312-x86_64-linux-gnu.so+0x6f7f4) (BuildId: 27ff3b63261f7a54f28902f49e11139b4d3b711f)

This module had to be added for the tokenizer to work in the integration test:

E           ImportError: 
E           LlamaTokenizer requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the
E           installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones
E           that match your environment. Please note that you may need to restart your runtime after installation.

The only test that I see which uses AutoTokenizer would be the CPU LLM integration test, which is outside of shortfin tests. The same tests pass ASan prior to sentencepiece being installed, so it's making me think that it's something under the hood at the build/installation layer which is causing the memory leaks.
I wasn't sure how to proceed with this. I was thinking filing an issue with sentencepiece and ignoring this specific failure in CI, if that's possible.

.github/workflows/ci-sharktank-and-shortfin.yml

ScottTodd

Nice! Mostly LGTM. A few comments about how this could continue to evolve.

.github/workflows/ci-sharktank-and-shortfin.yml

build_tools/integration_tests/cpu_llm_server_test.py

.github/workflows/ci-sharktank-and-shortfin.yml

build_tools/integration_tests/cpu_llm_server_test.py

ScottTodd

LGTM once we decide on which triggers (pull_request, push, schedule) are appropriate.

ScottTodd · 2024-11-05T00:22:20Z

.github/workflows/ci-shark-platform.yml

 on:
  workflow_dispatch:
-  schedule:
-    # Weekdays at 13:00 UTC = 05:00 PST / 06:00 PDT.
-    - cron: "5 4 * * 1-5"
+  pull_request:
+  push:
+    branches:
+      - main


@saienduri is nodai-amdgpu-mi250-x86-64 available enough for the pull_request trigger here? Runners matching that label are currently busy running on-demand jobs in E2ESHARK Test Suite: https://github.com/nod-ai/SHARK-TestSuite/actions/runs/11673948346/job/32505667904

This job itself takes ~5 minutes today, but we shouldn't be queueing for 1h+.

Ah the job itself is a nightly, but maybe time to allocate a mi250 runner just for SHARK-Platform

Sounds good and makes sense as we scale these integration tests in the future. What needs to happen for us to enable that?

Add integration tests for llm server application on cpu

Moved `cpu_llm_server_test.py` to top level testing directory, Added `iree-llvmcpu-target-cpu=host` in `cpu_settings`, Skip if transformers is not installed, Add `available_port` fixture to find an available port instead of hardcoding, Use `/health` route to check for server availability

Add `ci-shark-and-shortfin.yml` workflow, Remove `llama_export_compile_serve.sh` script, Remove `test.yml` workflow file

…indows-latest]`

Remove remove `exists` checks for non-cached files

Parameterize settings and batch_sizes

Add logging throughout test, Move integration test to `build_tools/integration_tests/llm`, Rename ci file

ScottTodd · 2024-11-06T16:51:02Z

Something about this new test started failing overnight. The test might be flaky, or it might not be hermetic enough (e.g. by downloading dependencies that aren't pinned to specific versions).

https://github.com/nod-ai/SHARK-Platform/actions/workflows/ci-shark-platform.yml?query=branch%3Amain

The workflow initially succeeded at this run/commit, but then failed on a retry: https://github.com/nod-ai/SHARK-Platform/actions/runs/11704581609

stbaione · 2024-11-06T17:07:16Z

Something about this new test started failing overnight. The test might be flaky, or it might not be hermetic enough (e.g. by downloading dependencies that aren't pinned to specific versions).

https://github.com/nod-ai/SHARK-Platform/actions/workflows/ci-shark-platform.yml?query=branch%3Amain

The workflow initially succeeded at this run/commit, but then failed on a retry: https://github.com/nod-ai/SHARK-Platform/actions/runs/11704581609

@ScottTodd

Yeah, for some reason the LLM Server started returning faulty responses:

INFO     cpu_llm_server_test:cpu_llm_server_test.py:46 Prompt text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:47 1 2 3 4 5 
INFO     cpu_llm_server_test:cpu_llm_server_test.py:50 Generate endpoint status code: 200
INFO     cpu_llm_server_test:cpu_llm_server_test.py:52 Generated text:
INFO     cpu_llm_server_test:cpu_llm_server_test.py:84 6000 1 10000 10000 1000

Which caused an assertion failure. I'm seeing the exact same output for a failure here: https://github.com/nod-ai/SHARK-Platform/actions/runs/11706659706/job/32604263601?pr=435

The action is not pinned to a specific version for IREE, so my initial thought is that may be the culprit:

pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
            -e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"

Others specify an installation candidate:

- name: Checkout IREE repo
      uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
      with:
        repository: iree-org/iree
        path: ${{ env.IREE_REPO_DIR }}
        submodules: false
        ref: candidate-20241104.1068

In a way this does highlight that there may be a bug between shortfin and latest IREE, but it shouldn't block development. I used the installation instructions from the README for installing IREE as opposed to a specific candidate to catch stuff like this, but that's kind of annoying from a developer perspective.

I'm thinking we pin it to a specific version and we'll catch these kinds of issues when we bump IREE.

stbaione requested review from ScottTodd and renxida October 30, 2024 15:03

stbaione self-assigned this Oct 30, 2024

ScottTodd requested changes Oct 30, 2024

View reviewed changes

stbaione mentioned this pull request Oct 30, 2024

Add Patch for LLM Server #379

Merged

stbaione changed the title ~~Fix LLM Service + CPU LLM Integration Test~~ CPU LLM Integration Test Oct 30, 2024

stbaione mentioned this pull request Oct 31, 2024

Find More General and Easier to use Alternative For Compiling Models for Shortfin LLM Server #402

Open

stbaione marked this pull request as ready for review October 31, 2024 21:48

ScottTodd mentioned this pull request Nov 1, 2024

[alt] remove default iree-compile extra args from onnx-iree mode nod-ai/SHARK-TestSuite#356

Merged

stbaione requested a review from ScottTodd November 4, 2024 17:21

renxida requested changes Nov 4, 2024

View reviewed changes

.github/workflows/ci-sharktank-and-shortfin.yml Outdated Show resolved Hide resolved

.github/workflows/ci-sharktank-and-shortfin.yml Show resolved Hide resolved

.github/workflows/ci-sharktank-and-shortfin.yml Show resolved Hide resolved

stbaione requested a review from renxida November 4, 2024 18:38

renxida approved these changes Nov 4, 2024

View reviewed changes

ScottTodd reviewed Nov 4, 2024

View reviewed changes

stbaione requested a review from ScottTodd November 5, 2024 00:06

ScottTodd reviewed Nov 5, 2024

View reviewed changes

ScottTodd mentioned this pull request Nov 5, 2024

[shortfin] Make SystemBuilder fully configuration/environment driven. #420

Merged

stbaione added 6 commits November 4, 2024 23:10

Fix llm serving,

5d3a0a0

Add integration tests for llm server application on cpu

Add sentencepiece as test requirement

1c3d91d

Parameterize download process for model_test_dir fixture

a91d2d0

Add parameterization to test_llm_server

78a8218

Move cpu_llm_server_test.py to ./build_tools/integration_tests,

d4c1c2c

Add `ci-shark-and-shortfin.yml` workflow, Remove `llama_export_compile_serve.sh` script, Remove `test.yml` workflow file

stbaione added 9 commits November 4, 2024 23:10

Switch sharktank-and-shortfin workflow file to use `[ubuntu-latest, w…

241f69c

…indows-latest]`

Give job a proper name and move os' into matrix

555ed62

Update runner for integration test

7229a6e

Check HF_HOME for caching of model and tokenizer files,

6d4abf7

Remove remove `exists` checks for non-cached files

Read --iree-hip-target from IREE_HIP_TARGET env variable,

be2aafb

Parameterize settings and batch_sizes

Add suppression for sentencepiece

b86f28f

Remove top-level tests dir

96b1236

Run pytest on build_tools/integration_tests directory

83c1031

Move setup funcs to conftest.py,

00244e6

Add logging throughout test, Move integration test to `build_tools/integration_tests/llm`, Rename ci file

renxida force-pushed the llm-server-int-test branch from beb5757 to 00244e6 Compare November 5, 2024 04:10

Merge branch 'main' into llm-server-int-test

ee675ce

ScottTodd approved these changes Nov 5, 2024

View reviewed changes

stbaione added 2 commits November 5, 2024 13:07

Merge branch 'main' into llm-server-int-test

5876c37

Merge branch 'main' into llm-server-int-test

e9e323e

stbaione merged commit 3dcca1f into nod-ai:main Nov 5, 2024
12 checks passed

ScottTodd mentioned this pull request Nov 12, 2024

Shortfin LLM Docs #481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU LLM Integration Test #373

CPU LLM Integration Test #373

stbaione commented Oct 30, 2024

renxida commented Oct 30, 2024 •

edited

Loading

stbaione commented Oct 30, 2024

renxida commented Oct 30, 2024 •

edited

Loading

stbaione commented Oct 30, 2024

stbaione commented Oct 30, 2024

stbaione commented Nov 4, 2024

ScottTodd left a comment

ScottTodd left a comment

ScottTodd Nov 5, 2024

saienduri Nov 5, 2024

stbaione Nov 5, 2024

ScottTodd commented Nov 6, 2024

stbaione commented Nov 6, 2024

CPU LLM Integration Test #373

CPU LLM Integration Test #373

Conversation

stbaione commented Oct 30, 2024

renxida commented Oct 30, 2024 • edited Loading

stbaione commented Oct 30, 2024

renxida commented Oct 30, 2024 • edited Loading

stbaione commented Oct 30, 2024

stbaione commented Oct 30, 2024

stbaione commented Nov 4, 2024

ScottTodd left a comment

Choose a reason for hiding this comment

ScottTodd left a comment

Choose a reason for hiding this comment

ScottTodd Nov 5, 2024

Choose a reason for hiding this comment

saienduri Nov 5, 2024

Choose a reason for hiding this comment

stbaione Nov 5, 2024

Choose a reason for hiding this comment

ScottTodd commented Nov 6, 2024

stbaione commented Nov 6, 2024

renxida commented Oct 30, 2024 •

edited

Loading

renxida commented Oct 30, 2024 •

edited

Loading