Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharktank] Add perplexity CI to sharktank dashboard #466

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 20 additions & 7 deletions .github/workflows/ci_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
name: CI - Perplexity

on:
pull_request:
archana-ramalingam marked this conversation as resolved.
Show resolved Hide resolved
workflow_dispatch:
schedule:
# Weekdays nightly at 07:00 UTC = 23:00 PST / 00:00 PDT.
Expand All @@ -21,9 +22,9 @@ concurrency:
cancel-in-progress: true

jobs:
test_perplexity_vmfb:
test_perplexity_iree:
timeout-minutes: 1000
name: "IREE/vmfb"
name: "Perplexity-IREE"
strategy:
matrix:
version: [3.11]
Expand Down Expand Up @@ -74,12 +75,18 @@ jobs:
iree-base-runtime \
"numpy<2.0"

- name: Run perplexity test with vmfb
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_vmfb_test.py --longrun --iree-device='hip://7' --iree-hip-target=gfx942 --iree-hal-target-backends=rocm --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json
- name: Run perplexity test with IREE
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_iree_test.py --longrun --iree-device='hip://7' --iree-hip-target=gfx942 --iree-hal-target-backends=rocm --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json --html=perplexity/perplexity_iree.html

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
archana-ramalingam marked this conversation as resolved.
Show resolved Hide resolved
with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./perplexity

test_perplexity_torch:
timeout-minutes: 1000
name: "Torch/eager mode"
name: "Perplexity-Torch"
strategy:
matrix:
version: [3.11]
Expand Down Expand Up @@ -122,5 +129,11 @@ jobs:
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"

- name: Run perplexity test in eager mode
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json
- name: Run perplexity test with Torch
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json --html=perplexity/perplexity_torch.html

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uses: peaceiris/actions-gh-pages@v3
uses: peaceiris/actions-gh-pages@4f9cc6602d3f66b9c108549d475ec49e8ef4d45e # v4.0.0

Actions should be pinned as as suggested by OpenSSF Scorecard, see https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, is it intended to push to gh-pages with every run of the CI? This probably creates the same problems outlined in issue #395.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nightly, so I don't think it will pose any presubmit problems

with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./perplexity
19 changes: 17 additions & 2 deletions sharktank/sharktank/evaluate/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,31 @@ pip install -r sharktank/requirements-tests.txt

### Perplexity

Test perplexity for Llama3.1 8B & 405B (FP16 & FP8) models:
Perplexity score measures the ability of a language model to predict the next token in a sequence. A lower score indicates that a model has higher certainty in it's predictions. Perplexity acts as an intrinsic evaluation metric that measures the model quality, independent of any downstream task.

In SHARK-Platform, we use perplexity to track code regressions and quality loss across quantized models (with FP16 as baseline). We use 100 prompts randomly selected from the Wikitext-2 test set and calculate the mean perplexities shown below. These numbers are neither comparable between models with different tokenizers nor with other projects due to varying implementations.

* Test perplexity for Llama3.1 8B (FP16) model:

```bash
pytest sharktank/tests/evaluate/perplexity_test.py --longrun
```

Get perplexity for a new model:
* Calculate perplexity for a new model:

```bash
python -m sharktank.evaluate.perplexity \
--gguf-file=llama3_70b_f16.gguf \
--tokenizer-config-json=tokenizer_config.json
```

### LLaMA 3.1 Perplexity Scoreboard

| CPU | GPU |
|:-------------: |:----------:|
| AMD EPYC 9554 | MI300X |


|Models |Model size (GB) |Torch |IREE |
|:--------|:---------------|:----------|:----------|
|8B f16 |16.07 |14.930181 |14.991893 |
2 changes: 1 addition & 1 deletion sharktank/tests/evaluate/baseline_perplexity_scores.json
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@
],
"mean_perplexity": 6.060831
},
"llama3_8B_f16_decomposed_vmfb": {
"llama3_8B_f16_decomposed_iree": {
"perplexities": [
6.651368,
22.059452,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import pytest
import json

from sharktank.evaluate import perplexity_vmfb
from sharktank.evaluate import perplexity_iree

longrun = pytest.mark.skipif("not config.getoption('longrun')")

Expand All @@ -32,10 +32,10 @@ def test_llama3_8B_f16_decomposed(self):

# Llama 3.1 8B decomposed

model_name = "llama3_8B_f16_decomposed_vmfb"
model_name = "llama3_8B_f16_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_f16_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -67,10 +67,10 @@ def test_llama3_8B_f16(self):

# Llama 3.1 8B non-decomposed

model_name = "llama3_8B_f16_vmfb"
model_name = "llama3_8B_f16_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_f16_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -102,10 +102,10 @@ def test_llama3_8B_fp8_decomposed(self):

# Llama 3.1 8B decomposed

model_name = "llama3_8B_fp8_decomposed_vmfb"
model_name = "llama3_8B_fp8_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_fp8_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -137,10 +137,10 @@ def test_llama3_8B_fp8(self):

# Llama 3.1 8B non-decomposed

model_name = "llama3_8B_fp8_vmfb"
model_name = "llama3_8B_fp8_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_fp8_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -172,10 +172,10 @@ def test_llama3_405B_f16_decomposed(self):

# Llama 3.1 405B decomposed

model_name = "llama3_405B_f16_decomposed_vmfb"
model_name = "llama3_405B_f16_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_f16_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -207,10 +207,10 @@ def test_llama3_405B_f16(self):

# Llama 3.1 405B non-decomposed

model_name = "llama3_405B_f16_vmfb"
model_name = "llama3_405B_f16_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_f16_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -242,10 +242,10 @@ def test_llama3_405B_fp8_decomposed(self):

# Llama 3.1 405B decomposed

model_name = "llama3_405B_fp8_decomposed_vmfb"
model_name = "llama3_405B_fp8_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_fp8_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -277,10 +277,10 @@ def test_llama3_405B_fp8(self):

# Llama 3.1 405B non-decomposed

model_name = "llama3_405B_fp8_vmfb"
model_name = "llama3_405B_fp8_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_fp8_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down
Loading