[sharktank] Evaluation - Add Perplexity test #233

archana-ramalingam · 2024-09-27T09:50:02Z

Add Perplexity test for LLM evaluation

rsuderman

Just some minor optional changes.

sharktank/sharktank/utils/load_llm.py

sharktank/sharktank/utils/tokenizer.py

sharktank/tests/evaluate/perplexity.py

…into perplexity-test

ScottTodd · 2024-10-16T16:08:08Z

sharktank/tests/evaluate/perplexity_test.py

+        llama_8b_f16_gguf_path = "/data/extra/models/llama3.1_8B/llama8b_f16.gguf"
+        llama_8b_f16_tokenizer_path = (
+            "/data/extra/models/llama3.1_8B/tokenizer_config.json"
+        )


Unit tests failed on this PR and after merge:

https://github.com/nod-ai/SHARK-Platform/actions/runs/11359973189/job/31596941530

https://github.com/nod-ai/SHARK-Platform/actions/runs/11360375745/job/31598023494

Any files required to run a test should be either

included in the repository (if small enough)

downloaded (and cached) on demand as part of the test

downloaded (and cached) ahead of time via a script

As this is, this test will only run on a machine that has already run some unknown, undocumented setup steps.

ScottTodd · 2024-10-16T16:09:57Z

.github/workflows/eval_test.yaml

+      - name: Run perplexity test
+        run:  pytest sharktank/tests/evaluate/perplexity_test.py


If more tests are going in this category, we could use pytest marks or some other filtering to pick up the list of tests to run. As this is now, the new tests are running in multiple workflows (on every commit and here on a nightly schedule).

ScottTodd · 2024-10-16T16:11:35Z

requirements.txt

@@ -7,6 +7,7 @@ onnx==1.15.0
 huggingface-hub==0.22.2
 transformers==4.40.0
 sentencepiece==0.2.0
+datasets==3.0.0


We should steer towards putting requirements in the subproject requirements files instead of this top level file, especially if this is a test-only requirement

Existing requirements file specific to sharktank: https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/requirements.txt

Test requirements file for shortfin: https://github.com/nod-ai/SHARK-Platform/blob/main/shortfin/requirements-tests.txt

This reverts commit e30d0af.

Reverts #233

archana-ramalingam added 5 commits September 27, 2024 04:19

Add 'datasets' package to load golden dataset

aee0d58

Isolate padding function in tokenizer

1ee8594

Add utility function to load/run LLMs for evaluation pipeline

0103293

Add perplexity test

1dfdbc6

Cleanup

14050f5

archana-ramalingam requested a review from rsuderman September 27, 2024 10:04

archana-ramalingam added 3 commits September 27, 2024 05:16

delete file

b7c75f3

Add perplexity test

a44a8a2

Fix dataset loading

8034432

rsuderman approved these changes Sep 27, 2024

View reviewed changes

sharktank/sharktank/utils/load_llm.py Outdated Show resolved Hide resolved

sharktank/sharktank/utils/tokenizer.py Show resolved Hide resolved

sharktank/tests/evaluate/perplexity.py Outdated Show resolved Hide resolved

archana-ramalingam and others added 20 commits September 27, 2024 15:14

Update page_cache_size

a26b17d

add run_perplexity and prompts

cd079a7

Merge branch 'main' into perplexity-test

df84163

Shift logits and change activation dtype

3e0871e

Add Grok model

4a74107

Remove decode and run prefill on every turn

64e812d

Change activation dtype to enable quantized models

29e6031

Add timing wrapper

9c168f3

Add instructions to run evaluation-perplexity

38590bb

Add prompts text file

2bf8739

Add logging + cleanup

70f6ba5

Add CI perplexity test

848da59

Update prompt file path

7e49580

Remove unit tests for nightly

ec6968f

Add relative path + push attention_mask to device

f7667ec

Remove debug changes

7054141

Merge branch 'main' into perplexity-test

70a7b10

Update dtype to F32 for compatibility across torch versions

134c77f

Merge branch 'main' into perplexity-test

27f4e15

Add decode

8a0a081

archana-ramalingam and others added 10 commits October 11, 2024 01:16

Fix padding logits

e47fe4a

Add local model path

b15c06d

Add CI test for evaluation

6da2b38

Add perplexity calculated from prefill logits only

0afe63f

Merge branch 'main' into perplexity-test

e4ccb10

Add CI tests for perplexity

478f1a1

Clean up

96458a8

Merge branch 'perplexity-test' of https://github.com/nod-ai/sharktank …

b4e3635

…into perplexity-test

Clean up

4af53c3

Update argument

e2eb98c

archana-ramalingam merged commit e30d0af into main Oct 16, 2024
6 of 9 checks passed

archana-ramalingam deleted the perplexity-test branch October 16, 2024 06:46

ScottTodd mentioned this pull request Oct 16, 2024

Ignore dir containing perplexity test from presubmit workflow #283

Closed

ScottTodd reviewed Oct 16, 2024

View reviewed changes

archana-ramalingam added a commit that referenced this pull request Oct 16, 2024

Revert "[sharktank] Evaluation - Add Perplexity test (#233)"

a2bc667

This reverts commit e30d0af.

archana-ramalingam mentioned this pull request Oct 16, 2024

Revert "[sharktank] Evaluation - Add Perplexity test" #285

Merged

archana-ramalingam restored the perplexity-test branch October 16, 2024 16:31

archana-ramalingam added a commit that referenced this pull request Oct 16, 2024

Revert "[sharktank] Evaluation - Add Perplexity test" (#285)

6b90ac7

Reverts #233

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sharktank] Evaluation - Add Perplexity test #233

[sharktank] Evaluation - Add Perplexity test #233

archana-ramalingam commented Sep 27, 2024

rsuderman left a comment

ScottTodd Oct 16, 2024

ScottTodd Oct 16, 2024

ScottTodd Oct 16, 2024

		- name: Run perplexity test
		run: pytest sharktank/tests/evaluate/perplexity_test.py

[sharktank] Evaluation - Add Perplexity test #233

[sharktank] Evaluation - Add Perplexity test #233

Conversation

archana-ramalingam commented Sep 27, 2024

rsuderman left a comment

Choose a reason for hiding this comment

ScottTodd Oct 16, 2024

Choose a reason for hiding this comment

ScottTodd Oct 16, 2024

Choose a reason for hiding this comment

ScottTodd Oct 16, 2024

Choose a reason for hiding this comment