Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharktank] Evaluation - Add Perplexity test #233

Merged
merged 38 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
aee0d58
Add 'datasets' package to load golden dataset
archana-ramalingam Sep 27, 2024
1ee8594
Isolate padding function in tokenizer
archana-ramalingam Sep 27, 2024
0103293
Add utility function to load/run LLMs for evaluation pipeline
archana-ramalingam Sep 27, 2024
1dfdbc6
Add perplexity test
archana-ramalingam Sep 27, 2024
14050f5
Cleanup
archana-ramalingam Sep 27, 2024
b7c75f3
delete file
archana-ramalingam Sep 27, 2024
a44a8a2
Add perplexity test
archana-ramalingam Sep 27, 2024
8034432
Fix dataset loading
archana-ramalingam Sep 27, 2024
a26b17d
Update page_cache_size
archana-ramalingam Sep 27, 2024
cd079a7
add run_perplexity and prompts
archana-ramalingam Sep 27, 2024
df84163
Merge branch 'main' into perplexity-test
archana-ramalingam Sep 28, 2024
3e0871e
Shift logits and change activation dtype
archana-ramalingam Sep 30, 2024
4a74107
Add Grok model
archana-ramalingam Sep 30, 2024
64e812d
Remove decode and run prefill on every turn
archana-ramalingam Oct 1, 2024
29e6031
Change activation dtype to enable quantized models
archana-ramalingam Oct 1, 2024
9c168f3
Add timing wrapper
archana-ramalingam Oct 1, 2024
38590bb
Add instructions to run evaluation-perplexity
archana-ramalingam Oct 2, 2024
2bf8739
Add prompts text file
archana-ramalingam Oct 2, 2024
70f6ba5
Add logging + cleanup
archana-ramalingam Oct 2, 2024
848da59
Add CI perplexity test
archana-ramalingam Oct 2, 2024
7e49580
Update prompt file path
archana-ramalingam Oct 2, 2024
ec6968f
Remove unit tests for nightly
archana-ramalingam Oct 2, 2024
f7667ec
Add relative path + push attention_mask to device
archana-ramalingam Oct 3, 2024
7054141
Remove debug changes
archana-ramalingam Oct 3, 2024
70a7b10
Merge branch 'main' into perplexity-test
archana-ramalingam Oct 3, 2024
134c77f
Update dtype to F32 for compatibility across torch versions
archana-ramalingam Oct 4, 2024
27f4e15
Merge branch 'main' into perplexity-test
archana-ramalingam Oct 7, 2024
8a0a081
Add decode
archana-ramalingam Oct 10, 2024
e47fe4a
Fix padding logits
archana-ramalingam Oct 11, 2024
b15c06d
Add local model path
archana-ramalingam Oct 11, 2024
6da2b38
Add CI test for evaluation
archana-ramalingam Oct 11, 2024
0afe63f
Add perplexity calculated from prefill logits only
archana-ramalingam Oct 15, 2024
e4ccb10
Merge branch 'main' into perplexity-test
archana-ramalingam Oct 16, 2024
478f1a1
Add CI tests for perplexity
archana-ramalingam Oct 16, 2024
96458a8
Clean up
archana-ramalingam Oct 16, 2024
b4e3635
Merge branch 'perplexity-test' of https://github.com/nod-ai/sharktank…
archana-ramalingam Oct 16, 2024
4af53c3
Clean up
archana-ramalingam Oct 16, 2024
e2eb98c
Update argument
archana-ramalingam Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions .github/workflows/eval_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: Evaluation Tests

on:
workflow_dispatch:
schedule:
# Weekdays nightly at 07:00 UTC = 23:00 PST / 00:00 PDT.
- cron: "0 7 * * 1-5"

concurrency:
# A PR number if a pull request and otherwise the commit hash. This cancels
# queued and in-progress runs for the same PR (presubmit) or commit
# (postsubmit). The workflow name is prepended to avoid conflicts between
# different workflows.
group: ${{ github.workflow }}-${{ github.event.number || github.sha }}
cancel-in-progress: true

jobs:
test_perplexity:
name: "Evaluation Tests - perplexity"
strategy:
matrix:
version: [3.11]
os: [ubuntu-latest, windows-latest]
fail-fast: false
runs-on: ${{matrix.os}}
defaults:
run:
shell: bash
env:
PIP_CACHE_DIR: "${{ github.workspace }}/.pip-cache"
steps:
- name: "Setting up Python"
id: setup_python
uses: actions/setup-python@v3
with:
python-version: ${{matrix.version}}

- name: "Checkout Code"
uses: actions/checkout@v3

- name: Cache Pip Packages
uses: actions/cache@v4
id: cache-pip
with:
path: ${{ env.PIP_CACHE_DIR }}
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('*requirements.txt') }}

- name: Install pip deps
run: |
python -m pip install --no-compile --upgrade pip
# Note: We install in three steps in order to satisfy requirements
# from non default locations first. Installing the PyTorch CPU
# wheels saves multiple minutes and a lot of bandwidth on runner setup.
pip install --no-compile -r pytorch-cpu-requirements.txt
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"
pip install --no-compile -r requirements.txt -e sharktank/ shortfin/

- name: Run perplexity test
run: pytest sharktank/tests/evaluate/perplexity_test.py
Comment on lines +59 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If more tests are going in this category, we could use pytest marks or some other filtering to pick up the list of tests to run. As this is now, the new tests are running in multiple workflows (on every commit and here on a nightly schedule).

10 changes: 10 additions & 0 deletions docs/model_cookbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,16 @@ iree-run-module \
--parameters=model=/tmp/open_llama_3b_v2/open-llama-3b-v2-f16.gguf
```

## Evaluation pipeline

Run perplexity test:

```bash
python -m sharktank.evaluate.perplexity \
--gguf-file=llama8b_f16.gguf \
--tokenizer-config-json=tokenizer_config.json
```

## Generating data for llama models

```bash
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ onnx==1.15.0
huggingface-hub==0.22.2
transformers==4.40.0
sentencepiece==0.2.0
datasets==3.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should steer towards putting requirements in the subproject requirements files instead of this top level file, especially if this is a test-only requirement


# It is expected that you have installed a PyTorch version/variant specific
# to your needs, so we only include a minimum version spec.
Expand Down
12 changes: 12 additions & 0 deletions sharktank/sharktank/evaluate/data/eval_prompts.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Robert Boulter is an English film, television and theatre actor.
Robert Boulter had a guest-starring role on the television series "The Bill" in 2000.
Du Fu was a prominent Chinese poet of the Tang dynasty.
Along with Li Bai (Li Po), Du Fu is frequently called the greatest of the Chinese poets.
The Ise-class battleships were a pair of dreadnought battleships built for the Imperial Japanese Navy (IJN) during World War I.
Originally intended to be repeats of the preceding Fusō class, the Ise-class battleships were redesigned before construction began. Both ships carried supplies for the survivors of the Great Kantō earthquake in 1923.
They were modernized in 1934-37 with improvements to their armour and machinery and a rebuilt superstructure in the pagoda mast style. Afterwards they played a minor role in the Second Sino-Japanese War.
Richard Gale "Dick" Rifenburg (August 21, 1926-December 5, 1994) was an American football player and a pioneering television broadcaster for the forerunner to WIVB-TV in Buffalo.
Rifenburg played college football for the University of Michigan Wolverines in 1944 and from 1946 to 1948. He was a consensus selection at end on the 1948 College Football All-America Team.
Rifenburg played professionally in the National Football League (NFL) with the Detroit Lions for one season in 1950. After retiring from football he settled in Buffalo and became a sports broadcaster.
An oxaziridine is an organic molecule that features a three-membered heterocycle containing oxygen, nitrogen, and carbon. In their largest application, oxazidines are intermediates in the industrial production of hydrazine.
Oxaziridine derivatives are also used as specialized reagents in organic chemistry for a variety of oxidations, including alpha hydroxylation of enolates, epoxidation and aziridination of olefins, and other heteroatom transfer reactions.
Loading
Loading