-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sharktank] Evaluation - Add Perplexity test #233
Merged
Merged
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
aee0d58
Add 'datasets' package to load golden dataset
archana-ramalingam 1ee8594
Isolate padding function in tokenizer
archana-ramalingam 0103293
Add utility function to load/run LLMs for evaluation pipeline
archana-ramalingam 1dfdbc6
Add perplexity test
archana-ramalingam 14050f5
Cleanup
archana-ramalingam b7c75f3
delete file
archana-ramalingam a44a8a2
Add perplexity test
archana-ramalingam 8034432
Fix dataset loading
archana-ramalingam a26b17d
Update page_cache_size
archana-ramalingam cd079a7
add run_perplexity and prompts
archana-ramalingam df84163
Merge branch 'main' into perplexity-test
archana-ramalingam 3e0871e
Shift logits and change activation dtype
archana-ramalingam 4a74107
Add Grok model
archana-ramalingam 64e812d
Remove decode and run prefill on every turn
archana-ramalingam 29e6031
Change activation dtype to enable quantized models
archana-ramalingam 9c168f3
Add timing wrapper
archana-ramalingam 38590bb
Add instructions to run evaluation-perplexity
archana-ramalingam 2bf8739
Add prompts text file
archana-ramalingam 70f6ba5
Add logging + cleanup
archana-ramalingam 848da59
Add CI perplexity test
archana-ramalingam 7e49580
Update prompt file path
archana-ramalingam ec6968f
Remove unit tests for nightly
archana-ramalingam f7667ec
Add relative path + push attention_mask to device
archana-ramalingam 7054141
Remove debug changes
archana-ramalingam 70a7b10
Merge branch 'main' into perplexity-test
archana-ramalingam 134c77f
Update dtype to F32 for compatibility across torch versions
archana-ramalingam 27f4e15
Merge branch 'main' into perplexity-test
archana-ramalingam 8a0a081
Add decode
archana-ramalingam e47fe4a
Fix padding logits
archana-ramalingam b15c06d
Add local model path
archana-ramalingam 6da2b38
Add CI test for evaluation
archana-ramalingam 0afe63f
Add perplexity calculated from prefill logits only
archana-ramalingam e4ccb10
Merge branch 'main' into perplexity-test
archana-ramalingam 478f1a1
Add CI tests for perplexity
archana-ramalingam 96458a8
Clean up
archana-ramalingam b4e3635
Merge branch 'perplexity-test' of https://github.com/nod-ai/sharktank…
archana-ramalingam 4af53c3
Clean up
archana-ramalingam e2eb98c
Update argument
archana-ramalingam File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
name: Evaluation Tests | ||
|
||
on: | ||
workflow_dispatch: | ||
schedule: | ||
# Weekdays nightly at 07:00 UTC = 23:00 PST / 00:00 PDT. | ||
- cron: "0 7 * * 1-5" | ||
|
||
concurrency: | ||
# A PR number if a pull request and otherwise the commit hash. This cancels | ||
# queued and in-progress runs for the same PR (presubmit) or commit | ||
# (postsubmit). The workflow name is prepended to avoid conflicts between | ||
# different workflows. | ||
group: ${{ github.workflow }}-${{ github.event.number || github.sha }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
test_perplexity: | ||
name: "Evaluation Tests - perplexity" | ||
strategy: | ||
matrix: | ||
version: [3.11] | ||
os: [ubuntu-latest, windows-latest] | ||
fail-fast: false | ||
runs-on: ${{matrix.os}} | ||
defaults: | ||
run: | ||
shell: bash | ||
env: | ||
PIP_CACHE_DIR: "${{ github.workspace }}/.pip-cache" | ||
steps: | ||
- name: "Setting up Python" | ||
id: setup_python | ||
uses: actions/setup-python@v3 | ||
with: | ||
python-version: ${{matrix.version}} | ||
|
||
- name: "Checkout Code" | ||
uses: actions/checkout@v3 | ||
|
||
- name: Cache Pip Packages | ||
uses: actions/cache@v4 | ||
id: cache-pip | ||
with: | ||
path: ${{ env.PIP_CACHE_DIR }} | ||
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('*requirements.txt') }} | ||
|
||
- name: Install pip deps | ||
run: | | ||
python -m pip install --no-compile --upgrade pip | ||
# Note: We install in three steps in order to satisfy requirements | ||
# from non default locations first. Installing the PyTorch CPU | ||
# wheels saves multiple minutes and a lot of bandwidth on runner setup. | ||
pip install --no-compile -r pytorch-cpu-requirements.txt | ||
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \ | ||
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine" | ||
pip install --no-compile -r requirements.txt -e sharktank/ shortfin/ | ||
|
||
- name: Run perplexity test | ||
run: pytest sharktank/tests/evaluate/perplexity_test.py | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,7 @@ onnx==1.15.0 | |
huggingface-hub==0.22.2 | ||
transformers==4.40.0 | ||
sentencepiece==0.2.0 | ||
datasets==3.0.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should steer towards putting requirements in the subproject requirements files instead of this top level file, especially if this is a test-only requirement
|
||
|
||
# It is expected that you have installed a PyTorch version/variant specific | ||
# to your needs, so we only include a minimum version spec. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Robert Boulter is an English film, television and theatre actor. | ||
Robert Boulter had a guest-starring role on the television series "The Bill" in 2000. | ||
Du Fu was a prominent Chinese poet of the Tang dynasty. | ||
Along with Li Bai (Li Po), Du Fu is frequently called the greatest of the Chinese poets. | ||
The Ise-class battleships were a pair of dreadnought battleships built for the Imperial Japanese Navy (IJN) during World War I. | ||
Originally intended to be repeats of the preceding Fusō class, the Ise-class battleships were redesigned before construction began. Both ships carried supplies for the survivors of the Great Kantō earthquake in 1923. | ||
They were modernized in 1934-37 with improvements to their armour and machinery and a rebuilt superstructure in the pagoda mast style. Afterwards they played a minor role in the Second Sino-Japanese War. | ||
Richard Gale "Dick" Rifenburg (August 21, 1926-December 5, 1994) was an American football player and a pioneering television broadcaster for the forerunner to WIVB-TV in Buffalo. | ||
Rifenburg played college football for the University of Michigan Wolverines in 1944 and from 1946 to 1948. He was a consensus selection at end on the 1948 College Football All-America Team. | ||
Rifenburg played professionally in the National Football League (NFL) with the Detroit Lions for one season in 1950. After retiring from football he settled in Buffalo and became a sports broadcaster. | ||
An oxaziridine is an organic molecule that features a three-membered heterocycle containing oxygen, nitrogen, and carbon. In their largest application, oxazidines are intermediates in the industrial production of hydrazine. | ||
Oxaziridine derivatives are also used as specialized reagents in organic chemistry for a variety of oxidations, including alpha hydroxylation of enolates, epoxidation and aziridination of olefins, and other heteroatom transfer reactions. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If more tests are going in this category, we could use pytest marks or some other filtering to pick up the list of tests to run. As this is now, the new tests are running in multiple workflows (on every commit and here on a nightly schedule).