Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharktank] Evaluation - Add Perplexity test for vmfb #306

Merged
merged 65 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
07130b8
Get baseline_perplexity_scores from azure sharkpublic blob
archana-ramalingam Oct 22, 2024
cd21d75
Merge branch 'main' into perplexity-vmfb
archana-ramalingam Oct 22, 2024
ebe1e69
Add perplexity for vmfb
archana-ramalingam Oct 23, 2024
c6b9998
Merge branch 'perplexity-vmfb' of https://github.com/nod-ai/SHARK-Pla…
archana-ramalingam Oct 23, 2024
aa47d67
Add vmfb runner script
archana-ramalingam Oct 23, 2024
1a7933a
Update test
archana-ramalingam Oct 23, 2024
026318a
Rename perplexity torch test
archana-ramalingam Oct 23, 2024
a2d7c7a
Merge branch 'main' into perplexity-vmfb
archana-ramalingam Oct 24, 2024
089f590
Revert npy to json
archana-ramalingam Oct 24, 2024
1711f85
Update gguf to irpa
archana-ramalingam Oct 24, 2024
74b376f
Add vmfb test
archana-ramalingam Oct 24, 2024
6a9b5b3
Reduce tqdm progress print frequency
archana-ramalingam Oct 24, 2024
dfa3218
Add -s flag for pytest to display test progress
archana-ramalingam Oct 24, 2024
0e83a2a
Merge main with branch
archana-ramalingam Oct 24, 2024
7c85d0d
Update vmfb perplexity
archana-ramalingam Oct 24, 2024
688d208
Merge branch 'main' into perplexity-vmfb
archana-ramalingam Oct 24, 2024
26b48de
Address review comments
archana-ramalingam Oct 24, 2024
4bb5857
Merge branch 'perplexity-vmfb' of https://github.com/nod-ai/SHARK-Pla…
archana-ramalingam Oct 24, 2024
3945f37
Add export & compile tests
archana-ramalingam Oct 25, 2024
c9fa072
Update export test script
archana-ramalingam Oct 25, 2024
7f4de96
Cleanup
archana-ramalingam Oct 25, 2024
1a26ed7
Test export
archana-ramalingam Oct 25, 2024
2725512
Update artifacts dir
archana-ramalingam Oct 25, 2024
d4d1d18
Add batch size
archana-ramalingam Oct 25, 2024
3c22732
Merge main
archana-ramalingam Oct 25, 2024
1f02051
Test export
archana-ramalingam Oct 25, 2024
6190176
Remove artifacts dir
archana-ramalingam Oct 25, 2024
9fe2c40
Remove export test and add as tool
archana-ramalingam Oct 25, 2024
cf6ee83
Add log messages
archana-ramalingam Oct 25, 2024
9dbc07a
Add log messages
archana-ramalingam Oct 25, 2024
f5c4fef
Update vmfb runner module name dynamically
archana-ramalingam Oct 25, 2024
3a91051
Update llallama3_8B_f16_decomposed_vmfb perplexities
archana-ramalingam Oct 25, 2024
006c5d4
Move CI to mi300x-3
archana-ramalingam Oct 25, 2024
7fe9594
Address review comments
archana-ramalingam Oct 26, 2024
03baccb
Revert debug to info logging
archana-ramalingam Oct 26, 2024
52a6fc1
Test
archana-ramalingam Oct 26, 2024
da04fd1
Merge branch 'main' into perplexity-vmfb
archana-ramalingam Oct 26, 2024
d1ed9a2
Update export mlir to remove tensor_parallelism_size arg
archana-ramalingam Oct 26, 2024
8ab20e0
Merge branch 'perplexity-vmfb' of https://github.com/nod-ai/SHARK-Pla…
archana-ramalingam Oct 26, 2024
1876f54
Make non_decomposed version the default
archana-ramalingam Oct 26, 2024
8b274da
Merge branch 'main' into perplexity-vmfb
archana-ramalingam Oct 26, 2024
563f72e
Fix export cmd string parsing issues
archana-ramalingam Oct 26, 2024
e58a10c
Merge branch 'perplexity-vmfb' of https://github.com/nod-ai/SHARK-Pla…
archana-ramalingam Oct 26, 2024
4607fb2
Upgrade to latest iree to resolve dynamo error
archana-ramalingam Oct 26, 2024
19e29d9
Add error handling if mlir export fails
archana-ramalingam Oct 28, 2024
493feeb
Update perplexity scores
archana-ramalingam Oct 28, 2024
b65c882
test benchmark export
archana-ramalingam Oct 28, 2024
ea311e8
test benchmark export
archana-ramalingam Oct 28, 2024
b220688
Remove export tests
archana-ramalingam Oct 28, 2024
2a79eda
Merge branch 'main' into perplexity-vmfb
archana-ramalingam Oct 28, 2024
09796b7
Remove hardcoded paths
archana-ramalingam Oct 28, 2024
8069f24
Xfail 405b as sharding vmfb is unsupported
archana-ramalingam Oct 28, 2024
fb78644
Update mi-300x-3 path
archana-ramalingam Oct 28, 2024
c3aa964
Test pytest command
archana-ramalingam Oct 28, 2024
7d277d3
Test pytest command
archana-ramalingam Oct 28, 2024
5f54084
Revert benchmarking test changes
archana-ramalingam Oct 28, 2024
052f24a
Revert debug changes
archana-ramalingam Oct 28, 2024
a9227c7
Xfail 405b eager mode perplexity till sharding is fixed
archana-ramalingam Oct 28, 2024
31aebbd
Add xfail to 405b as sharding needs to be fixed
archana-ramalingam Oct 28, 2024
461034b
Final testing
archana-ramalingam Oct 28, 2024
22da6e7
Fix CI test script
archana-ramalingam Oct 29, 2024
fe4988a
Remove CI debugging
archana-ramalingam Oct 29, 2024
fb7d720
Merge branch 'main' into perplexity-vmfb
archana-ramalingam Oct 29, 2024
e2c6c17
Remove dummy 405b vmfb baseline numbers
archana-ramalingam Oct 29, 2024
59082ed
Merge branch 'perplexity-vmfb' of https://github.com/nod-ai/SHARK-Pla…
archana-ramalingam Oct 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 58 additions & 5 deletions .github/workflows/ci_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ concurrency:
cancel-in-progress: true

jobs:
test_perplexity:
test_perplexity_vmfb:
timeout-minutes: 1000
name: "Evaluation Tests - perplexity"
name: "Evaluation Tests - perplexity_vmfb"
strategy:
matrix:
version: [3.11]
runs-on: [llama-mi300]
runs-on: [llama-mi300x-3]
fail-fast: false
runs-on: ${{matrix.runs-on}}
defaults:
Expand Down Expand Up @@ -58,5 +58,58 @@ jobs:
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"
pip install --no-compile -r requirements.txt -r sharktank/requirements-tests.txt -e sharktank/

- name: Run perplexity test
run: pytest -n 4 -v -s sharktank/tests/evaluate/perplexity_test.py --longrun
# Try with the latest nightly releases, not what iree-turbine pins.
# We could also pin to a known working or stable version.
# This should eventually stabilize. Do the best we can for now.
pip install -f https://iree.dev/pip-release-links.html --upgrade \
iree-compiler \
iree-runtime \
"numpy<2.0"
- name: Run perplexity test with vmfb
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_vmfb_test.py --longrun --iree-device='hip://7' --iree-hip-target='gfx942' --llama3-8b-f16-model-path=/data/llama-3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama-3.1/8b/tokenizer_config.json

test_perplexity_torch:
timeout-minutes: 1000
name: "Evaluation Tests - perplexity_torch"
strategy:
matrix:
version: [3.11]
runs-on: [llama-mi300x-3]
fail-fast: false
runs-on: ${{matrix.runs-on}}
defaults:
run:
shell: bash
env:
PIP_CACHE_DIR: "${{ github.workspace }}/.pip-cache"
SHARK_PLATFORM_REPO_ROOT: ${{ github.workspace }}
steps:
- name: "Setting up Python"
id: setup_python
uses: actions/setup-python@v3
with:
python-version: ${{matrix.version}}

- name: "Checkout Code"
uses: actions/checkout@v3

- name: Cache Pip Packages
uses: actions/cache@v4
id: cache-pip
with:
path: ${{ env.PIP_CACHE_DIR }}
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('*requirements.txt') }}

- name: Install sharktank deps
run: |
python -m pip install --no-compile --upgrade pip
# Note: We install in three steps in order to satisfy requirements
# from non default locations first. Installing the PyTorch CPU
# wheels saves multiple minutes and a lot of bandwidth on runner setup.
pip install --no-compile -r pytorch-cpu-requirements.txt
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"
pip install --no-compile -r requirements.txt -r sharktank/requirements-tests.txt -e sharktank/

- name: Run perplexity test in eager mode
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama-3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama-3.1/8b/tokenizer_config.json
94 changes: 75 additions & 19 deletions sharktank/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,20 +72,19 @@ def pytest_addoption(parser):
help="Enable long and slow tests",
)

# TODO: Remove all hardcoded paths in CI tests
parser.addoption(
"--llama3-8b-tokenizer-path",
type=Path,
action="store",
default="/data/extra/models/llama3.1_8B/tokenizer_config.json",
help="Llama3.1 8b tokenizer path, defaults to 30F CI system path",
)

parser.addoption(
"--llama3-8b-f16-gguf-path",
"--llama3-8b-f16-model-path",
type=Path,
action="store",
default="/data/extra/models/llama3.1_8B/llama8b_f16.gguf",
help="Llama3.1 8b gguf model path, defaults to 30F CI system path",
help="Llama3.1 8b model path, defaults to 30F CI system path",
)

parser.addoption(
Expand All @@ -100,16 +99,14 @@ def pytest_addoption(parser):
"--llama3-405b-tokenizer-path",
type=Path,
action="store",
default="/data/extra/models/llama3.1_405B/tokenizer_config.json",
help="Llama3.1 405b tokenizer path, defaults to 30F CI system path",
)

parser.addoption(
"--llama3-405b-f16-gguf-path",
"--llama3-405b-f16-model-path",
type=Path,
action="store",
default="/data/extra/models/llama3.1_405B/llama405b_fp16.gguf",
help="Llama3.1 405b gguf model path, defaults to 30F CI system path",
help="Llama3.1 405b model path, defaults to 30F CI system path",
)

parser.addoption(
Expand All @@ -121,20 +118,49 @@ def pytest_addoption(parser):
)

parser.addoption(
"--baseline-perplexity-score-json",
"--baseline-perplexity-scores",
type=Path,
action="store",
default="sharktank/tests/evaluate/baseline_perplexity_scores.json",
help="Llama3.1 8B & 405B model baseline perplexity scores json",
help="Llama3.1 8B & 405B model baseline perplexity scores",
)

parser.addoption(
"--iree-device",
type=str,
action="store",
help="List an IREE device from iree-run-module --list_devices",
)

parser.addoption(
"--iree-hip-target",
action="store",
default="gfx942",
help="Specify the iree-hip target version (e.g., gfx942)",
)

parser.addoption(
"--iree-hal-target-backends",
action="store",
default="rocm",
help="Specify the iree-hal target backend (e.g., rocm)",
)

parser.addoption(
"--tensor-parallelism-size",
action="store",
type=int,
default=1,
help="Number of devices for tensor parallel sharding",
)

parser.addoption(
"--bs",
action="store",
type=int,
default=4,
help="Batch size for mlir export",
)


def set_fixture_from_cli_option(
request: FixtureRequest,
Expand Down Expand Up @@ -183,27 +209,57 @@ def iree_hip_target_type(request: FixtureRequest) -> Optional[str]:


@pytest.fixture(scope="class")
def get_model_path(request: FixtureRequest):
def tensor_parallelism_size(request: FixtureRequest) -> Optional[str]:
return set_fixture_from_cli_option(
request, "tensor_parallelism_size", "tensor_parallelism_size"
)


@pytest.fixture(scope="class")
def baseline_perplexity_scores(request: FixtureRequest) -> Optional[str]:
return set_fixture_from_cli_option(
request, "baseline_perplexity_scores", "baseline_perplexity_scores"
)


@pytest.fixture(scope="class")
def batch_size(request: FixtureRequest) -> Optional[str]:
return set_fixture_from_cli_option(request, "bs", "batch_size")


@pytest.fixture(scope="class")
def get_model_artifacts(request: FixtureRequest):
model_path = {}
model_path["llama3_8b_tokenizer_path"] = set_fixture_from_cli_option(
request, "--llama3-8b-tokenizer-path", "llama3_8b_tokenizer"
)
model_path["llama3_8b_f16_gguf_path"] = set_fixture_from_cli_option(
request, "--llama3-8b-f16-gguf-path", "llama3_8b_f16_model"
model_path["llama3_8b_f16_model_path"] = set_fixture_from_cli_option(
request, "--llama3-8b-f16-model-path", "llama3_8b_f16_model"
)
model_path["llama3_8b_fp8_model_path"] = set_fixture_from_cli_option(
request, "--llama3-8b-fp8-model-path", "llama3_8b_fp8_model"
)
model_path["llama3_405b_tokenizer_path"] = set_fixture_from_cli_option(
request, "--llama3-405b-tokenizer-path", "llama3_405b_tokenizer"
)
model_path["llama3_405b_f16_gguf_path"] = set_fixture_from_cli_option(
request, "--llama3-405b-f16-gguf-path", "llama3_405b_f16_model"
model_path["llama3_405b_f16_model_path"] = set_fixture_from_cli_option(
request, "--llama3-405b-f16-model-path", "llama3_405b_f16_model"
)
model_path["llama3_405b_fp8_model_path"] = set_fixture_from_cli_option(
request, "--llama3-405b-fp8-model-path", "llama3_405b_fp8_model"
)
model_path["baseline_perplexity_score_json"] = set_fixture_from_cli_option(
request, "--baseline-perplexity-score-json", "baseline_perplexity_score_json"
)
return model_path


@pytest.fixture(scope="class")
def get_iree_flags(request: FixtureRequest):
model_path = {}
model_path["iree_device"] = set_fixture_from_cli_option(
request, "--iree-device", "iree_device"
)
model_path["iree_hip_target"] = set_fixture_from_cli_option(
request, "--iree-hip-target", "iree_hip_target"
)
model_path["iree_hal_target_backends"] = set_fixture_from_cli_option(
request, "--iree-hal-target-backends", "iree_hal_target_backends"
)
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@
logging.Formatter(fmt="\n%(levelname)s:%(name)-8s %(message)s")
)

__all__ = ["Perplexity", "run_perplexity"]
__all__ = ["Perplexity_torch", "run_perplexity_torch"]


class Perplexity:
class Perplexity_torch:
"""
Perplexity (PPL) is one of the most common metrics for evaluating language models.
It is defined as the exponentiated average negative log-likelihood of a sequence,
Expand All @@ -59,8 +59,6 @@ def __init__(
device,
kv_cache_type,
):
self.batch_size = 16

self.device = device
self.kv_cache_type = kv_cache_type
self.activation_dtype = torch.float32
Expand Down Expand Up @@ -173,6 +171,8 @@ def get_logits(self):
(self.token_ids != 0).int().detach().clone().to(self.device)
)

self.bs = len(self.test_prompts)

is_first_token = True
start = 0
for i in tqdm(
Expand Down Expand Up @@ -263,8 +263,6 @@ def compute_perplexity(self):
def get_perplexity(self, test_prompts):

self.test_prompts = test_prompts
self.bs = len(self.test_prompts)

self.get_logits()

self.out_logits = self.out_logits[..., :-1, :].contiguous()
Expand All @@ -282,15 +280,15 @@ def get_perplexity(self, test_prompts):
return self.compute_perplexity()


def run_perplexity(
def run_perplexity_torch(
dataset,
tokenizer,
device,
kv_cache_type,
tensor_parallelism_size,
attention_kernel,
):
perplexity = Perplexity(device=device, kv_cache_type=kv_cache_type)
perplexity = Perplexity_torch(device=device, kv_cache_type=kv_cache_type)

perplexity.load_model(dataset, tokenizer, tensor_parallelism_size, attention_kernel)
test_prompts = perplexity.get_prompts()
Expand Down Expand Up @@ -326,7 +324,7 @@ def main(argv):
dataset = cli.get_input_dataset(args)
tokenizer = cli.get_tokenizer(args)

ppl = run_perplexity(
ppl = run_perplexity_torch(
dataset=dataset,
tokenizer=tokenizer,
device=device,
Expand Down
Loading
Loading