[RFC] Local models, remote install and more losely dependencies #14

philschmid · 2024-06-28T07:37:22Z

Hello,

I was playing around with mix_eval and noticed something, which i addressed in a fork to properly use it in my setup. First as #11 also opened there are some missing __init__.py which prevent remote installs with pip install git+https://github.com/Psycoy/MixEval --upgrade. Additionally the dependencies are very hard which makes it harder to integrate in existing environments.

I created a fork to make usage more easier.

This is a fork of the original MixEval repository. The original repository can be found here. I created this fork to make the integration and use of MixEval easier during the training of new models. This Fork includes several improved feature to make usages easier and more flexible. Including:

Evaluation of Local Models during or post trainig with transformers
Hugging Face Datasets integration to avoid the need of local files.
Use of Hugging Face TGI or vLLM to accelerate evaluation and making it more manageable
Improved markdown outputs and timing for the training
Fixed pip install for remote or CI Integration.

Getting started

# Fork with more losely dependencies
pip install git+https://github.com/philschmid/MixEval --upgrade

Note: If you want to evaluate models that are not included Take a look here. Zephyr example here.

Evaluation open LLMs

Remote Hugging Face model with existing config:

# MODEL_PARSER_API=<your openai api key
MODEL_PARSER_API=$(echo $OPENAI_API_KEY) python -m mix_eval.evaluate \
    --data_path hf://zeitgeist-ai/mixeval \
    --model_name zephyr_7b_beta \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --batch_size 20 \
    --output_dir results \
    --api_parallel_num 20

Using vLLM/TGI with hosted or local API:

start you environment

python -m vllm.entrypoints.openai.api_server --model alignment-handbook/zephyr-7b-dpo-full

run the following command

MODEL_PARSER_API=$(echo $OPENAI_API_KEY) API_URL=http://localhost:8000/v1 python -m mix_eval.evaluate \
    --data_path hf://zeitgeist-ai/mixeval \
    --model_name local_api \
    --model_path alignment-handbook/zephyr-7b-dpo-full \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --batch_size 20 \
    --output_dir results \
    --api_parallel_num 20

Results

| Metric                      | Score   |
| --------------------------- | ------- |
| MBPP                        | 100.00% |
| OpenBookQA                  | 62.50%  |
| DROP                        | 47.60%  |
| BBH                         | 43.10%  |
| MATH                        | 38.10%  |
| PIQA                        | 37.50%  |
| TriviaQA                    | 37.30%  |
| BoolQ                       | 35.10%  |
| CommonsenseQA               | 34.00%  |
| GSM8k                       | 33.60%  |
| MMLU                        | 29.00%  |
| HellaSwag                   | 27.90%  |
| AGIEval                     | 26.80%  |
| GPQA                        | 0.00%   |
| ARC                         | 0.00%   |
| SIQA                        | 0.00%   |
| overall score (final score) | 34.85%  |

Total time: 398.0534451007843

Takes around 5 minutes to evaluate.

Local Hugging Face model from path:

# MODEL_PARSER_API=<your openai api key>
MODEL_PARSER_API=$(echo $OPENAI_API_KEY) python -m mix_eval.evaluate \
    --data_path hf://zeitgeist-ai/mixeval \
    --model_path my/local/path \
    --output_dir results/agi-5 \
    --model_name local_chat \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --batch_size 20 \
    --api_parallel_num 20

Remote Hugging Face model without config and defaults

Note: We use the model name local_chat to avoid the need for a config file and load it from the Hugging Face model hub.

# MODEL_PARSER_API=<your openai api key>
MODEL_PARSER_API=$(echo $OPENAI_API_KEY) python -m mix_eval.evaluate \
    --data_path hf://zeitgeist-ai/mixeval \
    --model_path alignment-handbook/zephyr-7b-sft-full \
    --output_dir results/handbook-zephyr \
    --model_name local_chat \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --batch_size 20 \
    --api_parallel_num 20

Update setup.py

Psycoy · 2024-06-29T03:59:50Z

Hey Philipp,

Thanks a lot for your nice commits!
To avoid breaking the original docs, I merged partial commits manually.
I acknowledged your help in the last of README, thanks again!

Expose revision kwarg

carstendraschner · 2024-07-17T13:42:25Z

Hi @philschmid ,

Thank you for your PR, I was working on something similar ;)
In inference, could it be that you missed usage of padding side as you are overwriting build tokenizer?
When using local chat I have sometimes issues with empty responses which might be due to padding side unequal true.
What do you think?

See in your Local Chat :)

regards and thank you for your efforts!
Carsten

H4

philschmid and others added 4 commits June 28, 2024 09:15

Update setup.py

46609e2

Merge pull request #1 from philschmid/patch-1

c812fbf

Update setup.py

add inits to install

a18082e

local models

3aa097a

philschmid changed the title ~~Mak dependencies more loosely and add __init__.py to allo pip install git+https~~ [RFC] Local models, remote install and more losely dependencies Jun 28, 2024

improvements

d0f5a31

lewtun and others added 2 commits July 1, 2024 13:18

Expose revision kwarg

f226211

Merge pull request #2 from lewtun/add-revision

2df025c

Expose revision kwarg

lewtun and others added 3 commits September 16, 2024 07:21

Update vllm usage

24e1b32

Add docs

c9e7e9d

Merge pull request #3 from huggingface/h4

31be255

H4

philschmid closed this Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Local models, remote install and more losely dependencies #14

[RFC] Local models, remote install and more losely dependencies #14

philschmid commented Jun 28, 2024 •

edited

Loading

Psycoy commented Jun 29, 2024

carstendraschner commented Jul 17, 2024

[RFC] Local models, remote install and more losely dependencies #14

[RFC] Local models, remote install and more losely dependencies #14

Conversation

philschmid commented Jun 28, 2024 • edited Loading

Getting started

Evaluation open LLMs

Psycoy commented Jun 29, 2024

carstendraschner commented Jul 17, 2024

philschmid commented Jun 28, 2024 •

edited

Loading