MT-Bench-X

MT-Bench-X is a framework to evaluate the multilingual instruction following capabilities of large language models. Adapting multilingual pre-trained large language models (LLMs) into articulate and effective assistants is crucial for their application across diverse linguistic regions. In line with this objective, we release our multilingual evaluation benchmark MT-Bench-X to evaluate multilingual models that have been instruction-tuned on different language compositions. We focus on a selection of the most spoken Indo-European languages: English, German, French, Italian, and Spanish.

For more details, see our paper.

This evaluation framework allows to

Generate the answers to the MT-Bench-X benchmark across the selected languages
Let GPT-4-as-a-judge assess the answers
Summarize the results in a single file

Installation

cd MT-Bench-X
mkdir venvs
python3 -m virtualenv --prompt mtbenchx --system-site-packages "venvs/mtbenchx"
. venvs/mtbenchx/bin/activate
pip install --upgrade pip
pip install -e .

Usage

An OpenAI key must be loaded within your environment to judge the generated model answers by GPT-4!

Example execution:

OPENAI_API_KEY=xyz mtbenchx \
    # load model from a local checkpoint or by a Hugging Face Hub ID
    --model-path "your_model_path" \
    # the model-id is used to get the correct ModelAdapter and Conversation(-template)
    --model-id "llama-2" \
    # allows for identification of several checkpoints by a postfix
    --model-id-postfix "my-local-model-variation" \
    --question-begin 6 \
    --question-end 10 \
    --eval-languages DE EN \
    --max-new-token 1024  \
    # how many OpenAI requests to execute in parallel
    --parallel 6 \
    # allows for data-parallel answer generation
    --num-gpus-per-model 1 \
    --num-gpus-total 8

Type mtbenchx --help for more information about the input arguments.

Citation

@misc{
 weber2024investigatingmultilingualinstructiontuningpolyglot,
 title={Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?}, 
 author={Alexander Arno Weber and Klaudia Thellmann and Jan Ebert and Nicolas Flores-Herr and Jens Lehmann and Michael Fromm and Mehdi Ali},
 year={2024},
 eprint={2402.13703},
 archivePrefix={arXiv},
 primaryClass={cs.CL},
 url={https://arxiv.org/abs/2402.13703}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
src/mtbenchx		src/mtbenchx
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MT-Bench-X

Installation

Usage

Citation

About

Releases

Packages

Contributors 2

Languages

License

Modalities/MT-Bench-X

Folders and files

Latest commit

History

Repository files navigation

MT-Bench-X

Installation

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages