Skip to content

Commit

Permalink
Update Arena Hard tasks docstrings
Browse files Browse the repository at this point in the history
  • Loading branch information
gabrielmbmb committed Jun 14, 2024
1 parent e5e9794 commit 60d17c1
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions src/distilabel/steps/tasks/benchmarks/arena_hard.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@


class ArenaHard(Task):
"""This `Task` is based on the "From Live Data to High-Quality Benchmarks: The
"""Evaluates two assistant responses using an LLM as judge.
This `Task` is based on the "From Live Data to High-Quality Benchmarks: The
Arena-Hard Pipeline" paper that presents Arena Hard, which is a benchmark for
instruction-tuned LLMs that contains 500 challenging user queries. GPT-4 is used
as the judge to compare the model responses against a baseline model, which defaults
Expand Down Expand Up @@ -145,7 +147,9 @@ def format_output(


class ArenaHardResults(GlobalStep):
"""This `Step` is based on the "From Live Data to High-Quality Benchmarks: The
"""Process Arena Hard results to calculate the ELO scores.
This `Step` is based on the "From Live Data to High-Quality Benchmarks: The
Arena-Hard Pipeline" paper that presents Arena Hard, which is a benchmark for
instruction-tuned LLMs that contains 500 challenging user queries. This step is
a `GlobalStep` that should run right after the `ArenaHard` task to calculate the
Expand All @@ -155,6 +159,10 @@ class ArenaHardResults(GlobalStep):
Arena-Hard-Auto has the highest correlation and separability to Chatbot Arena
among popular open-ended LLM benchmarks.
Input columns:
- evaluation (`str`): The evaluation of the responses generated by the LLMs.
- score (`str`): The score extracted from the evaluation.
References:
- [From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline](https://lmsys.org/blog/2024-04-19-arena-hard/)
- [`arena-hard-auto`](https://github.com/lm-sys/arena-hard-auto/tree/main)
Expand Down

0 comments on commit 60d17c1

Please sign in to comment.