Structured output eval #152

oplatek · 2024-11-11T14:41:16Z

What?

Introduces generation with structured outputs using OpenAI client for OpenAI API and VLLM

Defines JSON schema for ErrorSpanAnnotations as

class Annotation(BaseModel):
    text: str = Field(description="The text which is annotated.")
    error_type: int = Field(description="Index to the list of categories defined for the annotation campaign.")
    reason: str = Field(description="The reason for the annotation.")


class OutputAnnotations(BaseModel):
    annotations: list[Annotation] = Field(description="The list of annotations.")

Why?

Smaller models without structured outputs fails to follow the expected JSON format, now they comply 🎉
Pydantic is an excellent way how to define structured output 💪
Parsing is less heuristic 😉

Limitations

Notice that the Annotation has the attribute error_type instead of type as previously because for JSON schema the type is a reserved word.
Just after parsing this PR creates the dictionary with key type instead of error_type so error_type is used for LLM calls and it's parsing.

Limited testing: I tested all three LLM-eval configs (OpenAI, Ollama, VLLM)

The wiki page does not cover vLLM installation and vLLM requires GPU.

Added example for VLLM inference Completely changed parsing LLM-eval annotations Now we use pydantic and enforce structure

…utput-eval

kasnerz · 2024-11-12T12:52:50Z

@oplatek Thanks, looks useful!

Given that we allow to annotate any span categories, not just errors, can we call it category or category_index instead of error_type?

And will you be able to write the wiki page for VLLM?

oplatek · 2024-11-12T15:29:17Z

Given that we allow to annotate any span categories, not just errors, can we call it category or category_index instead of error_type?

Good point! Will fix it

And will you be able to write the wiki page for VLLM?

Yes, I will reference what I did for using VLLM on UFAL GPUs. I hope it will work for most people (I have very limited experience with vLLM but their documentation looks great)

oplatek added 5 commits November 8, 2024 14:22

removed unused imports in models.py

aaeb751

add VLLMetric and OpenAIMetric supporting structured decoding

0c9721c

OpenAI with structured output tested

98ffa84

Added example for VLLM inference Completely changed parsing LLM-eval annotations Now we use pydantic and enforce structure

after fix url works for vLLM

1d4a660

fix ollama parsing and use localhost for vllm example

7582948

oplatek requested a review from kasnerz November 11, 2024 17:15

Merge remote-tracking branch 'origin/release-1.0.0' into structured-o…

6d7c34e

…utput-eval

rename error_type to annotation_type and fix it also in default prompt

f6941c3

oplatek merged commit 8532c1e into release-1.0.0 Nov 12, 2024
1 of 2 checks passed

oplatek mentioned this pull request Nov 13, 2024

Support vLLM as backend #75

Closed

oplatek deleted the structured-output-eval branch November 13, 2024 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured output eval #152

Structured output eval #152

oplatek commented Nov 11, 2024 •

edited

Loading

kasnerz commented Nov 12, 2024

oplatek commented Nov 12, 2024

Structured output eval #152

Structured output eval #152

Conversation

oplatek commented Nov 11, 2024 • edited Loading

What?

Why?

Limitations

kasnerz commented Nov 12, 2024

oplatek commented Nov 12, 2024

oplatek commented Nov 11, 2024 •

edited

Loading