Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured output eval #152

Merged
merged 7 commits into from
Nov 12, 2024
Merged

Structured output eval #152

merged 7 commits into from
Nov 12, 2024

Conversation

oplatek
Copy link
Member

@oplatek oplatek commented Nov 11, 2024

What?

Introduces generation with structured outputs using OpenAI client for OpenAI API and VLLM

Defines JSON schema for ErrorSpanAnnotations as

class Annotation(BaseModel):
    text: str = Field(description="The text which is annotated.")
    error_type: int = Field(description="Index to the list of categories defined for the annotation campaign.")
    reason: str = Field(description="The reason for the annotation.")


class OutputAnnotations(BaseModel):
    annotations: list[Annotation] = Field(description="The list of annotations.")

Why?

  • Smaller models without structured outputs fails to follow the expected JSON format, now they comply 🎉
  • Pydantic is an excellent way how to define structured output 💪
  • Parsing is less heuristic 😉

Limitations

Notice that the Annotation has the attribute error_type instead of type as previously because for JSON schema the type is a reserved word.
Just after parsing this PR creates the dictionary with key type instead of error_type so error_type is used for LLM calls and it's parsing.

Limited testing: I tested all three LLM-eval configs (OpenAI, Ollama, VLLM)

The wiki page does not cover vLLM installation and vLLM requires GPU.

@oplatek oplatek requested a review from kasnerz November 11, 2024 17:15
@kasnerz
Copy link
Collaborator

kasnerz commented Nov 12, 2024

@oplatek Thanks, looks useful!

Given that we allow to annotate any span categories, not just errors, can we call it category or category_index instead of error_type?

And will you be able to write the wiki page for VLLM?

@oplatek
Copy link
Member Author

oplatek commented Nov 12, 2024

Given that we allow to annotate any span categories, not just errors, can we call it category or category_index instead of error_type?

Good point! Will fix it

And will you be able to write the wiki page for VLLM?

Yes, I will reference what I did for using VLLM on UFAL GPUs. I hope it will work for most people (I have very limited experience with vLLM but their documentation looks great)

@oplatek oplatek merged commit 8532c1e into release-1.0.0 Nov 12, 2024
1 of 2 checks passed
@oplatek oplatek mentioned this pull request Nov 13, 2024
@oplatek oplatek deleted the structured-output-eval branch November 13, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants