Add `StructuredGeneration` task and support for `grammar` in `InferenceEndpointsLLM` #680

alvarobartt · 2024-05-29T12:27:45Z

Description

This PR adds the StructuredGeneration task, similarly to the TextGeneration one, but also expecting the input grammar and producing both the chat-like input and the grammar within the format_input. In order to achieve that, the typing has been updated / modified in Task.process and also LLM.generate so that the received input/s contain either only the chat for most of the cases, or the chat and the grammar for the StructuredGeneration case.

Note

This is still a work in progress and subject to changes, but at the moment this seems the most straight forward / intuitive way to do so.

Additionally, this PR adds the grammar arg within InferenceEndpointsLLM so that it can be provided via the init or via runtime parameter.

Note

The main difference between using the grammar arg compared to using the StructuredGeneration task relies on the fact that the grammar arg is intended to be used with any task, whenever we want the output to match a certain format e.g. in UltraFeedback we may want the output to match a certain regex to avoid output parsing issues; while on the other hand, the task StructuredGeneration is intended when we have different grammars per row and we want to generate an output for the given instruction based on a different grammar per row, e.g. a function calling scenario where we want each generation for a given instruction to match a certain function schema.

Examples

`grammar` at LLM-level (same `grammar` for every generation)

from distilabel.llms import InferenceEndpointsLLM
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromDicts
from distilabel.steps.tasks import TextGeneration
from pydantic import BaseModel


class Character(BaseModel):
    name: str
    description: str
    role: str
    weapon: str


with Pipeline(name="inference-endpoints-structured-generation") as pipeline:
    load_data = LoadDataFromDicts(
        name="load_data",
        data=[{"instruction": "Generate a character from a RPG game."}],
    )

    text_generation_cohere = TextGeneration(
        name="text_generation_cohere",
        llm=InferenceEndpointsLLM(
            model_id="CohereForAI/c4ai-command-r-plus",
            tokenizer_id="CohereForAI/c4ai-command-r-plus",
            api_key="***",  # type: ignore
            grammar={
                "type": "json",
                "value": Character.model_json_schema(),
            },
        ),
        use_system_prompt=False,
        input_batch_size=10,
        output_mappings={"model_name": "generation_model"},
    )

    load_data >> text_generation_cohere  # type: ignore


if __name__ == "__main__":
    distiset = pipeline.run(
        parameters={  # type: ignore
            text_generation_cohere.name: {
                "llm": {
                    "generation_kwargs": {
                        "temperature": 0.7,
                        "max_new_tokens": 4096,
                        "stop_sequences": ["<EOS_TOKEN>", "<|END_OF_TURN_TOKEN|>"],
                    }
                }
            },
        },
    )
    if distiset is not None:
        distiset.push_to_hub(
            "distilabel-internal-testing/inference-endpoints-structured-generation",
            token="***",
        )

`grammar` via `StructuredGeneration (one` grammar` per row)

from distilabel.llms import InferenceEndpointsLLM
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromDicts
from distilabel.steps.tasks.structured_generation import StructuredGeneration
from pydantic import BaseModel


class Character(BaseModel):
    name: str
    description: str
    role: str
    weapon: str


class Animal(BaseModel):
    name: str
    species: str
    habitat: str
    diet: str


with Pipeline(name="inference-endpoints-structured-generation") as pipeline:
    load_data = LoadDataFromDicts(
        name="load_data",
        data=[
            {
                "instruction": "Generate a character from a RPG game.",
                "grammar": {
                    "type": "json",
                    "value": Character.model_json_schema(),
                },
            },
            {
                "instruction": "Generate an animal from a zoo.",
                "grammar": {
                    "type": "json",
                    "value": Animal.model_json_schema(),
                },
            },
            {
                "instruction": "What's the weather like today in Seattle in Celsius degrees?",
                "grammar": {
                    "type": "regex",
                    "value": "(\\d{1,2})°C",
                },
            },
        ],
    )

    task = StructuredGeneration(
        name="task",
        llm=InferenceEndpointsLLM(
            model_id="CohereForAI/c4ai-command-r-plus",
            tokenizer_id="CohereForAI/c4ai-command-r-plus",
            api_key="***",  # type: ignore
        ),
        use_system_prompt=False,
        output_mappings={"model_name": "generation_model"},
    )

    load_data >> task  # type: ignore


if __name__ == "__main__":
    distiset = pipeline.run(
        parameters={  # type: ignore
            task.name: {
                "llm": {
                    "generation_kwargs": {
                        "temperature": 0.7,
                        "max_new_tokens": 4096,
                        "stop_sequences": ["<EOS_TOKEN>", "<|END_OF_TURN_TOKEN|>"],
                    }
                }
            },
        },
    )
    if distiset is not None:
        distiset.push_to_hub(
            "distilabel-internal-testing/inference-endpoints-structured-generation-multiple",
            token="***",
        )

- Now the `generate` method in the `LLM` can receive either a chat or a tuple with the chat and the grammar for that chat - `grammar` is an arg at `LLM` level - The `grammar` can be specified per row via the `StructuredGeneration`, while when specifying a global `grammar` then the `grammar` arg within the `LLM` can be used via the `TextGeneration` task instead

codspeed-hq · 2024-05-31T12:34:28Z

CodSpeed Performance Report

Merging #680 will not alter performance

_{Comparing inference-endpoints-structured-gen (e7399d1) with develop (1624b1e)}

Summary

✅ 1 untouched benchmarks

alvarobartt added 6 commits May 29, 2024 08:30

Fix linting issue from develop branch

7b54367

Add grammar arg in agenerate (WIP)

80e9837

Run codespell in src/ and docs/

f24e11b

Add flatten_dict to avoid pyarrow issues with nested dicts

699e6f3

Handle pyarrow.lib.ArrowInvalid when nested unaligned dicts

a934ff0

alvarobartt added enhancement New feature or request improvement integrations labels May 29, 2024

alvarobartt added this to the 1.2.0 milestone May 29, 2024

alvarobartt requested review from gabrielmbmb and plaguss May 29, 2024 12:27

alvarobartt self-assigned this May 29, 2024

alvarobartt linked an issue May 29, 2024 that may be closed by this pull request

[FEATURE] Add structured generation for InferenceEndpointsLLM #657

Closed

Merge branch 'develop' into inference-endpoints-structured-gen

c3cc487

alvarobartt added 6 commits May 31, 2024 14:42

Add StructuredGeneration docstrings

947e1dd

Fix TextGeneration docstring for model_name output

acb2f19

Rename DefaultInput to StandardInput and add missing docstrings

a4dc53a

Update LLM subclasses type-hints

24b1608

Add StructuredGeneration import in distilabel.steps.tasks

a8944e3

Add InferenceEndpointsLLM and StructuredGeneration

e7399d1

alvarobartt marked this pull request as ready for review June 3, 2024 08:32

plaguss approved these changes Jun 3, 2024

View reviewed changes

plaguss merged commit 918c19f into develop Jun 3, 2024
7 checks passed

plaguss deleted the inference-endpoints-structured-gen branch June 3, 2024 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `StructuredGeneration` task and support for `grammar` in `InferenceEndpointsLLM` #680

Add `StructuredGeneration` task and support for `grammar` in `InferenceEndpointsLLM` #680

alvarobartt commented May 29, 2024 •

edited

Loading

codspeed-hq bot commented May 31, 2024 •

edited

Loading

Add StructuredGeneration task and support for grammar in InferenceEndpointsLLM #680

Add StructuredGeneration task and support for grammar in InferenceEndpointsLLM #680

Conversation

alvarobartt commented May 29, 2024 • edited Loading

Description

Examples

grammar at LLM-level (same grammar for every generation)

grammar via StructuredGeneration (one grammar` per row)

codspeed-hq bot commented May 31, 2024 • edited Loading

CodSpeed Performance Report

Merging #680 will not alter performance

Summary

Add `StructuredGeneration` task and support for `grammar` in `InferenceEndpointsLLM` #680

Add `StructuredGeneration` task and support for `grammar` in `InferenceEndpointsLLM` #680

alvarobartt commented May 29, 2024 •

edited

Loading

`grammar` at LLM-level (same `grammar` for every generation)

`grammar` via `StructuredGeneration (one` grammar` per row)

codspeed-hq bot commented May 31, 2024 •

edited

Loading