Skip to content

Commit

Permalink
Update docs to mention add_raw_output attribute
Browse files Browse the repository at this point in the history
  • Loading branch information
gabrielmbmb committed Jun 4, 2024
1 parent 06924a9 commit 578ccb2
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion docs/sections/learn/tutorial/task/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The subclasses of [`Task`][distilabel.steps.tasks.Task] are intended to be used

For example, the most basic task is the [`TextGeneration`][distilabel.steps.tasks.TextGeneration] task, which generates text based on a given instruction, and it can be used standalone as well as within a [`Pipeline`][distilabel.pipeline.Pipeline].

```python
```python
from distilabel.steps.tasks import TextGeneration

Expand All @@ -18,12 +19,23 @@ task = TextGeneration(
task.load()

next(task.process([{"instruction": "What's the capital of Spain?"}]))
# [{'instruction': "What's the capital of Spain?", "generation": "The capital of Spain is Madrid.", "model_name": "gpt-4"}]
# [
# {
# "instruction": "What's the capital of Spain?",
# "generation": "The capital of Spain is Madrid.",
# "model_name": "gpt-4",
# "distilabel_metadata": {
# "raw_output_text-generation": "The capital of Spain is Madrid"
# }
# }
# ]
```

!!! NOTE
The `load` method needs to be called ALWAYS if using the tasks as standalone, otherwise, if the [`Pipeline`][distilabel.pipeline.Pipeline] context manager is used, there's no need to call that method, since it will be automatically called on `Pipeline.run`; but in any other case the method `load` needs to be called from the parent class e.g. a [`Task`][distilabel.steps.tasks.Task] with an [`LLM`][distilabel.llms.LLM] will need to call `Task.load` to load both the task and the LLM.

As we can see in the comment of the code snippet above, the task has enriched the input dictionaries adding the `generation`, the `model_name` that was used to generate, and finally the `distilabel_metadata` dictionary that contains the raw output (without post-processing) from the LLM. In this case, the `TextGeneration` task does no post-processing, so the `generation` and the raw output is the same, but some other tasks do post-processing, which in some situations it can fail. That's why is useful to have the raw output available in the `distilabel_metadata` dictionary. If this default behaviour is not desired, then all the `Task`s has a `add_raw_output` attribute that we can set to `False` when creating the instance of the task or at run time.

## Defining custom Tasks

In order to define custom tasks, we need to inherit from the [`Task`][distilabel.steps.tasks.Task] class and implement the `format_input` and `format_output` methods, as well as setting the properties `inputs` and `outputs`, as for [`Step`][distilabel.steps.Step] subclasses.
Expand Down

0 comments on commit 578ccb2

Please sign in to comment.