Add StepResources docs

argilla-io · Jun 26, 2024 · 849a806 · 849a806
1 parent 5c685e9
commit 849a806
Show file tree

Hide file tree

Showing 5 changed files with 37 additions and 2 deletions.
diff --git a/docs/api/step/resources.md b/docs/api/step/resources.md
@@ -0,0 +1,3 @@
+# StepResources
+
+::: distilabel.steps.base.StepResources
diff --git a/docs/sections/how_to_guides/advanced/assigning_resources_to_step.md b/docs/sections/how_to_guides/advanced/assigning_resources_to_step.md
@@ -0,0 +1,30 @@
+# Assigning resources to a `Step`
+
+When dealing with complex pipelines that gets executed in a distributed environment with abundant resources (CPUs and GPUs), sometimes it's necessary to allocate these resources judiciously among the `Step`s. This is why `distilabel` allows to specify the number of `replicas`, `cpus` and `gpus` for each `Step`. Let's see that with an example:
+
+```python
+from distilabel.pipeline import Pipeline
+from distilabel.llms import vLLM
+from distilabel.steps import StepResources
+from distilabel.steps.tasks import PrometheusEval
+
+
+with Pipeline(name="resources") as pipeline:
+    ...
+
+    prometheus = PrometheusEval(
+        llm=vLLM(
+            model="prometheus-eval/prometheus-7b-v2.0",
+            chat_template="[INST] {{ messages[0]['content'] }}\\n{{ messages[1]['content'] }}[/INST]",
+        ),
+        resources=StepResources(replicas=2, cpus=1, gpus=1)
+        mode="absolute",
+        rubric="factual-validity",
+        reference=False,
+        num_generations=1,
+        group_generations=False,
+    )
+```
+
+In the example above, we're creating a `PrometheusEval` task (remember that `Task`s are `Step`s) that will use `vLLM` to serve `prometheus-eval/prometheus-7b-v2.0` model. This task is resource intensive as it requires an LLM, which in turn requires a GPU to run fast. With that in mind, we have specified the `resources` required for the task using the [`StepResources`][distilabel.steps.base.StepResources] class, and we have defined that we need `1` GPU and `1` CPU per replica of the task. In addition, we have defined that we need `2` replicas i.e. we will run two instances of the task so the computation for the whole dataset runs faster. When running the pipeline, `distilabel` will create the tasks in nodes that have available the specified resources.
+
diff --git a/docs/sections/how_to_guides/advanced/structured_generation.md b/docs/sections/how_to_guides/advanced/structured_generation.md
@@ -111,7 +111,7 @@ These were some simple examples, but one can see the options this opens.
 
 !!! Tip
     A full pipeline example can be seen in the following script:
-    [`examples/structured_generation_with_outlines.py`](../../pipeline_samples/examples/#llama-cpp-with-outlines)
+    [`examples/structured_generation_with_outlines.py`](../../pipeline_samples/examples/index.md#llamacpp-with-outlines)
 
 [^1]:
     You can check the variable type by importing it from:

diff --git a/docs/sections/pipeline_samples/examples/index.md b/docs/sections/pipeline_samples/examples/index.md
@@ -2,7 +2,7 @@
 
 This section contains different example pipelines that showcase different tasks, maybe you can take inspiration from them.
 
-### [llama.cpp with `outlines`](#llama-cpp-with-outlines)
+### [llama.cpp with `outlines`](#llamacpp-with-outlines)
 
 Generate RPG characters following a `pydantic.BaseModel` with `outlines` in `distilabel`.
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -160,6 +160,7 @@ nav:
           - Using a file system to pass data of batches between steps: "sections/how_to_guides/advanced/fs_to_pass_data.md"
           - Using CLI to explore and re-run existing Pipelines: "sections/how_to_guides/advanced/cli/index.md"
           - Cache and recover pipeline executions: "sections/how_to_guides/advanced/caching.md"
+          - Assigning resources to a step: "sections/how_to_guides/advanced/assigning_resources_to_step.md"
           - Structured data generation: "sections/how_to_guides/advanced/structured_generation.md"
           - Serving an LLM for sharing it between several tasks: "sections/how_to_guides/advanced/serving_an_llm_for_reuse.md"
   - Pipeline Samples:
@@ -176,6 +177,7 @@ nav:
           - GeneratorStep: "api/step/generator_step.md"
           - GlobalStep: "api/step/global_step.md"
           - "@step": "api/step/decorator.md"
+          - StepResources: "api/step/resources.md"
           - Step Gallery:
               - Argilla: "api/step_gallery/argilla.md"
               - Hugging Face: "api/step_gallery/hugging_face.md"