Skip to content

Commit

Permalink
distilabel v1.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
gabrielmbmb authored Jun 18, 2024
2 parents f9057f0 + 63ee8c5 commit 3910aca
Show file tree
Hide file tree
Showing 224 changed files with 18,861 additions and 5,809 deletions.
42 changes: 42 additions & 0 deletions .github/workflows/codspeed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: Benchmarks

on:
push:
branches:
- "main"
pull_request:

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
benchmarks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.12"
# Looks like it's not working very well for other people:
# https://github.com/actions/setup-python/issues/436
# cache: "pip"
# cache-dependency-path: pyproject.toml

- uses: actions/cache@v3
id: cache
with:
path: ${{ env.pythonLocation }}
key: ${{ runner.os }}-python-${{ env.pythonLocation }}-${{ hashFiles('pyproject.toml') }}-benchmarks-v00

- name: Install dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: ./scripts/install_dependencies.sh

- name: Run benchmarks
uses: CodSpeedHQ/action@v2
with:
token: ${{ secrets.CODSPEED_TOKEN }}
run: pytest tests/ --codspeed
4 changes: 4 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ jobs:
- run: mike deploy dev --push
if: github.ref == 'refs/heads/develop'
env:
GH_ACCESS_TOKEN: ${{ secrets.GH_ACCESS_TOKEN }}

- run: mike deploy ${{ github.ref_name }} latest --update-aliases --push
if: startsWith(github.ref, 'refs/tags/')
env:
GH_ACCESS_TOKEN: ${{ secrets.GH_ACCESS_TOKEN }}
18 changes: 8 additions & 10 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ on:
types:
- opened
- synchronize
workflow_dispatch:
inputs:
tmate_session:
description: Starts the workflow with tmate enabled.
required: false
default: "false"

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
Expand All @@ -19,7 +25,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
fail-fast: false

steps:
Expand All @@ -42,14 +48,7 @@ jobs:

- name: Install dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: |
python_version=$(python -c "import sys; print(sys.version_info[:2])")
pip install -e .[dev,tests,anthropic,argilla,cohere,groq,hf-inference-endpoints,hf-transformers,litellm,llama-cpp,ollama,openai,outlines,vertexai,vllm]
if [ "${python_version}" != "(3, 8)" ]; then
pip install -e .[mistralai]
fi;
pip install git+https://github.com/argilla-io/LLM-Blender.git
run: ./scripts/install_dependencies.sh

- name: Lint
run: make lint
Expand All @@ -59,4 +58,3 @@ jobs:

- name: Integration Tests
run: make integration-tests
timeout-minutes: 5
5 changes: 2 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,10 @@ repos:
- --fuzzy-match-generates-todo

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.1.4
rev: v0.4.5
hooks:
- id: ruff
args:
- --fix
args: [--fix]
- id: ruff-format

ci:
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ sources = src/distilabel tests

.PHONY: format
format:
ruff --fix $(sources)
ruff check --fix $(sources)
ruff format $(sources)

.PHONY: lint
lint:
ruff $(sources)
ruff check $(sources)
ruff format --check $(sources)

.PHONY: unit-tests
Expand Down
18 changes: 15 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Compute is expensive and output quality is important. We help you **focus on dat

Synthesize and judge data with **latest research papers** while ensuring **flexibility, scalability and fault tolerance**. So you can focus on improving your data and training your models.

## 🏘️ Community
## Community

We are an open-source community-driven project and we love to hear from you. Here are some ways to get involved:

Expand All @@ -68,7 +68,7 @@ Distilabel is a tool that can be used to **synthesize data and provide AI feedba
- Our [distilabeled Intel Orca DPO dataset](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) and the [improved OpenHermes model](https://huggingface.co/argilla/distilabeled-OpenHermes-2.5-Mistral-7B),, show how we **improve model performance by filtering out 50%** of the original dataset through **AI feedback**.
- The [haiku DPO data](https://github.com/davanstrien/haiku-dpo) outlines how anyone can create a **dataset for a specific task** and **the latest research papers** to improve the quality of the dataset.

## 👨🏽‍💻 Installation
## Installation

```sh
pip install distilabel --upgrade
Expand Down Expand Up @@ -116,7 +116,7 @@ with Pipeline(

generate_with_openai = TextGeneration(llm=OpenAILLM(model="gpt-3.5-turbo"))

load_dataset.connect(generate_with_openai)
load_dataset >> generate_with_openai

if __name__ == "__main__":
distiset = pipeline.run(
Expand Down Expand Up @@ -153,3 +153,15 @@ If you build something cool with `distilabel` consider adding one of these badge

To directly contribute with `distilabel`, check our [good first issues](https://github.com/argilla-io/distilabel/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) or [open a new one](https://github.com/argilla-io/distilabel/issues/new/choose).

## Citation

```bibtex
@misc{distilabel-argilla-2024,
author = {Álvaro Bartolomé Del Canto and Gabriel Martín Blázquez and Agustín Piqueres Lajarín and Daniel Vila Suero},
title = {Distilabel: An AI Feedback (AIF) framework for building datasets with and for LLMs},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/argilla-io/distilabel}}
}
```
2 changes: 1 addition & 1 deletion docs/api/cli.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Command Line Interface (CLI)

This section contains the API reference for the CLI. For more information on how to use the CLI, see [Tutorial - CLI](../sections/learn/tutorial/cli/index.md).
This section contains the API reference for the CLI. For more information on how to use the CLI, see [Tutorial - CLI](../sections/how_to_guides/advanced/cli/index.md).

## Utility functions for the `distilabel pipeline` sub-commands

Expand Down
6 changes: 6 additions & 0 deletions docs/api/distiset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Distiset

This section contains the API reference for the Distiset. For more information on how to use the CLI, see [Tutorial - CLI](../sections/how_to_guides/advanced/distiset.md).

:::distilabel.distiset.Distiset
:::distilabel.distiset.create_distiset
3 changes: 3 additions & 0 deletions docs/api/llm/cohere.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# CohereLLM

::: distilabel.llms.cohere
2 changes: 1 addition & 1 deletion docs/api/llm/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

This section contains the API reference for the `distilabel` LLMs, both for the [`LLM`][distilabel.llms.LLM] synchronous implementation, and for the [`AsyncLLM`][distilabel.llms.AsyncLLM] asynchronous one.

For more information and examples on how to use existing LLMs or create custom ones, please refer to [Tutorial - LLM](../../sections/learn/tutorial/llm/index.md).
For more information and examples on how to use existing LLMs or create custom ones, please refer to [Tutorial - LLM](../../sections/how_to_guides/basic/llm/index.md).

::: distilabel.llms.base
2 changes: 1 addition & 1 deletion docs/api/pipeline/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Pipeline

This section contains the API reference for the `distilabel` pipelines. For an example on how to use the pipelines, see the [Tutorial - Pipeline](../../sections/learn/tutorial/pipeline/index.md).
This section contains the API reference for the `distilabel` pipelines. For an example on how to use the pipelines, see the [Tutorial - Pipeline](../../sections/how_to_guides/basic/pipeline/index.md).

::: distilabel.pipeline.base
::: distilabel.pipeline.local
2 changes: 1 addition & 1 deletion docs/api/step/decorator.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

This section contains the reference for the `@step` decorator, used to create new [`Step`][distilabel.steps.Step] subclasses without having to manually define the class.

For more information check the [Tutorial - Step](../../sections/learn/tutorial/step/index.md) page.
For more information check the [Tutorial - Step](../../sections/how_to_guides/basic/step/index.md) page.

::: distilabel.steps.decorator
2 changes: 1 addition & 1 deletion docs/api/step/generator_step.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

This section contains the API reference for the [`GeneratorStep`][distilabel.steps.base.GeneratorStep] class.

For more information and examples on how to use existing generator steps or create custom ones, please refer to [Tutorial - Step - GeneratorStep](../../sections/learn/tutorial/step/generator_step.md).
For more information and examples on how to use existing generator steps or create custom ones, please refer to [Tutorial - Step - GeneratorStep](../../sections/how_to_guides/basic/step/generator_step.md).

::: distilabel.steps.base.GeneratorStep
2 changes: 1 addition & 1 deletion docs/api/step/global_step.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

This section contains the API reference for the [`GlobalStep`][distilabel.steps.base.GlobalStep] class.

For more information and examples on how to use existing global steps or create custom ones, please refer to [Tutorial - Step - GlobalStep](../../sections/learn/tutorial/step/global_step.md).
For more information and examples on how to use existing global steps or create custom ones, please refer to [Tutorial - Step - GlobalStep](../../sections/how_to_guides/basic/step/global_step.md).

::: distilabel.steps.base.GlobalStep
2 changes: 1 addition & 1 deletion docs/api/step/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This section contains the API reference for the `distilabel` step, both for the [`_Step`][distilabel.steps.base._Step] base class and the [`Step`][distilabel.steps.Step] class.

For more information and examples on how to use existing steps or create custom ones, please refer to [Tutorial - Step](../../sections/learn/tutorial/step/index.md).
For more information and examples on how to use existing steps or create custom ones, please refer to [Tutorial - Step](../../sections/how_to_guides/basic/step/index.md).

::: distilabel.steps.base
options:
Expand Down
2 changes: 1 addition & 1 deletion docs/api/step_gallery/columns.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Columns

This section contains the existing steps intended to be used for commong column operations to apply to the batches.
This section contains the existing steps intended to be used for common column operations to apply to the batches.

::: distilabel.steps.combine
::: distilabel.steps.expand
Expand Down
1 change: 1 addition & 0 deletions docs/api/step_gallery/extra.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Extra

::: distilabel.steps.generators.data
::: distilabel.steps.deita
::: distilabel.steps.formatting
::: distilabel.steps.typing
7 changes: 7 additions & 0 deletions docs/api/step_gallery/hugging_face.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Hugging Face

This section contains the existing steps integrated with `Hugging Face` so as to easily push the generated datasets to Hugging Face.

::: distilabel.steps.LoadDataFromDisk
::: distilabel.steps.LoadDataFromFileSystem
::: distilabel.steps.LoadDataFromHub
2 changes: 1 addition & 1 deletion docs/api/task/generator_task.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

This section contains the API reference for the `distilabel` generator tasks.

For more information on how the [`GeneratorTask`][distilabel.steps.tasks.GeneratorTask] works and see some examples, check the [Tutorial - Task - GeneratorTask](../../sections/learn/tutorial/task/generator_task.md) page.
For more information on how the [`GeneratorTask`][distilabel.steps.tasks.GeneratorTask] works and see some examples, check the [Tutorial - Task - GeneratorTask](../../sections/how_to_guides/basic/task/generator_task.md) page.

::: distilabel.steps.tasks.base.GeneratorTask
2 changes: 1 addition & 1 deletion docs/api/task/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This section contains the API reference for the `distilabel` tasks.

For more information on how the [`Task`][distilabel.steps.tasks.Task] works and see some examples, check the [Tutorial - Task](../../sections/learn/tutorial/task/index.md) page.
For more information on how the [`Task`][distilabel.steps.tasks.Task] works and see some examples, check the [Tutorial - Task](../../sections/how_to_guides/basic/task/index.md) page.

::: distilabel.steps.tasks.base
options:
Expand Down
3 changes: 3 additions & 0 deletions docs/api/task/typing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Task Typing

::: distilabel.steps.tasks.typing
1 change: 1 addition & 0 deletions docs/api/task_gallery/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ This section contains the existing [`Task`][distilabel.steps.tasks.Task] subclas
- "!_Task"
- "!GeneratorTask"
- "!ChatType"
- "!typing"
Binary file modified docs/assets/distilabel-badge-light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 3910aca

Please sign in to comment.