1.0.0
What's Changed
- Add
Step
abstract class and newPipeline
by @gabrielmbmb in #338 - Add runtime parameters validation by @gabrielmbmb in #345
- Pipeline local execution by @gabrielmbmb in #346
- Add
Task
(minimal implementation) by @alvarobartt in #347 - Refactor
_BatchManager
to have list of batches per step by @gabrielmbmb in #353 - Refactor getting parameters from
Step.process
method by @gabrielmbmb in #355 - Add
LLM
,OpenAILLM
,TransformersLLM
, andLlamaCppLLM
by @alvarobartt in #354 - Fix
Task
andTextGeneration
by @alvarobartt in #356 - Add
combine_dicts
function andCombineColumns
class by @alvarobartt in #358 - Add
PushToHub
step and fixtyping
by @alvarobartt in #357 - Add serialization for the new components by @plaguss in #349
- Fix
OpenAILLM.api_key
due toSecretStr
andStepInput
wrong imports by @alvarobartt in #359 - Add
GlobalStep
, fix_BatchManager
, and addlogging
by @alvarobartt in #362 - Migrate vllm to the new API by @plaguss in #361
- Update
_BatchManager
to work withGlobalStep
s andinput_batch_size
per step by @gabrielmbmb in #366 - Clean up outdated / unused files by @alvarobartt in #369
- Add
input_mappings
andoutput_mappings
attributes by @gabrielmbmb in #367 - Move batching from
Task
toLLM
, fixvLLM.generate
and addDISTILABEL_LOG_LEVEL
by @alvarobartt in #371 - Improve runtime parameter definition by @gabrielmbmb in #372
- Add
AsyncOpenAI
and updateOpenAILLM
accordingly by @alvarobartt in #381 - Update serde by @gabrielmbmb in #382
- Add
MistralLLM
and addgeneration_kwargs
asRuntimeParameters
by @alvarobartt in #383 - Move
steps
out ofpipeline
by @gabrielmbmb in #384 - Add tests and docstring for
Task
and subclasses by @alvarobartt in #385 - Add
step
decorator by @gabrielmbmb in #387 - Add
input
propagation throughTask.process
by @alvarobartt in #399 - Improve
Pipeline
error handling by @gabrielmbmb in #400 - Fix
combine_dicts
andStepInput
import inPushToHub
by @alvarobartt in #401 - Improve
GlobalStep
error handling by @gabrielmbmb in #402 - Changed " by italics in EvolInstruct tutorial where one "" was missing by @ignacioct in #398
- Add
get_last_hidden_states
method and updateTransformersLLM
by @gabrielmbmb in #414 - docs: correct small typos in tutorial by @sdiazlor in #419
- docs: readme positioning by @davidberenstein1957 in #386
- Add
num_generations
andgroup_generations
parameters toTask
by @gabrielmbmb in #416 - Add
Argilla
andPromptCompletionToArgilla
by @alvarobartt in #420 - Add
EvolInstruct
andEvolInstructGenerator
tasks by @alvarobartt in #407 - Wrap optional
LLM
dependencies underload
by @alvarobartt in #428 - Add
ComplexityScorer
task by @gabrielmbmb in #421 - Implement caching mechanism for the pipelines by @plaguss in #370
- Add method to Pipeline to handle keyboard interruptions via ctrl+c by @plaguss in #406
- Add
GenerateEmbeddings
task by @gabrielmbmb in #427 - Add
api_key
withinLLM.load
and addllm_kwargs
asRuntimeParameter
by @alvarobartt in #432 - Add
GeneratorStep.process
validation inDAG
and smaller fixes by @alvarobartt in #435 - Add
EvolComplexity
task by @davidberenstein1957 in #415 - Add
QualityScorer
Task by @ignacioct in #425 - Add
CudaDevicePlacementMixin
class by @gabrielmbmb in #436 - Return
distiset
fromPipeline.run
by @plaguss in #417 - Update README.md by @strickvl in #451
- Add
InferenceEndpointsLLM
by @alvarobartt in #439 - Fix
Distiset
afterPushToHub
and smaller fixes by @alvarobartt in #452 - Fix
Step.process_applying_mappings
by @alvarobartt in #453 - Add
AnyscaleLLM
by @davidberenstein1957 in #447 - Add general function to obtain schema for parquet writer by @plaguss in #454
- Add
TogetherLLM
by @davidberenstein1957 in #449 - Fix
LLM
subclasses based onOpenAILLM
by @alvarobartt in #455 - Improve batching and caching by @gabrielmbmb in #457
- Add
EvolQuality
task by @davidberenstein1957 in #429 - Add
VertexAILLM
by @davidberenstein1957 in #445 - Add
use_cache
toBasePipeline
by @plaguss in #463 - Add
AnthropicLLM
by @sdiazlor in #444 - Add
multiprocess
dependency by @gabrielmbmb in #467 - Add
UltraFeedback
by @alvarobartt in #464 - Add
OllamaLLM
by @davidberenstein1957 in #405 - Add
RuntimeParametersMixin
andLLM
runtime parameters by @gabrielmbmb in #466 - Add
LiteLLM
by @davidberenstein1957 in #441 - Add CLI by @gabrielmbmb in #471
- Set
_batch_manager
toNone
after run by @gabrielmbmb in #473 - Add create_distiset function by @plaguss in #480
- Add
overload
tostep
decorator by @gabrielmbmb in #474 - Move Enum to Dict[str, str] to avoid serialization errors during caching by @plaguss in #482
- Include a dataset card and the
pipeline.yaml
onDistiset.push_to_hub
by @plaguss in #479 - Add
PairRM
task for ranking responses by @plaguss in #450 - Update
_WriteBuffer
to write several parquet files by @gabrielmbmb in #483 - Extend
Argilla
integrationTextGeneration
,Preference
, and more by @alvarobartt in #472 - Add
DeitaFiltering
step by @gabrielmbmb in #481 - Add
InstructionBacktranslation
by @alvarobartt in #486 - Fix huggingface_hub TextGenerationError import by @Wauplin in #485
- Improve azure openai support by @BramVanroy in #461
- Add
SelfInstruct
task by @ignacioct in #456 - Use
QueueHandler
forPipeline
logging by @gabrielmbmb in #489 - Improve
_stop
andlogging
by @gabrielmbmb in #491 - Fix creating empty
Dataset
increate_distiset
function by @gabrielmbmb in #492 - Add imports from
__init__
modules by @gabrielmbmb in #493 batch_size
andinput_batch_size
runtime parameters by @gabrielmbmb in #495- Update serialization method of _BatchManager to write each step on its own file by @plaguss in #496
- Fix
asyncio
inAsyncLLM
to use the running event loop if any by @alvarobartt in #501 - Added authentication header to allow private/gated dataset use by @bjoernpl in #498
- Fix generator yielding batches all at once if
batch_size
==input_batch_size
by @gabrielmbmb in #510 - Run output queue loop in thread and improve stop by @gabrielmbmb in #511
- Update
docs
fordistilabel
v1.0 withmkdocs-material
by @plaguss in #476 - Add
CohereLLM
by @gabrielmbmb in #508 distilabel
v1.0 by @alvarobartt in #352- Remove draft comment by @plaguss in #515
- Fix
docs/sections/papers/*.md
and add example indocs/index.md
by @alvarobartt in #516 - Small fixes for the docs (images and nav bar) by @gabrielmbmb in #519
- Fix CTRL + C when still loading steps by @gabrielmbmb in #521
- Empty input queues when
CTRL + C
by @gabrielmbmb in #528 - Add
filelock
andflash-attn
tovllm
extra by @alvarobartt in #529 - Fix error in README.md when pushing the custom dataset card by @plaguss in #530
- Fix pipeline stuck when empty batches by @gabrielmbmb in #531
- Add
EvolQuality
totasks.__init__.py
by @davidberenstein1957 in #525 - Show information about subprocess exception by @gabrielmbmb in #532
- Update
TextGeneration.format_input
method to allow OpenAI format by @gabrielmbmb in #533 - Improve create_distiset by @plaguss in #534
- Fixes regarding
RuntimeParameter
s andpydantic
model attributes by @gabrielmbmb in #535 - Fix parsing
LLM
generation kwargs by @gabrielmbmb in #537 - pass on Distiset's kwargs to Dataset.push_to_hub() by @rasdani in #522
- Set
config="default"
inDistiset
when only one leafStep
by @alvarobartt in #540 - docs: update documentation for huggingface inference endpoints. by @burtenshaw in #539
- Remove
flash-attn
fromvllm
extra by @alvarobartt in #542 - Docs fix argilla imports by @burtenshaw in #541
- Fix not all exceptions being able to be pickled by @gabrielmbmb in #543
- Update CLI example by @gabrielmbmb in #544
- Check that
Step.name
doesn't contain dots or spaces by @gabrielmbmb in #545
New Contributors
- @strickvl made their first contribution in #451
- @Wauplin made their first contribution in #485
- @BramVanroy made their first contribution in #461
- @bjoernpl made their first contribution in #498
- @rasdani made their first contribution in #522
Full Changelog: 0.6.0...1.0.0