Release 1.0.0 · argilla-io/distilabel

What's Changed

Add Step abstract class and new Pipeline by @gabrielmbmb in #338
Add runtime parameters validation by @gabrielmbmb in #345
Pipeline local execution by @gabrielmbmb in #346
Add Task (minimal implementation) by @alvarobartt in #347
Refactor _BatchManager to have list of batches per step by @gabrielmbmb in #353
Refactor getting parameters from Step.process method by @gabrielmbmb in #355
Add LLM, OpenAILLM, TransformersLLM, and LlamaCppLLM by @alvarobartt in #354
Fix Task and TextGeneration by @alvarobartt in #356
Add combine_dicts function and CombineColumns class by @alvarobartt in #358
Add PushToHub step and fix typing by @alvarobartt in #357
Add serialization for the new components by @plaguss in #349
Fix OpenAILLM.api_key due to SecretStr and StepInput wrong imports by @alvarobartt in #359
Add GlobalStep, fix _BatchManager, and add logging by @alvarobartt in #362
Migrate vllm to the new API by @plaguss in #361
Update _BatchManager to work with GlobalSteps and input_batch_size per step by @gabrielmbmb in #366
Clean up outdated / unused files by @alvarobartt in #369
Add input_mappings and output_mappings attributes by @gabrielmbmb in #367
Move batching from Task to LLM, fix vLLM.generate and add DISTILABEL_LOG_LEVEL by @alvarobartt in #371
Improve runtime parameter definition by @gabrielmbmb in #372
Add AsyncOpenAI and update OpenAILLM accordingly by @alvarobartt in #381
Update serde by @gabrielmbmb in #382
Add MistralLLM and add generation_kwargs as RuntimeParameters by @alvarobartt in #383
Move steps out of pipeline by @gabrielmbmb in #384
Add tests and docstring for Task and subclasses by @alvarobartt in #385
Add step decorator by @gabrielmbmb in #387
Add input propagation through Task.process by @alvarobartt in #399
Improve Pipeline error handling by @gabrielmbmb in #400
Fix combine_dicts and StepInput import in PushToHub by @alvarobartt in #401
Improve GlobalStep error handling by @gabrielmbmb in #402
Changed " by italics in EvolInstruct tutorial where one "" was missing by @ignacioct in #398
Add get_last_hidden_states method and update TransformersLLM by @gabrielmbmb in #414
docs: correct small typos in tutorial by @sdiazlor in #419
docs: readme positioning by @davidberenstein1957 in #386
Add num_generations and group_generations parameters to Task by @gabrielmbmb in #416
Add Argilla and PromptCompletionToArgilla by @alvarobartt in #420
Add EvolInstruct and EvolInstructGenerator tasks by @alvarobartt in #407
Wrap optional LLM dependencies under load by @alvarobartt in #428
Add ComplexityScorer task by @gabrielmbmb in #421
Implement caching mechanism for the pipelines by @plaguss in #370
Add method to Pipeline to handle keyboard interruptions via ctrl+c by @plaguss in #406
Add GenerateEmbeddings task by @gabrielmbmb in #427
Add api_key within LLM.load and add llm_kwargs as RuntimeParameter by @alvarobartt in #432
Add GeneratorStep.process validation in DAG and smaller fixes by @alvarobartt in #435
Add EvolComplexity task by @davidberenstein1957 in #415
Add QualityScorer Task by @ignacioct in #425
Add CudaDevicePlacementMixin class by @gabrielmbmb in #436
Return distiset from Pipeline.run by @plaguss in #417
Update README.md by @strickvl in #451
Add InferenceEndpointsLLM by @alvarobartt in #439
Fix Distiset after PushToHub and smaller fixes by @alvarobartt in #452
Fix Step.process_applying_mappings by @alvarobartt in #453
Add AnyscaleLLM by @davidberenstein1957 in #447
Add general function to obtain schema for parquet writer by @plaguss in #454
Add TogetherLLM by @davidberenstein1957 in #449
Fix LLM subclasses based on OpenAILLM by @alvarobartt in #455
Improve batching and caching by @gabrielmbmb in #457
Add EvolQuality task by @davidberenstein1957 in #429
Add VertexAILLM by @davidberenstein1957 in #445
Add use_cache to BasePipeline by @plaguss in #463
Add AnthropicLLM by @sdiazlor in #444
Add multiprocess dependency by @gabrielmbmb in #467
Add UltraFeedback by @alvarobartt in #464
Add OllamaLLM by @davidberenstein1957 in #405
Add RuntimeParametersMixin and LLM runtime parameters by @gabrielmbmb in #466
Add LiteLLM by @davidberenstein1957 in #441
Add CLI by @gabrielmbmb in #471
Set _batch_manager to None after run by @gabrielmbmb in #473
Add create_distiset function by @plaguss in #480
Add overload to step decorator by @gabrielmbmb in #474
Move Enum to Dict[str, str] to avoid serialization errors during caching by @plaguss in #482
Include a dataset card and the pipeline.yaml on Distiset.push_to_hub by @plaguss in #479
Add PairRM task for ranking responses by @plaguss in #450
Update _WriteBuffer to write several parquet files by @gabrielmbmb in #483
Extend Argilla integration TextGeneration, Preference, and more by @alvarobartt in #472
Add DeitaFiltering step by @gabrielmbmb in #481
Add InstructionBacktranslation by @alvarobartt in #486
Fix huggingface_hub TextGenerationError import by @Wauplin in #485
Improve azure openai support by @BramVanroy in #461
Add SelfInstruct task by @ignacioct in #456
Use QueueHandler for Pipeline logging by @gabrielmbmb in #489
Improve _stop and logging by @gabrielmbmb in #491
Fix creating empty Dataset in create_distiset function by @gabrielmbmb in #492
Add imports from __init__ modules by @gabrielmbmb in #493
batch_size and input_batch_size runtime parameters by @gabrielmbmb in #495
Update serialization method of _BatchManager to write each step on its own file by @plaguss in #496
Fix asyncio in AsyncLLM to use the running event loop if any by @alvarobartt in #501
Added authentication header to allow private/gated dataset use by @bjoernpl in #498
Fix generator yielding batches all at once if batch_size == input_batch_size by @gabrielmbmb in #510
Run output queue loop in thread and improve stop by @gabrielmbmb in #511
Update docs for distilabel v1.0 with mkdocs-material by @plaguss in #476
Add CohereLLM by @gabrielmbmb in #508
distilabel v1.0 by @alvarobartt in #352
Remove draft comment by @plaguss in #515
Fix docs/sections/papers/*.md and add example in docs/index.md by @alvarobartt in #516
Small fixes for the docs (images and nav bar) by @gabrielmbmb in #519
Fix CTRL + C when still loading steps by @gabrielmbmb in #521
Empty input queues when CTRL + C by @gabrielmbmb in #528
Add filelock and flash-attn to vllm extra by @alvarobartt in #529
Fix error in README.md when pushing the custom dataset card by @plaguss in #530
Fix pipeline stuck when empty batches by @gabrielmbmb in #531
Add EvolQuality to tasks.__init__.py by @davidberenstein1957 in #525
Show information about subprocess exception by @gabrielmbmb in #532
Update TextGeneration.format_input method to allow OpenAI format by @gabrielmbmb in #533
Improve create_distiset by @plaguss in #534
Fixes regarding RuntimeParameters and pydantic model attributes by @gabrielmbmb in #535
Fix parsing LLM generation kwargs by @gabrielmbmb in #537
pass on Distiset's kwargs to Dataset.push_to_hub() by @rasdani in #522
Set config="default" in Distiset when only one leaf Step by @alvarobartt in #540
docs: update documentation for huggingface inference endpoints. by @burtenshaw in #539
Remove flash-attn from vllm extra by @alvarobartt in #542
Docs fix argilla imports by @burtenshaw in #541
Fix not all exceptions being able to be pickled by @gabrielmbmb in #543
Update CLI example by @gabrielmbmb in #544
Check that Step.name doesn't contain dots or spaces by @gabrielmbmb in #545

New Contributors

@strickvl made their first contribution in #451
@Wauplin made their first contribution in #485
@BramVanroy made their first contribution in #461
@bjoernpl made their first contribution in #498
@rasdani made their first contribution in #522

Full Changelog: 0.6.0...1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.0.0

What's Changed

New Contributors

Contributors