-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GenerateSentencePair
task
#689
Conversation
05b3886
to
fd9d64a
Compare
CodSpeed Performance ReportMerging #689 will not alter performanceComparing Summary
|
fd9d64a
to
2e5d9f8
Compare
d1e00be
to
06924a9
Compare
@@ -30,17 +30,15 @@ | |||
from distilabel.steps.tasks.prometheus_eval import PrometheusEval | |||
from distilabel.steps.tasks.quality_scorer import QualityScorer | |||
from distilabel.steps.tasks.self_instruct import SelfInstruct | |||
from distilabel.steps.tasks.sentence_transformers import GenerateSentencePair | |||
from distilabel.steps.tasks.structured_generation import StructuredGeneration | |||
from distilabel.steps.tasks.text_generation import ChatGeneration, TextGeneration | |||
from distilabel.steps.tasks.typing import ChatItem, ChatType | |||
from distilabel.steps.tasks.ultrafeedback import UltraFeedback | |||
|
|||
__all__ = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import order here was alphabetical, what's the rationale behind this change? Maybe we should change this in other places too to make sure we're aligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale was to have the imports ordered in __all__
by the order in which the were imported, which is more common than having them alphabetically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just read the comments above, and note that some docstrings are missing!
Co-authored-by: alvarobartt <[email protected]>
03aa740
to
3a1be83
Compare
Co-authored-by: alvarobartt <[email protected]>
Description
This PR adds a new task called
GenerateSentencePair
that allows building datasets that can be used to train embedding models. The task can be used to generate apositive
sentence based on the providedanchor
sentence, and iftriplet
attribute isTrue
, then it will generate anegative
sentence too. The task can be used to paraphrase the anchor, generate semantically similar content with respect to the anchor, or to generate a query for the anchor.In addition, this PR has updated the
add_raw_output
attribute so it's now aRuntimeParameter
, and it has nowTrue
as default value, so the raw outputs of the LLMs are stored by default in the final dataset.