Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notebook: add argilla dataset creation (and uploading everything) #6

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stefan-it
Copy link
Member

Hi,

this notebooks performs the following operations:

  • Use the Translated dataset and enrich is with metadata information (e.g. translation model, original id and even sentence embeddings
  • Create a Hugging Face Dataset and upload it to the hub
  • Create an Argilla dataset
  • Upload the created Argilla dataset to our Hugging Face Space demo

@PhilipMay
Copy link
Member

Is it an Argilla requirement that the metadata column must contain dicts? What is the purpose of the metadata column and why don't we split the info in the dicts into multiple columns?

@PhilipMay
Copy link
Member

I do not understand the column renaming. Why the underscore and why 2 renames that do not change anything?

@dvsrepo
Copy link

dvsrepo commented Apr 3, 2023

Is it an Argilla requirement that the metadata column must contain dicts? What is the purpose of the metadata column and why don't we split the info in the dicts into multiple columns?

The underscore for the instruction field is due to our current limitation for field ordering. We need this to make the instruction field shown at the top of the record. We plan to fix this soon. The other renames are indeed not needed and was old code I provided to @stefan-it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants