Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Import Error #6492

Open
richb-rv opened this issue Oct 9, 2024 · 3 comments
Open

Data Import Error #6492

richb-rv opened this issue Oct 9, 2024 · 3 comments

Comments

@richb-rv
Copy link

richb-rv commented Oct 9, 2024

Describe the bug
We get an incorrect formatting error when attempting to import new data.

Validation error
Error at item 0: "llm.inputs.retrieved_context" key is expected in task data [assume: item["data"] = task root with values] :: {'data': {'observability.identifiers.user': [{'key': 'session_id', 'value': ''}], 'observability.identifiers.system': [{'key': 'correlation_id', 'value': ''}, {'key': 'trace_id', 'value': ''}, {'key': 'parent_span_id', 'value': ''}], 'observability.identifiers.llm': [{'key': 'interaction_id', 'value': ''}, {'key': 'runnable_sequence_id', 'value': ''}, {'key': 'runnable_sequence_step', 'value': ''}, {'key': 'runnable_id', 'value': ''}], 'llm.inputs.retrieved_context': [{'id': '1', 'title': '', 'body': ''}, {'id': '2', 'title': '', 'body': ''}], 'llm.outputs': [{'key': 'text_response', 'value': ''}]}, 'file_upload_id': 28}

I believe this error is telling me that the key llm.inputs.retrieved_context defined in my interface is not present in the data being uploaded, however it is there.

If we import a data file, then add the interface it works fine, but if the interface is already existing we get the error message.

To Reproduce

Example Interface:

<View>
  <Style> .lsf-select { display: none; } </Style>
  <List name="retrieved-context" value="$llm.inputs.retrieved_context" title="Retrieved Context" />
  <header>LLM Outputs:</header>
  <Paragraphs name="llm-outputs" nameKey="key" textkey="value" value="$llm.outputs" layout="dialogue" />
  <Choices name="sentiment" toName="llm-outputs" choice="single" showInLine="true">
   <Choice value="ambiguous"/>
   <Choice value="factually accurate"/>
   <Choice value="factually inaccurate"/>
  </Choices>
</View>

example data:
fa-test.json

Steps to reproduce the behavior:

  1. Create a new project
  2. Add Label Interface
  3. Try to Import the data file

Expected behavior
Data file is uploaded and rendered through the interface

Screenshots
With data input directly into the labeling interface configuration:
Screenshot 2024-10-09 at 9 35 12 AM

When data is uploaded prior to setting up the labeling interface:
Screenshot 2024-10-09 at 9 35 28 AM

When attempting to import data as a file after labeling interface is saved:
Screenshot 2024-10-09 at 9 35 46 AM

Environment (please complete the following information):

  • OS: Mac OS Sonoma 14.5
  • Label Studio Version [e.g. 1.13.1]

Additional context
The same example data works if input as data in the labeling interface preview
The same example data also renders correctly in the UI if you:

  1. Create a new project
  2. Upload the example data file FIRST
  3. Create the labeling interface
@AbubakarSaad
Copy link
Collaborator

AbubakarSaad commented Oct 9, 2024

Hello Rich,

Its because the way data is structure. If you have llm.inputs.retrieved_context then it would mean the strucuture is something similar to this:
"llm": {
"inputs": {
"retrieved_context": [...]
},
But if you just remove llm.inputs and name it as "retrieved_context" it works.
Screenshot 2024-10-09 at 2 46 07 PM
Screenshot 2024-10-09 at 2 46 21 PM

@richb-rv
Copy link
Author

richb-rv commented Oct 9, 2024

Hmm okay interesting, So I'm not able to target nested items using dot notation;
for instance with your example:

"llm": {
"inputs": {
"retrieved_context": [...]
}
},

using dot notation like llm.inputs.retrieved_context does not actually target retrieved_context
(This is the reason we actually flattened that data, and created the key the way we did)

however I did realize that it was the . causing the issue; it seems that you can't use any special characters as separators in the key name, for example something like: llm:inputs:retrieved_context

Are both of those statements accurate?

@richb-rv
Copy link
Author

richb-rv commented Oct 9, 2024

Hey @AbubakarSaad So I did some more digging here, I think there's a couple of bugs, the main one being:
It appears that I can nest data, but I can't do that for example data when creating the labeling interface
It seems that there is some difference in how JSON is parsed between the labeling interface preview, the UI file import feature, and the Importing tasks via API.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants