Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seems to validate non-HED columns #1031

Open
yarikoptic opened this issue Oct 11, 2024 · 4 comments
Open

seems to validate non-HED columns #1031

yarikoptic opened this issue Oct 11, 2024 · 4 comments

Comments

@yarikoptic
Copy link
Contributor

I had a run

❯ hed-validator -o logs/hed-validator.log .
Using HEDTOOLS version: {'version': '0+untagged.2394.g2159297', 'full-revisionid': '215929781a603d0c097dca8c38246acffb313d09', 'dirty': False, 'error': None, 'date': '2024-10-11T13:35:47-0500'}
Number of issues: 1785282
hed-validator -o logs/hed-validator.log .  390.58s user 4.47s system 100% cpu 6:34.92 total

with full file at http://www.oneukrainian.com/tmp/hed-validator-20241011-1.log.gz happen someone has a boring weekend. But many of the errors are of the form

Errors in file 'sub-0001_ses-04_task-fractional_acq-mb8_run-01_events.tsv'
        Issues in row 2:
                Issues in column duration:
                        hed string: 11.0
                                CHARACTER_INVALID: Invalid character '.' in tag '11.0'  Problem spans string indexes: 2, 3
                                TAG_INVALID: '11.0' in 11.0 is not a valid base HED tag.  Problem spans string indexes: 0, 4
                Issues in column onset:
                        hed string: 12.0532553100586
                                CHARACTER_INVALID: Invalid character '.' in tag '12.0532553100586'  Problem spans string indexes: 2, 3
                                TAG_INVALID: '12.0532553100586' in 12.0532553100586 is not a valid base HED tag.  Problem spans string indexes: 0, 16

whenever we have

❯ head -n 20 task-fractional_events.json
{
  "onset": {
    "LongName": "Onset time of event",
    "Description": "Marks the start of an ongoing event of temporal extent.",
    "Units": "s",
    "HED": "Property/Data-property/Data-marker/Temporal-marker/Onset"
  },
  "duration": {
    "LongName": "The period of time during which an event occurs. Refers to Image duration or response time after stimulus depending on event_type",
    "Description": "a. For falsebelief and falsephoto trial types, duration refers to the image presentations of falsebelief and falsephoto stories. b. For rating_falsebelief and rating_falsephoto, duration refers to the response time to answer true false questions, followed by falsebelief or flasephoto stimulu",
    "Units": "s",
    "HED": "Property/Data-property/Data-value/Spatiotemporal-value/Temporal-value/Duration"
  },
...

have we used HED incorrectly to provide semantic to those columns?

attn @jungheejung

@VisLab
Copy link
Member

VisLab commented Oct 12, 2024

@yarikoptic @jungheejung -- sorry I didn't see this issue until just now -- if you continue to have issues, please re-post. Thx.

Several things.

  1. The onset column should not have HED in it at all. The onset is treated as a special column and not annotated.
  2. Duration requires a value. Annotate as Duration/# in the sidecar --- to represent one annotation that is applicable to the entire column. The # is replaced by the actual column value when assembled. It also requires that you say what is the duration of in parentheses.
  3. Please use short forms of tags.
  4. As a recommended strategy, it would be good to validate the sidecar (usually there is only one per dataset) using the online tools at https://hedtools.org/hed_dev/sidecar before trying to validate your dataset.
  5. The error above is for the tsv file, which you didn't include so I can't be sure that this will be the only error.

The corrected form:

{
  "onset": {
    "LongName": "Onset time of event",
    "Description": "Marks the start of an ongoing event of temporal extent.",
    "Units": "s"
  },
  "duration": {
    "LongName": "The period of time during which an event occurs. Refers to Image duration or response time after stimulus depending on event_type",
    "Description": "a. For falsebelief and falsephoto trial types, duration refers to the image presentations of falsebelief and falsephoto stories. b. For rating_falsebelief and rating_falsephoto, duration refers to the response time to answer true false questions, followed by falsebelief or flasephoto stimulu",
    "Units": "s",
    "HED": "(Duration/#, (Label/Entire-event-time))"
  }
}

Note: I think we could do a more precise job of annotation using the curly brace notation --- If you respond with the entire JSON file, I would be happy to suggest modifications.

@yarikoptic
Copy link
Contributor Author

Note: I think we could do a more precise job of annotation using the curly brace notation --- If you respond with the entire JSON file, I would be happy to suggest modifications.

FWIW -- Here is now the full "git portion" of that dataset shared on github: https://github.com/spatialtopology/ds005256 . Hopefully soon it would get public on openneuro

@jungheejung
Copy link

@VisLab Thank you so much for the point-by-point suggestions on the HED errors.
Also appreciate the resources, such as the json HED validator.

@VisLab
Copy link
Member

VisLab commented Oct 28, 2024

@jungheejung The JSON files look much better. Are you still having issues with HED validation? If so, are the issues with the python validator or the bids-validator?

If you would like to have an annotation ZOOM review of your JSON files with a HED maintainer, email [email protected].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants