Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add ContextURI to allow to define the context for the entity values #1939

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Collaborator

It is often desired to be able to determine that values used for entities in the dataset belong to some controlled vocabulary, or simply defined centrally within some "id" authority. E.g. could be unique scanning session IDs per scanning center, or similarly subject_ids defined per study or centrally for the center.

It is of particular interest for large studies where multiple datasets could be created, one per site or primary data modality, to later possibly be composed into a single dataset or just to become parts of the one larger multi-site one. In such cases it becomes quite important to annotate that particular entities (subject_id, session_id and possibly even _desc- or _acq- values) are defined in the scope of the specific larger study and thus correspond to the "same" thing given the same contextURI and value.

TODOs:

  • discuss definition of centrally defining context prefixes (see below)
  • provide example to bids-examples datasets

Context Prefixes

in .jsonld etc it is common to centrally define common JSON-LD Contexts which could even be defined externally and pointed via @context attribute. E.g. in https://dandiarchive.s3.amazonaws.com/dandisets/000003/draft/dandiset.jsonld we point to https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.0/context.json which would tell within its @context that "ORCID": "https://orcid.org/", and "spdx": "http://spdx.org/licenses/",. Now if we specify that "license": "spdx:apache-2.0" we know that license "identity" is really http://spdx.org/licenses/apache-2.0 (actual URL does not even have to exist).

So, I wonder if we could/should define within dataset_description.json also Context: dict[str, str] which would provide similar mappings. So then I could

  • in dataset_description.json have "Context": {"thelab": "http://thelab.example.com/term/"}
  • in participants.json for participant_id to have "ContextURI": "thelab:subject" which in turn for every participant_id ultimately get expanded into http://thelab.example.com/term/subject/{participant_id} if to map across datasets.

attn @satra and @tekrajchhetri who know "linked" stuff better and could express their recommendations how we could align even better

It is often desired to be able to determine that values used
for entities in the dataset belong to some controlled vocabulary,
or simply defined centrally within some "id" authority. E.g. could be
unique scanning session IDs per scanning center, or similarly subject_ids defined
per study or centrally for the center.

It is of particular interest for large studies where multiple datasets could be
created, one per site or primary data modality, to later possibly be composed
into a single dataset or just to become parts of the one larger multi-site one.
In such cases it becomes quite important to annotate that particular entities
(subject_id, session_id and possibly even _desc- or _acq- values) are defined
in the scope of the specific  larger study and thus correspond to the "same" thing
given the same contextURI and value.
@yarikoptic yarikoptic added the opinions wanted Please read and offer your opinion on this matter label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
opinions wanted Please read and offer your opinion on this matter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant