[ENH] Add ContextURI to allow to define the context for the entity values #1939
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It is often desired to be able to determine that values used for entities in the dataset belong to some controlled vocabulary, or simply defined centrally within some "id" authority. E.g. could be unique scanning session IDs per scanning center, or similarly subject_ids defined per study or centrally for the center.
It is of particular interest for large studies where multiple datasets could be created, one per site or primary data modality, to later possibly be composed into a single dataset or just to become parts of the one larger multi-site one. In such cases it becomes quite important to annotate that particular entities (subject_id, session_id and possibly even _desc- or _acq- values) are defined in the scope of the specific larger study and thus correspond to the "same" thing given the same contextURI and value.
TODOs:
Context Prefixes
in .jsonld etc it is common to centrally define common JSON-LD Contexts which could even be defined externally and pointed via
@context
attribute. E.g. in https://dandiarchive.s3.amazonaws.com/dandisets/000003/draft/dandiset.jsonld we point to https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.0/context.json which would tell within its@context
that"ORCID": "https://orcid.org/",
and"spdx": "http://spdx.org/licenses/",
. Now if we specify that"license": "spdx:apache-2.0"
we know that license "identity" is reallyhttp://spdx.org/licenses/apache-2.0
(actual URL does not even have to exist).So, I wonder if we could/should define within
dataset_description.json
alsoContext: dict[str, str]
which would provide similar mappings. So then I coulddataset_description.json
have"Context": {"thelab": "http://thelab.example.com/term/"}
participants.json
forparticipant_id
to have"ContextURI": "thelab:subject"
which in turn for everyparticipant_id
ultimately get expanded intohttp://thelab.example.com/term/subject/{participant_id}
if to map across datasets.attn @satra and @tekrajchhetri who know "linked" stuff better and could express their recommendations how we could align even better