Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added IRIs as extra field properties in datamodels #434

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions oteapi/models/datacacheconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@

class DataCacheConfig(AttrDict):
"""DataCache Configuration.

This class should not be used directly as a configuration object
quaat marked this conversation as resolved.
Show resolved Hide resolved
for a strategy object, but only as a configuration field inside
a configuration object.
"""

cacheDir: Path = Field(Path("oteapi"), description="Cache directory.")
cacheDir: Path = Field(
Path("oteapi"),
description="Cache directory.",
)
accessKey: Optional[str] = Field(
None,
description="Key with which the downloaded content can be accessed. "
Expand All @@ -36,6 +38,5 @@ class DataCacheConfig(AttrDict):
tag: Optional[str] = Field(
None,
description="Tag assigned to the downloaded content, typically "
"identifying a session. Used with the `evict()` method to clean up a "
"all cache entries with a given tag.",
"identifying a session. Used with the `evict()` method to clean up all cache entries with a given tag.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ahead to the PEP8 Python conventions. Max line length is 80 characters.

)
15 changes: 12 additions & 3 deletions oteapi/models/filterconfig.py
Copy link
Contributor

@jesper-friis jesper-friis Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested properties for the query, condition and limit fields seems arbitrary chosen. Since there exists no properties in the DCAT and related W3C vocabularies that correspond to these fields, I think that it is better that we define our them in the OTE Interface Ontology (OTEIO).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right about the IRIs being a bit random for certain properties, but I would much rather go to more generic concepts in established ontologies, than making up our own concepts in a custom ontology that nobody has adopted.

Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,22 @@ class FilterConfig(GenericConfig):
"""Filter Strategy Data Configuration."""

filterType: str = Field(
..., description="Type of registered filter strategy. E.g., `filter/sql`."
...,
description="Type of registered filter strategy. E.g., `filter/sql`.",
IRI="http://purl.org/dc/terms/type", # type: ignore
quaat marked this conversation as resolved.
Show resolved Hide resolved
)
query: Optional[str] = Field(
None,
description="Define a query operation.",
IRI="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement", # type: ignore
quaat marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

@jesper-friis jesper-friis Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdf:Statement refer to a RDF statement as a triple. That is not the meaning of the query field.

I haven't looked much into the details of the data cube vocabulary, but qb:slice seems to be a possible property one could refer to, since slicing is about selecting a subpart of a dataset, witch is exactly the purpose of a filter strategy. However, a big issue with data cube vocabulary is that it is not related to dcat, so I will not recommend to use it.

)
query: Optional[str] = Field(None, description="Define a query operation.")
condition: Optional[str] = Field(
None,
description="Logical statement indicating when a filter should be applied.",
IRI="http://www.w3.org/2000/01/rdf-schema#comment", # type: ignore
quaat marked this conversation as resolved.
Show resolved Hide resolved
)
limit: Optional[int] = Field(
None, description="Number of items remaining after a filter expression."
None,
description="Number of items remaining after a filter expression.",
IRI="http://schema.org/Integer", # type: ignore
quaat marked this conversation as resolved.
Show resolved Hide resolved
)
3 changes: 3 additions & 0 deletions oteapi/models/mappingconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,18 @@ class MappingConfig(GenericConfig):
mappingType: str = Field(
...,
description="Type of registered mapping strategy.",
IRI="http://purl.org/dc/terms/type", # type: ignore
)
prefixes: Optional[Dict[str, str]] = Field(
None,
description=(
"Dictionary of shortnames that expands to an IRI given as local "
"value/IRI-expansion-pairs."
),
IRI="http://www.w3.org/2004/02/skos/core#notation", # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using skos:notation for prefixes, then we need a to define a custom datatype, like oteio:prefixType, such that we can serialise a prefix as:

:mappingFilter skos:notation "rdfs: <http://www.w3.org/2000/01/rdf-schema#>"^^oteio:prefixType .

However, I think it would be simpler to define our own oteio:prefix, such that we can express the above as:

:mappingFilter oteio:prefix "rdfs: <http://www.w3.org/2000/01/rdf-schema#>" .

)
triples: Optional[Set[RDFTriple]] = Field(
None,
description="Set of RDF triples given as (subject, predicate, object).",
IRI="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement", # type: ignore
)
12 changes: 10 additions & 2 deletions oteapi/models/parserconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,13 @@
class ParserConfig(GenericConfig):
"""Parser Strategy Data Configuration."""

parserType: str = Field(..., description="Type of registered parser strategy.")
entity: AnyHttpUrl = Field(..., description="IRI to the entity or collection.")
parserType: str = Field(
...,
description="Type of registered parser strategy.",
IRI="http://purl.org/dc/terms/type",
) # type: ignore
entity: AnyHttpUrl = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entity is a difficult name. It can mean anything. I am not found of freely mixing different vocabularies, but if we really want to identify this with schema:url, then I think the field should be named url.

...,
description="IRI to the metadata (entity) or collection of entities.",
IRI="http://schema.org/URL",
) # type: ignore
11 changes: 10 additions & 1 deletion oteapi/models/resourceconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ class ResourceConfig(GenericConfig, SecretConfig):
"""

resourceType: Optional[str] = Field(
None, description="Type of registered resource strategy."
None,
description="Type of registered resource strategy.",
IRI="http://purl.org/dc/terms/type", # type: ignore
)

downloadUrl: Optional[HostlessAnyUrl] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell the field name as downloadURL to be consistent with DCAT.

Expand All @@ -32,6 +34,7 @@ class ResourceConfig(GenericConfig, SecretConfig):
" which this distribution is available directly, typically through a HTTPS"
" GET request or SFTP."
),
IRI="http://www.w3.org/ns/dcat#downloadURL", # type: ignore
)
mediaType: Optional[str] = Field(
None,
Expand All @@ -42,6 +45,7 @@ class ResourceConfig(GenericConfig, SecretConfig):
" type of the distribution is defined in IANA "
"[[IANA-MEDIA-TYPES](https://www.w3.org/TR/vocab-dcat-2/#bib-iana-media-types)]."
),
IRI="http://www.w3.org/ns/dcat#mediaType", # type: ignore
)
accessUrl: Optional[HostlessAnyUrl] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell the field name as accessURL to be consistent with DCAT.

None,
Expand All @@ -53,28 +57,33 @@ class ResourceConfig(GenericConfig, SecretConfig):
"query or API call.\n`downloadURL` is preferred for direct links to "
"downloadable resources."
),
IRI="http://www.w3.org/ns/dcat#accessURL", # type: ignore
)
accessService: Optional[str] = Field(
None,
description=(
"A data service that gives access to the distribution of the dataset."
),
IRI="http://www.w3.org/ns/dcat#accessService", # type: ignore
)
license: Optional[str] = Field(
None,
description=(
"A legal document under which the distribution is made available."
),
IRI="http://purl.org/dc/terms/license", # type: ignore
)
accessRights: Optional[str] = Field(
None,
description=(
"A rights statement that concerns how the distribution is accessed."
),
IRI="http://purl.org/dc/terms/accessRights", # type: ignore
)
publisher: Optional[str] = Field(
None,
description="The entity responsible for making the resource/item available.",
IRI="http://purl.org/dc/terms/publisher", # type: ignore
)

@model_validator(mode="after")
Expand Down
16 changes: 9 additions & 7 deletions oteapi/models/secretconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,22 @@ class SecretConfig(BaseModel):
"""Simple model for handling secret in other config-models."""

user: Optional[TogglableSecretStr] = Field(
None, description="User name for authentication."
None,
description="User name for authentication.",
)
password: Optional[TogglableSecretStr] = Field(
None, description="Password for authentication."
None,
description="Password for authentication.",
)
token: Optional[TogglableSecretStr] = Field(
None,
description=(
"An access token for providing access and meta data to an application."
),
description="An access token for providing access and meta data to an application.",
)
client_id: Optional[TogglableSecretStr] = Field(
None, description="Client ID for an OAUTH2 client."
None,
description="Client ID for an OAUTH2 client.",
)
client_secret: Optional[TogglableSecretStr] = Field(
None, description="Client secret for an OAUTH2 client."
None,
description="Client secret for an OAUTH2 client.",
)
30 changes: 24 additions & 6 deletions oteapi/models/transformationconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,40 +39,58 @@ class TransformationConfig(GenericConfig, SecretConfig):
description=(
"Type of registered transformation strategy. E.g., `celery/remote`."
),
IRI="http://purl.org/dc/terms/type", # type: ignore
)
name: Optional[str] = Field(
None, description="Human-readable name of the transformation strategy."
None,
description="Human-readable name of the transformation strategy.",
IRI="http://purl.org/dc/elements/1.1/title", # type: ignore
quaat marked this conversation as resolved.
Show resolved Hide resolved
)
due: Optional[datetime] = Field(
None,
description=(
"Optional field to indicate a due data/time for when a transformation "
"should finish."
),
IRI="http://purl.org/dc/terms/date", # type: ignore
)
priority: Optional[ProcessPriority] = Field(
ProcessPriority.MEDIUM,
description="Define the process priority of the transformation execution.",
IRI="http://www.w3.org/ns/adms#status", # type: ignore
)


class TransformationStatus(BaseModel):
"""Return from transformation status."""

id: str = Field(..., description="ID for the given transformation process.")
id: str = Field(
...,
description="ID for the given transformation process.",
IRI="http://purl.org/dc/terms/identifier", # type: ignore
)
status: Optional[str] = Field(
None, description="Status for the transformation process."
None,
description="Status for the transformation process.",
IRI="http://www.w3.org/ns/adms#status", # type: ignore
)
messages: Optional[List[str]] = Field(
None, description="Messages related to the transformation process."
None,
description="Messages related to the transformation process.",
IRI="http://purl.org/dc/terms/description", # type: ignore
)
created: Optional[datetime] = Field(
None,
description="Time of creation for the transformation process. Given in UTC.",
IRI="http://purl.org/dc/terms/created", # type: ignore
)
startTime: Optional[datetime] = Field(
None, description="Time when the transformation process started. Given in UTC."
None,
description="Time when the transformation process started. Given in UTC.",
IRI="http://purl.org/dc/terms/date", # type: ignore
Copy link
Contributor

@jesper-friis jesper-friis Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startTime and finishTime cannot have the same IRI. However, DCAT has dcat:startDate and dcat:endDate which are suitable here.

The only issue with dcat:startDate and dcat:endDate are that they have domain dcterms:PeriodOfTime. Since a TransformationStatus describes a time period, but isn't a time periode itself, the RDF serialisation cannot be straight forward, like

:transformation_status1 
  dcat:startTime "2024-02-29 08:45" ;
  dcat:endTime "2024-02-29 09:00" .

but has to has to be expressed as:

:transformation_status1 
  dcterms:temporal [
    a dcterms:PeriodOfTime ;
    dcat:startTime "2024-02-29 08:45" ;
    dcat:endTime "2024-02-29 09:00" ;
  ] .

)
finishTime: Optional[datetime] = Field(
None, description="Time when the tranformation process finished. Given in UTC."
None,
description="Time when the tranformation process finished. Given in UTC.",
IRI="http://purl.org/dc/terms/date", # type: ignore
)
1 change: 1 addition & 0 deletions oteapi/strategies/parse/application_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ class JSONParserConfig(ParserConfig):
"parser/json",
description=ParserConfig.model_fields["parserType"].description,
)

configuration: JSONConfig = Field(
..., description="JSON parse strategy-specific configuration."
)
Expand Down