-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trimming SHACL from the ontology graph #170
Comments
Hi @ajnelson-nist
There are no rules for this side of SHACL validation. This is operating outside of the SHACL spec document, so we can change the operation as we see fit, depending on what makes most sense. I don't know how other SHACL validator implementations handle it, but my understanding is PySHACL is the only validator that uses an ontology mix-in option. Other implementations require you to script in a pre-prep stage for your datagraph before the validation run to ensure it has all of your ontological definitions in place, to ensure validation is able to proceed as expected.
There is a very fine line that would need to be walked to determine what needs to be trimmed and what does not. If you consider the open-world assumption of linked-data, then nothing could safely be trimmed. I suppose there is certainly a best-effort approach, where we simply detect all instances of NodeShape, PropertyShape, and their constraints, and have an option to filter them from the ontology graph before performing the mix-in. It is also common for an SHACL NodeShape definition to be an instance of RDFS:Class (see Implicit Class Targeting). In that case, the RDF instance is both a SHACL statement, and an ontological definition, so it cannot be removed from the graph. So an alternative approach would be to discard everything except known ontological constructs, such as keep only RDFS:Class, OWL:Class, RDFS:subclassof, subpropertyof, domain, range, etc, from the ontology graph to mix in. That may be a better approach, except for the case where as you mentioned, as user could use SHACL-SPARQL and assume other triples are present in the datagraph. |
I have pushed a pretty big change to ontology graph behaviour in the latest (0.22.0) release. The feature now copies only what is required for RDFS and OWL inferencing, from the ontology graph into the datagraph. As you are a big user of this feature, are you able to test it (eg, with the problematic case above) and let me know if it: Thanks! |
Hi @ashleysommer , First, thank you for considering this issue further, and putting some effort into trying out a solution! Unfortunately, this solution has not panned out, and may need to be reverted. For further drafting on this, I am quite comfortable testing against non-release branches with Git submodule tracking, and I would be happy to exercise a branch to the fullest. The summary of impacts I've seen is:
So, unfortunately, the change in 0.22.0 introduces a new fail case, where |
This patch is implemented to skip the pySHACL 0.22.0 release due to an unforeseen bug with an individual node included in `case-corpora.ttl`. References: * RDFLib/pySHACL#170 (comment) Signed-off-by: Alex Nelson <[email protected]>
Hi @ajnelson-nist Your failing case indicates that I have not taken into account the full content of Thank you for volunteering to test feature branches before release. This is a case where I thought my own internal testing was sufficient, but I should have put it to you in a branch to test first. I will push the above mentioned |
@ajnelson-nist New PySHACL version v0.22.1 is release that should fix the |
This is to let the node be copied in pySHACL 0.22.1. References: * RDFLib/pySHACL#170 (comment) Signed-off-by: Alex Nelson <[email protected]>
@ashleysommer Thank you for the quick fix. I've confirmed it works in my local build environment. Unfortunately, a new issue is now presenting, either from your change for
Granted, that Turtle compacted into one line is usually barely legible to the eye. But it's better to see that than this new behavior which I suspect is unintended. I also see how this would be difficult to catch in a test framework and just took a manual run to see. |
Unfortunately, I think there's another issue that is more attributable to the cloning strategy. I just had CI fail when trying to evaluate a SHACL-SPARQL constraint. The constraint is here, and the failing data graph is here. The failure is that a certain prefix in the SPARQL query, That SHACL-SPARQL constraint should be moot from that data graph, because the data graph doesn't include any usage of the From a community member using another tool, I've come to understand that the SPARQL I linked above is a little lazy/lax in its spelling. For now, users who've been lax with prefixes are in for a mild surprise on their next |
References: * RDFLib/pySHACL#170 (comment) Signed-off-by: Alex Nelson <[email protected]>
That looks like a bug in the new default message behaviour I implemented for
Yeah, the SPARQL query runs on the DataGraph, so by default it only includes the Prefixes present on the datagraph. That is the correct behaviour according to the SHACL-SPARQL Spec. If you need extra prefixes present, you should use the It seems like the old ontology-mix-in behaviour did copy the namespace-prefix mapping from the ontology file into the working datagraph. This was for purposes of generating shortened prefixed stringification of nodes in the textual output and Turtle representation of the result graph, but seems like it had the side effect of allowing your SPARQL queries to work without prefix declarations too. However I will modify the new ontology inoculation feature to also copy over the namespace-prefix mapping from the ontology graph to the datagraph, because it seems like the sensible thing to do. That will likely fix the failure you are seeing, but I do recommend you update your SPARQL constraints to use the |
@ajnelson-nist PySHACL v0.22.2 is released now that address the two issues mentioned above. |
This is part of addressing what originally inspired pySHACL Issue 170. This patch modifies the behavior of `case_validate`, that reviews OWL syntax and OWL-SHACL interactions. With this patch, that functionality is now **opt-in** at call time. Further work on separating the OWL review shapes from UCO into a general CDO repository (originally started for CDOTSC-34) is currently believed to not have an impact on the user interface element where the user opts in to the more extensive review. References: * [CDOTSC-34] CDO should provide shapes for external ontologies and other RDF models, including OWL * RDFLib/pySHACL#170 Signed-off-by: Alex Nelson <[email protected]>
This is a port of a comment I'd made on Sep. 16 on Issue 89. I've migrated the comment here to avoid expanding the scope of that ticket.
I've come across undesired behavior related to pySHACL's ontology graph and shapes graph mix-in strategy. The short of it is, my community adopted a SHACL-based mechanism to review some OWL requirements and cross-check SHACL and OWL definitions with general patterns. (The shapes graph for this is here. We'd also appreciate hearing if you've seen something like this elsewhere; we couldn't find something like it.) My community's ontology is managed as Turtle files, one per namespace, and each Turtle file includes all OWL and SHACL statements for concepts in that namespace.
What we didn't appreciate is that this incurs a significant cost on reviewing all data graphs, due to the mix-in strategy, so our "TBox"-targeting tests also run on "ABox" data review because
pyshacl
mixes the "ontology" TBox and ABox stuff together. With our ontology (a few hundred classes and properties, each with associated SHACL shapes), this is a 20--30 second cost perpyshacl
run, guaranteed to be redundantly re-reviewing the ontology.So, I'd like to discuss whether the whole ontology mix-in is still the right approach in all cases.
The case my community is facing at the moment is our ontology is written including SHACL shapes and OWL definitions in the same graph file, so we feed SHACL statements into
.validate()
'sont_graph
keyword argument. Question: Would the ontology graph pySHACL uses (ont_graph
) ever be expected to include SHACL statements? My guess is no.I was considering whether there could be a further reasonable trim based on the user's inferencing capabilities request, but I think the answer there may be no. I suspect SHACL-SPARQL makes the whole ontology graph always potentially necessary, though I can't think of a good example at the moment - maybe linking via some annotation property only available in the ontology, or searching for deprecated classes?
For now, my one question is just the SHACL-in-ontology-graph question: Can SHACL be safely trimmed from the ontology graph?
The text was updated successfully, but these errors were encountered: