Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sh:targetClass does not use shacl/ont graph hierarchies #148

Closed
gtfierro opened this issue Jun 29, 2022 · 5 comments
Closed

sh:targetClass does not use shacl/ont graph hierarchies #148

gtfierro opened this issue Jun 29, 2022 · 5 comments

Comments

@gtfierro
Copy link
Contributor

Say I have a simple SHACL-based ontology as follows:

@prefix ex: <urn:ex#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:Class a owl:Class .
ex:SubClass a owl:Class ;
    rdfs:subClassOf ex:Class .
ex:SubSubClass a owl:Class ;
    rdfs:subClassOf ex:SubClass .

ex:FailedRule a sh:NodeShape ;
    sh:targetClass ex:Class ;
    sh:rule [
        a sh:TripleRule ;
        sh:object ex:Inferred ;
        sh:predicate ex:hasProperty ;
        sh:subject sh:this ;
    ] .

and a separate data graph:

@prefix ex: <urn:ex#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

ex:A a ex:SubSubClass .

I would expect that running pyshacl with advanced features will generate the ex:A ex:hasProperty ex:Inferred triple on the graph. However, this only happens when I put the ontology/shapes and data in the same graph object.

Succeeds:

def test_ruleTargetClass_onegraph():
    data_g = rdflib.Graph().parse(data=shapes_and_ontology_data, format='turtle').parse(data=model_data, format='turtle')

    conforms, results_graph, results_text = pyshacl.validate(
        data_graph=data_g, advanced=True
    )
    assert conforms
    assert (rdflib.URIRef("urn:ex#A"), rdflib.URIRef("urn:ex#hasProperty"), rdflib.URIRef("urn:ex#Inferred")) in data_g

Fails:

def test_ruleTargetClass_twograph():
    shape_g = rdflib.Graph().parse(data=shapes_and_ontology_data, format='turtle')
    data_g = rdflib.Graph().parse(data=model_data, format='turtle')

    conforms, results_graph, results_text = pyshacl.validate(
        data_graph=data_g, shacl_graph=shape_g, advanced=True
    )
    assert conforms
    assert (rdflib.URIRef("urn:ex#A"), rdflib.URIRef("urn:ex#hasProperty"), rdflib.URIRef("urn:ex#Inferred")) in data_g

I am not running RDFS or OWL inference in this scenario and per the SHACL specification, I shouldn't have to -- the sh:targetClass property should be aware of the rdfs:subClassOf hierarchy. In fact, this seems to work great in the case where the data graph has the rdfs:subClassOf statements. However, the way sh:targetClass is implemented, it only considers triples inside the data graph argument to validate and not the shacl_graph or ont_graph arguments.

I believe there should be a straightforward fix to this by passing in the shacl and ontology graphs into the apply_rules function within pySHACL. I've developed a reproducible test case (above) and will start looking at implementing a fix --- how does the proposed approach sound?

@gtfierro
Copy link
Contributor Author

This issue also occurs when I put the SHACL graph as an argument to shacl_graph, ont_graph or both

@ashleysommer
Copy link
Collaborator

ashleysommer commented Jun 30, 2022

Hi @gtfierro
It appears to me that you've come across two different common PySHACL stumbling points, and you're treating them as a single issue.

The first problem is this:

this seems to work great in the case where the data graph has the rdfs:subClassOf statements. However, the way sh:targetClass is implemented, it only considers triples inside the data graph argument to validate

That is correct. All OWL and RDFS defined relationships/axioms need to be part of the data-graph at runtime. This is part of the SHACL spec document, section 2.1.3.2:

Note that, according to the SHACL instance definition, all the rdfs:subClassOf declarations needed to walk the class hierarchy need to exist in the data graph

This does not only affect sh:targetClass but all SHACL constraints that operate on classes. And it does not only affect PySHACL, you'll find this same behaviour in all SHACL validators that adhere strictly to the spec.

This is the most frequently asked question on the PySHACL issue tracker, see #142 #46 #38 (the second part), the main conversation about this was in #6.

The solution is to use the ont_graph feature, to mix the ontological definitions into your data graph at runtime. This issue is why that feature exists. See this example for how that works.

Your second problem is:

I would expect that running pyshacl with advanced features will generate the ex:A ex:hasProperty ex:Inferred triple on the graph

This is a duplicate of #78 (and closely related to #20)

The SHACL spec specifies that the validator should not modify either the data graph or shapes graph at run time:

performed on the fly as part of SHACL processing (without modifying either data graph or shapes graph)

PySHACL creates a clone of your source datagraph at runtime, and any operations on the datagraph (eg inferencing/entailment and triple rules) are performed on the cloned graph. That is why the inferred triples do not exist on data_g after validation is run, even when the run succeeds. Note however, you did find a bug! That is, the example you labeled "Succeeds" should actually fail. PySHACL sometimes does not create a clone of the datagraph when it thinks there are no modifications to do, eg. when there is no ont_graph to mix in, and when RDFS/OWL inferencing is disabled, it will operate directly on the input graph, which in this case is incorrect, and is why the triples do exist on the graph in your test.

See the long running issue discussion thread in #60 about the proposed alternative mode to operate PySHACL as a new kind of inferencing engine, that will emit the triple rules back into the input data_graph.

In your PR #149, you've implemented an alternative solution to address the first problem, in an effort to solve your second problem. That is why you are getting inconsistent results when using ont_graph in your tests, and when RDFS inferencing is enabled.

@ashleysommer
Copy link
Collaborator

ashleysommer commented Jun 30, 2022

Note, there is a little hack/workaround you can use if you really do want to have PySHACL modify your input datagraph rather than creating a clone. That is to use the undocumented inplace switch. Eg:

validate(data_g, shacl_graph=shapes_g, ont_graph=ont_g, advanced=True, inplace=True)

That will put the PySHACL validator into a non-spec-compliant operation mode where it skips the clone step, and does emit any changes directly back into the input graph. This is normally used when your data-graph is not cloneable (eg, you are using a sparql-connector on a remote data graph, or when your data graph cannot fit in memory), but I've seen users use the feature to emulate the behaviour you're expecting in this issue.

See here for an example

@gtfierro
Copy link
Contributor Author

gtfierro commented Jul 4, 2022

Wow -- thank you for the extremely thoughtful and detailed answer! I see now how my mental model was incorrect for using pySHACL and I've adjusted my code to access the inferred triples. My final solution is a little awkward because I have to subtract out the triples I don't want in my expanded data graph, but it works reliably and also generates the same output as TopBraid Composer.

I can leave this open if you would like an open reminder of the small bug that I inadvertently found, or I can close this because my original issue is technically resolved. Let me know!

@ashleysommer
Copy link
Collaborator

I can leave this open if you would like an open reminder of the small bug that I inadvertently found

No need, a fix for that was already included in the v0.19.1 release.

I'm glad to know you've solved your problem with a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants