Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Walk option "with_reverse" not honored in remote KG settings #106

Open
rgrenz opened this issue Jul 12, 2022 · 2 comments
Open

Walk option "with_reverse" not honored in remote KG settings #106

rgrenz opened this issue Jul 12, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@rgrenz
Copy link

rgrenz commented Jul 12, 2022

🐛 Bug

Hi, thank you for making this library available to everyone! It is of great use to my university research project.
I believe to have spotted a bug concerning the with_reverse walk option:

Expected Behavior

When generating walks using the RandomWalker in combination with the with_reverse = True flag, the returned walks should contain zero or more predecessor triples, followed by the vertice of interest, followed by zero or more successor triples. It should especially be possible to read the returned walks from left to right as a valid traversal on the directed graph. This behavior should not change with the source of the KG.

Current Behavior

When using a local KG, the returned walks are well formed and follow the requirements from above. If the KG instead uses a remote SPARQL source, the resulting walks are no longer legal traversals of the graph. Instead, the generated walks consist of a mirrored successor part, followed by the vertice of interest, followed by another successor part (in correct order).

Steps to Reproduce

from pyrdf2vec import RDF2VecTransformer
from pyrdf2vec.embedders import Word2Vec
from pyrdf2vec.graphs import KG
from pyrdf2vec.walkers import RandomWalker

dbpedia = KG("https://dbpedia.org/sparql")

transformer = RDF2VecTransformer(
    Word2Vec(sg=0, vector_size=10),
    walkers=[RandomWalker(max_walks=1, max_depth=1, with_reverse=True, md5_bytes=None)],
    verbose=1
)

transformer.get_walks(dbpedia, ["http://dbpedia.org/resource/The_Matrix"])

"""
 e.g. [[('http://dbpedia.org/resource/The_Wachowskis',
   'http://dbpedia.org/property/writer',
   'http://dbpedia.org/resource/The_Matrix',
   'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
   'http://dbpedia.org/class/yago/Wikicat1990sScienceFictionFilms')]]

Notice that the first triple does not exist on DBpedia, only its inverse does.
"""

Environment

  • pyRDF2Vec version: 0.2.3
  • Python version: 3.8.13

Possible Solution

The fetch_hops() function from below should support the with_reverse option, as does its local counterpart _get_hops(). However, this probably also requires modifications to the querying and caching code.
https://github.com/IBCNServices/pyRDF2Vec/blob/fb7da659f67b6486a403a46bc2d3c589b802304c/pyrdf2vec/graphs/kg.py#L241-L256

@rgrenz rgrenz added the bug Something isn't working label Jul 12, 2022
@rgrenz
Copy link
Author

rgrenz commented Jul 12, 2022

Just noticed that this may already be covered by your TODO note in #67.

@GillesVandewiele
Copy link
Collaborator

GillesVandewiele commented Jul 12, 2022

Thank you for reporting this @rgrenz. You are correct that the behaviour of with_reverse seems faulty and should be fixed (something on our roadmap). Unfortunately, bandwidth is rather limited and might take some time. Feel free to open a PR if you'd fix it locally. You are spot on that the fetch_hops needs to be extended to include reverse walking logic, which should use a different SPARQL query (with object rather than subject filled in). I think it can be fixed by extending solely get_query (https://github.com/IBCNServices/pyRDF2Vec/blob/main/pyrdf2vec/connectors.py#L136) and the suggested fetch_hops (the latter should do nothing more than passing on the with_reverse to get_query).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants