Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-text search backend options #263

Open
edmondchuc opened this issue Aug 28, 2024 · 2 comments
Open

Full-text search backend options #263

edmondchuc opened this issue Aug 28, 2024 · 2 comments
Assignees

Comments

@edmondchuc
Copy link
Collaborator

Currently, Prez implements the default search backend, which is to use standard SPARQL regex.

It would be useful to have FTS options for different database backends such as Jena.

Some things to consider:

  • A way to configure the backend search, similar to how we can specify different database backend implementations
  • A way to specify the predicate to use in search:
    • the object type and its predicate preference
      • e.g. skos:Concept use skos:prefLabel, skos:altLabel
      • e.g. sdo:Dataset use sdo:name, sdo:alternateName
      • Default to use rdfs:label

Maybe can utilise https://datashapes.org/propertyroles.html#LabelRole to indicate in the prez profile what predicates are labels.

@RDFLib RDFLib deleted a comment Aug 28, 2024
@recalcitrantsupplant
Copy link
Collaborator

recalcitrantsupplant commented Oct 9, 2024

Rough steps for @lalewis1 to implement Fuseki FTS:

  1. add a config setting called fts_variant: None | enum with "fuseki" as the only enum option at present, behaviour is when this is not set regex search is used. Add second config setting for available FTS indexed fields (predicates).
  2. create a new module in prez/services/query_generation called something like search_fts_fuseki.py and maybe rename search.py to search_regex.py
  3. Create a class, SearchQueryComponentsJenaFTS which creates the FTS inputs for the listing query. NB I wouldn't subclass the ConstructQuery class in SPARQL Grammar as I did for the regex search - it will work, but I don't think it's necessary - the whole FTS query is a one liner. To do this:
  • create the following triple pattern match using SPARQL grammar ` ( ?focus_node ?weight ?match ?g ?pred) text:query ( property* 'query string' limit ) where the items in the object position are created from inputs to the query. List of properties should be supplied at runtime otherwise default to the set of configured FTS indexed fields. To create the SPARQL Grammar, I believe you're after the second form of TriplesSameSubjectPath , going through to multiple GraphNodePaths under a Collection Path. I would create this and write tests for it. The rest of the search query you can reuse/copy from the regex one.
  • Wrap the TriplesSameSubjectPath in a GraphPatternNotTriples and expose this as a property on the class
  • expose the other required outputs listed here, the TSS/Construct TSS you can copy from the search query regex; the limit order by etc. can be derived from inputs:
  1. Add logic to the search query builder dependency (dependencies.py) to utilise the FTS method if enabled in the config

@lalewis1 lalewis1 self-assigned this Oct 14, 2024
@lalewis1
Copy link
Collaborator

FTS query implemented under branch lawson/fts.

just need to write tests and then can submit PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants