Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggability of indexers #3

Open
tomcrane opened this issue Sep 25, 2020 · 2 comments
Open

Pluggability of indexers #3

tomcrane opened this issue Sep 25, 2020 · 2 comments

Comments

@tomcrane
Copy link

Not sure what a Pythonic approach to this is... Also may already work like this, in which case, Close

  • A folder of optional *.py indexer impls, drop the one(s) you want in there, including the IIIF indexer
  • plus config to switch them on and off
  • plus config to control field selection, parsing strategy. Nothing too complicated though.

I see the out-of-the-box IIIF indexer as the indexer of last resort, even if in practice it gets used 90% of the time as the only indexer.

  • As someone deploying madoc-search, I want to write an indexer for my known seeAlso format, and not enable the IIIF indexer.
  • As someone deploying madoc-search, I can't get a new custom seeAlso into my IIIF right now, but I know what fields I want to index from the metadata, and moreover I know which are quantitative, or dates, and I know how to parse them because I know how they went in there in the first place

etc - just make the pluggability of indexers a first-class concept, if it isn't already, and give the iiif-indexer a simple config mechanism

(Ignore if already handled)

@stephenwf
Copy link
Member

Similar to the request for lower-level APIs. I think we need to be careful with extensibility - if we tie a whole host of integrations to specific python structures / code then we might find it really difficult to make breaking changes and increase the breadth of our public API. If we use HTTP as the extension and write our extensions against that, then we could change internal schemas, code and possibly even backends like ES or SOLR without breaking the integrations.

It would also open up other languages to provide the integrations, with the downside that these integrations would have to be hosted and less of a simply plug-and-play. Maybe a balance of both?

For config - I think if we do introduce configuration we should make sure it's conceptually compatible with the configuration api (with contexts / cascading config) perhaps with JSON/Schema that matches that. This is likely going to be a common pattern for both JS/TS and Python services so pinning down how to manage config with an optional config server would be ideal!

@mattmcgrattan
Copy link
Collaborator

I think probably the first step for me is to write the basic handlers as classes/functions. And some basic config, just as static JSON. With the base assumption that the base cases are:

  • IIIF descriptive properties and metadata
  • Madoc capture models and associated intermediate OCR formats
  • seeAlsos -- with profiles
  • otherContent lists (and equivalents in OA and/or W3C models)

Currently, there's no public API provided on top of the converters, but, currently, the only one that's fully functional is the base IIIF case, which I think, is always going to be internal to the application.

But there others could all be written as API calls that basically go:

  • JSON in
  • List of indexables (as JSON) out

And then the option is there to provide those with another service, or in another language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants