Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getty AAT not working as I expect/desire (may apply to all Getty?) #346

Open
jrochkind opened this issue Dec 8, 2021 · 2 comments
Open

Comments

@jrochkind
Copy link
Contributor

jrochkind commented Dec 8, 2021

I am exploring using questioning_authority with the Getty AAT vocabulary, for a use case involving auto-complete on a staff metadata editing form.

I am finding that the Getty AAT vocabularly is not performing the way I expect it to to be suitable for that use case for us.

There is another long-open Getty related ticket at #84 , but it's not clear what the actual goals/problems of that ticket are, so I'm filing another one. But that ticket has useful links to Getty documentation and relevant listserv, and suggestions from a Getty developer of possible implementation improvements that may be relevant here.

For my uses:

  • I want search to be "left-anchored"/truncated, search for parch should find terms involving parchment. But it doesn't, only whole-word matches are currently returned.
  • The search right now is strangely case-sensitive in ways I don't expect. For instance, in fact entering parchment also returns no hits, but entering Parchment (capital-P) returns 4 (although they all come back in lowercase, only a search with initial capital-P can find them!)

Those are the main barriers, but as I investigate the code/implementation, other issues where it's not doing what I want but I'm not certain what other people would want/expect include:

  • Should the results be ranked by lucene relevance/best match or alphabetical? Currently they are alphabetical, I think maybe I prefer relevane (I think FAST searches in qa are relevance).
  • Should the number of results be capped/limited? Probably only feasible if result sort is "relevance", otherwise it doesn't necessarily make sense to limit it after so many ordered alphabetically. But unlimited, you can in some cases get so many results that it's difficult to do in a dropdown menu UX. (FAST search is, I beleive, limited results).
  • What should happen with multi-word input? Currently it looks like the code tries to do an "AND" search for all terms, but I'm not sure if this is desired or consistent with other qa vocabularies. Should it be a phrase search instead, with only the last term being "right-anchored" partial-word-match allowed, for an auto-complete use case?
  • Should all Getty lookups behave the same, or is there a reason for them to differ? Currently the three Getty vocabularies supported all use slightly different queries, I'm not sure if this is intentional/useful, or if they should all be made consistent. See [1] [2] [3]

The hard part is knowing what the desired outcome is. Does my sense of what it "should" do match what other people think? What parts of current implementation woudl be considered bugs vs working as intended, even if as intended is not what I personally want/need for my use case?

I could maybe find some time to PR improvements here, perhaps as part of a maintenance hours pledge. If it was clear what the consensus was on "correct behavior" or what the community expects.

It might be helpful to understand the most commonly used vocabularies in qa, to assume that they are not buggy but working as designed, and then try to make Getty AAT and/or other Getty work similarly to the popular ones? I am not sure if anyone is currently using Getty in production in a way that it works for their use cases, or if it's currently just buggy and un-used.

@jrochkind
Copy link
Contributor Author

jrochkind commented Dec 8, 2021

People in GH issues or commit history referring to Getty vocabs include: @jcoyne , @mbklein , @mikeapp , @kdid , @geekscruff , and @elrayle .

Are any of you currently using Getty vocabs in qa in deployed apps?

It would be helpful to have feedback from anyone currently using Getty vocabs via qa on whether AAT and other Getty vocabs are currently working as anyone expects/desires, or are currently broken/un-used; and then what behavior would be desired/acceptable, whether to make it usable or as improvements.

@mbklein
Copy link
Member

mbklein commented Dec 15, 2021

We're not using Getty via QA; we access it via authoritex, our Elixir module inspired by (but not a direct port of) QA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants