Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction between expandTransformedQueryTerm and stemming #8

Open
adamgundry opened this issue May 17, 2022 · 0 comments
Open

Interaction between expandTransformedQueryTerm and stemming #8

adamgundry opened this issue May 17, 2022 · 0 comments

Comments

@adamgundry
Copy link
Member

At the moment, client code specifies how to normalise/stem a term in the query viatransformQueryTerm. When running a query, expandTransformedQueryTerm produces the list of distinct transformations of a term (for any field), then they are all looked up in the index (irrespective of which field they came from).

A consequence of this is that if any field is stemmed, the query will return documents that match stemmed terms from the query, even if the documents mention the term only in non-stemmed fields. For example, suppose our documents are users, who have a name and a biography, and we stem the biography but not the name. Now a query like "Peters" will match a user whose name is "Peter", which might be undesirable.

See also the TODO in query. I don't have a clear picture of how to resolve this, other than by simply not stemming at all in indexes where this issue might be relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant