Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse Term queries among features #395

Open
SantaDiver opened this issue Oct 7, 2021 · 3 comments
Open

Reuse Term queries among features #395

SantaDiver opened this issue Oct 7, 2021 · 3 comments

Comments

@SantaDiver
Copy link

There is a very common use case of LTR plugin when somebody adds several features which may use common parts. For example somebody wants to have feature matching specific document field (which could be done with match query) and other feature matching multiple document fields (with multi_match query for example). In this case we have to traverse posting list for specific field and term multiple times (and also initialize corresponding data structures).

It is also known that on Elasticsearch level match and other similar queries are just combination of Term queries.
The idea is: maybe we can somehow reuse Term queries on advance phase? I had a look at sources and it seems this solution demands rewriting elastics queries (or at least inheriting it and rewriting QueryBuilder along with advance).

Do you have any idea on how we can achieve this without rewriting all elastic queries? Or maybe there is other way to increase features calculation performance?

@worleydl
Copy link
Collaborator

worleydl commented Oct 7, 2021

I feel like there was some ideation around this a while back that never turned into anything. I think it would require some sort of top level query wrapper that would keep the scores around and link them up to features but nothing was ever fleshed out around that. Such a wrapper could utilize the features themselves in the top level matching query, then keep scores around for the rescore phase.

We certainly welcome ideas around the topic but I don't know if anyone has been thinking about it recently. Maybe @nomoa? I should be digging into this project a little more next week to get us up to date on ES 7.15 and I'll see if I can find the previous discussion around performance optimizations.

@worleydl
Copy link
Collaborator

worleydl commented Oct 7, 2021

Found the previous issue here: #11, there was a WIP branch that's mentioned there.

@SantaDiver
Copy link
Author

SantaDiver commented Oct 8, 2021

@worleydl thank you for the answer!
Reusing query score at rescore phase sounds great but what I am really talking about is reuse of score between features in featureset itself.
For example somebody wants to use featureset like this:

[
    {
        "name": "feature1",
        "params": ["query"],
        "template_language": "mustache",
        "template": {
            "multi_match": {
                "query": "{{query}}",
                "fields": ["field1", "field2"],
                "type": "most_fields"
            }
        }
    },
    {
        "name": "feature2",
        "params": ["query"],
        "template_language": "mustache",
        "template": {
            "multi_match": {
                "query": "{{query}}",
                "fields": ["field1", "field2"],
                "type": "best_fields"
            }
        }
    }
]

This two features differs only by the type of term-field scores aggregation. What I think is maybe we can reuse intermediate results instead of calculating the same thing twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants