Reuse Term queries among features #395

SantaDiver · 2021-10-07T22:53:55Z

There is a very common use case of LTR plugin when somebody adds several features which may use common parts. For example somebody wants to have feature matching specific document field (which could be done with match query) and other feature matching multiple document fields (with multi_match query for example). In this case we have to traverse posting list for specific field and term multiple times (and also initialize corresponding data structures).

It is also known that on Elasticsearch level match and other similar queries are just combination of Term queries.
The idea is: maybe we can somehow reuse Term queries on advance phase? I had a look at sources and it seems this solution demands rewriting elastics queries (or at least inheriting it and rewriting QueryBuilder along with advance).

Do you have any idea on how we can achieve this without rewriting all elastic queries? Or maybe there is other way to increase features calculation performance?

worleydl · 2021-10-07T23:16:49Z

I feel like there was some ideation around this a while back that never turned into anything. I think it would require some sort of top level query wrapper that would keep the scores around and link them up to features but nothing was ever fleshed out around that. Such a wrapper could utilize the features themselves in the top level matching query, then keep scores around for the rescore phase.

We certainly welcome ideas around the topic but I don't know if anyone has been thinking about it recently. Maybe @nomoa? I should be digging into this project a little more next week to get us up to date on ES 7.15 and I'll see if I can find the previous discussion around performance optimizations.

worleydl · 2021-10-07T23:22:45Z

Found the previous issue here: #11, there was a WIP branch that's mentioned there.

SantaDiver · 2021-10-08T15:19:29Z

@worleydl thank you for the answer!
Reusing query score at rescore phase sounds great but what I am really talking about is reuse of score between features in featureset itself.
For example somebody wants to use featureset like this:

[
    {
        "name": "feature1",
        "params": ["query"],
        "template_language": "mustache",
        "template": {
            "multi_match": {
                "query": "{{query}}",
                "fields": ["field1", "field2"],
                "type": "most_fields"
            }
        }
    },
    {
        "name": "feature2",
        "params": ["query"],
        "template_language": "mustache",
        "template": {
            "multi_match": {
                "query": "{{query}}",
                "fields": ["field1", "field2"],
                "type": "best_fields"
            }
        }
    }
]

This two features differs only by the type of term-field scores aggregation. What I think is maybe we can reuse intermediate results instead of calculating the same thing twice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse Term queries among features #395

Reuse Term queries among features #395

SantaDiver commented Oct 7, 2021

worleydl commented Oct 7, 2021

worleydl commented Oct 7, 2021

SantaDiver commented Oct 8, 2021 •

edited

Loading

Reuse Term queries among features #395

Reuse Term queries among features #395

Comments

SantaDiver commented Oct 7, 2021

worleydl commented Oct 7, 2021

worleydl commented Oct 7, 2021

SantaDiver commented Oct 8, 2021 • edited Loading

SantaDiver commented Oct 8, 2021 •

edited

Loading