Integrate explainability for hybrid query into RRF processor #1037

martin-gaievski · 2024-12-18T21:52:37Z

Description

Adding explainability support to RRF processor

This PR adds explainability functionality to the RRF (Reciprocal Rank Fusion) processor. It's being submitted to the feature branch while the application security review for the RRF feature is in progress.

Key changes:

Implemented explainability support for RRF
Changed explanation details for case when sorting is enabled by field other then score, we return 0.0 as the score in explanation
Minor refactoring and more unit tests

PR for explainability feature #1014

Example response when RRF is part of the search pipeline and the 'explain' flag is set:

"hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 0.032522473,
        "hits": [
            {
                "_shard": "[index-test][0]",
                "_node": "oEdoEKSrReuk_oX9PJilBg",
                "_index": "index-test",
                "_id": "LJS935MBlqE6ecphe2i6",
                "_score": 0.032522473,
                "_source": {
                    "field1": 50,
                    "vector": [
                        4.2,
                        5.5,
                        8.9
                    ],
                    "name": "Why would he go to all that effort for a free pack of ranch dressing?",
                    "category": "story",
                    "price": 10
                },
                "_explanation": {
                    "value": 0.032522473,
                    "description": "rrf combination of:",
                    "details": [
                        {
                            "value": 0.016393442,
                            "description": "rrf, rank_constant [60] normalization of:",
                            "details": [
                                {
                                    "value": 1.0,
                                    "description": "field1:[20 TO 150]",
                                    "details": []
                                }
                            ]
                        },
                        {
                            "value": 0.016129032,
                            "description": "rrf, rank_constant [60] normalization of:",
                            "details": [
                                {
                                    "value": 0.019948136,
                                    "description": "within top 12",
                                    "details": []
                                }
                            ]
                        }
                    ]
                }
            },

Check List

New functionality includes testing.
~~[ ] New functionality has been documented.~~
~~[ ] API changes companion pull request created.~~
Commits are signed per the DCO using --signoff.
~~[ ] Public documentation issue/PR created.~~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov · 2024-12-18T22:04:18Z

Codecov Report

Attention: Patch coverage is 62.50000% with 12 lines in your changes missing coverage. Please review.

Project coverage is 78.53%. Comparing base (627fcb4) to head (8632f9c).

Files with missing lines	Patch %	Lines
...essor/normalization/RRFNormalizationTechnique.java	63.63%	7 Missing and 1 partial ⚠️
...processor/AbstractScoreHybridizationProcessor.java	60.00%	2 Missing ⚠️
...pensearch/neuralsearch/processor/RRFProcessor.java	75.00%	0 Missing and 1 partial ⚠️
...ssor/combination/RRFScoreCombinationTechnique.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@                           Coverage Diff                            @@
##             feature/rrf-score-normalization-v2    #1037      +/-   ##
========================================================================
+ Coverage                                 78.49%   78.53%   +0.03%     
- Complexity                                 1070     1075       +5     
========================================================================
  Files                                        90       91       +1     
  Lines                                      3729     3745      +16     
  Branches                                    619      619              
========================================================================
+ Hits                                       2927     2941      +14     
- Misses                                      541      543       +2     
  Partials                                    261      261

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski · 2024-12-19T18:50:08Z

src/main/java/org/opensearch/neuralsearch/processor/ExplanationResponseProcessor.java

@@ -111,8 +111,9 @@ public SearchResponse processResponse(
                        );
                    }
                    // Create and set final explanation combining all components
+                    Float finalScore = Float.isNaN(searchHit.getScore()) ? 0.0f : searchHit.getScore();


When sorting by a field other than score:

searchHit.score will be Float.NaN

score in the search hit and in actual response will be null

We can't pass null as a value for the explanation object. Therefore, I've set it to 0.0 in these cases. This ensures that we always have a valid numeric value for the explanation, even when the score isn't the primary sorting factor.

martin-gaievski · 2024-12-19T18:51:28Z

...ain/java/org/opensearch/neuralsearch/processor/combination/RRFScoreCombinationTechnique.java

    @ToString.Include
    public static final String TECHNIQUE_NAME = "rrf";

    // Not currently using weights for RRF, no need to modify or verify these params
-    public RRFScoreCombinationTechnique(final Map<String, Object> params, final ScoreCombinationUtil combinationUtil) {}


those parameters were never used

ryanbogan

LGTM!

heemin32 · 2024-12-20T19:18:48Z

Should we add more unit test to increase test coverage?
Patch coverage is 62.50000% with 12 lines

martin-gaievski · 2024-12-20T20:02:09Z

Should we add more unit test to increase test coverage? Patch coverage is 62.50000% with 12 lines

Actual coverage should be higher than that. I've added several unit tests, and the run that is currently in this PR, is old (12/18 ~2pm PST). All next runs got throttled because of the shared limited quota for uploads (example)

martin-gaievski added skip-changelog hybrid search labels Dec 18, 2024

martin-gaievski force-pushed the integrate_explain_feature_with_rrf branch from 8632f9c to 3d67d74 Compare December 18, 2024 23:28

Integrate explainability for hybrid query into RRF processor

2c5e6d0

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the integrate_explain_feature_with_rrf branch from 3d67d74 to 2c5e6d0 Compare December 19, 2024 00:24

Add case for null/NaN scores and minor refactoring

86c3263

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the integrate_explain_feature_with_rrf branch from 06532d1 to 86c3263 Compare December 19, 2024 18:01

martin-gaievski marked this pull request as ready for review December 19, 2024 18:12

martin-gaievski requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, sean-zheng-amazon, model-collapse, zane-neo, vibrantvarun, zhichao-aws, yuye-aws and minalsha as code owners December 19, 2024 18:12

martin-gaievski commented Dec 19, 2024

View reviewed changes

ryanbogan approved these changes Dec 19, 2024

View reviewed changes

heemin32 approved these changes Dec 20, 2024

View reviewed changes

martin-gaievski merged commit 1d93192 into opensearch-project:feature/rrf-score-normalization-v2 Dec 23, 2024
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate explainability for hybrid query into RRF processor #1037

Integrate explainability for hybrid query into RRF processor #1037

martin-gaievski commented Dec 18, 2024 •

edited

Loading

codecov bot commented Dec 18, 2024

martin-gaievski Dec 19, 2024

martin-gaievski Dec 19, 2024

ryanbogan left a comment

heemin32 commented Dec 20, 2024

martin-gaievski commented Dec 20, 2024

Integrate explainability for hybrid query into RRF processor #1037

Integrate explainability for hybrid query into RRF processor #1037

Conversation

martin-gaievski commented Dec 18, 2024 • edited Loading

Description

Check List

codecov bot commented Dec 18, 2024

Codecov Report

martin-gaievski Dec 19, 2024

Choose a reason for hiding this comment

martin-gaievski Dec 19, 2024

Choose a reason for hiding this comment

ryanbogan left a comment

Choose a reason for hiding this comment

heemin32 commented Dec 20, 2024

martin-gaievski commented Dec 20, 2024

martin-gaievski commented Dec 18, 2024 •

edited

Loading