Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate explainability for hybrid query into RRF processor #1037

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Dec 18, 2024

Description

Adding explainability support to RRF processor

This PR adds explainability functionality to the RRF (Reciprocal Rank Fusion) processor. It's being submitted to the feature branch while the application security review for the RRF feature is in progress.

Key changes:

  • Implemented explainability support for RRF
  • Changed explanation details for case when sorting is enabled by field other then score, we return 0.0 as the score in explanation
  • Minor refactoring and more unit tests

PR for explainability feature #1014

Example response when RRF is part of the search pipeline and the 'explain' flag is set:

"hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 0.032522473,
        "hits": [
            {
                "_shard": "[index-test][0]",
                "_node": "oEdoEKSrReuk_oX9PJilBg",
                "_index": "index-test",
                "_id": "LJS935MBlqE6ecphe2i6",
                "_score": 0.032522473,
                "_source": {
                    "field1": 50,
                    "vector": [
                        4.2,
                        5.5,
                        8.9
                    ],
                    "name": "Why would he go to all that effort for a free pack of ranch dressing?",
                    "category": "story",
                    "price": 10
                },
                "_explanation": {
                    "value": 0.032522473,
                    "description": "rrf combination of:",
                    "details": [
                        {
                            "value": 0.016393442,
                            "description": "rrf, rank_constant [60] normalization of:",
                            "details": [
                                {
                                    "value": 1.0,
                                    "description": "field1:[20 TO 150]",
                                    "details": []
                                }
                            ]
                        },
                        {
                            "value": 0.016129032,
                            "description": "rrf, rank_constant [60] normalization of:",
                            "details": [
                                {
                                    "value": 0.019948136,
                                    "description": "within top 12",
                                    "details": []
                                }
                            ]
                        }
                    ]
                }
            },

Check List

  • New functionality includes testing.
  • [ ] New functionality has been documented.
  • [ ] API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • [ ] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Dec 18, 2024

Codecov Report

Attention: Patch coverage is 62.50000% with 12 lines in your changes missing coverage. Please review.

Project coverage is 78.53%. Comparing base (627fcb4) to head (8632f9c).

Files with missing lines Patch % Lines
...essor/normalization/RRFNormalizationTechnique.java 63.63% 7 Missing and 1 partial ⚠️
...processor/AbstractScoreHybridizationProcessor.java 60.00% 2 Missing ⚠️
...pensearch/neuralsearch/processor/RRFProcessor.java 75.00% 0 Missing and 1 partial ⚠️
...ssor/combination/RRFScoreCombinationTechnique.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                           Coverage Diff                            @@
##             feature/rrf-score-normalization-v2    #1037      +/-   ##
========================================================================
+ Coverage                                 78.49%   78.53%   +0.03%     
- Complexity                                 1070     1075       +5     
========================================================================
  Files                                        90       91       +1     
  Lines                                      3729     3745      +16     
  Branches                                    619      619              
========================================================================
+ Hits                                       2927     2941      +14     
- Misses                                      541      543       +2     
  Partials                                    261      261              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martin-gaievski martin-gaievski force-pushed the integrate_explain_feature_with_rrf branch from 8632f9c to 3d67d74 Compare December 18, 2024 23:28
@martin-gaievski martin-gaievski force-pushed the integrate_explain_feature_with_rrf branch from 3d67d74 to 2c5e6d0 Compare December 19, 2024 00:24
@@ -111,8 +111,9 @@ public SearchResponse processResponse(
);
}
// Create and set final explanation combining all components
Float finalScore = Float.isNaN(searchHit.getScore()) ? 0.0f : searchHit.getScore();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When sorting by a field other than score:

  1. searchHit.score will be Float.NaN
  2. score in the search hit and in actual response will be null

We can't pass null as a value for the explanation object. Therefore, I've set it to 0.0 in these cases. This ensures that we always have a valid numeric value for the explanation, even when the score isn't the primary sorting factor.

@ToString.Include
public static final String TECHNIQUE_NAME = "rrf";

// Not currently using weights for RRF, no need to modify or verify these params
public RRFScoreCombinationTechnique(final Map<String, Object> params, final ScoreCombinationUtil combinationUtil) {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those parameters were never used

Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@heemin32
Copy link
Collaborator

Should we add more unit test to increase test coverage?
Patch coverage is 62.50000% with 12 lines

@martin-gaievski
Copy link
Member Author

Should we add more unit test to increase test coverage? Patch coverage is 62.50000% with 12 lines

Actual coverage should be higher than that. I've added several unit tests, and the run that is currently in this PR, is old (12/18 ~2pm PST). All next runs got throttled because of the shared limited quota for uploads (example)

@martin-gaievski martin-gaievski merged commit 1d93192 into opensearch-project:feature/rrf-score-normalization-v2 Dec 23, 2024
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants