Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Copeland Fusion for Hybrid Search #915

Open
wants to merge 11 commits into
base: mainline
Choose a base branch
from

Conversation

OwenPendrighElliott
Copy link
Contributor

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
This PR adds a new fusion method for disjunct retrieval named "copeland". Taking influence from social choice theory, copeland-based fusion guarantees that a condorcet winner will be placed first if it exists. i.e. copeland-based fusion is condorcet.

Copeland based fusion was proposed at SIGIR2024 by Liron Tyomkin et al: https://dl.acm.org/doi/pdf/10.1145/3626772.3657912

What is the current behavior? (You can also link to an open issue here)

We don't have any condorcet fusion methods

What is the new behavior (if this is a feature change)?

We have a condorcet based fusion method, copeland-fusion.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No

Have unit tests been run against this PR? (Has there also been any additional testing?)

Java tests on the searcher but not full unit tests

Related Python client changes (link commit/PR here)

none

Related documentation changes (link commit/PR here)

Not done yet

Other information:

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added / updated (for bug fixes / features)

@OwenPendrighElliott OwenPendrighElliott marked this pull request as ready for review July 29, 2024 06:35
@OwenPendrighElliott OwenPendrighElliott marked this pull request as draft July 29, 2024 06:40
@OwenPendrighElliott
Copy link
Contributor Author

Converting back to draft as I think I see some potential optimisations

@OwenPendrighElliott OwenPendrighElliott marked this pull request as ready for review July 29, 2024 23:18
@farshidz farshidz temporarily deployed to marqo-test-suite August 23, 2024 00:43 — with GitHub Actions Inactive
src/marqo/README.md Show resolved Hide resolved
@@ -578,10 +578,10 @@ def _to_vespa_hybrid_query(self, marqo_query: MarqoHybridQuery) -> Dict[str, Any

query = {k: v for k, v in query.items() if v is not None}

if marqo_query.hybrid_parameters.rankingMethod in {RankingMethod.RRF}: # TODO: Add NormalizeLinear
if marqo_query.hybrid_parameters.rankingMethod in [RankingMethod.RRF]: # TODO: Add NormalizeLinear
query["marqo__hybrid.alpha"] = marqo_query.hybrid_parameters.alpha
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If alpha and k aren't relevant for copeland, we need validation to catch this

)

self.assertIn("hits", hybrid_res)
self.assertEqual(hybrid_res["hits"][0]["_id"], "hippo text")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not checking the score to detect score regression?

int finalLength = Math.max(hitsTensor.size(), hitsLexical.size());

// Combine hits from both lists and update the raw score attributes
Map<URI, Hit> combinedHitsMap = new LinkedHashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need linked?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants