Implement Copeland Fusion for Hybrid Search #915

OwenPendrighElliott · 2024-07-24T03:19:53Z

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
This PR adds a new fusion method for disjunct retrieval named "copeland". Taking influence from social choice theory, copeland-based fusion guarantees that a condorcet winner will be placed first if it exists. i.e. copeland-based fusion is condorcet.

Copeland based fusion was proposed at SIGIR2024 by Liron Tyomkin et al: https://dl.acm.org/doi/pdf/10.1145/3626772.3657912

What is the current behavior? (You can also link to an open issue here)

We don't have any condorcet fusion methods

What is the new behavior (if this is a feature change)?

We have a condorcet based fusion method, copeland-fusion.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No

Have unit tests been run against this PR? (Has there also been any additional testing?)

Java tests on the searcher but not full unit tests

Related Python client changes (link commit/PR here)

none

Related documentation changes (link commit/PR here)

Not done yet

Other information:

Please check if the PR fulfills these requirements

The commit message follows our guidelines
Tests for the changes have been added (for bug fixes/features)
Docs have been added / updated (for bug fixes / features)

…n developer guide

OwenPendrighElliott · 2024-07-29T06:40:56Z

Converting back to draft as I think I see some potential optimisations

…ique hits

… owen/copeland_fusion

src/marqo/README.md

farshidz · 2024-08-23T01:15:45Z

src/marqo/core/structured_vespa_index/structured_vespa_index.py

@@ -578,10 +578,10 @@ def _to_vespa_hybrid_query(self, marqo_query: MarqoHybridQuery) -> Dict[str, Any

        query = {k: v for k, v in query.items() if v is not None}

-        if marqo_query.hybrid_parameters.rankingMethod in {RankingMethod.RRF}: # TODO: Add NormalizeLinear
+        if marqo_query.hybrid_parameters.rankingMethod in [RankingMethod.RRF]: # TODO: Add NormalizeLinear
            query["marqo__hybrid.alpha"] = marqo_query.hybrid_parameters.alpha


If alpha and k aren't relevant for copeland, we need validation to catch this

farshidz · 2024-08-23T01:21:44Z

tests/tensor_search/integ_tests/test_hybrid_search.py

+                    )
+
+                    self.assertIn("hits", hybrid_res)
+                    self.assertEqual(hybrid_res["hits"][0]["_id"], "hippo text")


Why are we not checking the score to detect score regression?

farshidz · 2024-08-23T06:49:24Z

vespa/src/main/java/ai/marqo/search/HybridSearcher.java

+        int finalLength = Math.max(hitsTensor.size(), hitsLexical.size());
+
+        // Combine hits from both lists and update the raw score attributes
+        Map<URI, Hit> combinedHitsMap = new LinkedHashMap<>();


Why do you need linked?

implement copeland fusion

b33945d

OwenPendrighElliott temporarily deployed to marqo-test-suite July 24, 2024 03:21 — with GitHub Actions Inactive

add integration tests for copeland fusion and zookeeper port expose i…

3c16561

…n developer guide

OwenPendrighElliott requested review from vicilliar and farshidz July 29, 2024 06:34

OwenPendrighElliott marked this pull request as ready for review July 29, 2024 06:35

Merge branch 'mainline' into owen/copeland_fusion

0a465bf

OwenPendrighElliott temporarily deployed to marqo-test-suite July 29, 2024 06:36 — with GitHub Actions Inactive

OwenPendrighElliott marked this pull request as draft July 29, 2024 06:40

OwenPendrighElliott added 2 commits July 29, 2024 17:34

remove toStrings and improve O(n log(n)) loop to O(n) for creating un…

fc05e2e

…ique hits

Merge branch 'owen/copeland_fusion' of github.com:marqo-ai/marqo into…

eeafa57

… owen/copeland_fusion

OwenPendrighElliott temporarily deployed to marqo-test-suite July 29, 2024 07:39 — with GitHub Actions Inactive

OwenPendrighElliott marked this pull request as ready for review July 29, 2024 23:18

Merge branch 'mainline' into owen/copeland_fusion

6edc454

OwenPendrighElliott temporarily deployed to marqo-test-suite July 30, 2024 22:58 — with GitHub Actions Inactive

OwenPendrighElliott and others added 3 commits August 15, 2024 12:23

updated with mainline latest

5c1f61b

Merge branch 'owen/copeland_fusion' of github.com:marqo-ai/marqo into…

68d40bd

… owen/copeland_fusion

Merge branch 'mainline' into owen/copeland_fusion

3a75f91

OwenPendrighElliott had a problem deploying to marqo-test-suite August 15, 2024 02:28 — with GitHub Actions Failure

OwenPendrighElliott temporarily deployed to marqo-test-suite August 15, 2024 03:40 — with GitHub Actions Inactive

Merge branch 'mainline' into owen/copeland_fusion

20e8d34

OwenPendrighElliott had a problem deploying to marqo-test-suite August 19, 2024 02:54 — with GitHub Actions Failure

OwenPendrighElliott temporarily deployed to marqo-test-suite August 19, 2024 05:39 — with GitHub Actions Inactive

Merge branch 'mainline' into owen/copeland_fusion

6b05b79

farshidz temporarily deployed to marqo-test-suite August 23, 2024 00:43 — with GitHub Actions Inactive

farshidz reviewed Aug 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Copeland Fusion for Hybrid Search #915

Implement Copeland Fusion for Hybrid Search #915

OwenPendrighElliott commented Jul 24, 2024

OwenPendrighElliott commented Jul 29, 2024

farshidz Aug 23, 2024

farshidz Aug 23, 2024

farshidz Aug 23, 2024

Implement Copeland Fusion for Hybrid Search #915

Are you sure you want to change the base?

Implement Copeland Fusion for Hybrid Search #915

Conversation

OwenPendrighElliott commented Jul 24, 2024

OwenPendrighElliott commented Jul 29, 2024

farshidz Aug 23, 2024

Choose a reason for hiding this comment

farshidz Aug 23, 2024

Choose a reason for hiding this comment

farshidz Aug 23, 2024

Choose a reason for hiding this comment