-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Pagination in hybrid query #963
[Feature] Pagination in hybrid query #963
Conversation
src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java
Outdated
Show resolved
Hide resolved
int paginationDepth; | ||
HybridQuery hybridQuery; | ||
Query query = searchContext.query(); | ||
if (query instanceof BooleanQuery) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when such scenario can happen? we should not allow hybrid if it's not the top clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the comment on top why it is written like that. Basically in case of nested fields and alias filter, hybrid query gets wrapped under bool query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simple case with nested fields is taken care of in QueryPhaseSearcher class. This logic doesn't belong here, unless this is a special scenario that we missed in that extract query method I mentioned above.
Query query = searchContext.query(); | ||
if (query instanceof BooleanQuery) { | ||
BooleanQuery booleanQuery = (BooleanQuery) query; | ||
hybridQuery = (HybridQuery) booleanQuery.clauses().get(0).getQuery(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks hacky, can we avoid this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the comment on top why it is written like that. Basically in case of nested fields and alias filter, hybrid query gets wrapped under bool query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my previous comment, this should be handled in here
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
.github/CODEOWNERS
Outdated
@@ -1,2 +1,2 @@ | |||
# This should match the owning team set up in https://github.com/orgs/opensearch-project/teams | |||
* @heemin32 @navneet1v @VijayanB @vamshin @jmazanec15 @naveentatikonda @junqiu-lei @martin-gaievski @sean-zheng-amazon @model-collapse @zane-neo @ylwu-amzn @jngz-es @vibrantvarun @zhichao-aws @yuye-aws | |||
* @heemin32 @navneet1v @VijayanB @vamshin @jmazanec15 @naveentatikonda @junqiu-lei @martin-gaievski @sean-zheng-amazon @model-collapse @zane-neo @vibrantvarun @zhichao-aws @yuye-aws @minalsha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like artifacts of improper rebase, can you please rebase on main properly
@@ -48,16 +50,23 @@ public final class HybridQueryBuilder extends AbstractQueryBuilder<HybridQueryBu | |||
public static final String NAME = "hybrid"; | |||
|
|||
private static final ParseField QUERIES_FIELD = new ParseField("queries"); | |||
private static final ParseField DEPTH_FIELD = new ParseField("pagination_depth"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static final ParseField DEPTH_FIELD = new ParseField("pagination_depth"); | |
private static final ParseField PAGINATION_DEPTH_FIELD = new ParseField("pagination_depth"); |
public final class HybridQuery extends Query implements Iterable<Query> { | ||
|
||
private final List<Query> subQueries; | ||
private Integer paginationDepth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is not primitive int? Operating with wrapper class is potentially error prone when boxing/unboxing a null value.
querySearchResult.topDocs(updatedTopDocsAndMaxScore, querySearchResult.sortValueFormats()); | ||
} | ||
|
||
final int from = querySearchResults.get(0).from(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need final
?
querySearchResult.topDocs(updatedTopDocsAndMaxScore, querySearchResult.sortValueFormats()); | ||
} | ||
|
||
final int from = querySearchResults.get(0).from(); | ||
if (from > 0 && from > totalScoreDocsCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first check looks redundant, can't we rely only on from > totalScoreDocsCount
?
@@ -637,6 +639,7 @@ public void testWrappedQueryWithFilter_whenIndexAliasHasFilterAndIndexWithNested | |||
|
|||
HybridQueryBuilder hybridQueryBuilder = new HybridQueryBuilder(); | |||
hybridQueryBuilder.add(QueryBuilders.existsQuery(TEST_TEXT_FIELD_NAME_1)); | |||
// hybridQueryBuilder.paginationDepth(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentional?
@SneakyThrows | ||
public void testHybridQuery_whenFromIsSetInSearchRequest_thenFail() { | ||
public void testPaginationDepth_whenSubqueriesCountIsGreaterThanFive_thenFail() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we testing here? check for max sub-query limit is part of the HybridQueryBuilderTests.
@@ -97,6 +109,9 @@ protected void doXContent(XContentBuilder builder, Params params) throws IOExcep | |||
queryBuilder.toXContent(builder, params); | |||
} | |||
builder.endArray(); | |||
if (isClusterOnOrAfterMinReqVersionForPaginationInHybridQuery()) { | |||
builder.field(DEPTH_FIELD.getPreferredName(), paginationDepth == null ? DEFAULT_PAGINATION_DEPTH : paginationDepth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we're doing today's logic? from this statement we'll either get user provided value or default which is 10. Today's logic is equivalent to PD equals to 1, I'm not exactly sure, but definitely not 10.
if (paginationDepth != null | ||
&& (paginationDepth < LOWER_BOUND_OF_PAGINATION_DEPTH || paginationDepth > UPPER_BOUND_OF_PAGINATION_DEPTH)) { | ||
throw new IllegalArgumentException( | ||
String.format(Locale.ROOT, "Pagination depth should lie in the range of 1-1000. Received: %s", paginationDepth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please came up with better error message. The range is static here, but that's different in the logic where you have variables for lower and upper bounds, and we can drop the user-provided value as that's potentially unsafe, actual value should be obvious to user from the search request
} | ||
|
||
@SneakyThrows | ||
public void testPaginationOnSingleShard_whenConcurrentSearchEnabled_thenSuccessful() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're basically put all tests for pagination to integ test. Can this be moved to unit tests, so we have only minimal increase in new integ tests. If that's not possible please consider moving them to a new class, this one because way overloaded, about 1000 lines.
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
Signed-off-by: Varun Jain <[email protected]>
3a9344f
to
4580337
Compare
Signed-off-by: Varun Jain <[email protected]> Signed-off-by: Martin Gaievski <[email protected]>
4580337
to
dc9020c
Compare
Signed-off-by: Martin Gaievski <[email protected]>
Description
This PR contains changes for enabling support for pagination in hybrid query.
The highlight of this PR are
Related Issues
#280
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.