Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Bulk Indexing POST API returns 200 and response.IsValid = true, but 1 document was not indexed. #16397

Open
kknd4eva opened this issue Oct 21, 2024 · 0 comments
Labels
bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing untriaged

Comments

@kknd4eva
Copy link

kknd4eva commented Oct 21, 2024

Describe the bug

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):

AWS OpenSearch Serverless
.NET8
OpenSearch.Client 1.8.0
OpenSearch.NET.Auth.AwsSigV4 1.8.0

Describe the issue:
We have an OpenSearch serverless collection which has 2 indexes:
salesorders
backorders

when we index, we receive a sales order document with 1 or more lines. The whole document goes to 'salesorders' and the individual lines are turned into documents and indexed to backorders. This works fine for 99.9999% of the millions of documents we have indexed. However we have 1 instance, where the sales order line got indexed, but the sales order document did not.

We combine our document indexing into a single bulk API call (POST). We log any exception even with an individually indexed document as part of the bulk call, by using Response.IsValid:

        if (response.IsValid is false)
        {
            throw response?.OriginalException ??
                throw new OpenSearchClientException($"OpenSearch - {nameof(_client.BulkAsync)} failed to process request");
        }

This is specified as the right property to use to catch anything failed from the bulk API request:

2024-10-21 15_06_10-Window

In addition we also log errors here, and we have seen this work with genuine errors:

Untitled

For this particular request, everything returned 200, no errors at all:

2024-10-19T01:13:25.974+11:00 START RequestId: d25c49fa-444d-567e-9f39-16762b7487ce Version: $LATEST
2024-10-19T01:13:25.974+11:00 {"order_reference":"1061207807","timestamp":"2024-10-18T14:13:25.9743234Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Received 1 messages"}
2024-10-19T01:13:25.974+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:25.9744389Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Processing event for message id: 3ba4b295-e319-44bc-b80a-013ae7618c74"}
2024-10-19T01:13:25.974+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:25.9744577Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Received sales order reference 5003047545 with (1) line items"}
2024-10-19T01:13:25.974+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:25.9744726Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Sending (1) sales order/s to be indexed with a total number of line items (1)"}
2024-10-19T01:13:26.011+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:26.0116322Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"HEAD https://kbpqoexy1rgxpdq6m1f6.ap-southeast-2.aoss.amazonaws.com/salesorders 200"}
2024-10-19T01:13:26.034+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:26.0343743Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"HEAD https://kbpqoexy1rgxpdq6m1f6.ap-southeast-2.aoss.amazonaws.com/backorders 200"}
2024-10-19T01:13:26.203+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:26.2037842Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"POST https://kbpqoexy1rgxpdq6m1f6.ap-southeast-2.aoss.amazonaws.com/_bulk 200"}
2024-10-19T01:13:26.203+11:00 {"order_reference":"5003047545","timestamp":"2024-10-18T14:13:26.2038542Z","level":"Information","service":"SOEB2B-ExportSalesOrderHistory","name":"AWS.Lambda.Powertools.Logging.Logger","message":"Bulk api response: [2] items processed successfully. api took 150 ms."}
2024-10-19T01:13:26.204+11:00 END RequestId: d25c49fa-444d-567e-9f39-16762b7487ce

Reports 200 responses across the board and both items indexed, but when we search for the specific sales order even after hours, it does not exist:

GET /salesorders/_doc/5003047545

{
  "_index": "salesorders",
  "_id": "5003047545",
  "found": false
}

But checking for the backorder index, its document from this bulk API call exists fine.

These are our connection settings:

return new ConnectionSettings(new Uri(openSearchDomain), connection)
                       .DisableDirectStreaming() // This will enable logging of request and response bodies
                       .OnRequestCompleted(apiCallDetails =>
                       {
                           if (apiCallDetails != null && apiCallDetails.Success)
                           {
                               logger.LogInformation(apiCallDetails.GetApiInfo());
                           }
                       });

Related component

Indexing

To Reproduce

Send a bulk API request to index multiple documents against up to 2 indexes:

private async Task SendBulkAsync(BulkRequest bulkRequest)
    {
        var response = await _retryPolicy.ExecuteAsync
            (
                async () => await _client.BulkAsync(bulkRequest, CancellationToken.None)
            );

        response.LogApiResponseMessage(_logger);

        if (response.IsValid is false)
        {
            throw response?.OriginalException ??
                throw new OpenSearchClientException($"OpenSearch - {nameof(_client.BulkAsync)} failed to process request");
        }
    }

Expected behavior

Response with 200 and IsValid = true from bulk API response. All documented indexed.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@kknd4eva kknd4eva added bug Something isn't working untriaged labels Oct 21, 2024
@github-actions github-actions bot added the Indexing Indexing, Bulk Indexing and anything related to indexing label Oct 21, 2024
@kknd4eva kknd4eva changed the title [BUG] <title> [BUG] Bulk Indexing POST API returns 200 and response.IsValid = true, but 1 document was not indexed. Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing untriaged
Projects
None yet
Development

No branches or pull requests

1 participant