Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Consistent Document State During Replay with Conditional Update Transformations #1089

Open
sumobrian opened this issue Oct 22, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@sumobrian
Copy link
Collaborator

sumobrian commented Oct 22, 2024

Is your feature request related to a problem?

In an eventually consistent system, it may be impossible to guarantee the order in which requests are stored. This can be a problem when trying to maintain a consistent state between a source and a target where the same traffic pattern is replayed. Although high-frequency updates may not occur in the expected order on the source, you can still guarantee that the state of the document is the same on both the source and target. This can be accomplished by using mechanisms to ensure that incoming updates are only written if the document currently stored is older than the incoming document. While some applications implement this logic within their codebase, with capture and replay, it would be possible to replay the original request with a transform that guarantees consistent state between the source and the target.

What solution would you like?

We propose adding a transformation feature within the capture and replay tool that verifies the timestamp of incoming updates against the stored version of the document. The transformation would ensure that an update is only applied if the timestamp of the incoming request is newer than the existing version on the target cluster. This would help maintain a consistent document state between the source and target, even in scenarios with high-frequency updates or out-of-order requests.

The solution could leverage OpenSearch’s existing document metadata and timestamp features, adding logic in the replay phase to enforce order based on timestamps.

What alternatives have you considered?

  1. Custom Application Logic: Applications could be modified to include timestamp-based checks or version control directly in the codebase. However, this approach requires developers to write and maintain custom code, and it doesn’t easily extend to scenarios involving legacy systems or third-party applications.
  2. Eventual Consistency Tuning: Alternatively, users could tune consistency settings within their clusters to reduce the impact of out-of-order requests. However, this often introduces trade-offs with latency and scalability, and may not resolve all cases of out-of-order updates.

Do you have any additional context?

In OpenSearch, the _update API can include checks for conditions such as timestamps or version numbers, allowing for precise control over when an update is applied. This type of control is valuable in maintaining data consistency during capture and replay migrations.

For example, the following script can be used in OpenSearch to update a document only if the incoming timestamp is newer:

POST /index_name/_update/document_id
{
  "script": {
    "source": "if (ctx._source.timestamp < params.new_timestamp) { ctx._source.value = params.new_value; ctx._source.timestamp = params.new_timestamp; }",
    "params": {
      "new_value": "updated value",
      "new_timestamp": 1672531199000
    }
  }
}

This example shows how OpenSearch users today can use scripting within the _update API to manage document versions based on timestamps. A similar approach can be integrated into the capture and replay transformation logic to achieve consistent states across clusters.

Examples

In OpenSearch today, users often address this issue by employing scripting within the _update API to enforce constraints like timestamps or version numbers. For instance, using a script to compare timestamps before applying an update helps ensure that only the latest data is written, thereby maintaining consistency.

Another technique is to leverage OpenSearch’s optimistic concurrency control using the if_seq_no and if_primary_term parameters to control updates based on the document’s sequence number and primary term. This method is commonly used to prevent conflicts when updating documents concurrently.

Summary

Adding this feature to the OpenSearch Migrations repository would help automate the enforcement of consistency rules during migrations. This is particularly beneficial when replaying captured traffic from an Elasticsearch source to an OpenSearch target, where differences in consistency guarantees or order of updates can lead to divergent document states.

@sumobrian sumobrian added enhancement New feature or request untriaged and removed untriaged labels Oct 22, 2024
@sumobrian sumobrian changed the title [FEATURE] Ensure Consistent Document State During Replay with Conditional Update Transformations Ensure Consistent Document State During Replay with Conditional Update Transformations Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 6 Months - 1 Year
Development

No branches or pull requests

1 participant