Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RW Separation] Change search replica recovery flow #15952

Open
Tracked by #15306
mch2 opened this issue Sep 16, 2024 · 3 comments · May be fixed by #16760
Open
Tracked by #15306

[RW Separation] Change search replica recovery flow #15952

mch2 opened this issue Sep 16, 2024 · 3 comments · May be fixed by #16760
Assignees
Labels
v2.19.0 Issues and PRs related to version 2.19.0

Comments

@mch2
Copy link
Member

mch2 commented Sep 16, 2024

Search replicas are currently configured to recover with the Peer recovery flow. To ensure these shards have reduced (node-node replication) and zero (remote backed) communication with primary shards during recovery we need to change this.

In both cases, I think we can initialize the shards with store recovery steps and then run a round of replication before marking the shards as active. I think we may also need to filter out these allocation Ids from cluster manager updates to the primary, as they should not care about tracking them as in-sync copies.

@mch2 mch2 changed the title Change search replica recovery flow. [RW Separation] Change search replica recovery flow Sep 16, 2024
@mch2 mch2 added v2.18.0 Issues and PRs related to version 2.18.0 and removed untriaged labels Sep 16, 2024
@mch2 mch2 self-assigned this Sep 16, 2024
@mch2 mch2 moved this from Todo to In Progress in Performance Roadmap Sep 16, 2024
@prudhvigodithi
Copy link
Member

Hey @mch2 from my understanding I have summarized the following on how we can remove the Peer recovery for the search replicas, please check

Filtering Allocation IDs in Cluster Updates:

In OpenSearch, the cluster manager tracks which shards are in-sync with the primary shard (we can check using /_cluster/state). This helps in ensuring data consistency across the cluster. The proposal suggests filtering out the allocation IDs of search replicas from these updates. Since search replicas are read-only and do not participate in writes, the cluster manager doesn’t need to track their state as in-sync copies like it does for regular replicas.

We can create a seperate child issue to track this?

Search Replica Recovery

Check Local Storage and Recover from Local Disk

The replica checks its own disk to see if it already has the necessary segment files from previous operations or previous shard instantiations. If the required data is on the search replica’s local disk, it uses that to recover the shard, skipping any communication with the primary node.

Recover from Remote Storage (if configured):

If the data is not available on the local disk, the replica can pull the data from remote-backed storage to recover the shard. We can even consider skipping the local disk check if Remote Storage is configured.

Replication Process (optional): Can be triggered after the recovery or can wait for next refresh interval

After initializing the replica via store recovery (local or remote), the process can start to ensure the shard is synchronized with the latest segments.

Thank you
@getsaurabh02

@prudhvigodithi
Copy link
Member

So in example API curl -X GET "http://localhost:9200/_cluster/state/routing_table?filter_path=routing_table.indices.my_sample_index" -H 'Content-Type: application/json' | jq '.' I can see the "state": "STARTED" for "searchOnly": true, is the proposal to remove this so that cluster manager need not track the searchOnly replicas?

{
  "routing_table": {
    "indices": {
      "my_sample_index": {
        "shards": {
          "0": [
            {
              "state": "STARTED",
              "primary": false,
              "searchOnly": false,
              "node": "BfhYEGSERsOMVxXa1BTFFA",
              "relocating_node": null,
              "shard": 0,
              "index": "my_sample_index",
              "allocation_id": {
                "id": "6TMmzm-6S0ip36g5VsWSog"
              }
            },
            {
              "state": "STARTED",
              "primary": false,
              "searchOnly": true,
              "node": "yOfxLMoGQPeVJ1lcHNPMrw",
              "relocating_node": null,
              "shard": 0,
              "index": "my_sample_index",
              "allocation_id": {
                "id": "icMnWUYjTReUN0J2H6UUlA"
              }
            },
            {
              "state": "STARTED",
              "primary": true,
              "searchOnly": false,
              "node": "KCszv1QKSZ-HHqzuAcoYbg",
              "relocating_node": null,
              "shard": 0,
              "index": "my_sample_index",
              "allocation_id": {
                "id": "Y6CRS10hT26SbpkdaiwMsQ"
              }
            }
          ]
        }
      }
    }
  }
}

@mch2 mch2 added v2.19.0 Issues and PRs related to version 2.19.0 and removed v2.18.0 Issues and PRs related to version 2.18.0 labels Oct 24, 2024
@mch2
Copy link
Member Author

mch2 commented Dec 2, 2024

is the proposal to remove this so that cluster manager need not track the searchOnly replicas?

No, the shards should still be included of the routing table, just not tracked at the primary.

@prudhvigodithi pls see the linked draft for what i'm thinking here. To simplify i've reduced the scope of separation to only remote store enabled domains at least for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.19.0 Issues and PRs related to version 2.19.0
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants