Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr Schema Upgrades - how to handle in Indexer? #52

Open
artntek opened this issue Nov 7, 2023 · 6 comments
Open

Solr Schema Upgrades - how to handle in Indexer? #52

artntek opened this issue Nov 7, 2023 · 6 comments
Milestone

Comments

@artntek
Copy link
Collaborator

artntek commented Nov 7, 2023

We need to decide upon and implement how we handle solr schema upgrades and their associated reindex actions.

@artntek artntek added this to the 3.0.0 milestone Nov 7, 2023
@mbjones
Copy link
Member

mbjones commented Nov 22, 2023

In the past, we have often upgraded a solr schema.xml file without reindexing the entire corpus, especially for DataONE that has millions of versions of documents. That said, SOLR really recommends against this, particularly for major version upgrades. They discuss this here, specifically dealing with schema changes, solrconfig changes, and version upgrades: https://solr.apache.org/guide/solr/latest/indexing-guide/reindexing.html

One mechanism to avoid downtime is to reindex to a new SOLR collection, so that the old index continues to exist. Once the new collection is available, then we can use a collection alias to atomically switch the service from the old index content to the new content.

When thinking about a SOLR cloud update, it would be good to contemplate a rolling update strategy where 1) existing SOLR pods continue operating and serving an existing collection index, probably in read-only mode; 2) a new version of the service is rolled out, and the new PODS start up and begin reindexing the content into a new collection; 3) when the reindex is complete, the old pods are brought down, and the new pods begin serving requests, possibly after renaming with a collection alias. Extra bonus points for tying this seamlessly into Kubernetes rolling updates in a way that permits rollback to the original pods and indexed collection if for some reason the upgrade is not successful.

Of course, this is an ideal world -- we can manually manage this transition as well if such a set of features would significantly delay release.

@mbjones
Copy link
Member

mbjones commented Nov 22, 2023

@artntek
Copy link
Collaborator Author

artntek commented Dec 1, 2023

Notes from related discussion in ESS-DIVE meeting:
could use a Sidecar container -
checks periodically for new index file, then triggers reindex if changes detected

@artntek
Copy link
Collaborator Author

artntek commented Dec 6, 2023

This is important for 3.0.0 release, since that release involves a schema upgrade

@artntek
Copy link
Collaborator Author

artntek commented Feb 6, 2024

Example

  • if a helm upgrade changes the solr schema, we need to reindex all, because solr data on PV is compliant with old schema
  • Best way is to do this offline - build new index while still using old index, then switch over once finished

For 3.0.0 release, this will be manual, and we can choose not to reindex for huge corpuses. No end-user impact other than not being able to access that new info in metacatui (eg new license field)

@artntek artntek modified the milestones: 3.0.0, 3.1.0 Feb 6, 2024
@artntek
Copy link
Collaborator Author

artntek commented Feb 6, 2024

Manage manually for 3.0; automate for 3.1

@artntek artntek modified the milestones: 3.1.0, 3.2.0 Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants