POC: Feat/make indexing more resiliant #16546

bielu · 2024-06-03T12:26:42Z

Prerequisites

I have added steps to test this contribution in the description below

If there's an existing issue for this PR then this fixes

Description

This Pr is POC for solving issue with rebuilding of indexes on startup. Also adding small flexibility around page size when reindexing as current we pulled 10k nodes, which if they have more than 100 properties might cause really slow indexing when we pull more than 1k. I am making it as POC as won't spend more time on this pr unless HQ confirm this is the way and I should improve code in this pr.

github-actions · 2024-06-03T12:26:53Z

Hi there @bielu, thank you for this contribution! 👍

While we wait for one of the Core Collaborators team to have a look at your work, we wanted to let you know about that we have a checklist for some of the things we will consider during review:

It's clear what problem this is solving, there's a connected issue or a description of what the changes do and how to test them
The automated tests all pass (see "Checks" tab on this PR)
The level of security for this contribution is the same or improved
The level of performance for this contribution is the same or improved
Avoids creating breaking changes; note that behavioral changes might also be perceived as breaking
If this is a new feature, Umbraco HQ provided guidance on the implementation beforehand
💡 The contribution looks original and the contributor is presumably allowed to share it

Don't worry if you got something wrong. We like to think of a pull request as the start of a conversation, we're happy to provide guidance on improving your contribution.

If you realize that you might want to make some changes then you can do that by adding new commits to the branch you created for this work and pushing new commits. They should then automatically show up as updates to this pull request.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

nul800sebastiaan · 2024-06-07T06:32:28Z

Thanks @bielu - I have noted this PR and asked for the team to have a look and see if it's the right direction. Of course we have a "little" conference coming up next week so it's taking a bit longer to get to.

Shazwazza · 2024-06-07T15:29:41Z

src/Umbraco.Infrastructure/Examine/IndexRebuildStatusManager.cs

+/// <summary>
+///
+/// </summary>
+public class IndexRebuildStatusManager : IIndexRebuildStatusManager


My recommended approach would be to do this: Shazwazza/Examine#372 (comment)

The only real way to know if the indexing is done in a resilient way would be to have an actual document in the index certifying that rebuilding is successful instead of relying on in-memory cache which is problematic.

@Shazwazza I am not 100% convinced about usage of additional index, as we both know less indexes is actually better with lucene. I am thinking maybe we should use additonal sql table, as it will be eqally resiliant as using index, but it will not require us to create index, what you think?

@bielu Sorry, I probably wasn't clear in my suggestion. We don't want to use an extra index to store any data, we can just use a marker document within the index. For example:

Rebuilding an index deletes all data

The index is populated with the normal data

When the IndexPopulator is done populating the index, it then writes a special marker document signaling that the populator is done. Perhaps this document has a field like __Populated: y

Then the rebuild checker, just checks if the document count for __Populated: y == 1

@Shazwazza that's make sense now! We can also extend it to check what populator are registered to show how many of them is done! I will make update to this pr

@Shazwazza i started changing implementation of this service to use examine underhood, can you have quick look and check if that is what you had in mind?
as in this way now we can also repeat failed batches (but i think I will need play around little more)

chore: make indexing more resiliant

779242f

Shazwazza reviewed Jun 7, 2024

View reviewed changes

POC: add populator information to reindex model

ad5ccef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC: Feat/make indexing more resiliant #16546

POC: Feat/make indexing more resiliant #16546

bielu commented Jun 3, 2024

github-actions bot commented Jun 3, 2024 •

edited by georgebid

Loading

nul800sebastiaan commented Jun 7, 2024

Shazwazza Jun 7, 2024

bielu Jun 10, 2024

Shazwazza Jun 10, 2024

bielu Jun 10, 2024

bielu Jun 15, 2024

POC: Feat/make indexing more resiliant #16546

Are you sure you want to change the base?

POC: Feat/make indexing more resiliant #16546

Conversation

bielu commented Jun 3, 2024

Prerequisites

Description

github-actions bot commented Jun 3, 2024 • edited by georgebid Loading

nul800sebastiaan commented Jun 7, 2024

Shazwazza Jun 7, 2024

Choose a reason for hiding this comment

bielu Jun 10, 2024

Choose a reason for hiding this comment

Shazwazza Jun 10, 2024

Choose a reason for hiding this comment

bielu Jun 10, 2024

Choose a reason for hiding this comment

bielu Jun 15, 2024

Choose a reason for hiding this comment

github-actions bot commented Jun 3, 2024 •

edited by georgebid

Loading