Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor based operations not working properly with ReadFrom = ANY #2926

Open
EMDavl opened this issue Jul 19, 2024 · 9 comments
Open

Cursor based operations not working properly with ReadFrom = ANY #2926

EMDavl opened this issue Jul 19, 2024 · 9 comments
Labels
status: waiting-for-feedback We need additional information before we can continue

Comments

@EMDavl
Copy link

EMDavl commented Jul 19, 2024

Bug Report

Current Behavior

Hello!

We have encountered the following problem - when using zscan with Redis Cluster and the ReadFrom parameter set to ANY, consecutive requests may be sent to different nodes. Because of this, the response does not comply with the guarantees provided by Redis regarding SCAN family operations. The response may contain duplicate values, and some values may be missing altogether.

Small example - https://github.com/EMDavl/cursor-bug

Expected behavior/code

SCAN family requests return correct values with the ReadFrom parameter set to ANY when the library is used with Redis Cluster. Requests are sent to the same node.

Environment

  • Lettuce version(s): 6.3.2.RELEASE
  • Redis version: 7+

Possible Solution

Send requests related to scan based operations to the same node untill iteration ends

@tishun tishun added for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Jul 19, 2024
@EMDavl
Copy link
Author

EMDavl commented Sep 7, 2024

@tishun
Hi! Is there any updates regarding this topic?

@tishun
Copy link
Collaborator

tishun commented Sep 9, 2024

Will try to take a look this week, apologies for the delay

@tishun
Copy link
Collaborator

tishun commented Sep 13, 2024

Hey @EMDavl ,

    /**
     * Setting to read from any node.
     *
     * @since 5.2
     */
    public static final ReadFrom ANY = new ReadFromImpl.ReadFromAnyNode();

Am I correct to assume you want to both use ReadFrom.Any and in the same time (only for the SCAN family of commands) read from the same node? Can you elaborate more on the use case? I assume ReadFrom.LOWEST_LATENCY is - for some reason - not going to work for you?

@tishun tishun added status: waiting-for-feedback We need additional information before we can continue and removed for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Sep 13, 2024
@tishun
Copy link
Collaborator

tishun commented Sep 13, 2024

Hello again,

I've taken the liberty of modifying your code and it now reads all the values:

        Set<String> set = new HashSet<>();
        ScoredValueScanCursor<String> cursor = null;
        do  {
            if (cursor == null) {
                cursor = sync.zscan(zsetName, ScanCursor.of("0"), ScanArgs.Builder.limit(1000));
            } else {
                cursor = sync.zscan(zsetName, ScanCursor.of(cursor.getCursor()), ScanArgs.Builder.limit(1000));
            }

            for (ScoredValue<String> value : cursor.getValues()) {
                set.add(value.getValue());
            }
            
        } while (!cursor.isFinished());

Does that help?

@EMDavl
Copy link
Author

EMDavl commented Sep 21, 2024

Hello!

Unfortunately, the solution you suggested didn't resolve the issue. I replaced it with what I had in my MRE, and in each of the three test runs, only part of the records were read. Perhaps you could share how you tested its functionality?

Can you elaborate more on the use case

Initially, our project was configured with the ReadFrom.ANY parameter to reduce the load on the nodes. After noticing the issue, we switched to the option you suggested, and so far, it has worked for us. The goal of this issue is to bring this problem to the attention of the development team, so that this behavior can either be fixed or documented.

you want to both use ReadFrom.ANY and at the same time (only for the SCAN family of commands) read from the same node

Not necessarily; the main requirement is that the SCAN family of commands, when used with ReadFrom.ANY and a Redis cluster, should return all values.

@tishun
Copy link
Collaborator

tishun commented Sep 25, 2024

Greetings again,

Unfortunately, the solution you suggested didn't resolve the issue. I replaced it with what I had in my MRE, and in each of the three test runs, only part of the records were read. Perhaps you could share how you tested its functionality?

What I did is deployed an new empty cluster and run the code, with the change I suggested EMDavl/cursor-bug#1

This resulted in the (for me) expected result being output:
Entries read: 10000

While without my change the result was random, but always less than 10000, for ex.:
Entries read: 7602

Can you elaborate more on the use case

The issue is that the order in which you check cursor.isFinished() and then call cursor = sync.zscan(zsetName, ScanCursor.of(cursor.getCursor()), ScanArgs.Builder.limit(1000)); is reversed - you need to check if the cursor is finished only after calling the zscan() on the new cursor, otherwise you are not following up on all the cursors. You can debug this by adding a new call just above the break statement and you will see that more results re returned, but due to the break statement they are not processed.

Initially, our project was configured with the ReadFrom.ANY parameter to reduce the load on the nodes. After noticing the issue, we switched to the option you suggested, and so far, it has worked for us. The goal of this issue is to bring this problem to the attention of the development team, so that this behavior can either be fixed or documented.

you want to both use ReadFrom.ANY and at the same time (only for the SCAN family of commands) read from the same node

Not necessarily; the main requirement is that the SCAN family of commands, when used with ReadFrom.ANY and a Redis cluster, should return all values.

You are correct! I was mistakingly assuming the driver does not follow up on the contract, but in fact it does, even when the ReadFrom.ANY is used. So you should be able to use ReadFrom.ANY and still be able to fetch the cursor from multiple different shards.

@EMDavl
Copy link
Author

EMDavl commented Sep 25, 2024

Thank you very much for your time and clarifications.
Unfortunately, even with the proposed solution, I am still getting inconsistent results. It might be due to my environment. I will try to carefully double-check everything again over the upcoming weekend and will get back to you with the results.

@EMDavl
Copy link
Author

EMDavl commented Oct 4, 2024

Apologies for the delay, the last few weeks have been quite busy, and the next couple don't promise to be any freer. I'll reply to the thread as soon as I'm able to double-check everything

@tishun
Copy link
Collaborator

tishun commented Oct 5, 2024

Sure, I will be happy to help if there is still something unclear or you discover an issue with the driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-feedback We need additional information before we can continue
Projects
None yet
Development

No branches or pull requests

2 participants