Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry locking if wait fails within timeout. Reset ttl after wait. #3

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dreusel
Copy link

@dreusel dreusel commented Dec 13, 2020

As mentioned in the other PRs discussion, the library could re-attempt to get a lock if synchronization fails within replicationTimeout. This is implemented in this PR.
Also I realized there was a flaw, which is that while waiting for replication, the already acquired lock in the primary is loosing it's time-to-live and may actually already have completely expired before the WAIT finishes. To mitigate that:

  • There is now a requirement that the TTL is 'significantly longer' (currently 1.5x) than replicationTimeout. So that when WAIT takes nearly replicationTimeout, we can be sure the lock is actually still alive
  • After waiting the lock is immediately 'extended', ie gets it's full TTL again.

This implies:

  • Resetting the TTL requires knowing the key before we enter the
    pipeline, which means we can't have it generated by redis during the
    aquireLock anymore. It needs to be generated client-side now, which I do by a cheap time + random.
  • the acquireScript needs to extend the TTL if it's already owned, since
    we may be re-trying after a failed WAIT. At which point we may or may
    not have the lock in the current master already. And we'll need to be
    sure that the ttl will survive the next WAIT. This implicitly gives
    aquireLock the ability to extend a lock.

Because acquireLock now actually extends a previously held lock, the same implementation is now used for the extendLock function, the difference being that extendLock requires to have the key in advance.

This implies:
- Resetting the TTL requires knowing the key before we enter the
pipeline, which means we cant have it generated by redis during the
aquireLock anymore. So that needs to be generated client-side now.
- the acquireScript needs to extend the TTL if it's already owned, since
we may be re-trying after a failed WAIT. At which point we may or may
not have the lock in the current master already. And we'll need to be
sure that the ttl will survive the next WAIT. This implicitly gives
aquireLock the ability to extend a lock.
Also use a unique key for each test, so that one test failing does not
break the others
@andris9
Copy link
Member

andris9 commented Dec 14, 2020

Hey, sorry, I already reverted the readme etc changes but forgot to push and now it is out of sync. Could you fix the conflicts? Closing #2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants