Retry locking if wait fails within timeout. Reset ttl after wait. #3

dreusel · 2020-12-13T18:25:20Z

As mentioned in the other PRs discussion, the library could re-attempt to get a lock if synchronization fails within replicationTimeout. This is implemented in this PR.
Also I realized there was a flaw, which is that while waiting for replication, the already acquired lock in the primary is loosing it's time-to-live and may actually already have completely expired before the WAIT finishes. To mitigate that:

There is now a requirement that the TTL is 'significantly longer' (currently 1.5x) than replicationTimeout. So that when WAIT takes nearly replicationTimeout, we can be sure the lock is actually still alive
After waiting the lock is immediately 'extended', ie gets it's full TTL again.

This implies:

Resetting the TTL requires knowing the key before we enter the
pipeline, which means we can't have it generated by redis during the
aquireLock anymore. It needs to be generated client-side now, which I do by a cheap time + random.
the acquireScript needs to extend the TTL if it's already owned, since
we may be re-trying after a failed WAIT. At which point we may or may
not have the lock in the current master already. And we'll need to be
sure that the ttl will survive the next WAIT. This implicitly gives
aquireLock the ability to extend a lock.

Because acquireLock now actually extends a previously held lock, the same implementation is now used for the extendLock function, the difference being that extendLock requires to have the key in advance.

This implies: - Resetting the TTL requires knowing the key before we enter the pipeline, which means we cant have it generated by redis during the aquireLock anymore. So that needs to be generated client-side now. - the acquireScript needs to extend the TTL if it's already owned, since we may be re-trying after a failed WAIT. At which point we may or may not have the lock in the current master already. And we'll need to be sure that the ttl will survive the next WAIT. This implicitly gives aquireLock the ability to extend a lock.

Also use a unique key for each test, so that one test failing does not break the others

andris9 · 2020-12-14T08:01:35Z

Hey, sorry, I already reverted the readme etc changes but forgot to push and now it is out of sync. Could you fix the conflicts? Closing #2

dreusel added 3 commits December 13, 2020 19:13

Apply same mechanics for extendLock()

dc055dd

Update tests

050f68e

Also use a unique key for each test, so that one test failing does not break the others

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry locking if wait fails within timeout. Reset ttl after wait. #3

Retry locking if wait fails within timeout. Reset ttl after wait. #3

dreusel commented Dec 13, 2020

andris9 commented Dec 14, 2020

Retry locking if wait fails within timeout. Reset ttl after wait. #3

Are you sure you want to change the base?

Retry locking if wait fails within timeout. Reset ttl after wait. #3

Conversation

dreusel commented Dec 13, 2020

andris9 commented Dec 14, 2020