Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed wait_for_group logic #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

baratrion
Copy link

Previously, the logic for waiting for tasks within a group had a small bug: it was calculating remaining time before brpoping a task, effectively only getting result of a single task that is close to timeout.

It's better to go over it with an example:

def task(x):
    time.sleep(x)
    return x

l = [1.2, 2.3, 3.1, 4.6, 4.7, 4.8, 5, 6, 7]
tasks_args = [(x,) for x in l]
task_ids = fifo.queue_tasks('example.tasks.task', tasks_args,
                            max_wait=30, result_timeout=30)


results = fifo.wait_for_group(task_ids, timeout=5)

Results in:

('2b39d9d7e53a477bbdc88b42a3ceba75', 'timeout', None)
('fff77873d32c403ba67bbb948653d52b', 'timeout', None)
('f12674188d0347ce896862c0ebdaded0', 'timeout', None)
('0751786d6f584ea78ad312879b556ec8', 'timeout', None)
('a9a69cca2cb7406fa62246648cd00f10', 'timeout', None)
('9786029043ba4cd9b74318998138f14d', 'completed', 2.3)
('b2c8df2304d64d258da1384039c8f82a', 'completed', 3.1)
('a2bad899fbea4bb395eb82468dd4996b', 'completed', 4.6)
('7c5ed51ca6b94828a6b7937b955cf654', 'completed', 1.2)

After this change the result is as follows:

('c933d3443fdd4d6fa13b3260ebb7ad4f', 'completed', 3.1)
('be707f648da54cf2a5e4747bd54fb6d1', 'timeout', None)
('28f64c68df9f412dbabfa1ea2b459908', 'completed', 1.2)
('28f0ece03f214f6c8690406ebcb9d664', 'completed', 2.3)
('c014f3c8178145829e665c940442de9c', 'completed', 5)
('0577ba31d591415fb3163d355c8982ea', 'completed', 4.6)
('d1d2983173104fe98d0289594cb72fcc', 'timeout', None)
('3bb4b55e08b147ebadcd82304adaf680', 'completed', 4.7)
('e3875a3de1be4568a22018484557314f', 'completed', 4.8)

@fergalwalsh
Copy link
Owner

Hi @baratrion , thanks, for the bug report. I have ran your test code and confirmed there is a problem but I think the source is a bit different from what you describe.

I have rechecked the logic and it seems correct for how brpop works: When any items are in the list the command will immediately return with a single item. If there are many items in the list it still only returns one each time brpop is called. If there are no items in the list then it blocks/waits until there is an item or until timeout seconds have passed.

The problem I see is that timeout is an integer (must be for redis) and the condition for continuing is if timeout > 0:. If the remaining time (deadline - time.time()) is < 1s but > 0s it doesn't wait so effectively the timeout is 1s less than the specified timeout argument.

The solution I see is to change the condition to if timeout >= 0: or to move the int casting after the condition. If you try either of these your example works fine as far as I can see. I will open an issue and make a fix for this bug.

@baratrion
Copy link
Author

Hi @fergalwalsh, yes, you've elaborated the problem nicely. However, without trying out your suggestion first, wouldn't if timeout >= 0 cause brpop to block indefinitely when timeout = 0?

Then, it'd cause it to block at least until there's an item, so wait_for_group wouldn't respect its timeout at all, no?

@fergalwalsh
Copy link
Owner

@baratrion yup right you are. That must be why I used > not >= in the first place.

Unfortunately the timeout must be an int for redis. If we round up the remaining time to 1s then it can end up waiting for more than the specified timeout in total which isn't really acceptable. On the other hand in the current situation it may not wait for the full timeout which also isn't good.

So at the moment I'm not really sure what the best solution is. Any ideas?

@baratrion
Copy link
Author

Probably the best solution would be Redis having floating timeouts, but I'd suggest taking a look at my changes also.

Do you have any objections using an interval (default 1 secs) for popping results and calculating total_elapsed every time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants