Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and implement a proper way to handle Pending objects when a module is not replying. #388

Open
mardim91 opened this issue Jun 7, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@mardim91
Copy link
Contributor

mardim91 commented Jun 7, 2024

When a object (VRF or LB or anyother object) gets created and one of the modules is stuck and do not return with success or error then the UpdateStatus function is not called and as a result the task in task manager is requeued immediately. This approach presents several problems that are listed below:

  1. The Publish function is a using a blocking channel. Which means that if the module cannot empty the channel where the notification is sent then the task manager cannot publish the next notification when the task gets expired and requeued. One solution here is to make the publish function non-blocking. How to do that properly we need to design it. This is a serious bug as the task manager cannot publish and the module cannot call the Updatefunction and return as the task manager cannot move forward and empty the TaskStatus channel because it is stuck in the publish function. This can make the opi-evpn-bridge unresponsive and we need to restart it.

  2. When a pending task gets expired we requeue it immediately. That means that if the module is stuck the task will expire and requeued many times without any exponential back off timer. This is not good because we can overload the publish function and the module itself. If we implement any exponential back off timer for this Pending tasks then we need to make the TaskStatus channel unblocking as we can have a situation that the module after a long time unstucks itself the call the UpdateStatus function but because the task has not been requeued as it waits on the timer to expire the queue is empty and that means that the TaskStatus will not be read by task manager in order to read whatever the module has sent as status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants