kubernetes provider does not notice missing pods removed by anything other than itself #705

benclifford · 2022-02-28T14:21:05Z

Describe the bug
If I kill a worker pod, then
when I launch a subsequent task, hoping that the system will start up a new worker pod, instead my task progresses as far as Task is pending due to waiting-for-nodes and a new worker pod is not launched. This looks like its because the fork of the kubernetes provider in funcx does not check kubernetes for status, and continues to claim that worker pod exists -- this was fixed in the fork of kubernetes provider in parsl in early 2021 - see Parsl/parsl#1740.
I thought I'd already opened a funcx github issue on this but I can't find it.

Restarting the endpoint clears away the list of disappeared-pods.

This 2nd issue is a little bit disguised by scaling: once I have blocked the first missing container with sufficent hung tasks, the end point scales out a new pod to take on any excess work - which then succeeds to execute any new work. So a user experiencing this who accepts that "often funcx doesn't run very well, i should just keep retrying and not report a problem" will trigger that effect without reporting a problem

To Reproduce
Delete a worker pod

Expected behavior
Something more like the parsl fork of the kubernetes provider, Parsl/parsl#1740

Environment
my kubernetes dev environment, main branches as of 2022-02-28

The text was updated successfully, but these errors were encountered:

benclifford added the bug Something isn't working label Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes provider does not notice missing pods removed by anything other than itself #705

kubernetes provider does not notice missing pods removed by anything other than itself #705

benclifford commented Feb 28, 2022

kubernetes provider does not notice missing pods removed by anything other than itself #705

kubernetes provider does not notice missing pods removed by anything other than itself #705

Comments

benclifford commented Feb 28, 2022