CloudWatchLogsLogGroup - stuck waiting #331

martivo · 2024-09-27T07:07:23Z

Creating the same issue in this fork as was closed in the old repo rebuy-de/aws-nuke#500

The problem still exists, tested with ghcr.io/ekristen/aws-nuke:v3.23.0
nuke.log

When I re-run aws-nuke then deletion is successfull (only the CloudWatchLogsLogGroup is deleted)

Do you want to continue? Enter account alias to continue.
> XXX

eu-north-1 - CloudWatchLogsLogGroup - /aws/eks/runner-main-pri/cluster - [CreatedTime: "1727419532008", LastEvent: "2024-09-27T06:45:43Z", logGroupName: "/aws/eks/runner-main-pri/cluster"] - triggered remove

Removal requested: 1 waiting, 0 failed, 113 skipped, 0 finished

eu-north-1 - CloudWatchLogsLogGroup - /aws/eks/runner-main-pri/cluster - [CreatedTime: "1727419532008", LastEvent: "2024-09-27T06:45:43Z", logGroupName: "/aws/eks/runner-main-pri/cluster"] - waiting

Removal requested: 1 waiting, 0 failed, 113 skipped, 0 finished

eu-north-1 - CloudWatchLogsLogGroup - /aws/eks/runner-main-pri/cluster - [CreatedTime: "1727419532008", LastEvent: "2024-09-27T06:45:43Z", logGroupName: "/aws/eks/runner-main-pri/cluster"] - removed

Removal requested: 0 waiting, 0 failed, 113 skipped, 1 finished

Nuke complete: 0 failed, 113 skipped, 1 finished.

The text was updated successfully, but these errors were encountered:

ekristen · 2024-09-27T12:03:11Z

Just so I understand, it does delete it but it tends to delete it and it comes back simply due to the order in which deletions happen and how long it takes for things like clusters to get removed?

martivo · 2024-09-28T16:45:48Z

I don't think it comes back. It seems like it never trigers the remove or when it was trigering the remove - it is still bein used and the remove fails.
In the attatched log you can also see that it will stay waiting forever for the resource to be deleted.

eu-north-1 - CloudWatchLogsLogGroup - /aws/eks/runner-main-pri/cluster - [CreatedTime: "1727413603921", LastEvent: "2024-09-27T06:38:09Z", logGroupName: "/aws/eks/runner-main-pri/cluster", tag:Name: "/aws/eks/runner-main-pri/cluster", tag:Prefix: "runner-main", tag:Terraform: "true", tag:Workspace: "pri"] - waiting

Removal requested: 1 waiting, 0 failed, 113 skipped, 305 finished

This message is repeated until aws-nuke gives up.
When I start the program again it will find the same CloudWatchLogsLogGroup resource and delete it successfully immediately.
EKS cluster removal usually takes around 5-15minutes - so quite some time. EKS uses the CloudWatchLogsLogGroup resource so the LogGroup can not be delted before EKS itself is removed. It seems to me that the nuke program does not even try to remove the LogGroup...seems to just wait for it to be removed. If there would be a way to triger the remove of the LogGroup only after the EKS is deleted I am sure it would fix the issue.

ekristen · 2024-09-28T18:06:53Z

Do you ever see a trigger remove?

martivo · 2024-09-29T13:42:36Z

Yes:

eu-north-1 - CloudWatchLogsLogGroup - /aws/eks/runner-main-pri/cluster - [CreatedTime: "1727413603921", LastEvent: "2024-09-27T06:38:09Z", logGroupName: "/aws/eks/runner-main-pri/cluster", tag:Name: "/aws/eks/runner-main-pri/cluster", tag:Prefix: "runner-main", tag:Terraform: "true", tag:Workspace: "pri"] - triggered remove

Also I found that EKS removal was triggered after the CloudWatchLogsLogGroup trigger (here you can see the LogGroup was already in waiting status)

eu-north-1 - CloudWatchLogsLogGroup - /aws/eks/runner-main-pri/cluster - [CreatedTime: "1727413603921", LastEvent: "2024-09-27T06:38:09Z", logGroupName: "/aws/eks/runner-main-pri/cluster", tag:Name: "/aws/eks/runner-main-pri/cluster", tag:Prefix: "runner-main", tag:Terraform: "true", tag:Workspace: "pri"] - waiting
eu-north-1 - EKSCluster - runner-main-pri - [CreatedAt: "2024-09-27T05:07:01Z", tag:Prefix: "runner-main", tag:Terraform: "true", tag:Workspace: "pri"] - triggered remove

ekristen · 2024-09-29T23:30:40Z

What is happening is it is getting deleted, but gets recreated by the cluster before the tool detects it is gone, so it's waiting for a removal that's already technically happened.

Not sure at the moment the best way to handle. Need to give this some thought.

ekristen · 2024-09-30T01:05:26Z

I actually think this can be solved very simply by a patch to libnuke. I'll put a PR together and make the binaries available.

Ultimately what is happening is that the CloudWatch LogGroup is being deleted, but coming back, however due to how the resource matching works currently, the String() result is matched over Properties() which in this case means the same log group name is found and results in a waiting forever.

There's two fixes here, a) we need some sort of upper threshold on "waiting" resources and b) we need to prioritize property matching over stringer. The CloudWatch Log Group properties would mismatch due to LastEvent and CreatedAt time, this invalidating and trying to remove again.

However, the ultimate fix is going to be some sort of DAG for deletions.

ekristen · 2024-09-30T01:12:04Z

This might fix the issue -- #332

Builds should be available here -- https://github.com/ekristen/aws-nuke/actions/runs/11097585762

If you can run and test that would be appreciated. It's the best fix that can be done at the moment, the real fix will be dependency graph, but that's a ways off.

martivo · 2024-09-30T13:13:32Z

This might fix the issue -- #332

Builds should be available here -- https://github.com/ekristen/aws-nuke/actions/runs/11097585762

If you can run and test that would be appreciated. It's the best fix that can be done at the moment, the real fix will be dependency graph, but that's a ways off.

I will give it a try on Wednesday evening - I have an account lined up for that time that needs to be nuked.

martivo · 2024-10-03T11:33:10Z

It seems it did not help. With the build you referenced in #332 it now just said the log group was removed but it was actually not removed.
I had to run the nuke twice still.

ekristen · 2024-10-03T13:09:06Z

So it is removed just comes back, this at least fixed then problem of getting stuck waiting.

The problem of it coming back is only ever going to be solved by a dependency based delete.

ekristen · 2024-10-10T22:34:48Z

@martivo I think we should merge #332 -- as it fixes the infinite waiting problem, and then close this and track it against with DAG feature issue I have open as I believe that's the only way to truly solve this problem, otherwise, it's pretty much, have to run it twice sort of a scenario.

Thoughts?

martivo · 2024-10-15T12:07:21Z

I guess its ok to merge, but then you will most surely get another isse that the nuke is not actually deleting all the resources. From my perspective this behaviour is ok - at least I can now decently run it twice without haivng to wait a long time.

ekristen linked a pull request Sep 30, 2024 that will close this issue

fix(deps): attempting fix for cloudwatch log group deletion with eks #332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CloudWatchLogsLogGroup - stuck waiting #331

CloudWatchLogsLogGroup - stuck waiting #331

martivo commented Sep 27, 2024 •

edited

Loading

ekristen commented Sep 27, 2024

martivo commented Sep 28, 2024 •

edited

Loading

ekristen commented Sep 28, 2024

martivo commented Sep 29, 2024 •

edited

Loading

ekristen commented Sep 29, 2024

ekristen commented Sep 30, 2024

ekristen commented Sep 30, 2024

martivo commented Sep 30, 2024

martivo commented Oct 3, 2024

ekristen commented Oct 3, 2024

ekristen commented Oct 10, 2024

martivo commented Oct 15, 2024

CloudWatchLogsLogGroup - stuck waiting #331

CloudWatchLogsLogGroup - stuck waiting #331

Comments

martivo commented Sep 27, 2024 • edited Loading

ekristen commented Sep 27, 2024

martivo commented Sep 28, 2024 • edited Loading

ekristen commented Sep 28, 2024

martivo commented Sep 29, 2024 • edited Loading

ekristen commented Sep 29, 2024

ekristen commented Sep 30, 2024

ekristen commented Sep 30, 2024

martivo commented Sep 30, 2024

martivo commented Oct 3, 2024

ekristen commented Oct 3, 2024

ekristen commented Oct 10, 2024

martivo commented Oct 15, 2024

martivo commented Sep 27, 2024 •

edited

Loading

martivo commented Sep 28, 2024 •

edited

Loading

martivo commented Sep 29, 2024 •

edited

Loading