-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate leaking allocation storage data #615
Comments
Let's have a look at recent events Flux Query
To identify the allocations that have been created but not deleted, we can create a Flux query.
|
No allocation data has leaked since the last Poseidon restart today at 09:04 AM UTC. |
We've checked today again and did not identify any identifiable mismatches between the number of created and deleted allocations. Hence, we assume that the issue has been fixed and is no longer occurring. Closing it. 🙂 |
On the 12th, we have seen 16922 objects in the
nomad_allocations
storage.The case of
29-33eaa850-28a3-11ef-920d-fa163efe023e
is one example of a runner that was added to this storage but never removed.The runner is used multiple times by a user and then, after the inactivity timer, destroyed.
time="2024-06-12T11:08:04.171686Z" level=debug msg="Destroying Runner" destroy_reason="runner inactivity timeout exceeded" package=runner runner_id=29-33eaa850-28a3-11ef-920d-fa163efe023e
The Nomad
Allocation
events however don't contain any hint that the allocation got removed.InfluxDB Allocation Events
Only the
Job
events contain the hint that the Job got deregistered.InfluxDB Job events
This raises the question if the Sentry issue (See #406) can be seen as an indicator for a changed allocation id when both Nomad and Poseidon crashed in a migration. Or maybe that we ignored an important event
time="2024-06-12T10:59:59.280162Z" level=debug msg="Ignoring duplicate event" allocID=54750d38-7bb8-978c-1f0a-1ca64f1c70b4 package=nomad
.This should be fixed together with #602 and #612.
Another case is
29-f6160f46-0e6b-11ef-97ca-fa163e7afdf8
on the 10th of May.The text was updated successfully, but these errors were encountered: