k3s server create 100% CPU load #11251
-
Environmental Info: Node(s) CPU architecture, OS, and Version: Cluster Configuration: Describe the bug: NewRelic CPU usage for 3 days screenshot Steps To Reproduce: As 3 days graph suggests, the CPU load increases day-by-day (while the application is having the same load), so I've tried to run:
from this topic #6095 (comment), but it never helped. So I've ended up reinstalling the whole cluster on my new machine, just to find myself in the same situation 5 days after. Expected behavior: Actual behavior: Additional context / logs: |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 20 replies
-
I'd probably look at what the cluster is doing? Check the logs for errors, do The fact that CPU utilization rises and falls over time suggests that the load is related to external factors, and is not just an issue with K3s itself. |
Beta Was this translation helpful? Give feedback.
-
Sorry, I'm not that familiar with cpu profiling. Could you suggest any tool to do that? |
Beta Was this translation helpful? Give feedback.
-
Btw, I've removed all the agent and load from the server, but it still consumes about 50% of CPU |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
OK yeah so your compact is regularly failing with
When the compact only partially completes, it skips the next cycle and then there is even more work to do the next time, so it is less likely to catch up. This also leaves more data in the database for it to sort through, which is probably the cause of your excessive load. You might consider switching to etcd? If you have a single server, you can just convert in-place by adding While we can try to handle the "database is locked" issue better, sqlite is really tuned for smaller use cases where low overhead and low disk IO is more important than raw throughput. Etcd will probably suit your use case better. |
Beta Was this translation helpful? Give feedback.
-
@allnightlong what filesystem and disk type are you running k3s on? I am trying to reproduce the |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@brandond Just noticed I have one more time the same issue with staled single node cluster where I have only rancher running. |
Beta Was this translation helpful? Give feedback.
OK yeah so your compact is regularly failing with
database is locked
errors: