Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #470

Open
EmptyLungs opened this issue Mar 1, 2023 · 2 comments
Open

Memory leak #470

EmptyLungs opened this issue Mar 1, 2023 · 2 comments

Comments

@EmptyLungs
Copy link

EmptyLungs commented Mar 1, 2023

I'm starting sneakers(2.12.0) process like this:

CMD ["bundle", "exec", "rails", "sneakers:run"]

with this confiuration:

  Sneakers.configure(
    amqp: RabbitClient::CONFIG.amqp,
    vhost: RabbitClient::CONFIG.vhost,
    heartbeat: 5,
    workers: 24,
    threads: 1,
    prefetch: 24,
    durable: true,
    log: $stdout,
    env: Rails.env,
    ack: true
  )

Even if I specify 1 thread per worker, anyways I see 19 processes per worker:

ps huH p 381 | wc -l
19

And in the result we have a huge memory leak after first job is started:

$ ps axo rss,comm,pid | awk '{ proc_list[$2] += $1; } END { for (proc in proc_list) { printf("%d\t%s\n", proc_list[proc],proc); }}' | sort -n | tail -n 10 | sort -rn | awk '{$1/=1024;printf "%.0fMB\t",$1}{print $2}'

23257MB	ruby
4MB	bash

I'm not familiar with Ruby so could you please tell how to profile such an issue?

@michaelklishin
Copy link
Collaborator

You have 24 workers, so why do you expect fewer than that number of processes?

I don't really see any proof of a memory leak. Peak memory use is not the same thing as a leak.
You have 24 workers and a prefetch value of 24, which means you can have up to 24 * 24 = 576 messages delivered and unacknowledged at any given moment.

Depending on their size it can have a massive effect on the amount of memory used.

Perhaps try using fewer workers, and if your messages can be large (say, in megabytes), a lower prefetch of 8-16.

@EmptyLungs
Copy link
Author

@michaelklishin thanks for you reply!
I've added monitoring with Node Exporter and Prometheus, here's some metrics:
image

To elaborate on the screenshot, we rebooted the sneakers docker container at around 19:00, and it had some minor tasks to process. Then, at just past 00:00, there is a major workload that lasts for roughly 45 minutes. After it finished, the memory used is not released back.

Could you please tell how Worker class instance handles tasks, does it get destroyed after messages were acked/nacked? If not, looks like we should avoid storing any data in those workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants