Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: avoid contention on scheduler execution #103

Merged
merged 7 commits into from
Feb 16, 2022

Conversation

kares
Copy link
Contributor

@kares kares commented Feb 10, 2022

What we're after here is to avoid Thread.list blocking operation, which in JRuby means walking a synchronized Map.

A lot of Ruby threads in the system trying to Thread.list will lead to thread contending (blocked) for execution ...

Rufus::Scheduler uses Thread.list when figuring out which are the worker/scheduler threads it started.
This is working but a bit unfortunate and might be the cause for issues such as #88 (if there are multiple scheduler instances started from plugins - e.g. multiple jdbc inputs or http_poller or elasticsearch inputs with a schedule => ...).

Implementation wise there's 3 changes that improve performance:

  1. less scheduler thread ticks (sleep for 1.0 instead of 0.3)
    Screenshot from 2022-02-09 14-29-08

  2. avoid timeout_jobs doing a Thread.list if the timeout feature is not used (the plugin does not use timeouts)
    Screenshot from 2022-02-09 14-45-33
    NOTE: this accounts for most of the improvements ...

  3. finally there's still a list operation whenever the job is about to be executed
    Screenshot from 2022-02-09 15-48-48
    NOTE: the reason we re-def worker_threads and cache them - there's no more locking operations (the remaining Random contention is due to using Kernel.rand in a local testing scenario and does not happen in production).

The recording screenshots are from a local emulated run, which is a bit far fetched: 600 schedulers with a 10s interval cron schedule. Still do not have a good enough explanation for a complete scheduler stop, what I was able to accomplish locally is some schedulers skipping one, two, sometimes three four (10 second) runs.


For now we're not sure whether there is another bug with Rufus::Scheduler (3.0.9) that would cause #88.

The plan is to have the patch tested with potentially many JDBC inputs, if the input stopping issue persist we might need to investigate further. Upon succession the < Scheduler sub-class should be extracted into it's own mixin to be usable by all plugins (not just JDBC), ideally the rufus-scheduler would also be updated to latest version.
The overrides should be iterated upon to avoid using Thread.list in the library completely and submitting a patch upstream.

@kares kares marked this pull request as ready for review February 10, 2022 11:32
@kares kares requested a review from andsel February 10, 2022 11:38
Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the ideas LGTM, have left only a (maybe) stylistic concern about the usage of nil as special case param.


# @overload
def work_threads(query = :all)
if query.nil? # our special case from JobDecorator#start_work_thread
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are using the nil value as a special value, we are assigning to it a meaning. work_threads(nil) is invoked only by the patched methods in JobDecorator, could we use something really specific and explicit for this?
Maybe adding a new label :os_list_threads or something similar instead of nil ?

Copy link
Contributor Author

@kares kares Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, only used from JobDecorator - otherwise it's usually called with no args or a selector.
I wanted to have smt of a different type than the :symbol it supports, will try smt better than nil.

@kares kares requested a review from andsel February 13, 2022 17:04
Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants