Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goschedstats: add cluster setting to always do short sampling #133459

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sumeerbhola
Copy link
Collaborator

The adaptive sampling, with the long period equal to 250ms, can result in sluggish changes in the number of AC slots, resulting in unnecessary queueing. The 1ms sampling is cheap, even on an idle roachprod node.

Effect of enabling this on an idle node -- the CPU utilization does not change
Screenshot 2024-10-25 at 12 18 47 PM

A 15s cpu profile on this 8vCPU node shows only 20ms in goschedstats and callbacks.
Screenshot 2024-10-25 at 12 19 54 PM

Fixes #131766

Epic: none

Release note (ops change): The
goschedstats.always_use_short_sample_period.enabled setting should be set to true for any serious production cluster, to prevent unnecessary queuing in admission control CPU queues.

The adaptive sampling, with the long period equal to 250ms, can result in
sluggish changes in the number of AC slots, resulting in unnecessary
queueing. The 1ms sampling is cheap, even on an idle roachprod node.

Fixes cockroachdb#131766

Epic: none

Release note (ops change): The
goschedstats.always_use_short_sample_period.enabled setting should be set
to true for any serious production cluster, to prevent unnecessary queuing
in admission control CPU queues.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@sumeerbhola sumeerbhola added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3 labels Oct 25, 2024
Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)


pkg/util/goschedstats/runnable.go line 88 at r1 (raw file):

	"goschedstats.always_use_short_sample_period.enabled",
	"when set to true, the system always does 1ms sampling of runnable queue lengths",
	false)

Any reason why we don't just make this true by default. considering the CPU profile helps show that this is cheap to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

goschedstats,admission: add cluster setting to turn off samplePeriodLong
3 participants