Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do conditional locking of pgsk->lock LWLock #39

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

x4m
Copy link

@x4m x4m commented Apr 21, 2023

This lock might be the source of the lock contention.

A way to reproduce the problem is described here.

@NikolayS asked me to create a prototype for benchmarking. I hope he will provide some feedback on testing this.

Andrey M. Borodin and others added 2 commits April 21, 2023 16:57
This lock might be source of the lock contention.
@rjuju rjuju self-requested a review April 21, 2023 13:22
@rjuju rjuju self-assigned this Apr 21, 2023
@rjuju
Copy link
Member

rjuju commented Apr 21, 2023

Hi,

I still think that this would be better solved by allowing extensions to store custom data in the new stats in dynshm infrastructure, rather than trying to add workarounds that lead to silently ignore some activity in each and every extension.

That's especially true for pg_stat_kcache: this extension requires pg_stat_statements to be loaded and inherit the same number of entries, so even if this patch somehow helps you should still get similar eviction pattern and therefore similar need to use exclusive lwlock on pg_stat_statements side.

More to the point: if your monitoring system is so heavy that you regularly need to ignore many metrics, what value has this monitoring system? You will only see the data when everything runs smoothly and ignore data during the incident, leading to totally unreliable metrics.

@NikolayS
Copy link

NikolayS commented Apr 21, 2023

@rjuju thank you for your very quick response!

...You will only see the data when everything runs smoothly and ignore data during the incident, leading to totally unreliable metrics.

"Totally" here seems to be a very strong word.

A new entry is not recorded only if someone (monitoring or human) is concurrently reading the stats, which should be rare (but when it happens, it causes bad latency spikes for dozens, sometimes hundreds of backends – I'm talking about heavily loaded cases, 10-50k TPS).

But other entries continue receiving updates – metrics are increasing since it doesn't require exclusive locks.

Currently, or both pgss and pgsk we already cannot claim that we have fully reliable stats – we are limited by pg_stat_statements.max, which default is 5000 – far from being enough to cover all normalized queries in some systems, especially if we have a lot of various DBs and apps in PG cluster, and/or track_utility is on and some Java app emits set applitcation_name to $session_id...; flooding our stats table, and so on. Some entries are evicted and recorded again, back and forth, all the time. In other words, currently we cannot guarantee totally reliable metrics.

For example, if we consider sum( some metric ), we should always keep in mind that it's only part of the whole.

I would prefer the "recording" moment to be less invasive. These entries are already "in the tail" of my stats, which are truncated anyway (sometimes even to lower number – say, if you have Prometheus, you probably truncating pgss / pgsk to as few as Top-100-500 entries). So if those who are evicted and willing to jump back to my car slow me down, I'd prefer them to try again a bit later.

Maybe we could have a new GUC for this, allowing admin to decide what to prefer (perhaps, not changing the default behavior)? I, personally, would definitely turn this approach on in most loaded systems.

@rjuju
Copy link
Member

rjuju commented Apr 22, 2023

A new entry is not recorded only if someone (monitoring or human) is concurrently reading the stats, which should be rare

No, it could also happen if any other backend is holding the lwlock in any mode for long enough. So any other backend concurrently updating an existing entry, adding a new entry or trying to evict some will lead to new entries not being recorded.

Currently, or both pgss and pgsk we already cannot claim that we have fully reliable stats – we are limited by pg_stat_statements.max

Yes, it's known that there's a limit in the total number of entries that can be recorded. But at least you have the guarantee the the current entries are the most recent one, which should also be the most interesting ones in case of performance issue. With this patch that guarantee disappear and you can't even rely on seeing the activity during the performance problem.

Now if you take a snapshot of the entries say every 5 minutes and during that time you generate more than pg_stat_statements.max new entries then you clearly have a problem, and you should either increase the number of entries, change your app or switch to pgbadger or similar.

which are truncated anyway (sometimes even to lower number – say, if you have Prometheus, you probably truncating pgss / pgsk to as few as Top-100-500 entries

If you mean calling pg_stat_statements_reset() or pg_stat_kcache_reset(), that sounds like a really bad idea. Those functions are expensive, especially for pg_stat_statements due to the query text handling.

Maybe we could have a new GUC for this, allowing admin to decide what to prefer (perhaps, not changing the default behavior)? I, personally, would definitely turn this approach on in most loaded systems.

How will that help since pg_stat_statements will still have the same bottleneck?

Again the new pgstat infrastructure is designed to address such problems and that would give a way to fix all similar extensions.

@vitabaks
Copy link

I take it this commit refers to this issue.

@rjuju
Copy link
Member

rjuju commented Jan 19, 2024

@vitabaks I thought about it too earlier but looking back at the pointed thread that's a different issue. The underlying issue was mostly an escalation of lock conflicts due to querying a quite big pg_stat_statements view very often, until at one point an exclusive lwlock is needed ( eviction, new entry, reset...) and then all hell breaks loose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants