Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider avoiding the "stable/volatile facts merge" in puppetdb SQL queries #3955

Closed
rbrw opened this issue Apr 4, 2024 · 1 comment
Closed
Labels

Comments

@rbrw
Copy link
Contributor

rbrw commented Apr 4, 2024

In an attempt to decrease the steady-state Postgres write load PuppetDB currently stores each factset in two separate jsonb fragments, stable and volatile1. Whenever a fact changes, the top-level subtree that contains it is moved from stable to volatile, if it wasn't already in volatile, and never moves it back. (See also: #3956)

When generating queries, PuppetDB just refers to (stable || volatile) via || which reconstructs the full factset, and (it turns out) can be notably expensive. During some scale testing, a simple inventory query effectively filtering on "somefact = x" was observed to take about five seconds with 100k nodes. During investigation, it looked like most of the time was being spent in the examination of the fact value. Rewriting the SQL from roughly (stable || volatile)->somefact to stable->somefact or volatile->somefact decreased the execution time to less than half a second.
Charlie also determined via perf that much of the extra time for the original version was being spent in the stable/volatile merge.

Since the stable/volatile split may be worth preserving1, consider adjusting PuppetDB to work with the stable and volatile fragments directly (as in the "or" conversion above). This will also require the creation of independent stable and volatile indexes, and may or may not allow dropping the existing, combined (stable||volatile) expresion index.

Footnotes

  1. The stable/volatile split relies on an assumption that in typical installations there will be a substantial set of facts that never change. When true, those facts will end up in the stable jsonb factsets column, and if they're also large enough to be TOASTed then the value in the factset row should just be an integer TOAST table rowid, shrinking the size of the row with respect to future rewrites during factset updates. From the TOAST page linked above: "During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change." 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants