Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Extended statistics advisor (Poc) #33

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yamatattsu
Copy link

Hi Julien!

I created PoC patch for Exnteded statistics advisor as I mentioned on PGCon2020 work. :-D
The test case and result are below:

CREATE TABLE t (a INT, b INT);
INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i);
ANALYZE t;

EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;

                                 QUERY PLAN
-----------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual rows=100 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 9900
 Planning Time: 0.122 ms
 Execution Time: 5.226 ms
(5 rows)

SELECT v
FROM json_array_elements(
     pg_qualstats_index_advisor(min_filter => 50)->'extstats') v
ORDER BY v::text COLLATE "C";

                          v
-----------------------------------------------------
 "CREATE STATISTICS t_b_a_ext ON b, a FROM public.t"
(1 row)

CREATE STATISTICS t_b_a_ext ON b, a FROM public.t;
CREATE STATISTICS

ANALYZE t;
EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual rows=100 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 9900
 Planning Time: 0.182 ms
 Execution Time: 4.430 ms
(5 rows)

Thanks,
Tatsuro Yamada

@rjuju rjuju self-requested a review August 9, 2020 07:09
@rjuju rjuju self-assigned this Aug 9, 2020
@rjuju
Copy link
Member

rjuju commented Aug 9, 2020

Thanks a lot for working on that @yamatattsu !

First, it should be done as of a new version, probably a 2.1.0, as it's adding some new feature rather than bugfix. I'll go ahead and commit preliminary work for that, so you don't have to deal with it in this patchset.

Then, regarding the feature in itself. I think it'd be better to create a new function, as they'll probably end up with a lot of differences. The most obvious one is probably the thresholds and filters. If you're using the same function, you'll discard quals that are used as part of index scan, which is probably a bad idea. Also, we should have some threshold based on selectivity estimation error here.

If needed we could of course pull out some part of the existing index advisors in functions / views so we don't have to copy/paste too many things.

The way I imagined that feature would be to work based on all compound predicates that have at least 2 simple predicates, with enough selectivity estimation error (ratio and/or raw num). Then, maybe optionally, try to validate that those errors are. really due to correlated columns. This could be done by checking that those simple predicates when used alone have good selectivity estimates. Of course there'll always be cases where compound predicates use simple predicates that are never used alone, so we can't make a hard requirement on those being available for a validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants