Add Extended statistics advisor (Poc) #33

yamatattsu · 2020-08-09T06:43:41Z

Hi Julien!

I created PoC patch for Exnteded statistics advisor as I mentioned on PGCon2020 work. :-D
The test case and result are below:

CREATE TABLE t (a INT, b INT);
INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i);
ANALYZE t;

EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;

                                 QUERY PLAN
-----------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..195.00 rows=1 width=8) (actual rows=100 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 9900
 Planning Time: 0.122 ms
 Execution Time: 5.226 ms
(5 rows)

SELECT v
FROM json_array_elements(
     pg_qualstats_index_advisor(min_filter => 50)->'extstats') v
ORDER BY v::text COLLATE "C";

                          v
-----------------------------------------------------
 "CREATE STATISTICS t_b_a_ext ON b, a FROM public.t"
(1 row)

CREATE STATISTICS t_b_a_ext ON b, a FROM public.t;
CREATE STATISTICS

ANALYZE t;
EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..195.00 rows=100 width=8) (actual rows=100 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 9900
 Planning Time: 0.182 ms
 Execution Time: 4.430 ms
(5 rows)

Thanks,
Tatsuro Yamada

rjuju · 2020-08-09T10:34:59Z

Thanks a lot for working on that @yamatattsu !

First, it should be done as of a new version, probably a 2.1.0, as it's adding some new feature rather than bugfix. I'll go ahead and commit preliminary work for that, so you don't have to deal with it in this patchset.

Then, regarding the feature in itself. I think it'd be better to create a new function, as they'll probably end up with a lot of differences. The most obvious one is probably the thresholds and filters. If you're using the same function, you'll discard quals that are used as part of index scan, which is probably a bad idea. Also, we should have some threshold based on selectivity estimation error here.

If needed we could of course pull out some part of the existing index advisors in functions / views so we don't have to copy/paste too many things.

The way I imagined that feature would be to work based on all compound predicates that have at least 2 simple predicates, with enough selectivity estimation error (ratio and/or raw num). Then, maybe optionally, try to validate that those errors are. really due to correlated columns. This could be done by checking that those simple predicates when used alone have good selectivity estimates. Of course there'll always be cases where compound predicates use simple predicates that are never used alone, so we can't make a hard requirement on those being available for a validation.

Add Extended statistics advisor (Poc)

e37d21e

rjuju self-requested a review August 9, 2020 07:09

rjuju self-assigned this Aug 9, 2020

rjuju added the enhancement label Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Extended statistics advisor (Poc) #33

Add Extended statistics advisor (Poc) #33

yamatattsu commented Aug 9, 2020

rjuju commented Aug 9, 2020

Add Extended statistics advisor (Poc) #33

Are you sure you want to change the base?

Add Extended statistics advisor (Poc) #33

Conversation

yamatattsu commented Aug 9, 2020

rjuju commented Aug 9, 2020