Skip to content

[Perf]: Planning time increases linearly with number of stripes #188

@leoyvens

Description

@leoyvens

What's wrong?

I noticed this because I have two tables with the same data, but one was built with the default 150k rows per stripe and the other with 10k rows per stripe. The table has about 1 billion rows. Both tables have a BTree index on vid.

This is an explain analyze for the table with 150k rows per stripe:

explain analyze select s.amount_in from swap_columnar_vid s where vid in (899171129, 999171129);
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_swap_columnar_vid on swap_columnar_vid s  (cost=0.57..1945.61 rows=2 width=10) (actual time=21.079..21.093 rows=1 loops=1)
   Index Cond: (vid = ANY ('{899171129,999171129}'::bigint[]))
 Planning Time: 11.635 ms
 Execution Time: 21.899 ms
(4 rows)

And now for the table with 10k rows per stripe:

d30750a14969=> explain analyze select s.amount_in from swap_columnar_vid_stripe_10k s where vid in (899171129, 999171129);
                                                                                QUERY PLAN                                                                     
            
---------------------------------------------------------------------------------------------------------------------------------------------------------------
------------
 Index Scan using idx_vid_on_swap_columnar_vid_stripe_10k on swap_columnar_vid_stripe_10k s  (cost=0.57..142.02 rows=2 width=10) (actual time=6.422..6.435 rows
=1 loops=1)
   Index Cond: (vid = ANY ('{899171129,999171129}'::bigint[]))
 Planning Time: 142.906 ms
 Execution Time: 6.474 ms
(4 rows)

Notice how the query planning increased proportionally to the increase in number of stripes, and to an unreasonable 140ms. And the plan isn't even a columnar scan. This can be worked around to some extent by using prepared statements to take advantage of plan caching, but still this is a performance issue.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions