Partition Pruning During Execution

Partitioning in Postgresql eases handling large volumes of data. This feature has greatly improved with the introduction of declarative partitioning in PostgreSQL 10, paving way for better query optimization and execution techniques on partitioned tables. PostgreSQL 11 extended query optimization by enabling partition elimination strategies during query execution. It also added a parameter enable_partition_pruning to control the executor’s ability to prune partitions which is on by default.

When does the runtime pruning occur?

The first attempt at pruning occurs at the planning stage for the quals using partition key with constants and then for the volatile params runtime pruning can be done at two stages of execution - at executor startup or initialization and during actual execution.

1. Executor Initialization

In some cases, as in the execution of the prepared query, we can know of the parameters called the external params during the initialization and hence avoid initializing the unwanted sub-plans. In this case, EXPLAIN outputs will not list the eliminated sub-plans but only give a number of the sub-plans removed.

=# prepare tprt_q1 (int, int, int) as select * from tprt where a between $1 and $2 and b <= $3; =# explain execute tprt_q1 (25000, 30000, 20000); Append (cost=0.00..2007.54 rows=153 width=8) Subplans Removed: 7 -> Seq Scan on tprt_a3_b1 (cost=0.00..222.98 rows=17 width=8) Filter: ((a >= $1) AND (a <= $2) AND (b <= $3)) -> Seq Scan on tprt_a3_b2 (cost=0.00..222.98 rows=17 width=8) Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))

The EXPLAIN output states that 7 sub-plans have been removed; which implies that the corresponding partitions were not required and hence not even initialized.

2. Actual Execution

As in the case of subqueries and parameterized nested loop joins; the parameters called exec params are only available at the time of actual execution. In this case, all the partitions are initialized and then the executor will determine which partitions need to be scanned depending on the parameters. In case any of the partitions are not required throughout the runtime then it is marked with “never executed” in the EXPLAIN ANALYZE output.

The following is an example of a parameterized nested loop join between two tables with 5000 rows. The outer table has values from 2001 to 7000  and the partitioned table has values from 1 to 5000. The partitioned table has 5 partitions each with the capacity of 1000 values.

EXPLAIN ANALYZE output with enable_partition_pruning = off
Nested Loop (actual rows=3000 loops=1) -> Seq Scan on t1 (actual rows=5000 loops=1) -> Append (actual rows=1 loops=5000) -> Index Scan using tp_a1_idx on tp_a1 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) -> Index Scan using tp_a2_idx on tp_a2 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) -> Index Scan using tp_a3_idx on tp_a3 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) -> Index Scan using tp_a4_idx on tp_a4 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) -> Index Scan using tp_a5_idx on tp_a5 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) Planning Time: 0.319 ms Execution Time: 114.823 ms

EXPLAIN ANALYZE output with enable_partition_pruning=on
Nested Loop (actual rows=3000 loops=1) -> Seq Scan on t1 (actual rows=5000 loops=1) -> Append (actual rows=1 loops=5000) -> Index Scan using tp_a1_idx on tp_a1 (never executed) Index Cond: (a = t1.col1) -> Index Scan using tp_a2_idx on tp_a2 (never executed) Index Cond: (a = t1.col1) -> Index Scan using tp_a3_idx on tp_a3 (actual rows=1 loops=1000) Index Cond: (a = t1.col1) -> Index Scan using tp_a4_idx on tp_a4 (actual rows=1 loops=1000) Index Cond: (a = t1.col1) -> Index Scan using tp_a5_idx on tp_a5 (actual rows=1 loops=1000) Index Cond: (a = t1.col1) Planning Time: 0.384 ms Execution Time: 36.572 ms

Reasons for Performance Improvement

There is a definite improvement in the performance of queries involving partitioned tables but the extent of it is determined by partition key parameters which controls how many scans of the partitions can be skipped.

Considering the nested loop join case above, with pruning disabled, all the partitions are scanned for each of the 5000 values from the outer table t1 (loops=5000). With pruning enabled, only the appropriate partitions are scanned for each value from the outer table (loops=1000). In two partitions there are no scans performed at all (never executed) since the outer table does not have the values that match entries with these partitions (1-2000). Since the number of scans on each partition is reduced substantially, we can see an improvement of 67% in execution time from 115 ms to 37 ms.
When the amount of data in the outer table is doubled to 10000 rows (values 2001 - 12000), the behavior is similar except in the non-pruned case where the number of scans made in each partition is 10000 instead of 5000 but the difference in performance is better at 83% from 239 ms to 40 ms.

Under the Hood

The partitions are internally sorted and stored by the increasing order of the values that they can hold. Initially, in V10 the undesirable partitions were eliminated by a tedious linear search in the planner but with V11, this has been updated to quicker a binary search of the list.

To perform a scan on a partitioned table, an Append node is used with the scan on each of the leaf partitions being a sub-plan under it. Each of these sub-plans is indexed and the executor internally accesses them by this index.  

To help the executor select the correct Append sub-plans, a map of partition with the corresponding sub-plan index is used. The planner first creates this map handling all the partitions pruned by it. When the executor detects that pruning is possible, it fetches the list of partitions that can satisfy the given param and figures out the corresponding sub-plan indexes from the map. If there is no sub-plan index, it indicates that the partition has been already pruned in the previous stage (planner or executor initialization).

If the pruning has taken place during the executor startup, then the map is updated because the rejected partitions are not initialized, changing the sub-plan indexes of the retained ones. This is necessary so that the map is valid for pruning to be done by the executor later.
Supporting runtime partition pruning is just one of the few performance improvements and there is more to be expected in the upcoming versions.


--

This blog is also posted on postgres rocks.

Sharing Folder from Mac to Cent OS VM

Install VMware Tools

  1. Start up VM.
  2. Goto Virtual Machine -> Install VMware Tools
  3. There will be a pop up window. Click Install to connect the VMware Tools installer CD to this virtual machine.
  4. From the CD, Extract the VMwareTools-<version>.tar.gz
  5. Goto the folder
  6. Run the following command:
./vmware-install.pl -d --clobber-kernel-modules=vmhgfs

Share a Folder

  1. Goto Virtual Machine -> Settings
  2. In the pop up window choose Sharing
  3. Check the Enable Shared Folders option
  4. Click [+]  and select the folder on Mac which is to be shared
  5. The folder will be visible under /mnt/hgfs/

PostgreSQL Datatypes

SQL datatypes and internal datatypes

SQL type Internal type
smallint int8
int int32
bigint int64

PostgreSQL : Get total pages of an object

SELECT pg_relation_size(<object_id>, 'main')/8192);
The pg_relation_size gives the disk space occupied by the object. Since each page is of 8kB, the pg_relation_size size is divided by 8192 to give the page count.

Vim - Highlight OverLength


Add the following code to /etc/vimrc if we want to highlight the characters that go beyond column 80.
highlight OverLength ctermbg=red ctermfg=white guibg=#592929
match OverLength /\%81v.\+ 

PostgreSQL : Data Object sizes

 Size of User Tables

=# SELECT
   relname as "Table",
   pg_size_pretty(pg_total_relation_size(relid)) as "Total Size",
   pg_size_pretty(pg_relation_size(relid)) as "Relation Size",
   pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
   FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC
;
      Table       | Total Size | Relation Size | External Size
------------------+------------+---------------+---------------
 load_test        | 1883 MB    | 1776 MB       | 107 MB
 pgbench_accounts | 1121 MB    | 961 MB        | 161 MB
 pgbench_history  | 8656 kB    | 8656 kB       | 0 bytes
 pgbench_tellers  | 80 kB      | 40 kB         | 40 kB
 pgbench_branches | 24 kB      | 8192 bytes    | 16 kB
(5 rows)

 Size of User Indexes

=# SELECT
   indexrelname as "Index",
   pg_size_pretty(pg_total_relation_size(indexrelid)) as "Size",
   pg_size_pretty(pg_total_relation_size(indexrelid) - pg_relation_size(indexrelid)) as "External Size"
   FROM pg_catalog.pg_statio_user_indexes ORDER BY pg_total_relation_size(indexrelid) DESC;
         Index         |  Size  | External Size
-----------------------+--------+---------------
 pgbench_accounts_pkey | 161 MB | 0 bytes
 load_testidx          | 107 MB | 0 bytes
 pgbench_tellers_pkey  | 40 kB  | 0 bytes
 pgbench_branches_pkey | 16 kB  | 0 bytes
(4 rows)

 Functions

pg_relation_size(oid) - returns disk space occupied by the table or index of the given oid.

pg_total_relation_size(oid) - returns total disk space of the table of given oid, including indexes and toasted data

Clearing OS cache

Run the following command as root
echo 3 > /proc/sys/vm/drop_caches