Showing posts with label postgresql. Show all posts
Showing posts with label postgresql. Show all posts

Multicolumn Partitioning


First published here .

Multi-column partitioning allows us to specify more than one column as a partition key. Currently, multi-column partitioning is possible only for range and hash type. Range partitioning was introduced in PostgreSQL10 and hash partitioning was added in PostgreSQL 11.

Creating Partitions

To create a multi-column partition, when defining the partition key in the CREATE TABLE command, state the columns as a comma-separated list. You can specify a maximum of 32 columns.
CREATE TABLE tbl_range (id int, col1 int, col2 int, col3 int)
 PARTITION BY RANGE (col1, col2, col3);
CREATE TABLE tbl_hash (id int, col1 int, col2 int, col3 int)
 PARTITION BY HASH (col1, col2, col3);

Range

When we mention the partition bounds for a partition of a multicolumn range partitioned table, we need to specify the bound for each of the columns of the partition key in the CREATE TABLE ... PARTITION OF or the ALTER TABLE ... ATTACH PARTITION command.
CREATE TABLE p1 PARTITION OF tbl_range
 FOR VALUES FROM (1, 110, 50) TO (20, 200, 200);
ALTER TABLE tbl_range ATTACH PARTITION r1
 FOR VALUES FROM (1, 110, 50) TO (20, 200, 200);
The tuple routing section explains how these bounds work for the partition.
Please note that if the unbounded value -- MINVALUE or MAXVALUE -- is used for one of the columns, then all the subsequent columns should also use the same unbounded value.
CREATE TABLE r2 PARTITION OF tbl_range 
 FOR VALUES FROM (900, MINVALUE, MINVALUE) TO (1020, 200, 200);
ALTER TABLE tbl_range ATTACH PARTITION r3
 FOR VALUES FROM (1, 110, 50) TO (MAXVALUE, MAXVALUE, MAXVALUE);

Hash

When we mention the partition bounds for a partition of a multicolumn hash partitioned table, we need to specify only one bound irrespective of the number of columns used.
CREATE TABLE p1 PARTITION OF tbl_hash
 FOR VALUES WITH (MODULUS 100, REMAINDER 20);
ALTER TABLE tbl_hash ATTACH PARTITION h1
 FOR VALUES FROM (WITH (MODULUS 100, REMAINDER 20)

Tuple Routing

The partitioned parent table will not store any rows but routes all the inserted rows to one of the partitions based on the value of the partition key. This section explains how the tuple routing takes place for the range and hash multi-column partition key.

Range

In the range partitioned table, the lower bound is included in the table but the upper bound is excluded. In a single partitioned table with bound of 0 to 100, rows with partition key value 0 will be permitted in the partition but rows with value 100 will not.

For a multi-column range partition, the row comparison operator is used for tuple routing which means the columns are compared left-to-right, stopping at first unequal value pair. If the partition key value is equal to the upper bound of that column then the next column will be considered.

Consider a partition with bound (0,0) to (100, 50). This would accept a row with the partition key value (0, 100) because the value of the first column satisfies the partition bound of the first column which is 0 to 100 and in this case, the second column is not considered.

The partition key value (100, 49) would also be accepted because the first column value is equal to the upper bound specified and so the second column is considered here and it satisfies the restriction 0 to 50.

On the same grounds, rows with value (100, 50) or (101, 10) will not be accepted in the said partition.

Note that if any of the partition key column values are NULL then it can only be routed to the default partition if it exists else it throws an error.

Hash

In the hash partitioned case, the hash of each column value that is part of the partition key is individually calculated and then combined to get a single 64-bit hash value. The modulus operation is performed on this hash value and the remainder is used to determine the partition for the inserted row.
There is no special handling for NULL values, the hash value is generated and combined as explained above to find the partition for the row to be inserted.

Partition Pruning

One of the main reasons to use partitioning is the improved performance achieved by partition pruning. Pruning in a multi-column partitioned table has few restrictions which are explained below.
For simplicity, all examples in this section only showcase the plan time pruning using constants. This pruning capability can be seen in other plans as well where pruning is feasible like runtime pruning, partition-wise aggregation, etc.

Query using all the partition key columns

When the query uses all the partition key columns in its WHERE clause or JOIN clause, partition pruning is possible.
Consider the following multi-column range partitioned table.
                           Partitioned table "public.tbl_range"

 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              | 
 col1   | integer |           |          |         | plain   |              | 
 col2   | integer |           |          |         | plain   |              | 
 col3   | integer |           |          |         | plain   |              | 

Partition key: RANGE (col1, col2, col3)

Partitions: r1 FOR VALUES FROM (MINVALUE, MINVALUE, MINVALUE) TO 
                               (1000, 2000, 3000),
            r2 FOR VALUES FROM (1000, 2000, 3000) TO
                               (5000, 6000, 7000),
            r3 FOR VALUES FROM (5000, 6000, 7000) TO
                               (10000, 11000, 12000),
            r4 FOR VALUES FROM (10000, 11000, 12000) TO
                               (15000, 16000, 17000),
            r5 FOR VALUES FROM (15000, 16000, 17000) TO
                               (MAXVALUE, MAXVALUE, MAXVALUE)
The following two queries show partition pruning when using all the columns in the partition key.
postgres=# EXPLAIN SELECT * FROM tbl_range WHERE col1 = 5000 
           AND col2 = 12000 AND col3 = 14000;

                           QUERY PLAN                            
-----------------------------------------------------------------
 Seq Scan on r3 tbl_range  (cost=0.00..230.00 rows=1 width=16)
   Filter: ((col1 = 5000) AND (col2 = 12000) AND (col3 = 14000))

(2 rows)


postgres=# EXPLAIN SELECT * FROM tbl_range WHERE col1 < 5000
                               AND col2 = 12000 AND col3 = 14000;

                              QUERY PLAN                               
-----------------------------------------------------------------------
 Append  (cost=0.00..229.99 rows=2 width=16)
   -> Seq Scan on r1 tbl_range_1  (cost=0.00..45.98 rows=1 width=16)
       Filter: ((col1 < 5000) AND (col2 = 12000) AND (col3 = 14000))
   -> Seq Scan on r2 tbl_range_2  (cost=0.00..184.00 rows=1 width=16)
       Filter: ((col1 < 5000) AND (col2 = 12000) AND (col3 = 14000))
(5 rows)
Similarly, for a hash partitioned table with multiple columns in partition key, partition pruning is possible when all columns of partition key are used in a query.

Consider the following multi-column hash partitioned table.
                           Partitioned table "public.tbl_hash"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              | 
 col1   | integer |           |          |         | plain   |              | 
 col2   | integer |           |          |         | plain   |              | 
 col3   | integer |           |          |         | plain   |              | 
Partition key: HASH (col1, col2, col3)
Partitions: h1 FOR VALUES WITH (modulus 5, remainder 0),
            h2 FOR VALUES WITH (modulus 5, remainder 1),
            h3 FOR VALUES WITH (modulus 5, remainder 2),
            h4 FOR VALUES WITH (modulus 5, remainder 3),
            h5 FOR VALUES WITH (modulus 5, remainder 4)
Query:
postgres=# EXPLAIN SELECT * FROM tbl_hash WHERE col1 = 5000 AND col2 = 12000 AND col3 = 14000;

                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Gather  (cost=1000.00..7285.05 rows=1 width=16)
   Workers Planned: 1
   -> Parallel Seq Scan on h4 tbl_hash  (cost=0.00..6284.95 rows=1 width=16)
        Filter: ((col1 = 5000) AND (col2 = 12000) AND (col3 = 14000))
(4 rows)
Unlike the range partitioned case, only equality operators support partition pruning as the < or  > operators will scan all the partitions due to the manner of tuple distribution in a hash-partitioned table.

Queries using a set of partition key columns

Since the multi-column hash partition uses a combined hash value, partition pruning is not applicable when the queries use a subset of the partition key columns.

For the range multi-column partition, however, if the query used the first few columns of the partition key, then partition pruning is still feasible. The tbl_range table described above is used here as well.

The query below only uses the first two out of the three partition key columns.
postgres=# EXPLAIN SELECT * FROM tbl_range WHERE col1 = 5000 AND col2 = 12000;

                          QUERY PLAN                           
---------------------------------------------------------------
 Seq Scan on r3 tbl_range  (cost=0.00..205.00 rows=1 width=16)
   Filter: ((col1 = 5000) AND (col2 = 12000))
(2 rows)
The query below uses only the first partition key column.
postgres=# EXPLAIN SELECT * FROM tbl_range WHERE col1 < 2000;

                                QUERY PLAN                                
--------------------------------------------------------------------------
 Append  (cost=0.00..199.97 rows=3997 width=16)
   ->  Seq Scan on r1 tbl_range_1  (cost=0.00..35.99 rows=1999 width=16)
         Filter: (col1 < 2000)
   ->  Seq Scan on r2 tbl_range_2  (cost=0.00..144.00 rows=1998 width=16)
         Filter: (col1 < 2000)
(5 rows)

Conclusion

To determine the candidate for the multi-column partition key, we should check for columns that are frequently used together in queries. For range partitioning, the sequence of columns can be from the most frequently grouped columns to the least frequently used one to enjoy the benefits of partition pruning in most cases. The sequence of columns does not matter in hash partitioning as it does not support pruning for a subset of partition key columns.



Using PQtrace

To enable PQtrace, we need to add the following code into the client-side source in the function where it establishes the connection with the server.

FILE *trace_file;
.
.
.
<PQconnectdb>
trace_file = fopen("/tmp/trace.out","w");
PQtrace(conn, trace_file);
.
.
.
fclose(trace_file);
<return>

First, declare the file variable and just after the connection is established on the client-side (by PQconnectdb), open the file with write permissions and start the trace. Do not forget to close the file before your return from the function where you have added this code.

From the file specified, we can get all the messages exchanged between the client and the server.

If you need to further debug the source of the messages being passed then run the client command from gdb with a breakpoint at PQconnectdb where it connects to the server. When the process breaks, attach another gdb process to the server process created and set the libpq function breakpoints.

In the client-side put a breakpoint on the following:
b pqPutc b pqPuts b pqPutnchar b pqPutInt b pqPutMsgStart b pqPutMsgEnd  

In the server-side put a breakpoint on the following:
b socket_putmessage

Now continue and you can easily monitor step by step how the messages are passed from both sides as it hits the breakpoints above.

Investigating bulk load operation in partitioned tables

This blog is published on the EDB website.

pgbench partitions pgbench_accounts which is the largest table and uses the bulkload command COPY populate it. The time taken to run COPY on pgbench_accounts table is logged separately. This blog will explore how this operation is affected by table partitioning. 

How to benchmark partition table performance

This blog is published on EDB website.

This blog briefs about the new pgbench options to partition the default pgbench table pgbench_accounts and discusses the outcome of OLTP point queries and ranged queries for the two partition types range and hash for various data sizes and partition counts.



PostgreSQL : Test Coverage

Install lcov

Install Dependencies:
yum install perl-devel
yum install perl-Digest-MD5
yum install perl-GD

Download and install lcov
rpm -U lcov-1.13-1.el7.noarch.rpm


Run Test

Configure and make
Use the --enable-coverage configure flag
./configure --enable-coverage
make -j 4

Run make check
cd src/
make check -i

A file with .gcno extension is created for each source file and another with .gcda extension is generated when we run the tests.


Check Coverage

HTML output

make coverage-html

A folder named 'coverage' is generated along with the index.html file and other required data to display the coverage information. The HTML page will show a summary of the coverage for each folder and recursively for each file and then for each line.


Text output

make coverage

A .gcov and .gcov.out file is created for each source file which contains the coverage information.


Reset

make coverage-clean

This resets the execution count by removing all the .gcda files generated.


Output files

<file>.gcov.out

This list out the details for each function in the corresponding source file. An example output for a function is shown below:
Function 'heap_sync' Lines executed:100.00% of 10 Branches executed:100.00% of 4 Taken at least once:75.00% of 4 Calls executed:100.00% of 6

<file>.gcov

This displays the original file entirely along with the line number and the count of the number of times each line was executed during the test run. Lines which were never executed are marked with hashes ‘######’ and '-' indicated that the line is not executable.
-: 9258: /* main heap */ 50: 9259: FlushRelationBuffers(rel); call 0 returned 100%

.
. <more lines>
.

#####: 9283:    Page        page = (Page) pagedata;

        -: 9284:    OffsetNumber off;

        -: 9285:

    #####: 9286:    mask_page_lsn_and_checksum(page);
call    0 never executed

index.html

The home page:
This lists out all the sub directory along with their coverage data.


Per directory info:
On clicking a particular directory, we get the coverage info of each file in the selected directory.


















Select a file:
This gives out the per line hit count of the selected file. The one highlighted in blue are hit and those in red are never executed during the test run.

Postgres Crash: Segmentation Fault

Sometimes we see that the postgres server crashes while running some command and in this blog we shall see how to check if it caused by segmentation fault.

Problem:

The server crashed while I was running a command.

 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost.
 Attempting reset: Failed.!>

The postgres logfile showed:
LOG: server process (PID 2779) was terminated by signal 11: Segmentation fault

Debug:

Attach gdb to the core dump generated and it will show the location which threw the segmentation fault error.  core.2779 is the name of my core dump file.

$ gdb postgres core.2779
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: emerson postgres [local] CREATE INDEX '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000059487a in function (arguments) at file_name.c:527

527 bool hasnulls = TupleHasNulls(tuple);

From here we can determine what has caused the error.

Postgres Crash: OOM error debug

Sometimes we see that the postgres server crashes while running some command and in this blog we shall see how to check if it caused by OOM (Out of Memory) error.

Problem:

The server crashed while I was running a command.

server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
psql: FATAL:  the database system is in recovery mode

The postgres logfile showed:
2019-02-19 17:34:12.074 IST [24391] LOG: server process (PID 24403) was terminated by signal 9: Killed

dmesg revealed that the process was killed because of OOM error:
$ dmesg . . Out of memory: Kill process 24403 (postgres) score 832 or sacrifice child [20631.325314] Killed process 24403 (postgres) total-vm:5252708kB, anon-rss:1605692kB, file-rss:0kB, shmem-rss:940kB


Debug:

Open a new psql session and get the backend process id.
postgres=# SELECT pg_backend_pid(); pg_backend_pid ---------------- 5379 (1 row)
Attach gdb to the process. Set breakpoint at AllocSetAlloc and ignore some 100000 runs on that breakpoint
gdb -p 5379 (gdb) b AllocSetAllocBreakpoint 1 at 0xab6f49: file aset.c, line 716. (gdb) ignore 1 99999 Will ignore next 99999 crossings of breakpoint 1.

Run the command that caused the crash and when it breaks in gdb, call MemoryContextStats.
(gdb) call MemoryContextStats(TopMemoryContext)
The output of MemoryContextStats is seen in the server logfile. A snippet is shown below:
TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used PortalContext: 5102824 total in 626 blocks; 64424 free (626 chunks); 5038400 used: ExecutorState: 8192 total in 1 blocks; 7152 free (0 chunks); 1040 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used TupleSort main: 32832 total in 2 blocks; 6800 free (1 chunks); 26032 used Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used TupleSort main: 1581120 total in 2 blocks; 6800 free (8 chunks); 1574320 used Caller tuples: 2097152 total in 9 blocks; 783776 free (2 chunks); 1313376 used
As seen, the PortalContext seems to be filling up. First check whether the current context where gdb had stopped is PortalContext.
(gdb) p *context $2 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, methods = 0xd11840 , parent = 0x1fa58c0, firstchild = 0x203d120, prevchild = 0x0, nextchild = 0x0, name = 0xd14150 "PortalContext", ident = 0x1fa9400 "", reset_cbs = 0x0}
Since I am already at the intended context, I can simply use the gdb command backtrace, to check from where memory is been allocated and then take necessary actions like using pfree on variables or switching to a temporary context or reseting the current context, etc.

If there current MemoryContext is different, then we can set a conditional breakpoint for the intended context and when gdb halts get the backtrace.
(gdb) break aset.c:717 if $_streq(context->name, "PortalContext") Breakpoint 2 at 0xab6f47: file aset.c, line 717.

(The AllocSetAlloc starts at line number 716 in aset.c and so this breakpoint specifies the line just after it).

Partition Pruning During Execution

Partitioning in Postgresql eases handling large volumes of data. This feature has greatly improved with the introduction of declarative partitioning in PostgreSQL 10, paving way for better query optimization and execution techniques on partitioned tables. PostgreSQL 11 extended query optimization by enabling partition elimination strategies during query execution. It also added a parameter enable_partition_pruning to control the executor’s ability to prune partitions which is on by default.

When does the runtime pruning occur?

The first attempt at pruning occurs at the planning stage for the quals using partition key with constants and then for the volatile params runtime pruning can be done at two stages of execution - at executor startup or initialization and during actual execution.

1. Executor Initialization

In some cases, as in the execution of the prepared query, we can know of the parameters called the external params during the initialization and hence avoid initializing the unwanted sub-plans. In this case, EXPLAIN outputs will not list the eliminated sub-plans but only give a number of the sub-plans removed.

=# prepare tprt_q1 (int, int, int) as select * from tprt where a between $1 and $2 and b <= $3; =# explain execute tprt_q1 (25000, 30000, 20000); Append (cost=0.00..2007.54 rows=153 width=8) Subplans Removed: 7 -> Seq Scan on tprt_a3_b1 (cost=0.00..222.98 rows=17 width=8) Filter: ((a >= $1) AND (a <= $2) AND (b <= $3)) -> Seq Scan on tprt_a3_b2 (cost=0.00..222.98 rows=17 width=8) Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))

The EXPLAIN output states that 7 sub-plans have been removed; which implies that the corresponding partitions were not required and hence not even initialized.

2. Actual Execution

As in the case of subqueries and parameterized nested loop joins; the parameters called exec params are only available at the time of actual execution. In this case, all the partitions are initialized and then the executor will determine which partitions need to be scanned depending on the parameters. In case any of the partitions are not required throughout the runtime then it is marked with “never executed” in the EXPLAIN ANALYZE output.

The following is an example of a parameterized nested loop join between two tables with 5000 rows. The outer table has values from 2001 to 7000  and the partitioned table has values from 1 to 5000. The partitioned table has 5 partitions each with the capacity of 1000 values.

EXPLAIN ANALYZE output with enable_partition_pruning = off
Nested Loop (actual rows=3000 loops=1) -> Seq Scan on t1 (actual rows=5000 loops=1) -> Append (actual rows=1 loops=5000) -> Index Scan using tp_a1_idx on tp_a1 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) -> Index Scan using tp_a2_idx on tp_a2 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) -> Index Scan using tp_a3_idx on tp_a3 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) -> Index Scan using tp_a4_idx on tp_a4 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) -> Index Scan using tp_a5_idx on tp_a5 (actual rows=0 loops=5000) Index Cond: (a = t1.col1) Planning Time: 0.319 ms Execution Time: 114.823 ms

EXPLAIN ANALYZE output with enable_partition_pruning=on
Nested Loop (actual rows=3000 loops=1) -> Seq Scan on t1 (actual rows=5000 loops=1) -> Append (actual rows=1 loops=5000) -> Index Scan using tp_a1_idx on tp_a1 (never executed) Index Cond: (a = t1.col1) -> Index Scan using tp_a2_idx on tp_a2 (never executed) Index Cond: (a = t1.col1) -> Index Scan using tp_a3_idx on tp_a3 (actual rows=1 loops=1000) Index Cond: (a = t1.col1) -> Index Scan using tp_a4_idx on tp_a4 (actual rows=1 loops=1000) Index Cond: (a = t1.col1) -> Index Scan using tp_a5_idx on tp_a5 (actual rows=1 loops=1000) Index Cond: (a = t1.col1) Planning Time: 0.384 ms Execution Time: 36.572 ms

Reasons for Performance Improvement

There is a definite improvement in the performance of queries involving partitioned tables but the extent of it is determined by partition key parameters which controls how many scans of the partitions can be skipped.

Considering the nested loop join case above, with pruning disabled, all the partitions are scanned for each of the 5000 values from the outer table t1 (loops=5000). With pruning enabled, only the appropriate partitions are scanned for each value from the outer table (loops=1000). In two partitions there are no scans performed at all (never executed) since the outer table does not have the values that match entries with these partitions (1-2000). Since the number of scans on each partition is reduced substantially, we can see an improvement of 67% in execution time from 115 ms to 37 ms.
When the amount of data in the outer table is doubled to 10000 rows (values 2001 - 12000), the behavior is similar except in the non-pruned case where the number of scans made in each partition is 10000 instead of 5000 but the difference in performance is better at 83% from 239 ms to 40 ms.

Under the Hood

The partitions are internally sorted and stored by the increasing order of the values that they can hold. Initially, in V10 the undesirable partitions were eliminated by a tedious linear search in the planner but with V11, this has been updated to quicker a binary search of the list.

To perform a scan on a partitioned table, an Append node is used with the scan on each of the leaf partitions being a sub-plan under it. Each of these sub-plans is indexed and the executor internally accesses them by this index.  

To help the executor select the correct Append sub-plans, a map of partition with the corresponding sub-plan index is used. The planner first creates this map handling all the partitions pruned by it. When the executor detects that pruning is possible, it fetches the list of partitions that can satisfy the given param and figures out the corresponding sub-plan indexes from the map. If there is no sub-plan index, it indicates that the partition has been already pruned in the previous stage (planner or executor initialization).

If the pruning has taken place during the executor startup, then the map is updated because the rejected partitions are not initialized, changing the sub-plan indexes of the retained ones. This is necessary so that the map is valid for pruning to be done by the executor later.
Supporting runtime partition pruning is just one of the few performance improvements and there is more to be expected in the upcoming versions.


--

This blog is also posted on postgres rocks.

PostgreSQL Datatypes

SQL datatypes and internal datatypes

SQL type Internal type
smallint int8
int int32
bigint int64

PostgreSQL : Get total pages of an object

SELECT pg_relation_size(<object_id>, 'main')/8192);
The pg_relation_size gives the disk space occupied by the object. Since each page is of 8kB, the pg_relation_size size is divided by 8192 to give the page count.

PostgreSQL : Data Object sizes

 Size of User Tables

=# SELECT
   relname as "Table",
   pg_size_pretty(pg_total_relation_size(relid)) as "Total Size",
   pg_size_pretty(pg_relation_size(relid)) as "Relation Size",
   pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
   FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC
;
      Table       | Total Size | Relation Size | External Size
------------------+------------+---------------+---------------
 load_test        | 1883 MB    | 1776 MB       | 107 MB
 pgbench_accounts | 1121 MB    | 961 MB        | 161 MB
 pgbench_history  | 8656 kB    | 8656 kB       | 0 bytes
 pgbench_tellers  | 80 kB      | 40 kB         | 40 kB
 pgbench_branches | 24 kB      | 8192 bytes    | 16 kB
(5 rows)

 Size of User Indexes

=# SELECT
   indexrelname as "Index",
   pg_size_pretty(pg_total_relation_size(indexrelid)) as "Size",
   pg_size_pretty(pg_total_relation_size(indexrelid) - pg_relation_size(indexrelid)) as "External Size"
   FROM pg_catalog.pg_statio_user_indexes ORDER BY pg_total_relation_size(indexrelid) DESC;
         Index         |  Size  | External Size
-----------------------+--------+---------------
 pgbench_accounts_pkey | 161 MB | 0 bytes
 load_testidx          | 107 MB | 0 bytes
 pgbench_tellers_pkey  | 40 kB  | 0 bytes
 pgbench_branches_pkey | 16 kB  | 0 bytes
(4 rows)

 Functions

pg_relation_size(oid) - returns disk space occupied by the table or index of the given oid.

pg_total_relation_size(oid) - returns total disk space of the table of given oid, including indexes and toasted data

Adding libpq in Contrib module

To include libpq functions in the contrib module the following header should be included:
#include "libpq-fe.h"
However, it throws the following error:
error: libpq-fe.h: No such file or directory
This can be solved by including the following in Makefile of that contib
PG_CPPFLAGS = -I$(libpq_srcdir)
This includes the option -I../../src/interfaces/libpq during make and hence the header file becomes accessible. 

Postgresql - Set timing on by default

For the psostgres user create file in the home folder:
vim ~/.psqlrc
and add the following to the file
\timing
When you connect to database and run  a query, the time is also displayed:

postgres=# SELECT current_timestamp;
               now              
----------------------------------
 2014-03-11 10:34:40.378455+05:30
(1 row)
Time: 43.821 ms

PostgreSQL : Install using RPM

1. Download rpm

Site: http://yum.postgresql.org/repopackages.php

Download the required rpm from the site. 
Ex. PostgreSQL 9.1 for Cent OS 6 -x86_64 (http://yum.postgresql.org/9.1/redhat/rhel-6-x86_64/pgdg-centos91-9.1-4.noarch.rpm)

2. Install rpm

$ cd <rpm_download_dir>
$ su
Password: 
# rpm -ivh pgdg-centos91-9.1-4.noarch.rpm
# yum install postgresql91-server postgresql91
If yum install fails with 'PG key retrieval failed: [Errno 14]' then perform the steps mentioned at http://www.rackspace.com/knowledge_center/article/installing-rhel-epel-repo-on-centos-5x-or-6x  before trying again.
If it is unable to find /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle then run the command: wget https://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle --no-check-certificate

3. Run PostgreSQL

# /etc/init.d/postgresql-9.1 initdb
# service postgresql-9.1 start
# su postgres
bash-4.1$ psql
psql (9.1.11)
Type "help" for help.

4. Load Extension

Download the required extension rpm from their site.

$ cd <rpm_download_dir>
$ su
Password: 
# rpm -ivh <rpm_name>.rpm
Login to Postgres and load the extension via CREATE EXTENSION command.

5. Uninstall

Remove the installation and the database cluster.
# rpm -e postgresql91-server postgresql91
# rm -rf /var/lib/pgsql/

PostgreSQL - Connect using SSL

Used software: PostgreSQL 9.2.5

1. Install Postgres

Use the --with-openssl option with ./configure while installing postgres
$ CFLAGS="-g -O0" ./configure --enable-debug --enable-cassert --enable-depend --prefix=<install_path> --with-openssl
$ make
$ make install

2. Create a Data Directory

$ cd <postgres_install_path>/bin
$ ./initdb -d <cluster_path>

3. Modify the postgresql.conf file

ssl  = on

4. Create self signed certificate

[Ref: http://www.postgresql.org/docs/current/static/ssl-tcp.html#SSL-CERTIFICATE-CREATION]
$ cd <cluster_path>
$ openssl req -new -text -out server.req
Enter a pass phrase (at least 4 char long).
Fill in other details if required.
Challenge password can be left blank.
$ openssl rsa -in privkey.pem -out server.key
Enter the same pass phrase entered before.
$ openssl req -x509 -in server.req -text -key server.key -out server.crt
$ chmod og-rwx server.key

4. Start postgres

$ cd <postgres_install_path>/bin
$ ./pg_ctl -D <cluster_path> start

5. Connect using ssl

$ ./psql "sslmode=require host=localhost dbname=postgres"
psql (9.2.5)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.
postgres=#



Inaugural Indian PostgreSQL User Group Meetup!


This was a memento given for the first meetup of Postgres Users in and around Pune at BMC Software, Pune.


It was a success. Around 30 people , newbies to developers from different companies had come. There were presentations on Streaming Replication, HA and Postgres XC. The content was good and had learnt quite a lot!...


This was a great experience, it encouraged and motivated me to continue contributing to the community! Hope there are more such meetups and I get to present my contributions too. :-)