My Postgres Trail: 2019

Using PQtrace

To enable PQtrace, we need to add the following code into the client-side source in the function where it establishes the connection with the server.

FILE *trace_file;

trace_file = fopen("/tmp/trace.out","w");

PQtrace(conn, trace_file);

fclose(trace_file);

First, declare the file variable and just after the connection is established on the client-side (by PQconnectdb), open the file with write permissions and start the trace. Do not forget to close the file before your return from the function where you have added this code.

From the file specified, we can get all the messages exchanged between the client and the server.

If you need to further debug the source of the messages being passed then run the client command from gdb with a breakpoint at PQconnectdb where it connects to the server. When the process breaks, attach another gdb process to the server process created and set the libpq function breakpoints.

In the client-side put a breakpoint on the following:

b pqPutc b pqPuts b pqPutnchar b pqPutInt b pqPutMsgStart b pqPutMsgEnd

In the server-side put a breakpoint on the following:

b socket_putmessage

Now continue and you can easily monitor step by step how the messages are passed from both sides as it hits the breakpoints above.

Investigating bulk load operation in partitioned tables

This blog is published on the EDB website.

pgbench partitions pgbench_accounts which is the largest table and uses the bulkload command COPY populate it. The time taken to run COPY on pgbench_accounts table is logged separately. This blog will explore how this operation is affected by table partitioning.

How to benchmark partition table performance

This blog is published on EDB website.

This blog briefs about the new pgbench options to partition the default pgbench table pgbench_accounts and discusses the outcome of OLTP point queries and ranged queries for the two partition types range and hash for various data sizes and partition counts.

PostgreSQL : Test Coverage

Install lcov

Install Dependencies:

yum install perl-devel

yum install perl-Digest-MD5

yum install perl-GD

Download and install lcov

rpm -U lcov-1.13-1.el7.noarch.rpm

Run Test

Configure and make

Use the --enable-coverage configure flag

./configure --enable-coverage

make -j 4

Run make check

cd src/

make check -i

A file with .gcno extension is created for each source file and another with .gcda extension is generated when we run the tests.

Check Coverage

HTML output

make coverage-html

A folder named 'coverage' is generated along with the index.html file and other required data to display the coverage information. The HTML page will show a summary of the coverage for each folder and recursively for each file and then for each line.

Text output

make coverage

A .gcov and .gcov.out file is created for each source file which contains the coverage information.

Reset

make coverage-clean

This resets the execution count by removing all the .gcda files generated.

Output files

<file>.gcov.out

This list out the details for each function in the corresponding source file. An example output for a function is shown below:

Function 'heap_sync' Lines executed:100.00% of 10 Branches executed:100.00% of 4 Taken at least once:75.00% of 4 Calls executed:100.00% of 6

<file>.gcov

This displays the original file entirely along with the line number and the count of the number of times each line was executed during the test run. Lines which were never executed are marked with hashes ‘######’ and '-' indicated that the line is not executable.

-: 9258: /* main heap */ 50: 9259: FlushRelationBuffers(rel); call 0 returned 100%

. <more lines>

#####: 9283: Page page = (Page) pagedata;

-: 9284: OffsetNumber off;

-: 9285:

#####: 9286: mask_page_lsn_and_checksum(page);

call 0 never executed

index.html

The home page:

This lists out all the sub directory along with their coverage data.

Per directory info:

On clicking a particular directory, we get the coverage info of each file in the selected directory.

Select a file:

This gives out the per line hit count of the selected file. The one highlighted in blue are hit and those in red are never executed during the test run.

Postgres Crash: Segmentation Fault

Sometimes we see that the postgres server crashes while running some command and in this blog we shall see how to check if it caused by segmentation fault.

Problem:

The server crashed while I was running a command.


 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost.
 Attempting reset: Failed.!>

The postgres logfile showed:

LOG: server process (PID 2779) was terminated by signal 11: Segmentation fault

Debug:

Attach gdb to the core dump generated and it will show the location which threw the segmentation fault error. core.2779 is the name of my core dump file.


$ gdb postgres core.2779
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: emerson postgres [local] CREATE INDEX '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000059487a in function (arguments) at file_name.c:527

527 bool hasnulls = TupleHasNulls(tuple);

From here we can determine what has caused the error.

Postgres Crash: OOM error debug

Sometimes we see that the postgres server crashes while running some command and in this blog we shall see how to check if it caused by OOM (Out of Memory) error.

Problem:

The server crashed while I was running a command.


server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
psql: FATAL:  the database system is in recovery mode

The postgres logfile showed:

2019-02-19 17:34:12.074 IST [24391] LOG:  server process (PID 24403) was terminated by signal 9: Killed

dmesg revealed that the process was killed because of OOM error:

$ dmesg
   .
   .
Out of memory: Kill process 24403 (postgres) score 832 or sacrifice child
[20631.325314] Killed process 24403 (postgres) total-vm:5252708kB, anon-rss:1605692kB, file-rss:0kB, shmem-rss:940kB
  

Debug:

Open a new psql session and get the backend process id.

postgres=# SELECT pg_backend_pid();
 pg_backend_pid
----------------
           5379
(1 row)

Attach gdb to the process. Set breakpoint at AllocSetAlloc and ignore some 100000 runs on that breakpoint

gdb -p 5379
(gdb) b AllocSetAllocBreakpoint 1 at 0xab6f49: file aset.c, line 716.
(gdb) ignore 1 99999
Will ignore next 99999 crossings of breakpoint 1.

Run the command that caused the crash and when it breaks in gdb, call MemoryContextStats.

(gdb) call MemoryContextStats(TopMemoryContext)

The output of MemoryContextStats is seen in the server logfile. A snippet is shown below:

TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
    PortalContext: 5102824 total in 626 blocks; 64424 free (626 chunks); 5038400 used:
      ExecutorState: 8192 total in 1 blocks; 7152 free (0 chunks); 1040 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      TupleSort main: 32832 total in 2 blocks; 6800 free (1 chunks); 26032 used
        Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      TupleSort main: 1581120 total in 2 blocks; 6800 free (8 chunks); 1574320 used
        Caller tuples: 2097152 total in 9 blocks; 783776 free (2 chunks); 1313376 used

As seen, the PortalContext seems to be filling up. First check whether the current context where gdb had stopped is PortalContext.

(gdb) p *context
$2 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, methods = 0xd11840 ,
  parent = 0x1fa58c0, firstchild = 0x203d120, prevchild = 0x0, nextchild = 0x0, name = 0xd14150 "PortalContext",
  ident = 0x1fa9400 "", reset_cbs = 0x0}

Since I am already at the intended context, I can simply use the gdb command backtrace, to check from where memory is been allocated and then take necessary actions like using pfree on variables or switching to a temporary context or reseting the current context, etc.

If there current MemoryContext is different, then we can set a conditional breakpoint for the intended context and when gdb halts get the backtrace.

(gdb) break aset.c:717 if $_streq(context->name, "PortalContext")
Breakpoint 2 at 0xab6f47: file aset.c, line 717.

(The AllocSetAlloc starts at line number 716 in aset.c and so this breakpoint specifies the line just after it).