My Postgres Trail: gdb

Showing posts with label gdb. Show all posts

Using PQtrace

To enable PQtrace, we need to add the following code into the client-side source in the function where it establishes the connection with the server.

FILE *trace_file;

trace_file = fopen("/tmp/trace.out","w");

PQtrace(conn, trace_file);

fclose(trace_file);

First, declare the file variable and just after the connection is established on the client-side (by PQconnectdb), open the file with write permissions and start the trace. Do not forget to close the file before your return from the function where you have added this code.

From the file specified, we can get all the messages exchanged between the client and the server.

If you need to further debug the source of the messages being passed then run the client command from gdb with a breakpoint at PQconnectdb where it connects to the server. When the process breaks, attach another gdb process to the server process created and set the libpq function breakpoints.

In the client-side put a breakpoint on the following:

b pqPutc b pqPuts b pqPutnchar b pqPutInt b pqPutMsgStart b pqPutMsgEnd

In the server-side put a breakpoint on the following:

b socket_putmessage

Now continue and you can easily monitor step by step how the messages are passed from both sides as it hits the breakpoints above.

Postgres Crash: Segmentation Fault

Sometimes we see that the postgres server crashes while running some command and in this blog we shall see how to check if it caused by segmentation fault.

Problem:

The server crashed while I was running a command.


 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost.
 Attempting reset: Failed.!>

The postgres logfile showed:

LOG: server process (PID 2779) was terminated by signal 11: Segmentation fault

Debug:

Attach gdb to the core dump generated and it will show the location which threw the segmentation fault error. core.2779 is the name of my core dump file.


$ gdb postgres core.2779
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: emerson postgres [local] CREATE INDEX '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000059487a in function (arguments) at file_name.c:527

527 bool hasnulls = TupleHasNulls(tuple);

From here we can determine what has caused the error.

Postgres Crash: OOM error debug

Sometimes we see that the postgres server crashes while running some command and in this blog we shall see how to check if it caused by OOM (Out of Memory) error.

Problem:

The server crashed while I was running a command.


server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
psql: FATAL:  the database system is in recovery mode

The postgres logfile showed:

2019-02-19 17:34:12.074 IST [24391] LOG:  server process (PID 24403) was terminated by signal 9: Killed

dmesg revealed that the process was killed because of OOM error:

$ dmesg
   .
   .
Out of memory: Kill process 24403 (postgres) score 832 or sacrifice child
[20631.325314] Killed process 24403 (postgres) total-vm:5252708kB, anon-rss:1605692kB, file-rss:0kB, shmem-rss:940kB
  

Debug:

Open a new psql session and get the backend process id.

postgres=# SELECT pg_backend_pid();
 pg_backend_pid
----------------
           5379
(1 row)

Attach gdb to the process. Set breakpoint at AllocSetAlloc and ignore some 100000 runs on that breakpoint

gdb -p 5379
(gdb) b AllocSetAllocBreakpoint 1 at 0xab6f49: file aset.c, line 716.
(gdb) ignore 1 99999
Will ignore next 99999 crossings of breakpoint 1.

Run the command that caused the crash and when it breaks in gdb, call MemoryContextStats.

(gdb) call MemoryContextStats(TopMemoryContext)

The output of MemoryContextStats is seen in the server logfile. A snippet is shown below:

TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
    PortalContext: 5102824 total in 626 blocks; 64424 free (626 chunks); 5038400 used:
      ExecutorState: 8192 total in 1 blocks; 7152 free (0 chunks); 1040 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      TupleSort main: 32832 total in 2 blocks; 6800 free (1 chunks); 26032 used
        Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
      TupleSort main: 1581120 total in 2 blocks; 6800 free (8 chunks); 1574320 used
        Caller tuples: 2097152 total in 9 blocks; 783776 free (2 chunks); 1313376 used

As seen, the PortalContext seems to be filling up. First check whether the current context where gdb had stopped is PortalContext.

(gdb) p *context
$2 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, methods = 0xd11840 ,
  parent = 0x1fa58c0, firstchild = 0x203d120, prevchild = 0x0, nextchild = 0x0, name = 0xd14150 "PortalContext",
  ident = 0x1fa9400 "", reset_cbs = 0x0}

Since I am already at the intended context, I can simply use the gdb command backtrace, to check from where memory is been allocated and then take necessary actions like using pfree on variables or switching to a temporary context or reseting the current context, etc.

If there current MemoryContext is different, then we can set a conditional breakpoint for the intended context and when gdb halts get the backtrace.

(gdb) break aset.c:717 if $_streq(context->name, "PortalContext")
Breakpoint 2 at 0xab6f47: file aset.c, line 717.

(The AllocSetAlloc starts at line number 716 in aset.c and so this breakpoint specifies the line just after it).

gdb commands

Future breakpoints

Set break point on future shared library:

set breakpoint pending on

Debug fork

To hold the forked process by gdb and not allow it to run independently:

set detach-on-fork off