Hide Forgot
Description of problem: Several times a day our postgresql 8.4.5 process segfaults with the following error log: 2011-03-07 02:36:37 PSTLOG: connection authorized: user=user database=db 2011-03-07 02:36:37 PSTLOG: disconnection: session time: 0:00:00.189 user=user database=db host=host.example.com 2011-03-07 02:37:03 PSTLOG: server process (PID 20636) was terminated by signal 11: Segmentation fault 2011-03-07 02:37:03 PSTLOG: terminating any other active server processes 2011-03-07 02:37:03 PSTWARNING: terminating connection because of crash of another server process 2011-03-07 02:37:03 PSTDETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2011-03-07 02:37:03 PSTHINT: In a moment you should be able to reconnect to the database and repeat your command. 2011-03-07 02:37:03 PSTWARNING: terminating connection because of crash of another server process 2011-03-07 12:34:23 PSTLOG: connection received: host=host.example.com port=56659 2011-03-07 12:34:23 PSTFATAL: no pg_hba.conf entry for host "123.123.123.123", user "user2", database "db", SSL off 2011-03-07 12:34:23 PSTLOG: connection received: host=host.example.com port=56660 2011-03-07 12:34:23 PSTLOG: connection authorized: user=user2 database=db 2011-03-07 12:34:24 PSTLOG: server process (PID 25430) was terminated by signal 11: Segmentation fault 2011-03-07 12:34:24 PSTLOG: terminating any other active server processes 2011-03-07 12:34:24 PSTWARNING: terminating connection because of crash of another server process 2011-03-07 12:34:24 PSTDETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. Version-Release number of selected component (if applicable): Name : postgresql84-server Relocations: (not relocatable) Version : 8.4.5 Vendor: Red Hat, Inc. Release : 1.el5_5.1 Build Date: Mon 04 Oct 2010 09:59:48 AM PDT How reproducible: The segmentation fault occurs several times a day. However, we have not yet been able to develop a test case to reproduce it. Any advice to narrow down the cause would be appreciated. Additional info:
A stack trace from the core dump (with postgresql84-debuginfo installed) would help. If it's not producing core dumps, try adding "ulimit -c unlimited" to /var/lib/pgsql/.bash_profile and restarting the database.
We will upgrade to 8.4.7 since there are fixes in 8.4.6 and 8.4.7 to address postmaster crashes.
After upgrading to 8.4.7 we still see the fault. We downgraded to 8.4.4 on another machine running the same OS and see it there too. The positive news is that we have SQL that will produce the seg fault on demand. I added "ulimit -c unlimited" to the /var/lib/pgsql/.bash_profile and restarted the database but am not producing core files (at least any that I can find). Please advise on how to proceed.
Please provide the reproducer script, then.
Some investigation on SDSC's test database eventually identified the problem. It's explained here: http://archives.postgresql.org/pgsql-hackers/2011-04/msg00689.php and a fix has been committed upstream here: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=1de8584fb1b71c98138b1f23808a4f01ab7566cd We'll absorb this fix automatically whenever we rebase to 8.4.8 or later, but don't know how soon that will be.
I forgot to include this bug in the erratum paperwork, but it should be fixed in 8.4.9, which was just pushed as a security erratum.