Bug 683212

Summary: Postgresql84 segfault during GEQO planning
Product: Red Hat Enterprise Linux 5 Reporter: Doug Weimer <dougw>
Component: postgresql84Assignee: Tom Lane <tgl>
Status: CLOSED CURRENTRELEASE QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.5CC: hhorak, pascal.depuis
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: postgresql84-8.4.9-1.el5_7.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-18 16:53:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Doug Weimer 2011-03-08 20:37:10 UTC
Description of problem:

Several times a day our postgresql 8.4.5 process segfaults with the following error log:

2011-03-07 02:36:37 PSTLOG:  connection authorized: user=user database=db
2011-03-07 02:36:37 PSTLOG:  disconnection: session time: 0:00:00.189 user=user database=db host=host.example.com
2011-03-07 02:37:03 PSTLOG:  server process (PID 20636) was terminated by signal 11: Segmentation fault
2011-03-07 02:37:03 PSTLOG:  terminating any other active server processes
2011-03-07 02:37:03 PSTWARNING:  terminating connection because of crash of another server process
2011-03-07 02:37:03 PSTDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-03-07 02:37:03 PSTHINT:  In a moment you should be able to reconnect to the database and repeat your command.
2011-03-07 02:37:03 PSTWARNING:  terminating connection because of crash of another server process

2011-03-07 12:34:23 PSTLOG:  connection received: host=host.example.com port=56659
2011-03-07 12:34:23 PSTFATAL:  no pg_hba.conf entry for host "123.123.123.123", user "user2", database "db", SSL off
2011-03-07 12:34:23 PSTLOG:  connection received: host=host.example.com port=56660
2011-03-07 12:34:23 PSTLOG:  connection authorized: user=user2 database=db
2011-03-07 12:34:24 PSTLOG:  server process (PID 25430) was terminated by signal 11: Segmentation fault
2011-03-07 12:34:24 PSTLOG:  terminating any other active server processes
2011-03-07 12:34:24 PSTWARNING:  terminating connection because of crash of another server process
2011-03-07 12:34:24 PSTDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

Version-Release number of selected component (if applicable):
Name        : postgresql84-server          Relocations: (not relocatable)
Version     : 8.4.5                             Vendor: Red Hat, Inc.
Release     : 1.el5_5.1                     Build Date: Mon 04 Oct 2010 09:59:48 AM PDT

How reproducible:
The segmentation fault occurs several times a day. However, we have not yet been able to develop a test case to reproduce it. Any advice to narrow down the cause would be appreciated.

Additional info:

Comment 1 Tom Lane 2011-03-08 21:20:24 UTC
A stack trace from the core dump (with postgresql84-debuginfo installed) would help.  If it's not producing core dumps, try adding "ulimit -c unlimited" to /var/lib/pgsql/.bash_profile and restarting the database.

Comment 2 pascal.depuis 2011-03-09 20:12:30 UTC
We will upgrade to 8.4.7 since there are fixes in 8.4.6 and 8.4.7 to address postmaster crashes.

Comment 3 pascal.depuis 2011-04-01 18:49:47 UTC
After upgrading to 8.4.7 we still see the fault.  We downgraded to 8.4.4 on another machine running the same OS and see it there too.  

The positive news is that we have SQL that will produce the seg fault on demand.  

I added "ulimit -c unlimited" to the /var/lib/pgsql/.bash_profile and restarted the database but am not producing core files (at least any that I can find).  

Please advise on how to proceed.

Comment 4 Tom Lane 2011-04-02 05:11:32 UTC
Please provide the reproducer script, then.

Comment 5 Tom Lane 2011-04-13 23:03:10 UTC
Some investigation on SDSC's test database eventually identified the problem.  It's explained here:
http://archives.postgresql.org/pgsql-hackers/2011-04/msg00689.php
and a fix has been committed upstream here:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=1de8584fb1b71c98138b1f23808a4f01ab7566cd

We'll absorb this fix automatically whenever we rebase to 8.4.8 or later, but don't know how soon that will be.

Comment 6 Tom Lane 2011-10-18 16:53:50 UTC
I forgot to include this bug in the erratum paperwork, but it should be fixed in 8.4.9, which was just pushed as a security erratum.