Bug 683212 - Postgresql84 segfault during GEQO planning
Summary: Postgresql84 segfault during GEQO planning
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: postgresql84
Version: 5.5
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Tom Lane
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-08 20:37 UTC by Doug Weimer
Modified: 2013-07-03 03:35 UTC (History)
2 users (show)

Fixed In Version: postgresql84-8.4.9-1.el5_7.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-18 16:53:50 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Doug Weimer 2011-03-08 20:37:10 UTC
Description of problem:

Several times a day our postgresql 8.4.5 process segfaults with the following error log:

2011-03-07 02:36:37 PSTLOG:  connection authorized: user=user database=db
2011-03-07 02:36:37 PSTLOG:  disconnection: session time: 0:00:00.189 user=user database=db host=host.example.com
2011-03-07 02:37:03 PSTLOG:  server process (PID 20636) was terminated by signal 11: Segmentation fault
2011-03-07 02:37:03 PSTLOG:  terminating any other active server processes
2011-03-07 02:37:03 PSTWARNING:  terminating connection because of crash of another server process
2011-03-07 02:37:03 PSTDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-03-07 02:37:03 PSTHINT:  In a moment you should be able to reconnect to the database and repeat your command.
2011-03-07 02:37:03 PSTWARNING:  terminating connection because of crash of another server process

2011-03-07 12:34:23 PSTLOG:  connection received: host=host.example.com port=56659
2011-03-07 12:34:23 PSTFATAL:  no pg_hba.conf entry for host "123.123.123.123", user "user2", database "db", SSL off
2011-03-07 12:34:23 PSTLOG:  connection received: host=host.example.com port=56660
2011-03-07 12:34:23 PSTLOG:  connection authorized: user=user2 database=db
2011-03-07 12:34:24 PSTLOG:  server process (PID 25430) was terminated by signal 11: Segmentation fault
2011-03-07 12:34:24 PSTLOG:  terminating any other active server processes
2011-03-07 12:34:24 PSTWARNING:  terminating connection because of crash of another server process
2011-03-07 12:34:24 PSTDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

Version-Release number of selected component (if applicable):
Name        : postgresql84-server          Relocations: (not relocatable)
Version     : 8.4.5                             Vendor: Red Hat, Inc.
Release     : 1.el5_5.1                     Build Date: Mon 04 Oct 2010 09:59:48 AM PDT

How reproducible:
The segmentation fault occurs several times a day. However, we have not yet been able to develop a test case to reproduce it. Any advice to narrow down the cause would be appreciated.

Additional info:

Comment 1 Tom Lane 2011-03-08 21:20:24 UTC
A stack trace from the core dump (with postgresql84-debuginfo installed) would help.  If it's not producing core dumps, try adding "ulimit -c unlimited" to /var/lib/pgsql/.bash_profile and restarting the database.

Comment 2 pascal.depuis 2011-03-09 20:12:30 UTC
We will upgrade to 8.4.7 since there are fixes in 8.4.6 and 8.4.7 to address postmaster crashes.

Comment 3 pascal.depuis 2011-04-01 18:49:47 UTC
After upgrading to 8.4.7 we still see the fault.  We downgraded to 8.4.4 on another machine running the same OS and see it there too.  

The positive news is that we have SQL that will produce the seg fault on demand.  

I added "ulimit -c unlimited" to the /var/lib/pgsql/.bash_profile and restarted the database but am not producing core files (at least any that I can find).  

Please advise on how to proceed.

Comment 4 Tom Lane 2011-04-02 05:11:32 UTC
Please provide the reproducer script, then.

Comment 5 Tom Lane 2011-04-13 23:03:10 UTC
Some investigation on SDSC's test database eventually identified the problem.  It's explained here:
http://archives.postgresql.org/pgsql-hackers/2011-04/msg00689.php
and a fix has been committed upstream here:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=1de8584fb1b71c98138b1f23808a4f01ab7566cd

We'll absorb this fix automatically whenever we rebase to 8.4.8 or later, but don't know how soon that will be.

Comment 6 Tom Lane 2011-10-18 16:53:50 UTC
I forgot to include this bug in the erratum paperwork, but it should be fixed in 8.4.9, which was just pushed as a security erratum.


Note You need to log in before you can comment on or make changes to this bug.