Bug 125508 - database system shutdown was interrupted
database system shutdown was interrupted
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: rh-postgresql (Show other bugs)
3.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Tom Lane
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-06-08 06:32 EDT by Need Real Name
Modified: 2013-07-02 23:01 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:24:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2004-06-08 06:32:05 EDT
Description of problem:
EL AS3 EL AS3 kernel-smp-2.4.21-9.0.1EL

Version-Release number of selected component (if applicable):
postgres version 7.3.6-1 

How reproducible:
not reproducible

Steps to Reproduce:
1.
2.
3.
  
Actual results:
rhdb could not be started after system shutdown. the error:
Jun  1 10:43:55 linux708 postgres[5537]: [30] LOG:  database system 
shutdown was interrupted at 2004-05-28 16:32:08 BST
Jun  1 10:43:55 linux708 postgres[5537]: [31] LOG:  open 
of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 
0) failed: No such file or directory
Jun  1 10:43:55 linux708 postgres[5537]: [32] LOG:  invalid primary 
checkpoint record
Jun  1 10:43:55 linux708 postgres[5537]: [33] LOG:  open 
of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 
0) failed: No such file or directory
Jun  1 10:43:55 linux708 postgres[5537]: [34] LOG:  invalid secondary 
checkpoint record
Jun  1 10:43:55 linux708 postgres[5537]: [35] PANIC:  unable to 
locate a valid checkpoint record
Jun  1 10:43:55 linux708 postgres[5534]: [31] LOG:  startup process 
(pid 5537) was terminated by signal 6
Jun  1 10:43:55 linux708 postgres[5534]: [32] LOG:  aborting startup 
due to startup process failure
Jun  1 10:43:56 linux708 rhdb: Starting PostgreSQL - Red Hat Edition 
service:  failed
Jun  1 10:44:00 linux708 su(pam_unix)[5554]: session opened for user 
postgres by (uid=0)
Jun  1 10:44:00 linux708 su(pam_unix)[5554]: session closed for user 
postgres
Jun  1 10:44:00 linux708 postgres[5595]: [30] LOG:  database system 
shutdown was interrupted at 2004-05-28 16:32:08 BST
Jun  1 10:44:00 linux708 postgres[5595]: [31] LOG:  open 
of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 
0) failed: No such file or directory
Jun  1 10:44:00 linux708 postgres[5595]: [32] LOG:  invalid primary 
checkpoint record
Jun  1 10:44:00 linux708 postgres[5595]: [33] LOG:  open 
of /var/lib/pgsql/data/pg_xlog/0000000000000000 (log file 0, segment 
0) failed: No such file or directory
Jun  1 10:44:00 linux708 postgres[5595]: [34] LOG:  invalid secondary 
checkpoint record
Jun  1 10:44:00 linux708 postgres[5595]: [35] PANIC:  unable to 
locate a valid checkpoint record
Jun  1 10:44:00 linux708 postgres[5592]: [31] LOG:  startup process 
(pid 5595) was terminated by signal 6
Jun  1 10:44:00 linux708 postgres[5592]: [32] LOG:  aborting startup 
due to startup process failure
Jun  1 10:44:01 linux708 rhdb: Starting PostgreSQL - Red Hat Edition 
service:  failed



Expected results:
rhdb start up withpout error

Additional info:
We recently encountered a serious database crash that resulted in a 
significant loss of data… 

We took down the database server, and when we restarted the backend 
we got an error 'database system shutdown was interrupted' … 'invalid 
checkpoint' etc… with missing xlog files (I've appended the log to 
the end of this post)…

I've been trawling list-archives for a few days and this issue has 
cropped up a number of times, but I've found it hard to identify a 
single post - or set of posts - that explain the cause of such a 
crash…

Hopefully I'll be able to bring together the results of this trawl 
through the archives in this post - but I'd really appreciate any 
help or suggestions people have - we currently have a slightly uneasy 
feeling because we've not quite got to the bottom of the issues, and 
it would be nice to set our minds at rest! :-)

So far I've identified three possible causes of the crash - I've 
listed them below, and wonder whether people have any comments on 
them:

1) We were running postgres version 7.3.6-1 (which is the version in 
RedHat AS3)…
The following post suggests that this is a known issue in 7.3.3, but 
7.3.4 is safe? I assume, therefore, that 7.3.6-1 is also safe...
http://archives.postgresql.org/pgsql-general/2003-09/msg01086.php
 
2) We're running the database in conjunction with Jboss, connecting 
to the database server from a different machine via JDBC. The 
database was taken down *without* stopping Jboss first. 

3) We may have run out of space briefly on the database's disk stack…
This post sugggests that this can cause such a crash: 
http://archives.postgresql.org/pgsql-general/2003-09/msg01059.php

I suspect that (3) didn't occur in our case, however there may have 
been some temporary interruptions to the file-store just before 
things died…

Any thoughts would be much apreciated!

Below is the relevant bit of the log,

shutdown log (/var/log/messages): 
May 28 15:43:35  shutdown: shutting down for system halt
May 28 15:43:35  init: Switching to runlevel: 0
May 28 15:43:36 server rhnsd[1694]: Exiting
May 28 15:43:36 server rhnsd: rhnsd shutdown succeeded
May 28 15:43:36 server atd: atd shutdown succeeded
May 28 15:43:36 server cups: cupsd shutdown succeeded
May 28 15:43:36 server xfs[1643]: terminating 
May 28 15:43:36 server xfs: xfs shutdown succeeded
May 28 15:43:36 server mysqld: Stopping MySQL: succeeded
May 28 15:43:36 server gpm: gpm shutdown succeeded
May 28 15:43:37 server rhdb: Stopping PostgreSQL - Red Hat Edition 
service: 
May 28 15:43:37 server su(pam_unix)[12400]: session opened for user 
postgres by (uid=0)
May 28 15:43:40 server su(pam_unix)[12400]: session closed for user 
postgres
May 28 15:43:40 server rhdb: ^[[60G[ 
May 28 15:43:40 server rhdb: 
May 28 15:43:40 server rc: Stopping rhdb: succeeded 
... 
May 28 15:43:44 server kernel: Kernel logging (proc) stopped.
May 28 15:43:44 server kernel: Kernel log daemon terminating.
May 28 15:43:45 server syslog: klogd shutdown succeeded
May 28 15:43:45 server exiting on signal 15
May 28 16:13:35 server syslogd 1.4.1: restart.
Comment 1 Tom Lane 2004-06-08 10:08:14 EDT
I'm afraid you won't find any more clues here than you did on the PG
mailing lists ;-).  There isn't any known cause for this (and I don't
believe any of your three theories).  I would like just as much as you
to identify for sure just what did happen, but I fear the chance is
gone now --- pg_resetxlog would have done a pretty good job of
destroying the evidence.  (I don't suppose you took a filesystem-level
backup before doing that, did you?)

Looking again at your log traces, it seems interesting that the PG log
trace says "shutdown was interrupted at 2004-05-28 16:32:08 BST"
(which would have been the status written to pg_control at the start
of a shutdown sequence) whereas the shutdown log claims this was all
going on at 15:43:37.  Do you have any idea about the cause of that
discrepancy?

I am not certain how much I trust the rhdb init script to report
failures occurring during shutdown --- it looks like it should work,
but it seems possible that PG reported a problem that did not get
reflected in the shutdown log.  It would be more useful to look at the
postmaster error log from the shutdown time, if you have it.
Comment 2 RHEL Product and Program Management 2007-10-19 15:24:48 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.