Bug 126885 - startup subprocess does not go away - hangs system shutdown
startup subprocess does not go away - hangs system shutdown
Status: CLOSED WORKSFORME
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-06-28 15:01 EDT by Bob Gustafson
Modified: 2007-11-30 17:10 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-07-03 14:28:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Bob Gustafson 2004-06-28 15:01:45 EDT
Description of problem:

  Things seem a bit flakey with my current 'yum install all'

   I tried to '/sbin/shutdown -r now' and it never went down. I could
log in from a remote terminal and after some looking around, I found
that shutdown seemed to be waiting for the postgresql subprocess to
exit. This never happened. I was able to kill -9 the subprocess and
then my remote terminal went dead - but the system was still not
rebooting. Finally just did a power cycle to regain control.

Version-Release number of selected component (if applicable):

[user1@hoho2 user1]$ rpm -q postgresql
postgresql-7.4.3-1
[user1@hoho2 user1]$

How reproducible:

  Once was enough.. The postgresql startup subprocess is running now.
will kill it and then try to reboot.

Steps to Reproduce:

1. see below

Actual results:

[root@hoho2 root]# ps ax | grep post
 2921 ?        S      0:00 /usr/bin/postmaster -p 5432 -D
/var/lib/pgsql/data
 2923 ?        S      0:00 postgres: stats buffer process
 2924 ?        S      0:00 postgres: stats collector process
 2925 ?        D      0:00 postgres: startup subprocess
 5159 tty1     S+     0:00 grep post
[root@hoho2 root]#

Expected results:


Additional info:

[user1@hoho2 user1]$ cat /proc/version
Linux version 2.6.7-1.457smp (bhcompile@porky.build.redhat.com) (gcc
version 3.4.0 20040621 (Red Hat Linux 3.4.0-7)) #1 SMP Sun Jun 27
13:16:48 EDT 2004
[user1@hoho2 user1]$
Comment 1 Bob Gustafson 2004-06-28 15:04:36 EDT
I tried to kill -9, but it did not disappear

[root@hoho2 user1]# kill -9 2925
[root@hoho2 user1]# ps ax | grep post
 2921 ?        S      0:00 /usr/bin/postmaster -p 5432 -D
/var/lib/pgsql/data
 2923 ?        S      0:00 postgres: stats buffer process
 2924 ?        S      0:00 postgres: stats collector process
 2925 ?        D      0:00 postgres: startup subprocess
 5570 pts/0    S+     0:00 grep post

After a pause

[root@hoho2 user1]# ps ax | grep post
 2921 ?        S      0:00 /usr/bin/postmaster -p 5432 -D
/var/lib/pgsql/data
 2923 ?        S      0:00 postgres: stats buffer process
 2924 ?        S      0:00 postgres: stats collector process
 2925 ?        D      0:00 postgres: startup subprocess
 5572 pts/0    R+     0:00 grep post
[root@hoho2 user1]#
Comment 2 Bob Gustafson 2004-06-28 15:16:23 EDT
[root@hoho2 user1]# cd /etc/init.d
[root@hoho2 init.d]# ./postgresql stop
Stopping postgresql service:                               [FAILED]
[root@hoho2 init.d]#

This next section is from a different terminal - before the [FAILED]
message occurred above

[user1@hoho2 user1]$ date
Mon Jun 28 14:06:39 CDT 2004
[user1@hoho2 user1]$ ps ax | grep post
 2921 ?        S      0:00 /usr/bin/postmaster -p 5432 -D
/var/lib/pgsql/data
 2923 ?        S      0:00 postgres: stats buffer process
 2924 ?        S      0:00 postgres: stats collector process
 2925 ?        D      0:00 postgres: startup subprocess
 5579 pts/0    S+     0:00 /bin/sh ./postgresql stop
 5586 pts/0    S+     0:00 su -l postgres -c /usr/bin/pg_ctl stop -D
/var/lib/pgsql/data -s -m fast
 5725 pts/1    S+     0:00 grep post
[user1@hoho2 user1]$

After the FAILED message appeared, the ./postgresql stop went away

I am running with boot parameters 'selinux=1 enforcing=0', so selinux
should not be getting in the way, however, there are lots of avc
messages in my /var/log/messages file - see below

audit(1088449566.922:0): avc:  denied  { search } for  pid=5587
exe=/bin/su name=pgsql dev=sda2 ino=1642533
scontext=root:system_r:initrc_su_t
tcontext=system_u:object_r:postgresql_db_t tclass=dir
audit(1088449566.926:0): avc:  denied  { search } for  pid=5588
exe=/bin/bash name=pgsql dev=sda2 ino=1642533
scontext=user_u:user_r:user_t
tcontext=system_u:object_r:postgresql_db_t tclass=dir
audit(1088449566.926:0): avc:  denied  { getattr } for  pid=5588
exe=/bin/bash path=/var/lib/pgsql dev=sda2 ino=1642533
scontext=user_u:user_r:user_t
tcontext=system_u:object_r:postgresql_db_t tclass=dir
audit(1088449566.943:0): avc:  denied  { read } for  pid=5587
exe=/bin/bash name=pgsql dev=sda2 ino=1642533
scontext=user_u:user_r:user_t
tcontext=system_u:object_r:postgresql_db_t tclass=dir
audit(1088449566.964:0): avc:  denied  { read } for  pid=5587
exe=/bin/bash name=.bash_profile dev=sda2 ino=1642162
scontext=user_u:user_r:user_t
tcontext=system_u:object_r:postgresql_db_t tclass=file
audit(1088449566.964:0): avc:  denied  { getattr } for  pid=5587
exe=/bin/bash path=/var/lib/pgsql/.bash_profile dev=sda2 ino=1642162
scontext=user_u:user_r:user_t
tcontext=system_u:object_r:postgresql_db_t tclass=file
audit(1088449566.988:0): avc:  denied  { getattr } for  pid=5587
exe=/bin/bash path=/var/lib/pgsql/data/postmaster.pid dev=sda2
ino=1641743 scontext=user_u:user_r:user_t
tcontext=user_u:object_r:postgresql_db_t tclass=file
audit(1088449566.990:0): avc:  denied  { read } for  pid=5615
exe=/bin/sed name=postmaster.pid dev=sda2 ino=1641743
scontext=user_u:user_r:user_t tcontext=user_u:object_r:postgresql_db_t
tclass=file
audit(1088449585.545:0): avc:  denied  { create } for  pid=5407
exe=/usr/bin/gnome-session scontext=user_u:user_r:user_t
tcontext=user_u:user_r:user_t tclass=netlink_route_socket
audit(1088449586.157:0): avc:  denied  { getattr } for  pid=5587
exe=/bin/bash path=/var/lib/pgsql/data/postmaster.pid dev=sda2
ino=1641743 scontext=user_u:user_r:user_t
tcontext=user_u:object_r:postgresql_db_t tclass=file
[root@hoho2 init.d]#

---

Looks like I will have to power cycle again

I am setting /etc/selinux/config  SELINUXTYPE=permissive just in case
the boot parameter syntax has changed (and therefore not being
recognized..)
Comment 3 Bob Gustafson 2004-06-28 15:33:04 EDT
System came up with fewer avc messages from SELinux, but the startup
subsystem process is still there.

Seems to be unkillable

[user1@hoho2 user1]$ ps ax | grep post
 3177 ?        S      0:00 /usr/bin/postmaster -p 5432 -D
/var/lib/pgsql/data
 3179 ?        S      0:00 postgres: stats buffer process
 3180 ?        S      0:00 postgres: stats collector process
 3181 ?        D      0:00 postgres: startup subprocess
 5275 pts/0    S+     0:00 grep post

[user1@hoho2 user1]$ kill -9 3181
bash: kill: (3181) - Operation not permitted

[user1@hoho2 user1]$ su
Password:

[root@hoho2 user1]# kill -9 3181

[root@hoho2 user1]# ps ax | grep post
 3177 ?        S      0:00 /usr/bin/postmaster -p 5432 -D
/var/lib/pgsql/data
 3179 ?        S      0:00 postgres: stats buffer process
 3180 ?        S      0:00 postgres: stats collector process
 3181 ?        D      0:00 postgres: startup subprocess
 5298 pts/0    R+     0:00 grep post
[root@hoho2 user1]#


Comment 4 Bob Gustafson 2004-06-28 15:36:28 EDT
Odd

[root@hoho2 user1]# su - postgres
-bash-2.05b$ psql -l
psql: FATAL:  the database system is starting up
Comment 5 Bob Gustafson 2004-06-28 16:28:23 EDT
Cannot kill off the startup subprocess..

[root@hoho2 user1]# ps ax | grep post
 3177 ?        S      0:00 /usr/bin/postmaster -p 5432 -D
/var/lib/pgsql/data
 3179 ?        S      0:00 postgres: stats buffer process
 3180 ?        S      0:00 postgres: stats collector process
 3181 ?        D      0:00 postgres: startup subprocess
 5681 pts/0    S+     0:00 grep post

[root@hoho2 user1]# kill -9 3177
[root@hoho2 user1]# kill -9 3179
bash: kill: (3179) - No such process
[root@hoho2 user1]# kill -9 3180
bash: kill: (3180) - No such process
[root@hoho2 user1]# kill -9 3181
[root@hoho2 user1]# ps ax | grep post
 3181 ?        D      0:00 postgres: startup subprocess
 5683 pts/0    S+     0:00 grep post
[root@hoho2 user1]#

Comment 6 Tom Lane 2004-06-28 21:23:41 EDT
I don't think this is a Postgres issue per se.  A process stuck in an
unkillable disk I/O wait is a symptom of a kernel-level problem, or at
least a kernel-level failure to recover from an even-lower-level
problem.  Exactly what sort of disk is the Postgres database sitting
on?  Have you had any reason to suspect hardware trouble with that
disk?  Can  you still access the disk while this is going on?  The
symptoms are consistent with the disk freezing up when asked to read
some block that the startup process needs to read during database
recovery processing.

You might try running a disk diagnostic such as badblocks to see if
that turns up anything.
Comment 7 Bob Gustafson 2004-07-02 08:03:16 EDT
I think you are correct in saying that it is a Kernel problem.

I am stuck on kernel 2.6.7-1.457 - higher numbers over the last few
days do not even boot for me.

-----

On your disk question, I am running a pair of Seagate scsi drives - no
raid or anythng fancy.

In the /var/log/messaage file, I see a couple of lines: that I haven't
seen before:

Jul  1 21:03:07 hoho2 kernel: Attached scsi generic sg0 at scsi0,
channel 0, id
0, lun 0,  type 0
Jul  1 21:03:07 hoho2 kernel: Attached scsi generic sg1 at scsi0,
channel 0, id
3, lun 0,  type 0
Jul  1 21:03:07 hoho2 kernel: kudzu: Using deprecated /dev/sg
mechanism instead
of SG_IO on the actual device
Jul  1 21:03:07 hoho2 kernel: kudzu: Using deprecated /dev/sg
mechanism instead
of SG_IO on the actual device

-----

I think the problem is more a shutdown problem. If you want to shift
this bug queue over to shutdown - it would be fine with me.

-----

I can kill processes from a terminal up to the time I say
'/sbin/shutdown -r now'  Then I am able to log in from another
terminal, but cannot kill processes (this of course depends on how
many critical processes I have killed prior to the shutdown attempt)

I am currently preventing postgresql from coming up during boot - to
simplify things for the moment.
Comment 8 Tom Lane 2004-07-02 19:54:54 EDT
Hmm, I'd wonder about a SCSI driver issue.  But rather than
prejudging, I'll just bounce it over to the general "kernel" category
and let Arjan decide who gets the blame ...

I'll stay cc'd to this in case any Postgres knowledge proves helpful.
Comment 9 Arjan van de Ven 2004-07-03 02:54:13 EDT
can you do
echo t > /proc/sysreq-trigger
and get me the backtrace that belongs to the postgres task in "D" state ?
Comment 10 Bob Gustafson 2004-07-03 14:28:34 EDT
See bug 126947

Going to single processor kernel seems to clear up both my no boot and
no shutdown problems.

Works (sort of) for me.

Note You need to log in before you can comment on or make changes to this bug.