Bug 490688 - cman init script needs to stop qdisk
cman init script needs to stop qdisk
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
low Severity medium
: rc
: ---
Assigned To: Lon Hohberger
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2009-03-17 12:42 EDT by Corey Marthaler
Modified: 2009-05-12 16:36 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-03-17 13:55:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2009-03-17 12:42:55 EDT
Description of problem:
It's currently starts qdisk, but never stops it. 

[root@taft-04 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   0   M      0   2009-03-17 11:29:18  /dev/disk/by-id/scsi-3600805f3000a05b0000000005022000e-part1
   1   M  32716   2009-03-17 11:29:07  taft-01-bond
   2   M  32716   2009-03-17 11:29:07  taft-02-bond
   3   M  32716   2009-03-17 11:29:07  taft-03-bond
   4   M  32716   2009-03-17 11:29:07  taft-04-bond
[root@taft-04 ~]# service cman stop
Stopping cluster: 
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]

Mar 17 11:41:13 taft-03 qdiskd[8114]: <info> Node 4 shutdown 
Mar 17 11:41:13 taft-03 qdiskd[8114]: <err> cman_dispatch: Host is down 
Mar 17 11:41:13 taft-03 qdiskd[8114]: <err> Halting qdisk operations 

[root@taft-04 ~]# service qdiskd status
qdiskd dead but pid file exists

Version-Release number of selected component (if applicable):

How reproducible:
Comment 1 Lon Hohberger 2009-03-17 13:50:43 EDT
This behavior is expected, and is due to an init script ordering problem in RHEL5.

In RHEL4, all of the cluster components were started separately, and the qdiskd init script fell between 'cman' and 'fenced'.

On RHEL5, the cman initscript starts 'fenced' now as well, while qdiskd is started after.  This causes a quorum problem - in order for fencing occur (and therefore complete), quorum must first be formed.

So, what we really want is qdiskd starting before CMAN.

Unfortunately, we are not allowed to reorder init scripts after a release, so there is a hack in the cman init script which performs the following check:

  "Does the administrator have qdiskd enabled for the current runlevel?"

If qdiskd is enabled on startup for the current runlevel, the cman init script starts the quorum disk daemon, and the subsequent start of qdiskd by init becomes a no-op.  Without this hack, the cman init script will hang during startup.

During system shutdown/reboot shutdown, qdiskd is not stopped because the qdiskd init script is still likewise enabled and correctly is stopped before CMAN.

So effectively, the most correct method to start qdiskd+cman when starting manually is the following:

  service qdiskd start; service cman start

And the stop for a "clean" shutdown is actually the same order:

  service qdiskd stop; service cman stop

This will avoid the dead PID file and the strange errors in the logs.

We can fix this in maybe RHEL6 (and remove the hack) by simply changing the init script start ordering, but for RHEL5, we can't change the ordering.

Performing the hacks in the shutdown of cman necessary to detect whether the cman init script started qdiskd (and if so, stop it, otherwise expect the admin to use the qdiskd init script) is not worth the maintenance, and is likely to introduce more bugs than it will fix.
Comment 2 RHEL Product and Program Management 2009-03-17 13:55:03 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.