Description of problem: It's currently starts qdisk, but never stops it. [root@taft-04 ~]# cman_tool nodes Node Sts Inc Joined Name 0 M 0 2009-03-17 11:29:18 /dev/disk/by-id/scsi-3600805f3000a05b0000000005022000e-part1 1 M 32716 2009-03-17 11:29:07 taft-01-bond 2 M 32716 2009-03-17 11:29:07 taft-02-bond 3 M 32716 2009-03-17 11:29:07 taft-03-bond 4 M 32716 2009-03-17 11:29:07 taft-04-bond [root@taft-04 ~]# service cman stop Stopping cluster: Stopping fencing... done Stopping cman... done Stopping ccsd... done Unmounting configfs... done [ OK ] Mar 17 11:41:13 taft-03 qdiskd[8114]: <info> Node 4 shutdown Mar 17 11:41:13 taft-03 qdiskd[8114]: <err> cman_dispatch: Host is down Mar 17 11:41:13 taft-03 qdiskd[8114]: <err> Halting qdisk operations [root@taft-04 ~]# service qdiskd status qdiskd dead but pid file exists Version-Release number of selected component (if applicable): cman-2.0.99-1.el5 How reproducible: Everytime
This behavior is expected, and is due to an init script ordering problem in RHEL5. In RHEL4, all of the cluster components were started separately, and the qdiskd init script fell between 'cman' and 'fenced'. On RHEL5, the cman initscript starts 'fenced' now as well, while qdiskd is started after. This causes a quorum problem - in order for fencing occur (and therefore complete), quorum must first be formed. So, what we really want is qdiskd starting before CMAN. Unfortunately, we are not allowed to reorder init scripts after a release, so there is a hack in the cman init script which performs the following check: "Does the administrator have qdiskd enabled for the current runlevel?" If qdiskd is enabled on startup for the current runlevel, the cman init script starts the quorum disk daemon, and the subsequent start of qdiskd by init becomes a no-op. Without this hack, the cman init script will hang during startup. During system shutdown/reboot shutdown, qdiskd is not stopped because the qdiskd init script is still likewise enabled and correctly is stopped before CMAN. So effectively, the most correct method to start qdiskd+cman when starting manually is the following: service qdiskd start; service cman start And the stop for a "clean" shutdown is actually the same order: service qdiskd stop; service cman stop This will avoid the dead PID file and the strange errors in the logs. We can fix this in maybe RHEL6 (and remove the hack) by simply changing the init script start ordering, but for RHEL5, we can't change the ordering. Performing the hacks in the shutdown of cman necessary to detect whether the cman init script started qdiskd (and if so, stop it, otherwise expect the admin to use the qdiskd init script) is not worth the maintenance, and is likely to introduce more bugs than it will fix.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.