Bug 490688

Summary: cman init script needs to stop qdisk
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: cmanAssignee: Lon Hohberger <lhh>
Status: CLOSED WONTFIX QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: cfeist, cluster-maint, edamato
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-17 17:55:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2009-03-17 16:42:55 UTC
Description of problem:
It's currently starts qdisk, but never stops it. 

[root@taft-04 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   0   M      0   2009-03-17 11:29:18  /dev/disk/by-id/scsi-3600805f3000a05b0000000005022000e-part1
   1   M  32716   2009-03-17 11:29:07  taft-01-bond
   2   M  32716   2009-03-17 11:29:07  taft-02-bond
   3   M  32716   2009-03-17 11:29:07  taft-03-bond
   4   M  32716   2009-03-17 11:29:07  taft-04-bond
[root@taft-04 ~]# service cman stop
Stopping cluster: 
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]


Mar 17 11:41:13 taft-03 qdiskd[8114]: <info> Node 4 shutdown 
Mar 17 11:41:13 taft-03 qdiskd[8114]: <err> cman_dispatch: Host is down 
Mar 17 11:41:13 taft-03 qdiskd[8114]: <err> Halting qdisk operations 

[root@taft-04 ~]# service qdiskd status
qdiskd dead but pid file exists


Version-Release number of selected component (if applicable):
cman-2.0.99-1.el5

How reproducible:
Everytime

Comment 1 Lon Hohberger 2009-03-17 17:50:43 UTC
This behavior is expected, and is due to an init script ordering problem in RHEL5.

In RHEL4, all of the cluster components were started separately, and the qdiskd init script fell between 'cman' and 'fenced'.

On RHEL5, the cman initscript starts 'fenced' now as well, while qdiskd is started after.  This causes a quorum problem - in order for fencing occur (and therefore complete), quorum must first be formed.

So, what we really want is qdiskd starting before CMAN.

Unfortunately, we are not allowed to reorder init scripts after a release, so there is a hack in the cman init script which performs the following check:

  "Does the administrator have qdiskd enabled for the current runlevel?"

If qdiskd is enabled on startup for the current runlevel, the cman init script starts the quorum disk daemon, and the subsequent start of qdiskd by init becomes a no-op.  Without this hack, the cman init script will hang during startup.

During system shutdown/reboot shutdown, qdiskd is not stopped because the qdiskd init script is still likewise enabled and correctly is stopped before CMAN.

So effectively, the most correct method to start qdiskd+cman when starting manually is the following:

  service qdiskd start; service cman start

And the stop for a "clean" shutdown is actually the same order:

  service qdiskd stop; service cman stop

This will avoid the dead PID file and the strange errors in the logs.

We can fix this in maybe RHEL6 (and remove the hack) by simply changing the init script start ordering, but for RHEL5, we can't change the ordering.

Performing the hacks in the shutdown of cman necessary to detect whether the cman init script started qdiskd (and if so, stop it, otherwise expect the admin to use the qdiskd init script) is not worth the maintenance, and is likely to introduce more bugs than it will fix.

Comment 2 RHEL Program Management 2009-03-17 17:55:03 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.