490688 – cman init script needs to stop qdisk

Bug 490688 - cman init script needs to stop qdisk

Summary: cman init script needs to stop qdisk

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	cman
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-03-17 16:42 UTC by Corey Marthaler
Modified:	2009-05-12 20:36 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-03-17 17:55:03 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Corey Marthaler 2009-03-17 16:42:55 UTC

Description of problem:
It's currently starts qdisk, but never stops it. 

[root@taft-04 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   0   M      0   2009-03-17 11:29:18  /dev/disk/by-id/scsi-3600805f3000a05b0000000005022000e-part1
   1   M  32716   2009-03-17 11:29:07  taft-01-bond
   2   M  32716   2009-03-17 11:29:07  taft-02-bond
   3   M  32716   2009-03-17 11:29:07  taft-03-bond
   4   M  32716   2009-03-17 11:29:07  taft-04-bond
[root@taft-04 ~]# service cman stop
Stopping cluster: 
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]


Mar 17 11:41:13 taft-03 qdiskd[8114]: <info> Node 4 shutdown 
Mar 17 11:41:13 taft-03 qdiskd[8114]: <err> cman_dispatch: Host is down 
Mar 17 11:41:13 taft-03 qdiskd[8114]: <err> Halting qdisk operations 

[root@taft-04 ~]# service qdiskd status
qdiskd dead but pid file exists


Version-Release number of selected component (if applicable):
cman-2.0.99-1.el5

How reproducible:
Everytime

Comment 1 Lon Hohberger 2009-03-17 17:50:43 UTC

This behavior is expected, and is due to an init script ordering problem in RHEL5.

In RHEL4, all of the cluster components were started separately, and the qdiskd init script fell between 'cman' and 'fenced'.

On RHEL5, the cman initscript starts 'fenced' now as well, while qdiskd is started after. This causes a quorum problem - in order for fencing occur (and therefore complete), quorum must first be formed.

So, what we really want is qdiskd starting before CMAN.

Unfortunately, we are not allowed to reorder init scripts after a release, so there is a hack in the cman init script which performs the following check:

"Does the administrator have qdiskd enabled for the current runlevel?"

If qdiskd is enabled on startup for the current runlevel, the cman init script starts the quorum disk daemon, and the subsequent start of qdiskd by init becomes a no-op. Without this hack, the cman init script will hang during startup.

During system shutdown/reboot shutdown, qdiskd is not stopped because the qdiskd init script is still likewise enabled and correctly is stopped before CMAN.

So effectively, the most correct method to start qdiskd+cman when starting manually is the following:

service qdiskd start; service cman start

And the stop for a "clean" shutdown is actually the same order:

service qdiskd stop; service cman stop

This will avoid the dead PID file and the strange errors in the logs.

We can fix this in maybe RHEL6 (and remove the hack) by simply changing the init script start ordering, but for RHEL5, we can't change the ordering.

Performing the hacks in the shutdown of cman necessary to detect whether the cman init script started qdiskd (and if so, stop it, otherwise expect the admin to use the qdiskd init script) is not worth the maintenance, and is likely to introduce more bugs than it will fix.

Comment 2 RHEL Program Management 2009-03-17 17:55:03 UTC

Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.