216092 – node without disk quorum rejoined after being fenced

Bug 216092 - node without disk quorum rejoined after being fenced

Summary: node without disk quorum rejoined after being fenced

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	cman
Sub Component:
Version:	4
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-11-17 08:07 UTC by Ken Chan
Modified:	2009-04-16 20:21 UTC (History)
CC List:	6 users (show)
Fixed In Version:	RHBA-2007-0134
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-05-10 21:04:29 UTC
Embargoed:

Attachments	(Terms of Use)
cluster.conf (1.81 KB, text/plain) 2006-11-17 08:09 UTC, Ken Chan	no flags	Details
station55 (good node) /var/log/messages (252.21 KB, text/plain) 2006-11-17 08:38 UTC, Ken Chan	no flags	Details
station58 (fenced node) /var/log/messages (826.12 KB, text/plain) 2006-11-17 08:39 UTC, Ken Chan	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0134	0	normal	SHIPPED_LIVE	cman bug fix update	2007-05-10 21:04:00 UTC

Description Ken Chan 2006-11-17 08:07:10 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.3) Gecko/20060526 Red Hat/1.5.0.3-0.2.EL4 Firefox/1.5.0.3 pango-text

Description of problem:
2 node cluster using iscsi-target as storage and disk quorum.

when disconnect the network cable for iscsi traffic, the failed node fenced and rebooted and then rejoin the cluster.

If the failed node is set to preferred, then the service will relocate back to it and the service will fail due to the loss of shared storage.

the qdiskd failed when the fenced node restarting, but the rgmanager still started

Environment:

1 x RHEL AS 4 32bit with iscsi-target 0.4.5
	- station41.example.com 2.6.9-42.ELsmp
2 x RHEL AS 4 32bit with RHCS
	- station55.example.com 2.6.9-42.0.3.ELsmp
	- station58.example.com 2.6.9-42.0.3.ELsmp



Version-Release number of selected component (if applicable):
cman-1.0.11-0

How reproducible:
Always


Steps to Reproduce:
1. Disconnect the network cable using solely for iscsi traffic on station58
2. station58 got fenced and power reset
3. station58 rejoined the cluster after reboot even it couldn't reach the disk quorum
4. if station58 is a preferred node for running the service, the service will then relocate back to station58 and failed (due to unable to mount storage)

Actual Results:
failed node (station58) rejoin cluster. clustat reported online.
reading more into the log.
ccsd - started
cman - started
qdisk - failed
fenced - started
rgmanager -started

Expected Results:
the node without access to disk quorum & shared storage should not be allow to rejoin.

Additional info:
/var/log/messages of the fence node during restart.


Nov 16 17:46:13 station58 ccsd[3861]:  Local version # : 6
Nov 16 17:46:13 station58 ccsd[3861]:  Remote version #: 6
Nov 16 17:46:13 station58 kernel: CMAN: Waiting to join or form a Linux-cluster
Nov 16 17:46:14 station58 ccsd[3861]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.7.1
Nov 16 17:46:14 station58 ccsd[3861]: Initial status:: Inquorate
Nov 16 17:46:15 station58 iscsid[3773]: cannot make connection to 192.168.2.10:3260: No route to host
Nov 16 17:46:15 station58 iscsid[3773]: Connection to Discovery Address 192.168.2.10 failed
Nov 16 17:46:18 station58 kernel: CMAN: sending membership request
Nov 16 17:46:18 station58 kernel: CMAN: got node station55.example.com
Nov 16 17:46:18 station58 kernel: CMAN: quorum regained, resuming activity
Nov 16 17:46:18 station58 ccsd[3861]: Cluster is quorate.  Allowing connections.
Nov 16 17:46:18 station58 cman: startup succeeded
Nov 16 17:46:18 station58 lock_gulmd: no <gulm> section detected in /etc/cluster/cluster.conf succeeded
Nov 16 17:46:18 station58 qdiskd[4041]: <crit> Unable to match label 'raw1' to any device
Nov 16 17:46:18 station58 qdiskd: [4041] crit: Unable to match label 'raw1' to any device
Nov 16 17:46:18 station58 qdiskd: Starting the Quorum Disk Daemon: failed
Nov 16 17:46:19 station58 fenced: startup succeeded
Nov 16 17:46:24 station58 clurgmgrd[4934]: <notice> Resource Group Manager Starting
Nov 16 17:46:24 station58 clurgmgrd[4934]: <info> Loading Service Data
Nov 16 17:46:24 station58 rgmanager: clurgmgrd startup succeeded
Nov 16 17:46:24 station58 clurgmgrd[4934]: <info> Initializing Services
Nov 16 17:46:24 station58 clurgmgrd: [4934]: <info> Executing /etc/init.d/httpd stop
Nov 16 17:46:24 station58 httpd: httpd shutdown failed
Nov 16 17:46:24 station58 clurgmgrd: [4934]: <err> stop: Could not match /dev/sdc1 with a real device
Nov 16 17:46:24 station58 clurgmgrd[4934]: <notice> stop on fs "shared_fs" returned 2 (invalid argument(s))
Nov 16 17:46:24 station58 clurgmgrd[4934]: <info> Services Initialized
Nov 16 17:46:24 station58 clurgmgrd[4934]: <info> Logged in SG "usrm::manager"
Nov 16 17:46:24 station58 clurgmgrd[4934]: <info> Magma Event: Membership Change
Nov 16 17:46:24 station58 clurgmgrd[4934]: <info> State change: Local UP
Nov 16 17:46:24 station58 clurgmgrd[4934]: <info> State change: station55.example.com UP

Comment 1 Ken Chan 2006-11-17 08:09:58 UTC

Created attachment 141455 [details]
cluster.conf

Comment 2 Ken Chan 2006-11-17 08:38:35 UTC

Created attachment 141459 [details]
station55 (good node) /var/log/messages

station55 (good node) /var/log/messages

Comment 3 Ken Chan 2006-11-17 08:39:23 UTC

Created attachment 141460 [details]
station58 (fenced node) /var/log/messages

station58 (fenced node) /var/log/messages

Comment 5 Lon Hohberger 2006-11-27 16:57:20 UTC

Ok, so, as it's currently implemented, nodes themselves are not quorate, but
entire clusters are.  So, a node can rejoin a quorate cluster even if it can't
access shared storage.  In order to reverse this priority, what needs to happen
is that qdisk needs to be altered to:

(a) start before CMAN

(b) allow master to assign a new state to a node which is quorate on disk but
not according to CMAN (yet) (in this case, because CMAN has not started)

The 'down' case (where a node loses qdisk access) is interesting because it's
apparently a bug in how CMAN operates.  I.E. when qdisk tells CMAN that qdisk is
not accessible on a given node, that node (and only that node) will lose quorum
- but the other nodes in the cluster will do nothing at all.  (We worked around
the lack of fencing recently by rebooting the node if it loses qdisk access,
which forces the node to later be fenced.)

There may be other solutions to this problem; I've CC'd Patrick in case he has
anything further to add.

Comment 6 Lon Hohberger 2006-11-27 16:58:13 UTC

forgot (c): When qdisk reaches the state noted in (b), start CMAN.

Comment 9 Lon Hohberger 2006-12-08 14:57:48 UTC

I can make the change; it will only take a day or two.  If your customer(s) is
willing to test it (to make sure it does what they want it to do, or what they
think it should do), it would go a long way.

For RHEL5, the change would be to start qdiskd in the cman script.  Looking in
to my crystal ball, I do not see this change occurring before 5.1.

Comment 10 Lon Hohberger 2006-12-08 15:03:27 UTC

Note: There's implications to doing this which prevent it from being used with
CLVM volumes, merely changing the behavior will not be enough; it will have to
be backwards-compatible.

Comment 11 Lon Hohberger 2007-01-22 21:31:51 UTC

Ok, since qdisk isn't the primary membership determinant (CMAN is, and that will
not change), it will optionally (with stop_cman="1" set in the <quorumd> tag)
leave the cluster immediately on boot if the qdisk device is not available.  

Here us the log of the node booting which has been cut off from the qdisk device:

CMAN 2.6.9-45.8 (built Oct 11 2006 15:50:50) installed
NET: Registered protocol family 30
DLM 2.6.9-44.3 (built Oct 11 2006 17:46:32) installed
CMAN: Waiting to join or form a Linux-cluster
CMAN: sending membership request
CMAN: got node red.lab.boston.redhat.com
CMAN: quorum regained, resuming activity
CMAN: we are leaving the cluster. Membership rejected
WARNING: dlm_emergency_shutdown
WARNING: dlm_emergency_shutdown

On the other node (red.lab):

CMAN: node green.lab.boston.redhat.com rejoining
CMAN: removing node green.lab.boston.redhat.com from the cluster : Membership
rejected

So, since qdiskd isn't the primary quorum determinant, it:

(a) Does NOT control whether or not CMAN can join the cluster, and
(b) the requested behavior must be explictly enabled.

My quorumd tag looks like this for my two node cluster:

        <quorumd votes="1" interval="1" tko="10" log_level="4" 
         log_facility="local4" status_file="/var/run/qdiskd_status"
         label="foo" stop_cman="1">
                <heuristic program="ping 192.168.79.254 -c1 -t1" score="10" 
                 interval="2" tko="3"/>
        </quorumd>

(note: you do not need the "status_file" attribute).

Comment 12 Lon Hohberger 2007-01-22 21:35:41 UTC

The net effect is the same: CMAN is no longer running on the node.  The fact
that it joins for an instant and subsequently leaves is not a bug, but expected
behavior.  During that time in the boot process on RHEL4, nothing is using CMAN
- so the node ought not to be fenced again.  The cluster is stopped bt qdiskd
before fenced or GFS is started if the boot device is not available.

Note that this obviously requires qdiskd to be started on boot (e.g. chkconfig
--level 345 qdisk on) as well.

Comment 15 Lon Hohberger 2007-01-22 22:52:32 UTC

Fixes in CVS; RHEL4 / RHEL5 / head branches.

Comment 18 Red Hat Bugzilla 2007-05-10 21:04:29 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0134.html

Note You need to log in before you can comment on or make changes to this bug.