Red Hat Bugzilla – Bug 442898
QDisk freezes cluster when FC is disconnected
Last modified: 2009-04-16 16:22:57 EDT
+++ This bug was initially created as a clone of Bug #442541 +++ Description of problem: In a 2-nodes cluster using qdisc as a tie-breaker, disconnecting all Fiber Channel cables from one node (let's call it "node1") has two results: 1) node1's cman gets killed by node2 2) node2 is stuck, and does not take over the service Version-Release number of selected component (if applicable): cman-2.0.73-1.el5_1.5-i386 How reproducible: Always Steps to Reproduce: 1. configure a simple 2-node cluster with Quorum Disc enabled 2. unplug FC cables connecting to Qdisc from one node (let's say node1) Actual results: on node1 (from /var/log/messages) openais[2870]: [CMAN ] cman killed by node 2 because we were killed by cman_tool or other application one node2 qdiskd[2973]: <notice> Writing eviction notice for node 2 qdiskd[2973]: <notice> Node 2 evicted qdiskd[2973]: <crit> Node 2 is undead. qdiskd[2973]: <alert> Writing eviction notice for node 2 qdiskd[2973]: <crit> Node 2 is undead. qdiskd[2973]: <alert> Writing eviction notice for node 2 qdiskd[2973]: <crit> Node 2 is undead. ...and here it gets stuck forever Expected results: node2 should have fenced node1 and should have brought up the service Additional info: reconnecting FC and manually resetting both nodes produces a clean start and a working cluster. However umplugging the cables again the problem is always reproduceable -- Additional comment from lhh@redhat.com on 2008-04-15 10:58 EST -- Created an attachment (id=302467) Fix. -- Additional comment from lhh@redhat.com on 2008-04-15 13:51 EST -- Note - Fix is not in 5.2. -- Additional comment from lhh@redhat.com on 2008-04-17 10:18 EST -- Fix is in RHEL5 branch of git and had already been applied to stable2 and master. === Clone for RHEL4 === Bug is fixed in RHEL4 (e.g. 4.8) branch.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Appears to be fixed in rhel47 branch: http://sources.redhat.com/git/?p=cluster.git;a=commit;h=5eec9c0832cd1c91d00d2f3e4bd42389a5cbc7bb
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0799.html