Bug 442898 - QDisk freezes cluster when FC is disconnected
QDisk freezes cluster when FC is disconnected
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cman (Show other bugs)
4
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-17 10:21 EDT by Lon Hohberger
Modified: 2009-04-16 16:22 EDT (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2008-0799
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-25 15:07:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Lon Hohberger 2008-04-17 10:21:19 EDT
+++ This bug was initially created as a clone of Bug #442541 +++

Description of problem:
In a 2-nodes cluster using qdisc as a tie-breaker, disconnecting all Fiber
Channel cables from one node (let's call it "node1") has two results:
1) node1's cman gets killed by node2
2) node2 is stuck, and does not take over the service

Version-Release number of selected component (if applicable):
cman-2.0.73-1.el5_1.5-i386

How reproducible:
Always

Steps to Reproduce:
1. configure a simple 2-node cluster with Quorum Disc enabled
2. unplug FC cables connecting to Qdisc from one node (let's say node1)
  
Actual results:
on node1 (from /var/log/messages)
openais[2870]: [CMAN ] cman killed by node 2 because we were killed by cman_tool
or other application
one node2
qdiskd[2973]: <notice> Writing eviction notice for node 2
qdiskd[2973]: <notice> Node 2 evicted
qdiskd[2973]: <crit> Node 2 is undead.
qdiskd[2973]: <alert> Writing eviction notice for node 2
qdiskd[2973]: <crit> Node 2 is undead.
qdiskd[2973]: <alert> Writing eviction notice for node 2
qdiskd[2973]: <crit> Node 2 is undead.
...and here it gets stuck forever

Expected results:
node2 should have fenced node1 and should have brought up the service

Additional info:
reconnecting FC and manually resetting both nodes produces a clean start and a
working cluster. However umplugging the cables again the problem is always
reproduceable

-- Additional comment from lhh@redhat.com on 2008-04-15 10:58 EST --
Created an attachment (id=302467)
Fix.


-- Additional comment from lhh@redhat.com on 2008-04-15 13:51 EST --
Note - Fix is not in 5.2.

-- Additional comment from lhh@redhat.com on 2008-04-17 10:18 EST --
Fix is in RHEL5 branch of git and had already been applied to stable2 and master.




=== Clone for RHEL4 ===
Bug is fixed in RHEL4 (e.g. 4.8) branch.
Comment 2 RHEL Product and Program Management 2008-04-17 10:40:14 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 Lon Hohberger 2008-04-18 11:33:10 EDT
Appears to be fixed in rhel47 branch:

http://sources.redhat.com/git/?p=cluster.git;a=commit;h=5eec9c0832cd1c91d00d2f3e4bd42389a5cbc7bb
Comment 7 errata-xmlrpc 2008-07-25 15:07:10 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0799.html

Note You need to log in before you can comment on or make changes to this bug.