Bug 442541 - QDisk freezes cluster when FC is disconnected
QDisk freezes cluster when FC is disconnected
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.1
i386 Linux
medium Severity high
: rc
: ---
Assigned To: Lon Hohberger
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-15 10:05 EDT by Mattia Gandolfi
Modified: 2013-02-12 15:31 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 16:51:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
First node's sosreport (497.52 KB, application/x-bzip2)
2008-04-15 10:07 EDT, Mattia Gandolfi
no flags Details
Second node's sosreport (524.95 KB, application/x-bzip2)
2008-04-15 10:08 EDT, Mattia Gandolfi
no flags Details
Fix. (1.70 KB, patch)
2008-04-15 10:58 EDT, Lon Hohberger
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 29611 None None None Never

  None (edit)
Description Mattia Gandolfi 2008-04-15 10:05:34 EDT
Description of problem:
In a 2-nodes cluster using qdisc as a tie-breaker, disconnecting all Fiber
Channel cables from one node (let's call it "node1") has two results:
1) node1's cman gets killed by node2
2) node2 is stuck, and does not take over the service

Version-Release number of selected component (if applicable):
cman-2.0.73-1.el5_1.5-i386

How reproducible:
Always

Steps to Reproduce:
1. configure a simple 2-node cluster with Quorum Disc enabled
2. unplug FC cables connecting to Qdisc from one node (let's say node1)
  
Actual results:
on node1 (from /var/log/messages)
openais[2870]: [CMAN ] cman killed by node 2 because we were killed by cman_tool
or other application
one node2
qdiskd[2973]: <notice> Writing eviction notice for node 2
qdiskd[2973]: <notice> Node 2 evicted
qdiskd[2973]: <crit> Node 2 is undead.
qdiskd[2973]: <alert> Writing eviction notice for node 2
qdiskd[2973]: <crit> Node 2 is undead.
qdiskd[2973]: <alert> Writing eviction notice for node 2
qdiskd[2973]: <crit> Node 2 is undead.
...and here it gets stuck forever

Expected results:
node2 should have fenced node1 and should have brought up the service

Additional info:
reconnecting FC and manually resetting both nodes produces a clean start and a
working cluster. However umplugging the cables again the problem is always
reproduceable
Comment 1 Mattia Gandolfi 2008-04-15 10:05:34 EDT
Created attachment 302455 [details]
Cluster.conf
Comment 4 Lon Hohberger 2008-04-15 10:58:45 EDT
Created attachment 302467 [details]
Fix.
Comment 5 Lon Hohberger 2008-04-15 13:51:31 EDT
Note - Fix is not in 5.2.
Comment 6 Lon Hohberger 2008-04-17 10:18:30 EDT
Fix is in RHEL5 branch of git and had already been applied to stable2 and master.

Comment 10 errata-xmlrpc 2009-01-20 16:51:34 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0189.html

Note You need to log in before you can comment on or make changes to this bug.