Bug 442541

Summary: QDisk freezes cluster when FC is disconnected
Product: Red Hat Enterprise Linux 5 Reporter: Mattia Gandolfi <mgandolf>
Component: cmanAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 5.1CC: clasohm, cluster-maint, edamato, tao, tmarshal
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 21:51:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
First node's sosreport
none
Second node's sosreport
none
Fix. none

Description Mattia Gandolfi 2008-04-15 14:05:34 UTC
Description of problem:
In a 2-nodes cluster using qdisc as a tie-breaker, disconnecting all Fiber
Channel cables from one node (let's call it "node1") has two results:
1) node1's cman gets killed by node2
2) node2 is stuck, and does not take over the service

Version-Release number of selected component (if applicable):
cman-2.0.73-1.el5_1.5-i386

How reproducible:
Always

Steps to Reproduce:
1. configure a simple 2-node cluster with Quorum Disc enabled
2. unplug FC cables connecting to Qdisc from one node (let's say node1)
  
Actual results:
on node1 (from /var/log/messages)
openais[2870]: [CMAN ] cman killed by node 2 because we were killed by cman_tool
or other application
one node2
qdiskd[2973]: <notice> Writing eviction notice for node 2
qdiskd[2973]: <notice> Node 2 evicted
qdiskd[2973]: <crit> Node 2 is undead.
qdiskd[2973]: <alert> Writing eviction notice for node 2
qdiskd[2973]: <crit> Node 2 is undead.
qdiskd[2973]: <alert> Writing eviction notice for node 2
qdiskd[2973]: <crit> Node 2 is undead.
...and here it gets stuck forever

Expected results:
node2 should have fenced node1 and should have brought up the service

Additional info:
reconnecting FC and manually resetting both nodes produces a clean start and a
working cluster. However umplugging the cables again the problem is always
reproduceable

Comment 1 Mattia Gandolfi 2008-04-15 14:05:34 UTC
Created attachment 302455 [details]
Cluster.conf

Comment 4 Lon Hohberger 2008-04-15 14:58:45 UTC
Created attachment 302467 [details]
Fix.

Comment 5 Lon Hohberger 2008-04-15 17:51:31 UTC
Note - Fix is not in 5.2.

Comment 6 Lon Hohberger 2008-04-17 14:18:30 UTC
Fix is in RHEL5 branch of git and had already been applied to stable2 and master.



Comment 10 errata-xmlrpc 2009-01-20 21:51:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0189.html