Description of problem: If two nodes start up in a partitioned network - ie they can't see each other, often because the switch separating them hasn't sorted out its multicast routes yet - and have the dirty flags set, when the network connection is restored the two nodes don't see each other. No node is fenced or even marked "disallowed". Version-Release number of selected component (if applicable): 5.0+ How reproducible: Every time Steps to Reproduce: 1. Separate the two nodes eg: on 131: iptables -A INPUT -s 10.15.84.132 -p udp -j DROP on 132: iptables -A INPUT -s 10.15.84.131 -p udp -j DROP 2. Join them to the cluster. 3. Set the dirty flag, I think fenced does this. 4. Join the cluster nodes with iptables -D INPUT 1 Actual results: cman_tool nodes on both systems shows only the local node, and not the other. syslog shows that openais/Clm can see both nodes. Expected results: The other node shows up as "disallowed" or there is a fence race to kill one node. Additional info: This bug is similar to bz#443358 See also bz#460190
Created attachment 315566 [details] Small program to set the dirty flag in cman
Created attachment 315567 [details] Proposed patch This patch should fix the problem. I'd like to see it tested rather a lot before releasing it though.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
On RHEL5: commit 74721309f73dc6dc38abd07dc7c08e0ecb8ec602 Author: Christine Caulfield <ccaulfie> Date: Wed Sep 10 09:06:25 2008 +0100 cman: honour the dirty flag on a node we haven't seen before and STABLE2
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0189.html