Bug 443358 - merge of openais partitions and disallowed cman nodes
merge of openais partitions and disallowed cman nodes
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
urgent Severity high
: rc
: ---
Assigned To: Christine Caulfield
GFS Bugs
Depends On: 251966 460190
Blocks: 391501
  Show dependency treegraph
Reported: 2008-04-21 01:18 EDT by Andrew Ryan
Modified: 2009-04-16 18:17 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-08-01 05:10:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
allow a clean node to merge with a dirty node (733 bytes, patch)
2008-04-21 01:18 EDT, David Robinson
no flags Details | Diff

  None (edit)
Description David Robinson 2008-04-21 01:18:11 EDT
+++ This bug was initially created as a clone of Bug #251966 +++

Description of problem:

Customer reports a possible split brain condition caused by a malfunction of
fence. Under some conditions the fence cluster group (the one displayed by
"group_tool -v" command) can go into JOIN_START_WAIT state and stays there
forever. This means that when a fence action is required its silently discarded
and other cluster group are allowed to perform their recovery steps. This can
easily lead to a split brain condition in a two node cluster, where the fence
action is not performed and the two nodes may recover the same GFS journal or
mount the same ext3 fs.

How reproducible:

Steps to Reproduce:
1) Configure post_join_delay="60", this is not required but makes it easier to
reproduce the problem
2) Start both nodes at the same time, but keep the network interface for the
heartbeat channel disconnected
3) When both nodes are waiting at the fencing startup, wait a few seconds then
connect the network interface

(This is simple to reproduce with 2 xen guests. Configure 2 xen guests as a
cluster. Shutdown the network bridge from the host then boot both guests. whilst
fenced is waiting enable up the bridge.)

Actual results:
4) the nodes have fencing stuck in JOIN_START_WAIT using two distinct id, from
now on every fence action will be silently discarded, the other clustered
services will perform their recovery action as the fence action was performed

The output of "group_tool -v" shows the services stuck on JOIN_START_WAIT and
using two distinct group id on each node:

Node 1:
type             level name     id       state node id local_done
fence            0     default  00010001 JOIN_START_WAIT 2 200020001 1
[1 2]
dlm              1     clvmd    00020001 none
[1 2]

Node 2:
type             level name     id       state node id local_done
fence            0     default  00010002 JOIN_START_WAIT 1 100020001 1
[1 2]
dlm              1     clvmd    00020001 none
[1 2]

Expected results:
4) either the cluster should not form, or the two clusters should be merged

Additional info:
When the nodes start up, they each form a 1-node openais cluster independent of
the other. fence_tool join is run on each node which creates group state in both
clusters. In the situation this bug describes, the dirty flag will not prevent
the clusters from merging because NODE_FLAGS_BEENDOWN is not set:

if (msg->flags & NODE_FLAGS_DIRTY && node->flags & NODE_FLAGS_BEENDOWN)

The attached patch modifies the dirty flag test so that its possible for a
"clean" node (one without state) to join a dirty node regardless of whether its
Comment 1 David Robinson 2008-04-21 01:18:11 EDT
Created attachment 303098 [details]
allow a clean node to merge with a dirty node
Comment 3 Christine Caulfield 2008-04-21 06:42:21 EDT
That patch looks good to me. I've committed it to the master and STABLE branches.
Comment 5 Christine Caulfield 2008-04-28 11:08:29 EDT
Committed to RHEL5 branch

commit 4cd89a0d7eef3c0a8f02517957b393a5be736f46
Author: Christine Caulfield <ccaulfie@redhat.com>
Date:   Mon Apr 28 16:07:08 2008 +0100
Comment 8 Christine Caulfield 2008-08-01 05:10:40 EDT
I'm going to close this NOTABUG and revert the commit as it causes a serious bug
(see 457107). 

There's a misunderstanding of how TRANSITION messages work for a start (which I
should have spotted before I applied the change). And if the DLM can start
without fencing then that's a different problem (if it IS a problem at all)
which isn't related.
Comment 9 Kiersten (Kerri) Anderson 2008-08-29 14:41:01 EDT
Further updates, while this bug is now closed, the story continues in bug 460190.  We now believe there are network switches that end up delaying initial connections for up to 60 or more seconds.  This ends up with a situation where we end up with split fence domains during initial cluster startup.

Note You need to log in before you can comment on or make changes to this bug.