Description of problem: When operating two clusters on a single LAN segment, cman_tool displays an empty cluster name field. This results in identical cluster IDs (0) and default multicast addresses, causing nodes membership to "leak" from one cluster to the other. For example, a two node cluster (clu01): # cman_tool status Version: 6.0.1 Config Version: 12 Cluster Name: Cluster Id: 0 Cluster Member: Yes Cluster Generation: 8 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Quorum: 1 Active subsystems: 5 Flags: 2node Ports Bound: 0 Node name: clu01n01.example.com Node ID: 1 Multicast addresses: 239.192.0.0 Node addresses: 10.0.0.1 And a three node cluster on the same LAN (clu02): # cman_tool status Version: 6.0.1 Config Version: 3 Cluster Name: Cluster Id: 0 Cluster Member: Yes Cluster Generation: 12 Membership state: Cluster-Member Nodes: 3 Expected votes: 3 Total votes: 3 Quorum: 2 Active subsystems: 6 Flags: Ports Bound: 0 Node name: clu02n01.example.com Node ID: 1 Multicast addresses: 239.192.0.0 Node addresses: 10.0.0.11 Version-Release number of selected component (if applicable): Both clusters form correctly when only one is started at a time but attempts to start both simultaneously results in nodes joining the wrong clusters. How reproducible: Unclear, has been reported twice but not yet reproduced. Steps to Reproduce: 1. Configure a pair of clusters on a single LAN segment, e.g. with the above addresses in a single 10.0.0.0/16 network. 2. Do not specify an explicit multicast address/port in cluster.conf 3. Allow both clusters to run at the same time Actual results: One or more nodes join the wrong cluster. Expected results: Nodes all join the correct cluster. Additional info:
Created attachment 160553 [details] cluster.conf for clu01
Created attachment 160554 [details] cluster.conf for clu02
It seems that this happens if the cluster name is passed to the cman_tool command-line: # cman_tool join -c chrissie # cman_tool_status Version: 6.0.1 Config Version: 39 Cluster Name: Cluster Id: 0 ... # cman_tool join # cman_tool_status Version: 6.0.1 Config Version: 39 Cluster Name: chrissie Cluster Id: 26347 ... Does that sound like what might be happening in this case ? The fix is simple and has been applied to CVS head: Checking in cmanccs.c; /cvs/cluster/cluster/cman/daemon/cmanccs.c,v <-- cmanccs.c new revision: 1.29; previous revision: 1.28 done
I've added this fix to RHEL5 branch for 5.2. Checking in cmanccs.c; /cvs/cluster/cluster/cman/daemon/cmanccs.c,v <-- cmanccs.c new revision: 1.21.2.5; previous revision: 1.21.2.4 done
In my case, the client is just booting up the cluster and letting it automatically form the cluster in accordance with it's cluster.conf. Internal Status set to 'Waiting on Customer' Status set to: Waiting on Client This event sent from IssueTracker by mbelangia issue 127532
But does the customer have CLUSTERNAME defined in /etc/sysconfig/cman ? That's what passes the cluster name to cman_tool join.
Customer problem, setting blocker flag for 5.1 so we pickup the fix.
Put this back to assigned as I'm pretty sure it's fixing the problem. Now all we need is all the ACKS I think.
Please provide status of this bug as I am still in a down state since 7/21/2007. I am happy to assist by testing any patches that fix this problem prior to official public release of the patch. Regards, Ken
Created attachment 161634 [details] Patch to fix Here's the patch that's in head of CVS.
On the RHEL51 branch: Checking in cmanccs.c; /cvs/cluster/cluster/cman/daemon/cmanccs.c,v <-- cmanccs.c new revision: 1.21.2.4.2.1; previous revision: 1.21.2.4 done
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0575.html