Description of problem: When using a private crossover cable for heartbeating with clumanager 1.2, you have a choice between broadcast and multicast modes. In multicast mode (the recommended!), the cluster software sends heartbeats over the primary NIC (defined as the NIC which has an IP which matches the cluster member's name). The primary NIC can be bonded ethernet device - thus providing higher availability. The downside to multicast is that it requires a switch or router which understands multicast to be in the mix. So, the alternative method of heartbeating is broadcast - which works by bombarding every physical interface with heartbeat packets. This includes loopback devices, and all other network devices which have a non-virtual IP configured (e.g. eth0, but not eth0:0). This is partly necessary because a node doesn't "see" its own broadcast traffic necessarily, so the use of 'lo' is needed so a node can declare itself alive. This works well, but is really brute-force: ALL nics get the broadcast packets sent out. This means that if a cluster has a public and a private network (the latter used only for cluster communications) which doesn't understand multicast, that the public network gets unnecessary broadcast traffic. The request is this: Provide an option to only use 'lo' and the primary NIC (defined previously) for heartbeating.
Created attachment 103170 [details] Patch which implements behavior To turn on "primary-nic + lo only": cludb -p clumembd%broadcast_primary_only yes To turn it off: cludb -r clumembd%broadcast_primary_only
1.2.18pre1 patch (unsupported; test only, etc.) http://people.redhat.com/lhh/clumanager-1.2.16-1.2.18pre1.patch This includes the fix for this bug and a few others.
New behavior with feature: Member binds to only the loopback address and the 'primary NIC', regardless of how many physical NICs exist.
Tested this by setting up 2 eth cards, eth0 (primary) and eth1 (secondary). With default setting (broadcast_primary_only off) verified that we were sending broadcast packets on eth1: [root@link-01 root]# tcpdump -i eth1 tcpdump: listening on eth1 10:13:18.927517 10.1.1.1.1228 > 10.1.3.255.1228: udp 16 (DF) 10:13:19.677554 10.1.1.1.1228 > 10.1.3.255.1228: udp 16 (DF) Then shut down the cluster and issued `cludb -p clumembd%broadcast_primary_only yes`, wrote to shared config storage, and restarted the cluster. Ran tcpdump on eth1 again and confirmed that this interface was not issuing broadcast packets. Version clumanager-1.2.22-2 This is ready for RHEL3-U4.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-491.html