Red Hat Cluster Manager uses either broadcast or multicast for heartbeat transmission/reception (and can use both; just in an unsupported fashion). Currently, if a member can not join a multicast group, the heartbeat transmission thread tries to send anyway - even though there are no heartbeat file descriptors are active. After the configured failover time, the member reboots, complaining that it could not send a heartbeat within the failover interval. This problem is common on cluster connected via an inexpensive hub or switch which doesn't handle multicast traffic. The membership daemon should not run at all if it can not join the multicast group.
Created attachment 97517 [details] Patch to fix behavior and make code more clear
Patch is against 1.2.9
Testing: (1) Stop cluster software on one member (2) ifconfig eth0 - record IP address & netmask & broadcast (3) ifdown eth0 on that member. Unplug eth0 on the same member. (4) rmmod <ethernet module> (e100?) (5) ifconfig eth0 <ip> netmask <netmask> broadcast <broadcast> up (6) Start cluster software on member Node should reboot on old version after failover interval.
Verified.
*** Bug 136553 has been marked as a duplicate of this bug. ***
Fixing product name. Clumanager on RHEL3 was part of RHCS3, not RHEL3