Red Hat Bugzilla – Bug 114653
clumanager should not run if joining the multicast group fails
Last modified: 2009-04-16 16:35:27 EDT
Red Hat Cluster Manager uses either broadcast or multicast for
heartbeat transmission/reception (and can use both; just in an
Currently, if a member can not join a multicast group, the heartbeat
transmission thread tries to send anyway - even though there are no
heartbeat file descriptors are active.
After the configured failover time, the member reboots, complaining
that it could not send a heartbeat within the failover interval. This
problem is common on cluster connected via an inexpensive hub or
switch which doesn't handle multicast traffic.
The membership daemon should not run at all if it can not join the
Created attachment 97517 [details]
Patch to fix behavior and make code more clear
Patch is against 1.2.9
(1) Stop cluster software on one member
(2) ifconfig eth0 - record IP address & netmask & broadcast
(3) ifdown eth0 on that member. Unplug eth0 on the same member.
(4) rmmod <ethernet module> (e100?)
(5) ifconfig eth0 <ip> netmask <netmask> broadcast <broadcast> up
(6) Start cluster software on member
Node should reboot on old version after failover interval.
*** Bug 136553 has been marked as a duplicate of this bug. ***
Fixing product name. Clumanager on RHEL3 was part of RHCS3, not RHEL3