Bug 114653

Summary: clumanager should not run if joining the multicast group fails
Product: [Retired] Red Hat Cluster Suite Reporter: Lon Hohberger <lhh>
Component: clumanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: 3CC: benoit_arthur, cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-03-19 19:27:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Patch to fix behavior and make code more clear none

Description Lon Hohberger 2004-01-30 18:26:26 UTC
Red Hat Cluster Manager uses either broadcast or multicast for
heartbeat transmission/reception (and can use both; just in an
unsupported fashion).  

Currently, if a member can not join a multicast group, the heartbeat
transmission thread tries to send anyway - even though there are no
heartbeat file descriptors are active.

After the configured failover time, the member reboots, complaining
that it could not send a heartbeat within the failover interval.  This
problem is common on cluster connected via an inexpensive hub or
switch which doesn't handle multicast traffic.

The membership daemon should not run at all if it can not join the
multicast group.

Comment 1 Lon Hohberger 2004-02-06 17:28:15 UTC
Created attachment 97517 [details]
Patch to fix behavior and make code more clear

Comment 3 Lon Hohberger 2004-02-06 17:29:42 UTC
Patch is against 1.2.9

Comment 4 Lon Hohberger 2004-03-19 18:50:09 UTC
Testing:

(1) Stop cluster software on one member
(2) ifconfig eth0 - record IP address & netmask & broadcast
(3) ifdown eth0 on that member.  Unplug eth0 on the same member.
(4) rmmod <ethernet module> (e100?)
(5) ifconfig eth0 <ip> netmask <netmask> broadcast <broadcast> up
(6) Start cluster software on member

Node should reboot on old version after failover interval.



Comment 5 Suzanne Hillman 2004-03-19 19:27:54 UTC
Verified.

Comment 6 Lon Hohberger 2005-01-10 18:00:57 UTC
*** Bug 136553 has been marked as a duplicate of this bug. ***

Comment 7 Lon Hohberger 2007-12-21 15:10:20 UTC
Fixing product name.  Clumanager on RHEL3 was part of RHCS3, not RHEL3