Bug 114653 - clumanager should not run if joining the multicast group fails
Summary: clumanager should not run if joining the multicast group fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: clumanager
Version: 3
Hardware: i386
OS: Linux
low
high
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact:
URL:
Whiteboard:
: 136553 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-01-30 18:26 UTC by Lon Hohberger
Modified: 2009-04-16 20:35 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-03-19 19:27:54 UTC
Embargoed:


Attachments (Terms of Use)
Patch to fix behavior and make code more clear (8.96 KB, patch)
2004-02-06 17:28 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:122 0 normal SHIPPED_LIVE Updated clumanager and redhat-config-cluster packages fix various bugs 2004-05-12 04:00:00 UTC

Description Lon Hohberger 2004-01-30 18:26:26 UTC
Red Hat Cluster Manager uses either broadcast or multicast for
heartbeat transmission/reception (and can use both; just in an
unsupported fashion).  

Currently, if a member can not join a multicast group, the heartbeat
transmission thread tries to send anyway - even though there are no
heartbeat file descriptors are active.

After the configured failover time, the member reboots, complaining
that it could not send a heartbeat within the failover interval.  This
problem is common on cluster connected via an inexpensive hub or
switch which doesn't handle multicast traffic.

The membership daemon should not run at all if it can not join the
multicast group.

Comment 1 Lon Hohberger 2004-02-06 17:28:15 UTC
Created attachment 97517 [details]
Patch to fix behavior and make code more clear

Comment 3 Lon Hohberger 2004-02-06 17:29:42 UTC
Patch is against 1.2.9

Comment 4 Lon Hohberger 2004-03-19 18:50:09 UTC
Testing:

(1) Stop cluster software on one member
(2) ifconfig eth0 - record IP address & netmask & broadcast
(3) ifdown eth0 on that member.  Unplug eth0 on the same member.
(4) rmmod <ethernet module> (e100?)
(5) ifconfig eth0 <ip> netmask <netmask> broadcast <broadcast> up
(6) Start cluster software on member

Node should reboot on old version after failover interval.



Comment 5 Suzanne Hillman 2004-03-19 19:27:54 UTC
Verified.

Comment 6 Lon Hohberger 2005-01-10 18:00:57 UTC
*** Bug 136553 has been marked as a duplicate of this bug. ***

Comment 7 Lon Hohberger 2007-12-21 15:10:20 UTC
Fixing product name.  Clumanager on RHEL3 was part of RHCS3, not RHEL3


Note You need to log in before you can comment on or make changes to this bug.