114653 – clumanager should not run if joining the multicast group fails

Bug 114653 - clumanager should not run if joining the multicast group fails

Summary: clumanager should not run if joining the multicast group fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	clumanager
Sub Component:
Version:	3
Hardware:	i386
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	136553 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-01-30 18:26 UTC by Lon Hohberger
Modified:	2009-04-16 20:35 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-03-19 19:27:54 UTC
Embargoed:

Attachments	(Terms of Use)
Patch to fix behavior and make code more clear (8.96 KB, patch) 2004-02-06 17:28 UTC, Lon Hohberger	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:122	0	normal	SHIPPED_LIVE	Updated clumanager and redhat-config-cluster packages fix various bugs	2004-05-12 04:00:00 UTC

Description Lon Hohberger 2004-01-30 18:26:26 UTC

Red Hat Cluster Manager uses either broadcast or multicast for
heartbeat transmission/reception (and can use both; just in an
unsupported fashion).  

Currently, if a member can not join a multicast group, the heartbeat
transmission thread tries to send anyway - even though there are no
heartbeat file descriptors are active.

After the configured failover time, the member reboots, complaining
that it could not send a heartbeat within the failover interval.  This
problem is common on cluster connected via an inexpensive hub or
switch which doesn't handle multicast traffic.

The membership daemon should not run at all if it can not join the
multicast group.

Comment 1 Lon Hohberger 2004-02-06 17:28:15 UTC

Created attachment 97517 [details]
Patch to fix behavior and make code more clear

Comment 3 Lon Hohberger 2004-02-06 17:29:42 UTC

Patch is against 1.2.9

Comment 4 Lon Hohberger 2004-03-19 18:50:09 UTC

Testing:

(1) Stop cluster software on one member
(2) ifconfig eth0 - record IP address & netmask & broadcast
(3) ifdown eth0 on that member.  Unplug eth0 on the same member.
(4) rmmod <ethernet module> (e100?)
(5) ifconfig eth0 <ip> netmask <netmask> broadcast <broadcast> up
(6) Start cluster software on member

Node should reboot on old version after failover interval.

Comment 5 Suzanne Hillman 2004-03-19 19:27:54 UTC

Verified.

Comment 6 Lon Hohberger 2005-01-10 18:00:57 UTC

*** Bug 136553 has been marked as a duplicate of this bug. ***

Comment 7 Lon Hohberger 2007-12-21 15:10:20 UTC

Fixing product name.  Clumanager on RHEL3 was part of RHCS3, not RHEL3

Note You need to log in before you can comment on or make changes to this bug.