Bug 114653

Summary:

clumanager should not run if joining the multicast group fails

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Lon Hohberger <lhh>

Component:

clumanager

Assignee:

Lon Hohberger <lhh>

Status:

CLOSED ERRATA

QA Contact:

Severity:

high

Docs Contact:

Priority:

low

Version:

CC:

benoit_arthur, cluster-maint

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2004-03-19 19:27:54 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Patch to fix behavior and make code more clear	none

Description Lon Hohberger 2004-01-30 18:26:26 UTC

Red Hat Cluster Manager uses either broadcast or multicast for
heartbeat transmission/reception (and can use both; just in an
unsupported fashion).  

Currently, if a member can not join a multicast group, the heartbeat
transmission thread tries to send anyway - even though there are no
heartbeat file descriptors are active.

After the configured failover time, the member reboots, complaining
that it could not send a heartbeat within the failover interval.  This
problem is common on cluster connected via an inexpensive hub or
switch which doesn't handle multicast traffic.

The membership daemon should not run at all if it can not join the
multicast group.

Comment 1 Lon Hohberger 2004-02-06 17:28:15 UTC

Created attachment 97517 [details]
Patch to fix behavior and make code more clear

Comment 3 Lon Hohberger 2004-02-06 17:29:42 UTC

Patch is against 1.2.9

Comment 4 Lon Hohberger 2004-03-19 18:50:09 UTC

Testing:

(1) Stop cluster software on one member
(2) ifconfig eth0 - record IP address & netmask & broadcast
(3) ifdown eth0 on that member.  Unplug eth0 on the same member.
(4) rmmod <ethernet module> (e100?)
(5) ifconfig eth0 <ip> netmask <netmask> broadcast <broadcast> up
(6) Start cluster software on member

Node should reboot on old version after failover interval.

Comment 5 Suzanne Hillman 2004-03-19 19:27:54 UTC

Verified.

Comment 6 Lon Hohberger 2005-01-10 18:00:57 UTC

*** Bug 136553 has been marked as a duplicate of this bug. ***

Comment 7 Lon Hohberger 2007-12-21 15:10:20 UTC

Fixing product name.  Clumanager on RHEL3 was part of RHCS3, not RHEL3