Bug 163636 - RFE: Provide iproute-based workaround of e1000 problem for service IPs
Summary: RFE: Provide iproute-based workaround of e1000 problem for service IPs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: clumanager
Version: 3
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
: 184039 (view as bug list)
Depends On:
Blocks: 161773 162166
TreeView+ depends on / blocked
 
Reported: 2005-07-19 18:23 UTC by Lon Hohberger
Modified: 2009-04-16 20:17 UTC (History)
5 users (show)

Fixed In Version: RHBA-2005-676
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-30 14:58:00 UTC
Embargoed:


Attachments (Terms of Use)
Patch implementing RFE (12.40 KB, patch)
2005-07-27 16:23 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:676 0 normal SHIPPED_LIVE clumanager bug fix update 2005-09-30 04:00:00 UTC

Description Lon Hohberger 2005-07-19 18:23:41 UTC
Description of problem:

RHEL3 (up to U5) has an issue where if multiple instances of the "ifconfig"
utility are started at the same time (or quickly in sequence), some calls will
fail (and produce no output).  As far as we can tell, this only happens on
machines with 4 or more e1000-based NICs, with at least two bonded interfaces
configured and multiple (>2) service IPs are configured for use by Red Hat
Cluster Manager.

This problem should be addressed in the next update of the RHEL3 kernel. 
However, it may be beneficial to provide this workaround in addition to the
kernel fix for several reasons (no need to upgrade kernel, preference for
iproute over ifconfig, etc...).


Background:

Cluster Manager on RHEL3 (1.2.x) parses the output of "ifconfig" to monitor
cluster service IP addresses, and has done so since 2002, when Cluster Manager
was first introduced with RHEL 2.1.


What is proposed:

The enhancement here is to incorporate an already-written, field-tested
workaround in an update of Cluster Manager for RHEL3 as a configuration option.


How reproducible:

<0.1% of the calls to ifconfig.  On a system with multiple service IP addresses
with 10 second check intervals, the problem will occur approximately 4-8 times
per day.  When it occurs, services will be restarted on the same node, causing
downtime for several seconds.


Additional info:

This workaround will be a user-configurable option from the command line, and
will not be present in the GUI (redhat-config-cluster).  It will not be enabled
by default.  There are significant behavioral changes with how service IP
addresses are handled, namely:

(a) No more virtual interfaces.  Previously, a service IP address would end up
as "eth0:0" or "eth0:1", etc. in the output of "ifconfig".  The new iproute
utilities, rather than assigning virtual devices (eth0:x) simply assign multiple
IPs to the same NIC at once.  Consequently, the service IP addresses will no
longer show up in the output of the "ifconfig" utility.  Instead, administrators
should use "ip addr list" to view them.

(b) The workaround completely ignores user-specified broadcast and netmask
addresses.  Generally, this is fine, as the service IPs must match
broadcast/netmask addresses of existing system IPs anyway prior to being brought
up.  Bugs reported about this fact will be closed "NOTABUG".

(c) The configuration change must only be done when ALL SERVICES are in the
DISABLED state, but ALL NODES are online.

Comment 1 Lon Hohberger 2005-07-27 16:18:08 UTC
Workaround:

   cludb -p clusvcmgrd%use_netlink yes

Above caveats apply (all nodes up, but all services disabled).


Comment 4 Lon Hohberger 2005-07-27 16:23:38 UTC
Created attachment 117193 [details]
Patch implementing RFE

Comment 9 Lon Hohberger 2005-08-08 16:45:57 UTC
Note: The original summary implies that this issue is related to Red Hat Cluster
Manager.  In fact, it is not related to Cluster Manager at all, and is merely
exacerbated by using Cluster Manager due to the fact that the ifconfig utility
is often called in quick succession while checking the status of services.


Comment 12 Red Hat Bugzilla 2005-09-30 14:58:01 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-676.html


Comment 13 Lon Hohberger 2006-03-06 15:07:29 UTC
*** Bug 184039 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.