Red Hat Bugzilla – Bug 163636
RFE: Provide iproute-based workaround of e1000 problem for service IPs
Last modified: 2009-04-16 16:17:48 EDT
Description of problem:
RHEL3 (up to U5) has an issue where if multiple instances of the "ifconfig"
utility are started at the same time (or quickly in sequence), some calls will
fail (and produce no output). As far as we can tell, this only happens on
machines with 4 or more e1000-based NICs, with at least two bonded interfaces
configured and multiple (>2) service IPs are configured for use by Red Hat
This problem should be addressed in the next update of the RHEL3 kernel.
However, it may be beneficial to provide this workaround in addition to the
kernel fix for several reasons (no need to upgrade kernel, preference for
iproute over ifconfig, etc...).
Cluster Manager on RHEL3 (1.2.x) parses the output of "ifconfig" to monitor
cluster service IP addresses, and has done so since 2002, when Cluster Manager
was first introduced with RHEL 2.1.
What is proposed:
The enhancement here is to incorporate an already-written, field-tested
workaround in an update of Cluster Manager for RHEL3 as a configuration option.
<0.1% of the calls to ifconfig. On a system with multiple service IP addresses
with 10 second check intervals, the problem will occur approximately 4-8 times
per day. When it occurs, services will be restarted on the same node, causing
downtime for several seconds.
This workaround will be a user-configurable option from the command line, and
will not be present in the GUI (redhat-config-cluster). It will not be enabled
by default. There are significant behavioral changes with how service IP
addresses are handled, namely:
(a) No more virtual interfaces. Previously, a service IP address would end up
as "eth0:0" or "eth0:1", etc. in the output of "ifconfig". The new iproute
utilities, rather than assigning virtual devices (eth0:x) simply assign multiple
IPs to the same NIC at once. Consequently, the service IP addresses will no
longer show up in the output of the "ifconfig" utility. Instead, administrators
should use "ip addr list" to view them.
(b) The workaround completely ignores user-specified broadcast and netmask
addresses. Generally, this is fine, as the service IPs must match
broadcast/netmask addresses of existing system IPs anyway prior to being brought
up. Bugs reported about this fact will be closed "NOTABUG".
(c) The configuration change must only be done when ALL SERVICES are in the
DISABLED state, but ALL NODES are online.
cludb -p clusvcmgrd%use_netlink yes
Above caveats apply (all nodes up, but all services disabled).
Created attachment 117193 [details]
Patch implementing RFE
Note: The original summary implies that this issue is related to Red Hat Cluster
Manager. In fact, it is not related to Cluster Manager at all, and is merely
exacerbated by using Cluster Manager due to the fact that the ifconfig utility
is often called in quick succession while checking the status of services.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
*** Bug 184039 has been marked as a duplicate of this bug. ***