163636 – RFE: Provide iproute-based workaround of e1000 problem for service IPs

Bug 163636 - RFE: Provide iproute-based workaround of e1000 problem for service IPs

Summary: RFE: Provide iproute-based workaround of e1000 problem for service IPs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	clumanager
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	184039 (view as bug list)
Depends On:
Blocks:	161773 162166
TreeView+	depends on / blocked

Reported:	2005-07-19 18:23 UTC by Lon Hohberger
Modified:	2009-04-16 20:17 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHBA-2005-676
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-09-30 14:58:00 UTC
Embargoed:

Attachments	(Terms of Use)
Patch implementing RFE (12.40 KB, patch) 2005-07-27 16:23 UTC, Lon Hohberger	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2005:676	0	normal	SHIPPED_LIVE	clumanager bug fix update	2005-09-30 04:00:00 UTC

Description Lon Hohberger 2005-07-19 18:23:41 UTC

Description of problem:

RHEL3 (up to U5) has an issue where if multiple instances of the "ifconfig"
utility are started at the same time (or quickly in sequence), some calls will
fail (and produce no output).  As far as we can tell, this only happens on
machines with 4 or more e1000-based NICs, with at least two bonded interfaces
configured and multiple (>2) service IPs are configured for use by Red Hat
Cluster Manager.

This problem should be addressed in the next update of the RHEL3 kernel. 
However, it may be beneficial to provide this workaround in addition to the
kernel fix for several reasons (no need to upgrade kernel, preference for
iproute over ifconfig, etc...).


Background:

Cluster Manager on RHEL3 (1.2.x) parses the output of "ifconfig" to monitor
cluster service IP addresses, and has done so since 2002, when Cluster Manager
was first introduced with RHEL 2.1.


What is proposed:

The enhancement here is to incorporate an already-written, field-tested
workaround in an update of Cluster Manager for RHEL3 as a configuration option.


How reproducible:

<0.1% of the calls to ifconfig.  On a system with multiple service IP addresses
with 10 second check intervals, the problem will occur approximately 4-8 times
per day.  When it occurs, services will be restarted on the same node, causing
downtime for several seconds.


Additional info:

This workaround will be a user-configurable option from the command line, and
will not be present in the GUI (redhat-config-cluster).  It will not be enabled
by default.  There are significant behavioral changes with how service IP
addresses are handled, namely:

(a) No more virtual interfaces.  Previously, a service IP address would end up
as "eth0:0" or "eth0:1", etc. in the output of "ifconfig".  The new iproute
utilities, rather than assigning virtual devices (eth0:x) simply assign multiple
IPs to the same NIC at once.  Consequently, the service IP addresses will no
longer show up in the output of the "ifconfig" utility.  Instead, administrators
should use "ip addr list" to view them.

(b) The workaround completely ignores user-specified broadcast and netmask
addresses.  Generally, this is fine, as the service IPs must match
broadcast/netmask addresses of existing system IPs anyway prior to being brought
up.  Bugs reported about this fact will be closed "NOTABUG".

(c) The configuration change must only be done when ALL SERVICES are in the
DISABLED state, but ALL NODES are online.

Comment 1 Lon Hohberger 2005-07-27 16:18:08 UTC

Workaround:

   cludb -p clusvcmgrd%use_netlink yes

Above caveats apply (all nodes up, but all services disabled).

Comment 4 Lon Hohberger 2005-07-27 16:23:38 UTC

Created attachment 117193 [details]
Patch implementing RFE

Comment 9 Lon Hohberger 2005-08-08 16:45:57 UTC

Note: The original summary implies that this issue is related to Red Hat Cluster
Manager.  In fact, it is not related to Cluster Manager at all, and is merely
exacerbated by using Cluster Manager due to the fact that the ifconfig utility
is often called in quick succession while checking the status of services.

Comment 12 Red Hat Bugzilla 2005-09-30 14:58:01 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-676.html

Comment 13 Lon Hohberger 2006-03-06 15:07:29 UTC

*** Bug 184039 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.