Bug 429154 - bonding + arp monitoring + xen cause interface flapping
Summary: bonding + arp monitoring + xen cause interface flapping
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Daniel Berrangé
QA Contact: Yulia Kopkova
URL:
Whiteboard:
Depends On:
Blocks: 448899 RHEL5u3_relnotes 462680
TreeView+ depends on / blocked
 
Reported: 2008-01-17 16:45 UTC by Vince Worthington
Modified: 2018-10-20 00:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When setting up interface bonding on dom0, the default 'network-bridge' script may cause bonded network interfaces to alternately switch between 'unavailable' and 'available'. This occurrence is commonly known as flapping. To prevent this, replace the standard network-script line in /etc/xen/xend-config.sxp with the following line: (network-script network-bridge-bonding netdev=bond0) Doing so will disable the netloop device, which prevents Address Resolution Protocol (ARP) monitoring from failing during the address transfer process.
Clone Of:
Environment:
Last Closed: 2009-01-20 21:11:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to network-bridge to add scope host IP back to physical bond (1.60 KB, patch)
2008-01-17 16:45 UTC, Vince Worthington
no flags Details | Diff
modified 3.0.3-41 network-bridge script (9.45 KB, text/plain)
2008-01-17 17:42 UTC, Vince Worthington
no flags Details
sysreport (9.05 MB, application/octet-stream)
2008-01-22 18:02 UTC, Vince Worthington
no flags Details
network-bridge script based on upstream unstable dated 1/30/08 but modified to support bond devices (8.90 KB, patch)
2008-01-30 23:24 UTC, Vince Worthington
no flags Details | Diff
goes along with modified upstream unstable network-bridge script (has also changed since 3.0.3-41) (3.08 KB, patch)
2008-01-30 23:26 UTC, Vince Worthington
no flags Details | Diff
Alternate network script for binding (15.00 KB, patch)
2008-12-01 16:16 UTC, Daniel Berrangé
no flags Details | Diff
Patch that fixes the problem described in the previous command (530 bytes, patch)
2008-12-31 14:10 UTC, Sadique Puthen
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0118 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2009-01-20 16:04:49 UTC

Description Vince Worthington 2008-01-17 16:45:32 UTC
Description of problem:
When setting up interface bonding on a Xen dom0 server and implementing arp
monitoring using arp_interval and arp_ip_targets cause interface flapping
between the slave interfaces that are bonded and excessive failures in bonding
configuration

Version-Release number of selected component (if applicable):
confirmed on 3.0.3-41

How reproducible:
always

Steps to Reproduce:
1. Set up Xen server using eth0 and eth1 bonded to bond0 such that bond0 is an
interface on xenbr0 as pbond0 by updating (network-script network-bridge) to
(network-script 'network-bridge netdev=bond0')

2. configure bond0 to use arp monitoring by setting arp_interval=500 and
arp_ip_target=<default gateway, for example> in BONDING_OPTS

3. reboot server for configuration to take effect
  
Actual results:
Get excessive failovers between interfaces

Expected results:
Network stability if the arp_ip_target and the network between it and the host
is stable.

Additional info:
(from customer):
Please note that if the network bridge is configured to connect to eth0, this
problem does not happen so it has something to do with the interface renaming
that happens when bond0 is attached to xenbr0.

Comment 1 Vince Worthington 2008-01-17 16:45:32 UTC
Created attachment 292029 [details]
Patch to network-bridge to add scope host IP back to physical bond

Comment 3 RHEL Program Management 2008-01-17 16:56:39 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Vince Worthington 2008-01-17 17:42:21 UTC
Created attachment 292034 [details]
modified 3.0.3-41 network-bridge script

Comment 9 Vince Worthington 2008-01-22 17:58:52 UTC
Attaching sysreport in case it is useful.

--vince

Comment 10 Vince Worthington 2008-01-22 18:02:52 UTC
Created attachment 292542 [details]
sysreport

Comment 12 Herbert Xu 2008-01-24 04:13:15 UTC
OK I've had a look at the problem and it seems that it's all down to the netloop
device.  Please note that upstream has already obsoleted the netloop device. 
Unfortunately we still enable it by default in RHEL5.  However, by modifying the
network-bridge script you can operate without the netloop device.

The reason the netloop device needs to go with bonding is because you can't just
take away the MAC address from the bonding constituents as you can with a normal
Ethernet NIC.

So if you modify the scripts such that the netloop device is gone (by simply
renaming xenbr0 to bond0 and using that in place of the old netloop device), the
problem should go away.

You also need to make sure that you don't assign the FF:* MAC address to either
xenbr0 or pbond0.

Comment 13 Jason Willeford 2008-01-25 18:30:05 UTC
Herbert,
Can you please expand on your modification of the scripts?  The GPS consultant
at the customer site did not have any luck with his interpretation of your
suggestion.

>Calvin,
>Does the engineers comments make sense to you?

Sadly, no.  I tried setting (network-script 'network-bridge=bond0') in the
configuration file but that didn't work. I also couldn't understand whether
xenbr0 still exists at the end of the changes or not, since I either rename it
to bond0 or make sure that it doesn't get a FF:* mac address, depending on your
reading of the note.

If I could get some clarification or even better a sample script I might be able
to make progress.


Comment 21 Vince Worthington 2008-01-30 23:24:57 UTC
Created attachment 293521 [details]
network-bridge script based on upstream unstable dated 1/30/08 but modified to support bond devices

Comment 22 Vince Worthington 2008-01-30 23:26:03 UTC
Created attachment 293522 [details]
goes along with modified upstream unstable network-bridge script (has also changed since 3.0.3-41)

Comment 28 Daniel Berrangé 2008-01-31 14:30:05 UTC
NB, the upstream network-bridge  script has changed beyond all recognition when
compared to the RHEL-5 network-bridge script. This has significantly changed the
way networking is setup in a Dom0 host, and as such is not suitable for patching
into RHEL-5.  If any network-bridge script changes are required they need to be
small & clear patches against the current RHEL-5 version of network-bridge, not
xen-unstable's version.


Comment 38 RHEL Program Management 2008-03-11 19:36:55 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 41 RHEL Program Management 2008-07-24 16:11:09 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 43 Daniel Berrangé 2008-10-07 09:27:04 UTC
FYI, current plan for addressing this bug is to try and provide an alternative "network-bridge-bonding" script, rather than changing existing network-bridge script. The alternate script will be written to not use netloop and match latest upstream. This will hopefully make bonding work, and not risk regressions for users of the existing network-bridge script.

Comment 46 Daniel Berrangé 2008-12-01 16:16:52 UTC
Created attachment 325247 [details]
Alternate network script for binding

We cannot fix the main 'network-bridge' script for use with bonding, since that would require removal of netloop, which is a user visible change we cannot do in an update.

Thus, this patch introduces a alternate script which must be explicitly requested with /etc/xen/xend-config.sxp

 (network-script network-bridge-bonding netdev=bond0)

This script eliminates the netloop device as per Herbert's recommendation, and takes care not to break ARP monitoring during the address transfer process

Comment 47 Bill Burns 2008-12-02 15:43:01 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The default Xen 'network-bridge' does not work properly with bonding. When used it can cause the network interfaces to flap.

This patch introduces a alternate script which must be explicitlyvrequested in the /etc/xen/xend-config.sxp file by replacing the
standard network-script line with:

 (network-script network-bridge-bonding netdev=bond0)

This script eliminates the netloop device and
takes care not to break ARP monitoring during the address transfer process.

Comment 50 Don Domingo 2008-12-03 00:12:23 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,9 +1,7 @@
-The default Xen 'network-bridge' does not work properly with bonding. When used it can cause the network interfaces to flap.
+When setting up interface bonding on dom0, the default 'network-bridge' script may cause bonded network interfaces to alternately switch between 'unavailable' and 'available'. This occurrence is commonly known as flapping.
 
-This patch introduces a alternate script which must be explicitlyvrequested in the /etc/xen/xend-config.sxp file by replacing the
-standard network-script line with:
+To prevent this, replace the standard network-script line in /etc/xen/xend-config.sxp with the following line:
 
  (network-script network-bridge-bonding netdev=bond0)
 
-This script eliminates the netloop device and
+Doing so will disable the netloop device, which prevents Address Resolution Protocol (ARP) monitoring from failing during the address transfer process.-takes care not to break ARP monitoring during the address transfer process.

Comment 51 Daniel Berrangé 2008-12-03 20:44:57 UTC
Built into xen-3.0.3-78.el5

Comment 61 errata-xmlrpc 2009-01-20 21:11:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0118.html


Note You need to log in before you can comment on or make changes to this bug.