Description of problem: When setting up interface bonding on a Xen dom0 server and implementing arp monitoring using arp_interval and arp_ip_targets cause interface flapping between the slave interfaces that are bonded and excessive failures in bonding configuration Version-Release number of selected component (if applicable): confirmed on 3.0.3-41 How reproducible: always Steps to Reproduce: 1. Set up Xen server using eth0 and eth1 bonded to bond0 such that bond0 is an interface on xenbr0 as pbond0 by updating (network-script network-bridge) to (network-script 'network-bridge netdev=bond0') 2. configure bond0 to use arp monitoring by setting arp_interval=500 and arp_ip_target=<default gateway, for example> in BONDING_OPTS 3. reboot server for configuration to take effect Actual results: Get excessive failovers between interfaces Expected results: Network stability if the arp_ip_target and the network between it and the host is stable. Additional info: (from customer): Please note that if the network bridge is configured to connect to eth0, this problem does not happen so it has something to do with the interface renaming that happens when bond0 is attached to xenbr0.
Created attachment 292029 [details] Patch to network-bridge to add scope host IP back to physical bond
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 292034 [details] modified 3.0.3-41 network-bridge script
Attaching sysreport in case it is useful. --vince
Created attachment 292542 [details] sysreport
OK I've had a look at the problem and it seems that it's all down to the netloop device. Please note that upstream has already obsoleted the netloop device. Unfortunately we still enable it by default in RHEL5. However, by modifying the network-bridge script you can operate without the netloop device. The reason the netloop device needs to go with bonding is because you can't just take away the MAC address from the bonding constituents as you can with a normal Ethernet NIC. So if you modify the scripts such that the netloop device is gone (by simply renaming xenbr0 to bond0 and using that in place of the old netloop device), the problem should go away. You also need to make sure that you don't assign the FF:* MAC address to either xenbr0 or pbond0.
Herbert, Can you please expand on your modification of the scripts? The GPS consultant at the customer site did not have any luck with his interpretation of your suggestion. >Calvin, >Does the engineers comments make sense to you? Sadly, no. I tried setting (network-script 'network-bridge=bond0') in the configuration file but that didn't work. I also couldn't understand whether xenbr0 still exists at the end of the changes or not, since I either rename it to bond0 or make sure that it doesn't get a FF:* mac address, depending on your reading of the note. If I could get some clarification or even better a sample script I might be able to make progress.
Created attachment 293521 [details] network-bridge script based on upstream unstable dated 1/30/08 but modified to support bond devices
Created attachment 293522 [details] goes along with modified upstream unstable network-bridge script (has also changed since 3.0.3-41)
NB, the upstream network-bridge script has changed beyond all recognition when compared to the RHEL-5 network-bridge script. This has significantly changed the way networking is setup in a Dom0 host, and as such is not suitable for patching into RHEL-5. If any network-bridge script changes are required they need to be small & clear patches against the current RHEL-5 version of network-bridge, not xen-unstable's version.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
FYI, current plan for addressing this bug is to try and provide an alternative "network-bridge-bonding" script, rather than changing existing network-bridge script. The alternate script will be written to not use netloop and match latest upstream. This will hopefully make bonding work, and not risk regressions for users of the existing network-bridge script.
Created attachment 325247 [details] Alternate network script for binding We cannot fix the main 'network-bridge' script for use with bonding, since that would require removal of netloop, which is a user visible change we cannot do in an update. Thus, this patch introduces a alternate script which must be explicitly requested with /etc/xen/xend-config.sxp (network-script network-bridge-bonding netdev=bond0) This script eliminates the netloop device as per Herbert's recommendation, and takes care not to break ARP monitoring during the address transfer process
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The default Xen 'network-bridge' does not work properly with bonding. When used it can cause the network interfaces to flap. This patch introduces a alternate script which must be explicitlyvrequested in the /etc/xen/xend-config.sxp file by replacing the standard network-script line with: (network-script network-bridge-bonding netdev=bond0) This script eliminates the netloop device and takes care not to break ARP monitoring during the address transfer process.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,9 +1,7 @@ -The default Xen 'network-bridge' does not work properly with bonding. When used it can cause the network interfaces to flap. +When setting up interface bonding on dom0, the default 'network-bridge' script may cause bonded network interfaces to alternately switch between 'unavailable' and 'available'. This occurrence is commonly known as flapping. -This patch introduces a alternate script which must be explicitlyvrequested in the /etc/xen/xend-config.sxp file by replacing the -standard network-script line with: +To prevent this, replace the standard network-script line in /etc/xen/xend-config.sxp with the following line: (network-script network-bridge-bonding netdev=bond0) -This script eliminates the netloop device and +Doing so will disable the netloop device, which prevents Address Resolution Protocol (ARP) monitoring from failing during the address transfer process.-takes care not to break ARP monitoring during the address transfer process.
Built into xen-3.0.3-78.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0118.html