Bug 213559

Summary: bonding does not work with xen kernel
Product: Red Hat Enterprise Linux 5 Reporter: jordan hargrave <jordan_hargrave>
Component: xenAssignee: Herbert Xu <herbert.xu>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.0CC: jfeeney, mbrodeur, wwlinuxengineering, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-xen-2.6.18-1.2747.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-29 14:33:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 189473, 200812    

Description jordan hargrave 2006-11-01 23:25:49 UTC
Description of problem:
When booting the xen kernel, bonding configuration does not work


Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-1.2714.el5xen

How reproducible:
Always


Steps to Reproduce:
1. Install RHEL5B1-x86_64 + xen + dependencies (kernel-xen-2.6.18-1.2714.el5xen)
2. Setup channel bonding (any mode. I use mode=1)
3. edit /etc/xen/xend-config.sxp for netdev=bond0
4. Reboot the system, and try to ping the other system, or gateway.

Actual results:
Ping does not work

Expected results:


Additional info:
Came across this thread on xen bonding:
   1.   Sébastien Cramatte Says:
      August 28th, 2006 at 16:20

      I’ve tested you configuration as this and works well.

      I’m trying to setup bond0 instead of eth0 but It’s little bite more
complicated. For example you can’t enslave eth0+eth1 before run Xend because to
rename an interface, this interface must be down. So you should enslave directly
in network-bridge script …

      I will post moredetails ASAP !
   2. Sébastien Cramatte Says:
      August 28th, 2006 at 17:02

      It seems that only one change is required in network-bridge script.

      You just need to enslave manualy
      a 2nd time. But be carefull, you must bring up “bond0″ and enslave
“eth0+eth1″ correctly in your main network config.

      So after that, In you main xend-config.sxp you should have a line like this :

      (network-script my-network-bridge netdev=bond0)

      In the file “my-network-bridge” that is just a copy of the original
“network-bridge” script you must add just after :

      ip set link ${pdev} up

      this line :

      ifenslave ${pdev} eth0 eth1

      Seems that works for me with VLAN inside DOMU …

      If anyone could test it …
   3. Sébastien Cramatte Says:
      August 29th, 2006 at 01:24

      After many test …
      bonding not works well !

      It’s very very slow or loose packet…

      So I’ve deciced to use only eth0

      With you config I must force vlan interface mtu to 1496 in each domU. My
e1000 are compatible with packets > 1500 but the virtual Xen network driver
seems that not …
   4. Alejandro anadon Says:
      September 10th, 2006 at 19:53

      Nice document… but the original problem could be solved if you do:
      ethtool -K eth0 tx off
      in each machine (domain0 and domainU). It seems to be a bug of Xen. The
problem of this, is that I am not sure if the checksum verification is done on
each machine or not.
      see:
     
http://wiki.xensource.com/xenwiki/XenFaq#head-4ce9767df34fe1c9cf4f85f7e07cb10110eae9b7
   5. Felipe Alfaro Solana Says:
      October 14th, 2006 at 09:51

      With respect this subject on Xen and VLANs and Bonding, Sébastien CRAMATTE
points me to the following Bug on XenSource’s Bugzilla site:

      http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=753

      In this Bug report, several users express their concerns on problems when
trying to use Bonding and VLANs. I’m not completely sure of where the problem
lies, but it could be possibly related to the Linux kernel itself.

Comment 1 jordan hargrave 2006-11-01 23:28:26 UTC
Another xen bonding thread:

http://lists.xensource.com/archives/html/xen-bugs/2006-09/msg00000.html

Comment 3 Amit Bhutani 2006-11-03 08:39:27 UTC
Crictical bug. Changing severity.

Comment 4 Samuel Benjamin 2006-11-03 15:32:19 UTC
Raised priority and added ACK flags for approval into rhel5.

Comment 5 RHEL Program Management 2006-11-03 16:00:29 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 7 Samuel Benjamin 2006-11-15 21:28:21 UTC
Dell wants to make sure we do not come back later and tell them that we do not
support bonding with Xen Kernel. I have assured them that this is being looked
at will be assigned to an engineer for evaluation into RHEL5.1.

Comment 8 Amit Bhutani 2006-11-15 21:39:23 UTC
> will be assigned to an engineer for evaluation into RHEL5.1.
Sammy- You surely mean RHEL 5.0 ??

Comment 9 Samuel Benjamin 2006-11-15 21:46:52 UTC
Yes, I was looking for this comment and I see that Amit has caught this
mistyping :-).  RHEL 5.0 it is.

Comment 10 John Feeney 2006-11-21 18:47:10 UTC
Perhaps because I am extremely literal, I got a confused when I read the entire
bug description because I thought it was describing multiple bugs. I checked
with Dell and determined that the scope of this bugzilla should be limited to
just the behavior described in the "Description of problem" and perhaps the
first two "threads" listed under "Additional Info". 

Thus the bug descripton should be "ping does not work when you bond and bridge
due to a problem with the default bridging script". 

All additional "threads on xen bonding" (from Sébastien Cramatte et al.) should
be considered superfluous to the issue at hand. I hope this clarifies the issue.

Comment 11 Jay Turner 2006-11-21 19:24:18 UTC
Deferring QE ack until we scope the necessary work.

Comment 12 Herbert Xu 2006-11-22 03:10:01 UTC
Could you please show me the output of ifconfig before you start xend and the
output afterwards? Thanks.

Comment 13 Amit Bhutani 2006-11-28 23:54:25 UTC
Appears that this issue has been resolved with Beta2 code drop. All of the
preliminary testing is positive. Issue can be marked as MODIFIED. Thanks!


Comment 14 Samuel Benjamin 2006-11-29 13:43:08 UTC
Dell verified fix. Requesting RH to close.



Comment 15 Axel Thimm 2007-03-04 09:17:35 UTC
I wonder whether this is really fixed. There is no mentioning of (bond|enslave)
anywhere in beta2's xen, and renaming the bond0 device will drop the slaves and
needs re-enslaving. Unfortunately my RHEL5 test systems are not capable of
bonding, but the issue persist on FC6 (bug #189473) and also rawhide.

The only fix I can imagine that wouldn't involve doing reenslaving in xen
scripts is that renaming the bond device does not lose its slave anymore. Is
that the case? If so, where exactly is the fix, so I can use it in Fedora?