Description of problem: When booting the xen kernel, bonding configuration does not work Version-Release number of selected component (if applicable): kernel-xen-2.6.18-1.2714.el5xen How reproducible: Always Steps to Reproduce: 1. Install RHEL5B1-x86_64 + xen + dependencies (kernel-xen-2.6.18-1.2714.el5xen) 2. Setup channel bonding (any mode. I use mode=1) 3. edit /etc/xen/xend-config.sxp for netdev=bond0 4. Reboot the system, and try to ping the other system, or gateway. Actual results: Ping does not work Expected results: Additional info: Came across this thread on xen bonding: 1. Sébastien Cramatte Says: August 28th, 2006 at 16:20 I’ve tested you configuration as this and works well. I’m trying to setup bond0 instead of eth0 but It’s little bite more complicated. For example you can’t enslave eth0+eth1 before run Xend because to rename an interface, this interface must be down. So you should enslave directly in network-bridge script … I will post moredetails ASAP ! 2. Sébastien Cramatte Says: August 28th, 2006 at 17:02 It seems that only one change is required in network-bridge script. You just need to enslave manualy a 2nd time. But be carefull, you must bring up “bond0″ and enslave “eth0+eth1″ correctly in your main network config. So after that, In you main xend-config.sxp you should have a line like this : (network-script my-network-bridge netdev=bond0) In the file “my-network-bridge” that is just a copy of the original “network-bridge” script you must add just after : ip set link ${pdev} up this line : ifenslave ${pdev} eth0 eth1 Seems that works for me with VLAN inside DOMU … If anyone could test it … 3. Sébastien Cramatte Says: August 29th, 2006 at 01:24 After many test … bonding not works well ! It’s very very slow or loose packet… So I’ve deciced to use only eth0 With you config I must force vlan interface mtu to 1496 in each domU. My e1000 are compatible with packets > 1500 but the virtual Xen network driver seems that not … 4. Alejandro anadon Says: September 10th, 2006 at 19:53 Nice document… but the original problem could be solved if you do: ethtool -K eth0 tx off in each machine (domain0 and domainU). It seems to be a bug of Xen. The problem of this, is that I am not sure if the checksum verification is done on each machine or not. see: http://wiki.xensource.com/xenwiki/XenFaq#head-4ce9767df34fe1c9cf4f85f7e07cb10110eae9b7 5. Felipe Alfaro Solana Says: October 14th, 2006 at 09:51 With respect this subject on Xen and VLANs and Bonding, Sébastien CRAMATTE points me to the following Bug on XenSource’s Bugzilla site: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=753 In this Bug report, several users express their concerns on problems when trying to use Bonding and VLANs. I’m not completely sure of where the problem lies, but it could be possibly related to the Linux kernel itself.
Another xen bonding thread: http://lists.xensource.com/archives/html/xen-bugs/2006-09/msg00000.html
Crictical bug. Changing severity.
Raised priority and added ACK flags for approval into rhel5.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion.
Dell wants to make sure we do not come back later and tell them that we do not support bonding with Xen Kernel. I have assured them that this is being looked at will be assigned to an engineer for evaluation into RHEL5.1.
> will be assigned to an engineer for evaluation into RHEL5.1. Sammy- You surely mean RHEL 5.0 ??
Yes, I was looking for this comment and I see that Amit has caught this mistyping :-). RHEL 5.0 it is.
Perhaps because I am extremely literal, I got a confused when I read the entire bug description because I thought it was describing multiple bugs. I checked with Dell and determined that the scope of this bugzilla should be limited to just the behavior described in the "Description of problem" and perhaps the first two "threads" listed under "Additional Info". Thus the bug descripton should be "ping does not work when you bond and bridge due to a problem with the default bridging script". All additional "threads on xen bonding" (from Sébastien Cramatte et al.) should be considered superfluous to the issue at hand. I hope this clarifies the issue.
Deferring QE ack until we scope the necessary work.
Could you please show me the output of ifconfig before you start xend and the output afterwards? Thanks.
Appears that this issue has been resolved with Beta2 code drop. All of the preliminary testing is positive. Issue can be marked as MODIFIED. Thanks!
Dell verified fix. Requesting RH to close.
I wonder whether this is really fixed. There is no mentioning of (bond|enslave) anywhere in beta2's xen, and renaming the bond0 device will drop the slaves and needs re-enslaving. Unfortunately my RHEL5 test systems are not capable of bonding, but the issue persist on FC6 (bug #189473) and also rawhide. The only fix I can imagine that wouldn't involve doing reenslaving in xen scripts is that renaming the bond device does not lose its slave anymore. Is that the case? If so, where exactly is the fix, so I can use it in Fedora?