Created attachment 1012410 [details] Host Traceback if guest NIC is e1000 Description of problem: If you use VLANs on top of a 802.3ad bond of ixgbe network interfaces, packets are lost and skb_warn_bad_offload is reported by the host or guest kernel. Version-Release number of selected component (if applicable): 3.5-0.201502231653.el7 How reproducible: Use VLANs on a 802.3ad bond that consists of Intel network interfaces, e.g. ixgbe (maybe e1000e is also affected?). Steps to Reproduce: 1. Install oVirt Node 2. Create a 802.3ad bond with ixgbe slaves 3. Use VLANs Actual results: Loss of every packets with certain parameters (probably size) and warnings in the guest or host kernel log. Expected results: Proper transmission of the packets. Additional info: If virtio_net is used in for the guests NICs the errors are reported by the guest kernel, if emulated NICs like e1000 are used the errors are reported by the host kernel. The bug has been reported to the Intel network team here: http://sourceforge.net/p/e1000/bugs/434/
Created attachment 1012411 [details] Guest Traceback if guest NIC is virtio_net
oVirt Node is always pulling in the latest kernel from CentOS, this means we depend upon when CentOS is fixing this issue.
Did you try if one of the current builds fixes this issue?
I couldn't try it yet, http://resources.ovirt.org/pub/ovirt-3.5/iso/ yields a 500 Internal Server Error currently.
Right, there was a reorganization, please try the latest build from this CI job: http://jenkins.ovirt.org/job/ovirt-node_ovirt-3.5_create-iso-el7_merged/
I have the same problems and I tried the latest build in Comment 5. This has not resolved the problem. (for me)
I got my hands back on a machine with the affected hardware. The problem is still there.
Okay. Thanks for testing, this can actually be a dupe of bug 1217848. We need to see when it get's resolved there, or in the upstream kernel.
I'm not authorized for the bug you referenced.
Yesterday, somebody reported something interesting in the referenced Kernel bug. https://bugzilla.kernel.org/show_bug.cgi?id=82471 The bug can be avoided by disabling Large Receive Offload (LRO). I could confirm this with our machines by manually disabling it. Unfortunately I don't know how to make this change persistent with oVirt Node during reboots or apply it to a fleet of machines. On a related note the ixgbe README contains an important warning, maybe LRO should be disabled completely in the oVirt Node ixgbe module: Important Note -------------- WARNING: The ixgbe driver compiles by default with the Large Receive Offload (LRO) feature enabled. This option offers the lowest CPU utilization for receives but is completely incompatible with *routing/ip forwarding* and *bridging*. If enabling ip forwarding or bridging is a requirement, it is necessary to disable LRO using compile time options as noted in the LRO section later in this document. The result of not disabling LRO when combined with ip forwarding or bridging can be low throughput or even a kernel panic
In future it will be easier to do this kind of changes, deferring this bug.