Bug 1210086 - skb_warn_bad_offload if a bond of Intel interfaces and VLANs are used
Summary: skb_warn_bad_offload if a bond of Intel interfaces and VLANs are used
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-node
Classification: oVirt
Component: General
Version: ---
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Fabian Deutsch
QA Contact: bugs@ovirt.org
URL:
Whiteboard: node
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-08 21:26 UTC by Sebastian Schrader
Modified: 2019-04-28 13:22 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-10-22 13:05:31 UTC
oVirt Team: Node
Embargoed:
ylavi: ovirt-3.6.0?
ylavi: planning_ack?
ylavi: devel_ack?
ylavi: testing_ack?


Attachments (Terms of Use)
Host Traceback if guest NIC is e1000 (4.42 KB, text/plain)
2015-04-08 21:26 UTC, Sebastian Schrader
no flags Details
Guest Traceback if guest NIC is virtio_net (4.49 KB, text/plain)
2015-04-08 21:27 UTC, Sebastian Schrader
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 82471 0 None None None 2019-05-14 07:01:22 UTC

Description Sebastian Schrader 2015-04-08 21:26:38 UTC
Created attachment 1012410 [details]
Host Traceback if guest NIC is e1000

Description of problem:
If you use VLANs on top of a 802.3ad bond of ixgbe network interfaces, packets are lost and skb_warn_bad_offload is reported by the host or guest kernel.

Version-Release number of selected component (if applicable):
3.5-0.201502231653.el7

How reproducible:
Use VLANs on a 802.3ad bond that consists of Intel network interfaces, e.g. ixgbe (maybe e1000e is also affected?).

Steps to Reproduce:
1. Install oVirt Node
2. Create a 802.3ad bond with ixgbe slaves
3. Use VLANs

Actual results:
Loss of every packets with certain parameters (probably size) and warnings in the guest or host kernel log.


Expected results:
Proper transmission of the packets.

Additional info:
If virtio_net is used in for the guests NICs the errors are reported by the guest kernel, if emulated NICs like e1000 are used the errors are reported by the host kernel.

The bug has been reported to the Intel network team here:
http://sourceforge.net/p/e1000/bugs/434/

Comment 1 Sebastian Schrader 2015-04-08 21:27:28 UTC
Created attachment 1012411 [details]
Guest Traceback if guest NIC is virtio_net

Comment 2 Fabian Deutsch 2015-04-13 11:20:44 UTC
oVirt Node is always pulling in the latest kernel from CentOS, this means we depend upon when CentOS is fixing this issue.

Comment 3 Fabian Deutsch 2015-04-27 14:27:36 UTC
Did you try if one of the current builds fixes this issue?

Comment 4 Sebastian Schrader 2015-04-27 22:12:10 UTC
I couldn't try it yet, http://resources.ovirt.org/pub/ovirt-3.5/iso/ yields a 500 Internal Server Error currently.

Comment 5 Fabian Deutsch 2015-04-29 12:19:51 UTC
Right, there was a reorganization, please try the latest build from this CI job:

http://jenkins.ovirt.org/job/ovirt-node_ovirt-3.5_create-iso-el7_merged/

Comment 6 Jurriën Bloemen 2015-05-11 11:43:38 UTC
I have the same problems and I tried the latest build in Comment 5.
This has not resolved the problem. (for me)

Comment 7 Sebastian Schrader 2015-05-11 14:20:36 UTC
I got my hands back on a machine with the affected hardware. The problem is still there.

Comment 8 Fabian Deutsch 2015-05-28 07:04:53 UTC
Okay. Thanks for testing, this can actually be a dupe of bug 1217848. We need to see when it get's resolved there, or in the upstream kernel.

Comment 9 Sebastian Schrader 2015-05-29 13:12:44 UTC
I'm not authorized for the bug you referenced.

Comment 10 Sebastian Schrader 2015-06-03 22:50:30 UTC
Yesterday, somebody reported something interesting in the referenced Kernel bug.
https://bugzilla.kernel.org/show_bug.cgi?id=82471

The bug can be avoided by disabling Large Receive Offload (LRO). I could confirm this with our machines by manually disabling it. Unfortunately I don't know how to make this change persistent with oVirt Node during reboots or apply it to a fleet of machines.

On a related note the ixgbe README contains an important warning, maybe LRO should be disabled completely in the oVirt Node ixgbe module:

Important Note
--------------

WARNING: The ixgbe driver compiles by default with the Large Receive Offload
(LRO) feature enabled. This option offers the lowest CPU utilization for
receives but is completely incompatible with *routing/ip forwarding* and
*bridging*. If enabling ip forwarding or bridging is a requirement, it is
necessary to disable LRO using compile time options as noted in the LRO
section later in this document. The result of not disabling LRO when combined
with ip forwarding or bridging can be low throughput or even a kernel panic

Comment 11 Fabian Deutsch 2015-10-22 13:05:31 UTC
In future it will be easier to do this kind of changes, deferring this bug.


Note You need to log in before you can comment on or make changes to this bug.