Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1210086

Summary:

skb_warn_bad_offload if a bond of Intel interfaces and VLANs are used

Product:

[oVirt] ovirt-node

Reporter:

Sebastian Schrader <sebastian.schrader+bugzilla.redhat.com>

Component:

General

Assignee:

Fabian Deutsch <fdeutsch>

Status:

CLOSED DEFERRED

QA Contact:

bugs <bugs>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

---

CC:

bugs, ecohen, fdeutsch, jbloemen, lsurette, mgoldboi, ovirt-bugs, pablo.iranzo, parsonsa, rbalakri, sebastian.schrader+bugzilla.redhat.com, yeylon

Target Milestone:

---

Keywords:

TestOnly

Target Release:

---

Flags:

ylavi: ovirt-3.6.0?
ylavi: planning_ack?
ylavi: devel_ack?
ylavi: testing_ack?

Hardware:

x86_64

OS:

Linux

Whiteboard:

node

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-10-22 13:05:31 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Node

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Host Traceback if guest NIC is e1000	none
Guest Traceback if guest NIC is virtio_net	none

Description Sebastian Schrader 2015-04-08 21:26:38 UTC

Created attachment 1012410 [details]
Host Traceback if guest NIC is e1000

Description of problem:
If you use VLANs on top of a 802.3ad bond of ixgbe network interfaces, packets are lost and skb_warn_bad_offload is reported by the host or guest kernel.

Version-Release number of selected component (if applicable):
3.5-0.201502231653.el7

How reproducible:
Use VLANs on a 802.3ad bond that consists of Intel network interfaces, e.g. ixgbe (maybe e1000e is also affected?).

Steps to Reproduce:
1. Install oVirt Node
2. Create a 802.3ad bond with ixgbe slaves
3. Use VLANs

Actual results:
Loss of every packets with certain parameters (probably size) and warnings in the guest or host kernel log.


Expected results:
Proper transmission of the packets.

Additional info:
If virtio_net is used in for the guests NICs the errors are reported by the guest kernel, if emulated NICs like e1000 are used the errors are reported by the host kernel.

The bug has been reported to the Intel network team here:
http://sourceforge.net/p/e1000/bugs/434/

Comment 1 Sebastian Schrader 2015-04-08 21:27:28 UTC

Created attachment 1012411 [details]
Guest Traceback if guest NIC is virtio_net

Comment 2 Fabian Deutsch 2015-04-13 11:20:44 UTC

oVirt Node is always pulling in the latest kernel from CentOS, this means we depend upon when CentOS is fixing this issue.

Comment 3 Fabian Deutsch 2015-04-27 14:27:36 UTC

Did you try if one of the current builds fixes this issue?

Comment 4 Sebastian Schrader 2015-04-27 22:12:10 UTC

I couldn't try it yet, http://resources.ovirt.org/pub/ovirt-3.5/iso/ yields a 500 Internal Server Error currently.

Comment 5 Fabian Deutsch 2015-04-29 12:19:51 UTC

Right, there was a reorganization, please try the latest build from this CI job:

http://jenkins.ovirt.org/job/ovirt-node_ovirt-3.5_create-iso-el7_merged/

Comment 6 Jurriën Bloemen 2015-05-11 11:43:38 UTC

I have the same problems and I tried the latest build in Comment 5.
This has not resolved the problem. (for me)

Comment 7 Sebastian Schrader 2015-05-11 14:20:36 UTC

I got my hands back on a machine with the affected hardware. The problem is still there.

Comment 8 Fabian Deutsch 2015-05-28 07:04:53 UTC

Okay. Thanks for testing, this can actually be a dupe of bug 1217848. We need to see when it get's resolved there, or in the upstream kernel.

Comment 9 Sebastian Schrader 2015-05-29 13:12:44 UTC

I'm not authorized for the bug you referenced.

Comment 10 Sebastian Schrader 2015-06-03 22:50:30 UTC

Yesterday, somebody reported something interesting in the referenced Kernel bug.
https://bugzilla.kernel.org/show_bug.cgi?id=82471

The bug can be avoided by disabling Large Receive Offload (LRO). I could confirm this with our machines by manually disabling it. Unfortunately I don't know how to make this change persistent with oVirt Node during reboots or apply it to a fleet of machines.

On a related note the ixgbe README contains an important warning, maybe LRO should be disabled completely in the oVirt Node ixgbe module:

Important Note
--------------

WARNING: The ixgbe driver compiles by default with the Large Receive Offload
(LRO) feature enabled. This option offers the lowest CPU utilization for
receives but is completely incompatible with *routing/ip forwarding* and
*bridging*. If enabling ip forwarding or bridging is a requirement, it is
necessary to disable LRO using compile time options as noted in the LRO
section later in this document. The result of not disabling LRO when combined
with ip forwarding or bridging can be low throughput or even a kernel panic

Comment 11 Fabian Deutsch 2015-10-22 13:05:31 UTC

In future it will be easier to do this kind of changes, deferring this bug.