Bug 430722

Summary: [RHEL5 U2] e1000e network issues while running kernel-xen variant
Product: Red Hat Enterprise Linux 5 Reporter: Jeff Burke <jburke>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: auke-jan.h.kok, dzickus, jesse.brandeburg, peterm
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 15:08:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
e1000e-crc-revert.patch none

Description Jeff Burke 2008-01-29 16:22:45 UTC
Description of problem:
 When trying to install a xen guest over nfs the installation hangs. While
trying to transfer files with wget it fails.

Version-Release number of selected component (if applicable):
 2.6.18-72.el5

How reproducible:
 Always

Steps to Reproduce:
1. Install rhel5.U1 on a system with e1000 or e1000e
2. Install the kernel-xen 2.6.18-72.el5 kernel
3. tcpdump -e -v -p -n -i peth0 icmp&
4. ping -s 1472 bigpapi.boston.redhat.com
   if the rx length is 1514 then you're good, if it's 1518 then the FCS is present.
  
Actual results:
 Install fails, Can't transfer large files.

Additional info:
 There has been discussion on this. Between Herbert, Andy and Don.

Comment 1 Andy Gospodarek 2008-01-29 18:45:28 UTC
Created attachment 293315 [details]
e1000e-crc-revert.patch

Herbert deserves the credit on this one since he seems to have figured out what
the problem was.  If we revert this patch, we are back to stripping the crc in
software.

Running the commands in comment #1 yields the following output:

13:42:17.075335 00:16:e6:8c:55:1e > 00:d0:01:25:30:0a, ethertype IPv4 (0x0800),
length 1514: (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto: ICMP (1),
length: 1500) 10.12.4.139 > 10.13.255.101: ICMP echo request, id 10767, seq 16,
length 1480
13:42:17.083581 00:d0:01:25:30:0a > 00:16:e6:8c:55:1e, ethertype IPv4 (0x0800),
length 1514: (tos 0x0, ttl 254, id 31830, offset 0, flags [DF], proto: ICMP
(1), length: 1500) 10.13.255.101 > 10.12.4.139: ICMP echo reply, id 10767, seq
16, length 1480
1480 bytes from ntap-storage0-b.boston.redhat.com (10.13.255.101): icmp_seq=16
ttl=254 time=8.25 ms
13:42:18.075355 00:16:e6:8c:55:1e > 00:d0:01:25:30:0a, ethertype IPv4 (0x0800),
length 1514: (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto: ICMP (1),
length: 1500) 10.12.4.139 > 10.13.255.101: ICMP echo request, id 10767, seq 17,
length 1480
13:42:18.081458 00:d0:01:25:30:0a > 00:16:e6:8c:55:1e, ethertype IPv4 (0x0800),
length 1514: (tos 0x0, ttl 254, id 32086, offset 0, flags [DF], proto: ICMP
(1), length: 1500) 10.13.255.101 > 10.12.4.139: ICMP echo reply, id 10767, seq
17, length 1480
1480 bytes from ntap-storage0-b.boston.redhat.com (10.13.255.101): icmp_seq=17
ttl=254 time=6.11 ms

Comment 2 Andy Gospodarek 2008-01-29 19:06:34 UTC
Auke,

We are considering reverting this patch for e1000e since it doesn't play well
with xen bridging:

commit 140a74802894e9db57e5cd77ccff77e590ece5f3
Author: Auke Kok <auke-jan.h.kok>
Date:   Thu Oct 25 13:57:58 2007 -0700

    e1000e: Re-enable SECRC - crc stripping

    This workaround code performed software stripping instead of the
    hardware which can do it much faster. None of the e1000e target
    hardware has issues with this feature and should work fine. This
    gives us some performance back on receive, and removes some
    kludging stripping the 4 bytes.

    Signed-off-by: Auke Kok <auke-jan.h.kok>
    Signed-off-by: Jeff Garzik <jeff>

From the description it seems this will only effect performance, not specific
functionality.  Do you agree with that statement?



Comment 3 Auke Kok 2008-01-29 19:46:25 UTC
correct, however I wonder why this breaks Xen - it sounds like a similar problem
we had a while ago. Jesse, do you remember?

Comment 4 RHEL Program Management 2008-01-29 21:46:38 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Andy Gospodarek 2008-01-29 22:22:29 UTC
It seems very odd to me that this would make a difference at all, but when
receiving frames on the bridge without this patch they come through an extra 4
bytes bigger.  

Is there any way that the stripping done by the e1000 hardware strips the FCS
(makes it all zeros), but doesn't change the received length?  Maybe the
subtraction of the length is still needed?  Or maybe this just happens on our
particular hardware?

07:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet
Controller (Copper) (rev 01)
07:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet
Controller (Copper) (rev 01)


Comment 7 Jesse Brandeburg 2008-01-30 00:10:36 UTC
our older drivers used SECRC with no apparent issues.  This functionality should
work.  However there really is no harm in doing it in software if you're looking
for a quick fix.

here is an excerpt from the manual:
The SECRC bit controls whether the hardware strips the Ethernet CRC from the
received packet.  This stripping occurs prior to any checksum calculations.  The
stripped CRC is not DMA’d to host memory and is not included in the length
reported in the descriptor.

Can someone explain what the actual expected result is (besides the obvious
"install over nfs should work") vs what was observed?

I'm referring specifically to the tcpdump/ping command output as it seems
everything is fine.

Comment 8 Jeff Burke 2008-01-30 02:54:33 UTC
It was also seen with the e1000e driver using this hardware:
01:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet
Controller (Copper) (rev 03)


Comment 9 Andy Gospodarek 2008-01-30 15:56:11 UTC
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel5

Please test them and report back your results.

Comment 10 Don Zickus 2008-02-01 21:11:12 UTC
in 2.6.18-77.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 13 errata-xmlrpc 2008-05-21 15:08:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html