Bug 656450

Summary: [e1000e] probe fails with error -5 (-EIO) - The NVM Checksum Is Not Valid
Product: Red Hat Enterprise Linux 5 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: Flavio Leitner <fleitner>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: agospoda, bruce.w.allan, carolyn.wyborny, jesse.brandeburg, pcfe, tushar.n.dave
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-27 12:46:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
194.el5-e1000e-disables-NVR-check.patch
none
dmesg while loading the patched driver none

Description Flavio Leitner 2010-11-23 18:55:06 UTC
Description of problem:

Can't use the interface because it fails to probe the NIC.
See the messages log below:

Sep  6 15:30:52 kernel: ACPI: PCI interrupt for device 0000:0d:00.0 disabled
Sep  6 15:30:52 kernel: e1000e: probe of 0000:0d:00.0 failed with error -5
Sep  6 15:30:52 kernel: PCI: Enabling device 0000:0d:00.1 (0040 -> 0043)
Sep  6 15:30:52 kernel: ACPI: PCI Interrupt 0000:0d:00.1[B] -> GSI 31 (level, low) -> IRQ 193
Sep  6 15:30:52 kernel: 0000:0d:00.1: 0000:0d:00.1: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
Sep  6 15:30:55 kernel: 0000:0d:00.1: 0000:0d:00.1: The NVM Checksum Is Not Valid
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep  6 15:30:55 kernel: ACPI: PCI interrupt for device 0000:0d:00.1 disabled
Sep  6 15:30:55 kernel: e1000e: probe of 0000:0d:00.1 failed with error -5

HP ProLiant DL785 G6
https://hardware.redhat.com/show.cgi?id=538598

Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 05)
8086:105e 

looking at the source code, it does the following:

drivers/net/e1000e/netdev.c:
5079          * systems with ASPM and others may see the checksum fail on the fi     rst
5080          * attempt. Let's give it a few tries
5081          */
5082         for (i = 0;; i++) {
5083                 if (e1000_validate_nvm_checksum(&adapter->hw) >= 0)
5084                         break;
5085                 if (i == 2) {
5086                         e_err("The NVM Checksum Is Not Valid\\n");
5087                         err = -EIO;
5088                         goto err_eeprom;
5089                 }
5090         }

In order to get the ethtool -e output, a test kernel with the lines #5087 and #5088 commented out had been provided but unfortunately the kernel hangs and the watchdog prints the following:

BUG: soft lockup - CPU#45 stuck for 10s! [insmod:14756]
...
Pid: 14756, comm: insmod Tainted: P      2.6.18-194.el5.gss00320226.1 #1
RIP: 0010:[<ffffffff887edcd9>]  [<ffffffff887edcd9>] :e1000e:e1000e_poll_eerd_eewr_done+0x17/0x43
...
Call Trace:
 [<ffffffff887edcf5>] :e1000e:e1000e_poll_eerd_eewr_done+0x33/0x43
 [<ffffffff887ef13c>] :e1000e:e1000e_read_nvm_eerd+0x52/0x85
 [<ffffffff887ee2b4>] :e1000e:e1000e_validate_nvm_checksum_generic+0x26/0x50
 [<ffffffff887e9273>] :e1000e:e1000_validate_nvm_checksum_82571+0x8d/0x94
 [<ffffffff887e9c73>] :e1000e:e1000_reset_hw_82571+0xd6/0x145
 [<ffffffff887f9eb5>] :e1000e:e1000_probe+0x554/0xb7f
 [<ffffffff8015e733>] pci_device_probe+0x104/0x184
 [<ffffffff801c8873>] driver_probe_device+0x52/0xaa
 [<ffffffff801c89a2>] __driver_attach+0x65/0xb6
 [<ffffffff801c893d>] __driver_attach+0x0/0xb6
 [<ffffffff801c817a>] bus_for_each_dev+0x43/0x6e
 [<ffffffff801c7db6>] bus_add_driver+0x76/0x110
 [<ffffffff8015ea4f>] __pci_register_driver+0x51/0xa6
 [<ffffffff800a7fe0>] sys_init_module+0xaf/0x1f2
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

The kernel and the patch (linux-kernel-test.patch) are in CVS.

Version-Release number of selected component (if applicable):
kernel-2.6.18-164.el5 (rhel-x86_64-server-5)

How reproducible:
Always

Steps to Reproduce:
1. Not known

Comment 3 Flavio Leitner 2010-11-23 19:24:47 UTC
Created attachment 462421 [details]
194.el5-e1000e-disables-NVR-check.patch

Comment 4 Flavio Leitner 2010-11-23 19:27:04 UTC
Created attachment 462422 [details]
dmesg while loading the patched driver

Comment 5 Flavio Leitner 2010-11-24 14:30:32 UTC
We will try to get a dump using flashrom utility.

Comment 7 Tushar Dave 2010-12-22 20:09:11 UTC
Flavio,
Sorry for the delayed response.
Do you still have this issue?

Comment 8 Flavio Leitner 2010-12-27 12:46:25 UTC
Hi Tushar,

The customer has replaced the card and it is working for now.

Flavio