Bug 656450 - [e1000e] probe fails with error -5 (-EIO) - The NVM Checksum Is Not Valid
Summary: [e1000e] probe fails with error -5 (-EIO) - The NVM Checksum Is Not Valid
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Flavio Leitner
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-23 18:55 UTC by Flavio Leitner
Modified: 2018-10-27 13:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-27 12:46:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
194.el5-e1000e-disables-NVR-check.patch (461 bytes, patch)
2010-11-23 19:24 UTC, Flavio Leitner
no flags Details | Diff
dmesg while loading the patched driver (122.64 KB, text/plain)
2010-11-23 19:27 UTC, Flavio Leitner
no flags Details

Description Flavio Leitner 2010-11-23 18:55:06 UTC
Description of problem:

Can't use the interface because it fails to probe the NIC.
See the messages log below:

Sep  6 15:30:52 kernel: ACPI: PCI interrupt for device 0000:0d:00.0 disabled
Sep  6 15:30:52 kernel: e1000e: probe of 0000:0d:00.0 failed with error -5
Sep  6 15:30:52 kernel: PCI: Enabling device 0000:0d:00.1 (0040 -> 0043)
Sep  6 15:30:52 kernel: ACPI: PCI Interrupt 0000:0d:00.1[B] -> GSI 31 (level, low) -> IRQ 193
Sep  6 15:30:52 kernel: 0000:0d:00.1: 0000:0d:00.1: Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
Sep  6 15:30:55 kernel: 0000:0d:00.1: 0000:0d:00.1: The NVM Checksum Is Not Valid
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep  6 15:30:55 kernel: ACPI: PCI interrupt for device 0000:0d:00.1 disabled
Sep  6 15:30:55 kernel: e1000e: probe of 0000:0d:00.1 failed with error -5

HP ProLiant DL785 G6
https://hardware.redhat.com/show.cgi?id=538598

Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 05)
8086:105e 

looking at the source code, it does the following:

drivers/net/e1000e/netdev.c:
5079          * systems with ASPM and others may see the checksum fail on the fi     rst
5080          * attempt. Let's give it a few tries
5081          */
5082         for (i = 0;; i++) {
5083                 if (e1000_validate_nvm_checksum(&adapter->hw) >= 0)
5084                         break;
5085                 if (i == 2) {
5086                         e_err("The NVM Checksum Is Not Valid\\n");
5087                         err = -EIO;
5088                         goto err_eeprom;
5089                 }
5090         }

In order to get the ethtool -e output, a test kernel with the lines #5087 and #5088 commented out had been provided but unfortunately the kernel hangs and the watchdog prints the following:

BUG: soft lockup - CPU#45 stuck for 10s! [insmod:14756]
...
Pid: 14756, comm: insmod Tainted: P      2.6.18-194.el5.gss00320226.1 #1
RIP: 0010:[<ffffffff887edcd9>]  [<ffffffff887edcd9>] :e1000e:e1000e_poll_eerd_eewr_done+0x17/0x43
...
Call Trace:
 [<ffffffff887edcf5>] :e1000e:e1000e_poll_eerd_eewr_done+0x33/0x43
 [<ffffffff887ef13c>] :e1000e:e1000e_read_nvm_eerd+0x52/0x85
 [<ffffffff887ee2b4>] :e1000e:e1000e_validate_nvm_checksum_generic+0x26/0x50
 [<ffffffff887e9273>] :e1000e:e1000_validate_nvm_checksum_82571+0x8d/0x94
 [<ffffffff887e9c73>] :e1000e:e1000_reset_hw_82571+0xd6/0x145
 [<ffffffff887f9eb5>] :e1000e:e1000_probe+0x554/0xb7f
 [<ffffffff8015e733>] pci_device_probe+0x104/0x184
 [<ffffffff801c8873>] driver_probe_device+0x52/0xaa
 [<ffffffff801c89a2>] __driver_attach+0x65/0xb6
 [<ffffffff801c893d>] __driver_attach+0x0/0xb6
 [<ffffffff801c817a>] bus_for_each_dev+0x43/0x6e
 [<ffffffff801c7db6>] bus_add_driver+0x76/0x110
 [<ffffffff8015ea4f>] __pci_register_driver+0x51/0xa6
 [<ffffffff800a7fe0>] sys_init_module+0xaf/0x1f2
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

The kernel and the patch (linux-kernel-test.patch) are in CVS.

Version-Release number of selected component (if applicable):
kernel-2.6.18-164.el5 (rhel-x86_64-server-5)

How reproducible:
Always

Steps to Reproduce:
1. Not known

Comment 3 Flavio Leitner 2010-11-23 19:24:47 UTC
Created attachment 462421 [details]
194.el5-e1000e-disables-NVR-check.patch

Comment 4 Flavio Leitner 2010-11-23 19:27:04 UTC
Created attachment 462422 [details]
dmesg while loading the patched driver

Comment 5 Flavio Leitner 2010-11-24 14:30:32 UTC
We will try to get a dump using flashrom utility.

Comment 7 Tushar Dave 2010-12-22 20:09:11 UTC
Flavio,
Sorry for the delayed response.
Do you still have this issue?

Comment 8 Flavio Leitner 2010-12-27 12:46:25 UTC
Hi Tushar,

The customer has replaced the card and it is working for now.

Flavio


Note You need to log in before you can comment on or make changes to this bug.