Bug 463503

Summary: EEPROM/NVM of the e1000e becomes corrupted
Product: Red Hat Enterprise Linux 5 Reporter: Andy Gospodarek <agospoda>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.3CC: bruce.w.allan, ddomingo, djuran, dkovalsk, gbeshers, jane.lv, jcm, jesse.brandeburg, john.ronciak, jvillalo, keve.a.gabbert, ltroan, martinez, peterm, rpacheco, tao, youquan.song
Target Milestone: beta   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://bugzilla.kernel.org/show_bug.cgi?id=11382
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The e1000e driver for Intel(R) PRO/1000 ethernet devices has been updated to the upstream version 0.3.3.3-k2. With this update, the EEPROM and NVM of supported devices are now write-protected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:06:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 459202    
Bug Blocks: 432382, 436045, 454962    

Description Andy Gospodarek 2008-09-23 18:46:48 UTC
+++ This bug was initially created as a clone of Bug #459202 +++

Description of problem:
I am unable to use my Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 03). System does not see it. Pleae find dmesg output.


e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
e1000e: Copyright (c) 1999-2007 Intel Corporation.
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:19.0 to 64
iTCO_vendor_support: vendor-support=0
0000:00:19.0: The NVM Checksum Is Not Valid
ACPI: PCI interrupt for device 0000:00:19.0 disabled
e1000e: probe of 0000:00:19.0 failed with error -5

Version-Release number of selected component (if applicable):
Driver version 0.2.0

How reproducible:
Happens everytime

Steps to Reproduce:
1.Boot computer

--- Additional comment from yaneti on 2008-08-14 20:44:06 EDT ---

What kernel version is this? Has this adapter ever worked under Fedora. If yes when did it stop?

--- Additional comment from klich.michal on 2008-08-15 19:36:51 EDT ---

I am sorry, i totally forgot about these details.
Kernels which i have:
2.6.25.11-97.fc9.x86_64
2.6.25.14-108.fc9.x86_64

I guess it stopped shortly after i upgraded to F9. It must have been one of first kernel updates. I am not sure if that ever worked in F9.

Strange thing, on ubuntu i can not use it too. I do not have dmesg output yet. I will try and see if this matches.

--- Additional comment from cebbert on 2008-08-21 21:42:03 EDT ---

Can you post the output of 'lspci -nn -s 0000:00:19.0'?

--- Additional comment from klich.michal on 2008-08-22 02:41:39 EDT ---

Output you have requested:

00:19.0 Ethernet controller [0200]: Intel Corporation 82566DC Gigabit Network Connection [8086:104b] (rev 03)

--- Additional comment from jesse.brandeburg on 2008-09-11 13:27:22 EDT ---

The driver you have supports your hardware, but is erroring out on load.
The "NVM checksum is not valid" means that something corrupted your system BIOS flash.

Can you please give us details about the hardware in your system, attach the output of 
# lspci -vvv > lspci.txt

# dmidecode > dmiout.txt

we have some reports that Lenovo systems (a lot of them) are starting to have this issue.

Please DO NOT run ibautil as some sites on the web suggest to try to fix this issue.  It will likely cause you to have to replace your motherboard to get LAN functionality back.

--- Additional comment from klich.michal on 2008-09-11 18:31:41 EDT ---

Created an attachment (id=316491)
dmiout.txt

--- Additional comment from klich.michal on 2008-09-11 18:32:07 EDT ---

Created an attachment (id=316492)
lspci.txt

--- Additional comment from klich.michal on 2008-09-12 02:28:13 EDT ---

I have messed around a little with my card. Just wanted to check some suggestions point out here http://www.thinkwiki.org/wiki/Problem_with_e1000:_EEPROM_Checksum_Is_Not_Valid#Solutions

Little orange led on my ethernet is constantly flashing, when i tried with unloading e1000e module it did not changed anything. When i plugged in cable it stopped and green led showed up, meaning that connection is ok though driver still failed to load.
If you need any other info i will gladly help.

--- Additional comment from jesse.brandeburg on 2008-09-12 12:28:34 EDT ---

okay, so you have an HP machine with an ICH8 chipset.  I don't know what the little orange LED flashing means, I will have to check on that.

can you get into the iAMT setup just after BIOS completes by pressing CTRL-p?
not sure if that might help you or not.

If I attach a debug driver here would you be willing to compile and run it?

--- Additional comment from klich.michal on 2008-09-14 10:42:31 EDT ---

I am not able to open iAMT setup. I believe that i do not have that option as i have found that to enable that i need to go to my BIOS settings and turn it on in Power section. Well, i do not have it there.

Yes, please attach driver.

--- Additional comment from jesse.brandeburg on 2008-09-22 19:08:54 EDT ---

Created an attachment (id=317425)
driver with csum check bypass

here is a driver that just prints the message but doesn't error out if the checksum validation fails.

This should allow you to run ethtool -e ethX after loading the driver.

--- Additional comment from jesse.brandeburg on 2008-09-22 19:10:02 EDT ---

the difference in the driver I just attached is:
diff -rup e1000e-0.4.1.7.orig/src/netdev.c e1000e-0.4.1.7/src/netdev.c
--- e1000e-0.4.1.7.orig/src/netdev.c    2008-06-23 09:27:33.000000000 -0700
+++ e1000e-0.4.1.7/src/netdev.c 2008-09-22 16:06:59.000000000 -0700
@@ -56,7 +56,7 @@

 #define DRV_DEBUG

-#define DRV_VERSION "0.4.1.7" DRV_NAPI DRV_DEBUG
+#define DRV_VERSION "0.4.1.7_nocsum" DRV_NAPI DRV_DEBUG
 char e1000e_driver_name[] = "e1000e";
 const char e1000e_driver_version[] = DRV_VERSION;

@@ -5309,8 +5309,10 @@ static int __devinit e1000_probe(struct
                        break;
                if (i == 2) {
                        e_err("The NVM Checksum Is Not Valid\n");
+                       /* JJJ skip around error path
                        err = -EIO;
                        goto err_eeprom;
+                        JJJ end */
                }
        }

--- Additional comment from jesse.brandeburg on 2008-09-22 19:35:50 EDT ---

also, whole piles of reports now starting to converge, many of them linked here:

http://bugzilla.kernel.org/show_bug.cgi?id=11382

I'm trying to work a plan to help address this soonest.

--- Additional comment from cebbert on 2008-09-22 21:40:51 EDT ---

Michal, have you ever booted a Fedora 10 Alpha or rawhide disk on that system?

--- Additional comment from klich.michal on 2008-09-23 02:51:35 EDT ---

Yes, i have rawhide on my system.
Last two kernels i have
2.6.27-0.226.rc1.git5.fc10.i686
2.6.27-0.244.rc2.git1.fc10.i686

I do not know which one killed my port. If you want me to run it or something i am unable to have any internet connection on that kernels, wifi does not work, eth you know.

--- Additional comment from wtogami on 2008-09-23 10:16:33 EDT ---

Does this mean Fedora 9 is not to blame for killing e1000e?

Slashdot reported that Fedora 9 and 10 are affected, but it sounds like only rawhide has the problem.

--- Additional comment from jcm on 2008-09-23 11:16:21 EDT ---

FWIW, I've heard of similar problems with recent -RT kernels.

Comment 4 Ronald Pacheco 2008-09-30 19:59:07 UTC
John,

Can you update this BZ with your latest updates?

Comment 5 John Ronciak 2008-09-30 20:22:03 UTC
We have been submitting patches to the e1000e to protect the NVM.  We will be submitting another patch later today that has a way to make the NVM read only.  If you want to write to the NVM for any reason, you will need to reload the driver with a load time parameter to be able to write the NVM.  We will be pushing the patch as _the_ mechanism to protect the NVM for the e1000e driver.  So with this fix added to the e1000e driver we see no reason to hold up the driver update for this driver.  All of these patches will not fix the cause of the problem but will protect the NVM from being corrupted.  So the next reboot of the system the LOM will still be seen on the PCI bus.

Comment 8 RHEL Program Management 2008-10-02 14:44:26 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 Don Zickus 2008-10-06 15:56:28 UTC
in kernel-2.6.18-118.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 14 Jesse Brandeburg 2008-10-21 20:46:00 UTC
since this was root caused to be due to CONFIG_DYNAMIC_FTRACE and I don't think el5 has ftrace enabled (correct me if I'm wrong) then I don't think there is anything to worry about for EL5.

Comment 15 Andy Gospodarek 2008-10-22 13:59:33 UTC
You are correct, Jesse.  We will ship with the protection anyway since it's already in there and will ease the concerns of some of our customers.

Comment 18 Don Domingo 2008-11-21 03:48:55 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The e1000e driver for Intel(R) PRO/1000 ethernet devices has been updated to the upstream version 0.3.3.3-k2. With this update, the EEPROM and NVM of supported devices are now write-protected.

Comment 21 errata-xmlrpc 2009-01-20 20:06:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html