Bug 220725

Summary: e1000 driver thinks EEPROM checksum is invalid
Product: [Fedora] Fedora Reporter: Ralf Ertzinger <redhat-bugzilla>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: jesse.brandeburg, mhuhtala, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: F9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-04 19:29:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ralf Ertzinger 2006-12-24 17:22:21 UTC
Description of problem:
Since kernel-2.6.19-1.2891.fc7 the e1000 driver thinks that the EEPROM checksum
of my notebook's network controller is invalid. kernel-2.6.19-1.2890.fc7 detects
and drives the card just fine.

2890 boot log:
e1000: 0000:02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:16:d3:32:bc:a6
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Half Duplex
e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO

2891 boot log:
e1000: 0000:02:00.0: e1000_probe: The EEPROM Checksum Is Not Valid
e1000: probe of 0000:02:00.0 failed with error -5


Version-Release number of selected component (if applicable):
kernel-2.6.19-1.2891.fc7

How reproducible:
Always

Steps to Reproduce:
1. Update to 2891, boot
2.
3.
  
Actual results:
No network

Expected results:
Network

Additional info:

Comment 1 Ralf Ertzinger 2006-12-24 18:01:40 UTC
Turns out this is not deterministic. The driver has found the card after a reboot.

Comment 2 Tom London 2006-12-26 01:11:28 UTC
I see this problem on Thinkpad X60 with various kernels; 
but this only happens on a 'restart'.

Once this occurs, it repeats until I do a poweroff/poweron cycle. [Last night
had about 5 restarts in a row with this problem.]

lspci entry:
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller

Any other info that would be useful?

Comment 3 Ralf Ertzinger 2006-12-26 13:19:40 UTC
Thinkpad X60s here. The lspci output is the same.

Comment 4 Ralf Ertzinger 2007-02-04 11:23:48 UTC
Most probably the BIOS does not fully reset the card on reboot (or botches
something up).
Maybe the driver could do an additional hard reset of the card.

Comment 5 Chuck Ebbert 2007-02-05 16:52:08 UTC
There is some good information about this problem at:

http://www.thinkwiki.org/wiki/Problem_with_e1000:_EEPROM_Checksum_Is_Not_Valid



Comment 6 Tom London 2007-02-23 15:32:22 UTC
Problem still exists with kernel-2.6.20-1.2940.fc7

Comment 7 Tom London 2007-03-10 19:25:37 UTC
Running the script described in #5 seems to have fixed this for me.

Comment 8 Mikko Huhtala 2007-05-18 13:18:59 UTC
I'm using a ThinkPad R50 and kernel 2.6.21-1.3163.fc7. Cold start (power off and
on) works ok, but after issuing the command 'reboot', booting ends with a kernel
panic and the following message:

  EIP: [<f8a8944f>] e1000_clean+0x1d6/0x23a [e1000] SS:ESP 0068:c0762fa8
  Kernel panic - not syncing: Fatal exception in interrupt

This is reproducible: cold start works every time and reboot fails every time.

lspci says that the NIC is Intel 82540P, rev 03

I'm not sure if this is the same problem as the one #5 refers to. Will try the
Lenovo script.



Comment 9 Mikko Huhtala 2007-05-18 13:24:06 UTC
I had a typo in #8, the NIC is 82540EP, rev 03

The Lenovo script says it does not apply to this NIC.

Should I file a separate bug?


Comment 10 Chuck Ebbert 2007-05-18 22:43:25 UTC
(In reply to comment #9)
> I had a typo in #8, the NIC is 82540EP, rev 03
> 
> The Lenovo script says it does not apply to this NIC.
> 
> Should I file a separate bug?
> 

See #240339

Comment 11 Lubomir Kundrak 2007-10-04 08:07:05 UTC
My t60 claimed that my internal e1000 is gone this morning and it will delay
initialization. It turned out to be exactly this problem:


Oct  4 09:37:28 localhost kernel: Intel(R) PRO/1000 Network Driver - version
7.3.20-k2-NAPI
Oct  4 09:37:28 localhost kernel: Copyright (c) 1999-2006 Intel Corporation.
Oct  4 09:37:28 localhost kernel: ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16
(level, low) -> IRQ 16
Oct  4 09:37:28 localhost kernel: e1000: 0000:02:00.0: e1000_probe: The EEPROM
Checksum Is Not Valid
Oct  4 09:37:28 localhost kernel: ACPI: PCI interrupt for device 0000:02:00.0
disabled
Oct  4 09:37:28 localhost kernel: e1000: probe of 0000:02:00.0 failed with error -5

I was not able to do anything about that. I did both warm and cold reboots -- no
success. In a horror I wanted to find something that would correct an eprom in
the internets; though when attempting to connect to the interents from my wife's
machine I found out that the network is down. It was chuck [1] who disconnected
the network switch's power cord.

[1] http://skosi.org/~lkundrak/misc/cica.jpeg

When I connected the switch back, the driver did not refuse to load anymore (I
did not reboot, just rmmod/modprobe). I am not able to reproduce that, but
looking at the log I see that this happened once before, a couple of days back
(I was running the same kernel as now then).

My kernel is:

Linux gopher 2.6.22.9-91.fc7 #1 SMP Thu Sep 27 23:10:59 EDT 2007 i686 i686 i386
GNU/Linux

And the ethernet adapter is:

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
02:00.0 0200: 8086:109a

Comment 12 Jesse Brandeburg 2008-01-08 22:34:30 UTC
this is the ASPM issue, well documented and fix submitted upstream.

https://bugzilla.redhat.com/show_bug.cgi?id=400561

Comment 13 Jesse Brandeburg 2008-01-08 22:37:00 UTC
oops, I was referring to comment 10 in bug #400561, has this patch:
https://bugzilla.redhat.com/attachment.cgi?id=272011

which should fix this issue on T60/61, the two bugs are unrelated otherwise.

Comment 14 Chuck Ebbert 2008-01-08 22:47:12 UTC
davej reports getting the invalid checksum error even after the ASPM disable
patch had been applied.

Comment 15 Bug Zapper 2008-05-14 02:31:35 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 16 Dave Jones 2008-05-29 01:35:51 UTC
This should be fixed now, without the need for module options or the like.  Ralf ?

Comment 17 Ralf Ertzinger 2008-05-29 08:22:19 UTC
It definitely works for me. The reason for this is not known
to me, though, since I also once tried Lenovo's solution
(which did some ethtool magic fiddling with the card EEPROM,
I think)