Bug 187237 - e1000: EEPROM Checksum Is Not Valid if PXE booted
Summary: e1000: EEPROM Checksum Is Not Valid if PXE booted
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: John W. Linville
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-03-29 14:18 UTC by Richard Monk
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version: RHEL4 U4 Beta
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-06-29 18:10:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
output of /proc/interrupts in FC5 (543 bytes, text/plain)
2006-03-29 17:29 UTC, Richard Monk
no flags Details

Description Richard Monk 2006-03-29 14:18:03 UTC
Description of problem:
When booting RHEL4-U3 from PXE on a Lenovo T60p, I get:

<3>e1000: 0000:02:00.0: e1000_probe: The EEPROM Checksum Is Not Valid
<4>e1000: probe of 0000:02:00.0 failed with error -5

If I use Fedora 5, it works just fine.

on RHEL4, `lspci -v -d 8086:109a` gives me
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
        Subsystem: Lenovo Unknown device 2001
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 3000 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
        Capabilities: [e0] Express Endpoint IRQ 0
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 77-d9-0b-ff-ff-58-15-00

on FC5, `lspci -v -d 8086:109a` gives me:
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
        Subsystem: Lenovo Unknown device 2001
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 3000 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
        Capabilities: [e0] Express Endpoint IRQ 0
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 77-d9-0b-ff-ff-58-15-00

I notice the IRQs are different.  Is this expected?

Also, it works just fine from rescue mode RHEL4-U3, but not from PXE?  PXE related?

Summary of Tests:
RHEL4-U3 PXE: Checksum error
RHEL4-U3 CD: works
Fedora PXE: works

So I think the PXE is doing something to the NICs firmware that's making RHEL4
very unhappy.

Version-Release number of selected component (if applicable):
kernel 2.6.9-34 (U3 stock)


How reproducible:
Attempt to install RHEL4-U3 over PXE

Actual results:
e1000 driver reports checksum is invalid

Expected results:
e1000 driver operates properly

Additional info:

Comment 1 Konrad Rzeszutek 2006-03-29 14:32:55 UTC
I have a haunch that this might be related to the legacy PIC controller behaving
badly (something that it does with the tg3 driver in RHEL4 U4). Can you attach
what /proc/interrupts output looks when you are installing RHEL4 and Fedora Core? 

Having different IRQs is perfectly normal. RHEL4 install kernel boots up without
initializing the APIC, while FC install kernel does. The APIC is the device for
routing IRQs from devices to the CPU. If not initialized, the  backwards
compatible legacy PIC is used, which has a limited amount of IRQs.

Comment 2 John W. Linville 2006-03-29 16:14:45 UTC
Have you tried booting w/ acpi=off or similar?  I know turning acpi off on a 
laptop is bad news, but it would be worth knowing what effect (if any) it has 
on the issue? 

Comment 3 Richard Monk 2006-03-29 17:29:30 UTC
Created attachment 127006 [details]
output of /proc/interrupts in FC5

Comment 4 Konrad Rzeszutek 2006-03-29 17:46:05 UTC
Another thought, under FC, can you try installing with 'noapic' bootup
parameter? That should make the FC install kernel boot up using the legacy
controller (as RHEL4 does). This can shed some light if this is the PIC or the
e1000 driver revision being at fault.

Comment 5 Richard Monk 2006-03-29 17:56:57 UTC
I'm now very confused.

After several reboots, trying various combinations of acpi= and apic to no
avail, I also did a reset of the system (this involves turning off the machine,
unplugging, removing the battery, and holding the power for 10 seconds).  This
did nothing to help.  I then told the BIOS to load defaults, rebooted, and PXE
would not work.  I would put in the image I wanted to boot, and it would stick at

Loading

and hang.  I then rebooted again and PXE worked again, and lo and behold, the
card is working perfectly in RHEL4 and FC5 with no special boot parameters.  FC
with noapic works fine either way.

I was having the same issue with a T60 and T60p, and after this strange set of
trials, both are in the same state of working.  Could there be something
fundamentally "stuck" or wrong with the hardware?  I hate something that just
"goes away" after some fiddling, even after resetting things.

I'm going to try and do a full reset of the BIOS and system, and see if I can
get it to start again.

Comment 6 Konrad Rzeszutek 2006-03-29 18:13:22 UTC
What version of PXELINUX are you loading. I know that version 3.07 had problems
with some of the IBM machines. Upgrading in the lab to 3.11 of PXELINUX took
those problems away.

Comment 7 Richard Monk 2006-03-29 18:35:48 UTC
I'm running PXELINUX 3.11.  I can replicate this now with a CD, so I don't think
it's PXE related.

FC5 always works, even with noapic nolapic acpi=off
RHEL4 works, but read below:

Here's what I can do to replicate it:

1. Reset system (power off, battery out, hold power for 20 secs)
2. RHEL4 PXE: Checksum error
3. Reboot, go into BIOS, and change ANY network-related setting.  You can even
change it back, it just has to rewrite the BIOS info
4. RHEL4 PXE: Works perfectly

So, this appears to be a perfectly valid error, something is corrupting the
checksum and changing a setting resets it.  I'll talk with Lenovo about a fix to
the BIOS.

However, why does it work with FC and not RHEL?  Does e1000 in FC5 ignore the
checksum?  I'll have to investigate the e1000 code in FC5.


Comment 8 Richard Monk 2006-04-07 11:36:12 UTC
Lenovo got back to me and told me that it's not a hardware problem.  I'm
suspicious, but I doubt they're going to fix it, if it is hardware.

Comment 9 John W. Linville 2006-05-16 20:32:22 UTC
Is this still an issue w/ RHEL4 U3 Beta? 

Comment 10 Richard Monk 2006-06-29 11:11:10 UTC
In our tests with RHEL4 U4 Beta, it works just fine (updated e1000).  Consider
this bug closed for me.


Note You need to log in before you can comment on or make changes to this bug.