Description of problem: When booting RHEL4-U3 from PXE on a Lenovo T60p, I get: <3>e1000: 0000:02:00.0: e1000_probe: The EEPROM Checksum Is Not Valid <4>e1000: probe of 0000:02:00.0 failed with error -5 If I use Fedora 5, it works just fine. on RHEL4, `lspci -v -d 8086:109a` gives me 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller Subsystem: Lenovo Unknown device 2001 Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at ee000000 (32-bit, non-prefetchable) [size=128K] I/O ports at 3000 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 77-d9-0b-ff-ff-58-15-00 on FC5, `lspci -v -d 8086:109a` gives me: 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller Subsystem: Lenovo Unknown device 2001 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at ee000000 (32-bit, non-prefetchable) [size=128K] I/O ports at 3000 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 77-d9-0b-ff-ff-58-15-00 I notice the IRQs are different. Is this expected? Also, it works just fine from rescue mode RHEL4-U3, but not from PXE? PXE related? Summary of Tests: RHEL4-U3 PXE: Checksum error RHEL4-U3 CD: works Fedora PXE: works So I think the PXE is doing something to the NICs firmware that's making RHEL4 very unhappy. Version-Release number of selected component (if applicable): kernel 2.6.9-34 (U3 stock) How reproducible: Attempt to install RHEL4-U3 over PXE Actual results: e1000 driver reports checksum is invalid Expected results: e1000 driver operates properly Additional info:
I have a haunch that this might be related to the legacy PIC controller behaving badly (something that it does with the tg3 driver in RHEL4 U4). Can you attach what /proc/interrupts output looks when you are installing RHEL4 and Fedora Core? Having different IRQs is perfectly normal. RHEL4 install kernel boots up without initializing the APIC, while FC install kernel does. The APIC is the device for routing IRQs from devices to the CPU. If not initialized, the backwards compatible legacy PIC is used, which has a limited amount of IRQs.
Have you tried booting w/ acpi=off or similar? I know turning acpi off on a laptop is bad news, but it would be worth knowing what effect (if any) it has on the issue?
Created attachment 127006 [details] output of /proc/interrupts in FC5
Another thought, under FC, can you try installing with 'noapic' bootup parameter? That should make the FC install kernel boot up using the legacy controller (as RHEL4 does). This can shed some light if this is the PIC or the e1000 driver revision being at fault.
I'm now very confused. After several reboots, trying various combinations of acpi= and apic to no avail, I also did a reset of the system (this involves turning off the machine, unplugging, removing the battery, and holding the power for 10 seconds). This did nothing to help. I then told the BIOS to load defaults, rebooted, and PXE would not work. I would put in the image I wanted to boot, and it would stick at Loading and hang. I then rebooted again and PXE worked again, and lo and behold, the card is working perfectly in RHEL4 and FC5 with no special boot parameters. FC with noapic works fine either way. I was having the same issue with a T60 and T60p, and after this strange set of trials, both are in the same state of working. Could there be something fundamentally "stuck" or wrong with the hardware? I hate something that just "goes away" after some fiddling, even after resetting things. I'm going to try and do a full reset of the BIOS and system, and see if I can get it to start again.
What version of PXELINUX are you loading. I know that version 3.07 had problems with some of the IBM machines. Upgrading in the lab to 3.11 of PXELINUX took those problems away.
I'm running PXELINUX 3.11. I can replicate this now with a CD, so I don't think it's PXE related. FC5 always works, even with noapic nolapic acpi=off RHEL4 works, but read below: Here's what I can do to replicate it: 1. Reset system (power off, battery out, hold power for 20 secs) 2. RHEL4 PXE: Checksum error 3. Reboot, go into BIOS, and change ANY network-related setting. You can even change it back, it just has to rewrite the BIOS info 4. RHEL4 PXE: Works perfectly So, this appears to be a perfectly valid error, something is corrupting the checksum and changing a setting resets it. I'll talk with Lenovo about a fix to the BIOS. However, why does it work with FC and not RHEL? Does e1000 in FC5 ignore the checksum? I'll have to investigate the e1000 code in FC5.
Lenovo got back to me and told me that it's not a hardware problem. I'm suspicious, but I doubt they're going to fix it, if it is hardware.
Is this still an issue w/ RHEL4 U3 Beta?
In our tests with RHEL4 U4 Beta, it works just fine (updated e1000). Consider this bug closed for me.