Bug 187237
Summary: | e1000: EEPROM Checksum Is Not Valid if PXE booted | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Richard Monk <rmonk> | ||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.0 | CC: | cweyl, jbaron, konradr | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHEL4 U4 Beta | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-06-29 18:10:58 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Richard Monk
2006-03-29 14:18:03 UTC
I have a haunch that this might be related to the legacy PIC controller behaving badly (something that it does with the tg3 driver in RHEL4 U4). Can you attach what /proc/interrupts output looks when you are installing RHEL4 and Fedora Core? Having different IRQs is perfectly normal. RHEL4 install kernel boots up without initializing the APIC, while FC install kernel does. The APIC is the device for routing IRQs from devices to the CPU. If not initialized, the backwards compatible legacy PIC is used, which has a limited amount of IRQs. Have you tried booting w/ acpi=off or similar? I know turning acpi off on a laptop is bad news, but it would be worth knowing what effect (if any) it has on the issue? Created attachment 127006 [details]
output of /proc/interrupts in FC5
Another thought, under FC, can you try installing with 'noapic' bootup parameter? That should make the FC install kernel boot up using the legacy controller (as RHEL4 does). This can shed some light if this is the PIC or the e1000 driver revision being at fault. I'm now very confused. After several reboots, trying various combinations of acpi= and apic to no avail, I also did a reset of the system (this involves turning off the machine, unplugging, removing the battery, and holding the power for 10 seconds). This did nothing to help. I then told the BIOS to load defaults, rebooted, and PXE would not work. I would put in the image I wanted to boot, and it would stick at Loading and hang. I then rebooted again and PXE worked again, and lo and behold, the card is working perfectly in RHEL4 and FC5 with no special boot parameters. FC with noapic works fine either way. I was having the same issue with a T60 and T60p, and after this strange set of trials, both are in the same state of working. Could there be something fundamentally "stuck" or wrong with the hardware? I hate something that just "goes away" after some fiddling, even after resetting things. I'm going to try and do a full reset of the BIOS and system, and see if I can get it to start again. What version of PXELINUX are you loading. I know that version 3.07 had problems with some of the IBM machines. Upgrading in the lab to 3.11 of PXELINUX took those problems away. I'm running PXELINUX 3.11. I can replicate this now with a CD, so I don't think it's PXE related. FC5 always works, even with noapic nolapic acpi=off RHEL4 works, but read below: Here's what I can do to replicate it: 1. Reset system (power off, battery out, hold power for 20 secs) 2. RHEL4 PXE: Checksum error 3. Reboot, go into BIOS, and change ANY network-related setting. You can even change it back, it just has to rewrite the BIOS info 4. RHEL4 PXE: Works perfectly So, this appears to be a perfectly valid error, something is corrupting the checksum and changing a setting resets it. I'll talk with Lenovo about a fix to the BIOS. However, why does it work with FC and not RHEL? Does e1000 in FC5 ignore the checksum? I'll have to investigate the e1000 code in FC5. Lenovo got back to me and told me that it's not a hardware problem. I'm suspicious, but I doubt they're going to fix it, if it is hardware. Is this still an issue w/ RHEL4 U3 Beta? In our tests with RHEL4 U4 Beta, it works just fine (updated e1000). Consider this bug closed for me. |