Red Hat Bugzilla – Bug 459747
Last modified: 2014-06-29 19:00:32 EDT
IT#189898: ice-csn nodes with RHEL5.2 won't PXE boot unless power-cycled
Description of problem:
PXE boot on ice-csn nodes works only once after the
AC power cords are plugged in.
Steps to Reproduce:
The leader nodes are configured to have PXE as the first
boot option in their boot order. The same results are
seen when using F12 to trigger a PXE boot.
* halt system
* unplug system
* plug in system
* power up
* PXE configured to boot from local disk works
* system starts up
* after system is up, issue a reboot command
* system reboots, BIOS comes up, etc.
* PXE boot is attempted but fails:
-> PXE moving dash appears on the console
-> DHCP server sees the DHCP requests from the node
-> PXE can't hear the DHCP server replies
-> PXE fails saying it didn't hear valid responses
* repeat the cycle from the top (including
unplugging AC power) and PXE works again
It appears that the driver puts the NIC into some weird
state where it cannot PXE boot after the kernel is
If the e1000 driver is replaced with one built from
sourceforge and the steps above repeated, the PXE boot
does not fail. There's no need to unplug the node to
get PXE to work. The sourceforge driver used was
e1000-22.214.171.124; it was only tested on SLES, but should
presumably work on RHEL too.
PXE boot works only once after AC power cords are
PXE boot works every time.
rhel5.2 and rhel5.1 have the problem.
rehl5.0 does not have the problem (but also wasn't using e1000e as far as
I can tell).
Here is the driver information I have for what's in rhel5.2 (broken for
pxe after reboots on ice-csn hardware) and what is in the SF version:
old driver from rhel5.2:
new driver from sourceforge (works):
lsmod confirms that e1000 not e1000e is loaded in 5.0.
Linus' tree has drivers/net/e1000e/netdev.c:#define DRV_VERSION "0.3.3.3-k2"
There are substantial differences between upstream
(Linus' tree) and whats in 5.3 for the files under
driver/net/e1000e. This is good news in that a
backport of a specific patch may be all that is
Alright John, I will get back to other work :).
Has is been confirmed that this is still broken on the latest 5.3 development kernels? There have been some significant changes to the e1000e driver for 5.3 (we updated to 0.3.3.3-k2).
I'd love to know if this has been fixed upstream and in rhel5.3 or if we need to add this to the list of bugs-fixed-in-sourceforge-driver-but-not-upstream.
Andy, I think this was fixed upstream (if it was the same problem) and I think you already backported the change. Can somebody at RH try it with your 5.3 test kernel?
George, any chance you can test the latest rhel5.3 beta kernel?
I will tomorrow morning. I got a different behavior Thursday afternoon
with the patch from rhkernel list applied, but I am not sure if the problem
I am seeing a different behavior, but I am seeing another
problem, probably unrelated. I get beyond the point where
PXE won't load anaconda, but anaconda crashes after asking
for the nfs mount point. This is rebooting from the
I also had a problem with a failure to format the disk
drive on one attempt.
I need to try this on other machines.
George, that is interesting. It seems to be like this might indicate that the PXE failure from the description might be fixed though. Do you think that's a fair statement?
Yes, I have tried the beta on a couple of systems
and it is working fine. The snapshot1 is currently
being worked on by SGI QE and when they confirm it
working I will flag this as verified.
closing based on comment #9