Bug 459747 - PXE boot
PXE boot
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
medium Severity high
: rc
: ---
Assigned To: Andy Gospodarek
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-21 16:11 EDT by George Beshers
Modified: 2014-06-29 19:00 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-24 08:29:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description George Beshers 2008-08-21 16:11:02 EDT
IT#189898: ice-csn nodes with RHEL5.2 won't PXE boot unless power-cycled

Description of problem:

 PXE boot on ice-csn nodes works only once after the
 AC power cords are plugged in.

How reproducible:

 Always.

Steps to Reproduce:

 The leader nodes are configured to have PXE as the first
 boot option in their boot order.  The same results are
 seen when using F12 to trigger a PXE boot.

   * halt system
   * unplug system
   * plug in system
   * power up
   * PXE configured to boot from local disk works
   * system starts up
   * after system is up, issue a reboot command
   * system reboots, BIOS comes up, etc.
   * PXE boot is attempted but fails:
     -> PXE moving dash appears on the console
     -> DHCP server sees the DHCP requests from the node
     -> PXE can't hear the DHCP server replies
     -> PXE fails saying it didn't hear valid responses
   * repeat the cycle from the top (including
     unplugging AC power) and PXE works again

 It appears that the driver puts the NIC into some weird
 state where it cannot PXE boot after the kernel is
 loaded.

 If the e1000 driver is replaced with one built from
 sourceforge and the steps above repeated, the PXE boot
 does not fail.  There's no need to unplug the node to
 get PXE to work.  The sourceforge driver used was
 e1000-7.6.15.5; it was only tested on SLES, but should
 presumably work on RHEL too.

Actual results:

 PXE boot works only once after AC power cords are
 plugged in.

Expected results:

 PXE boot works every time.
Comment 1 George Beshers 2008-08-25 14:38:14 EDT
From erikj@sgi.com:

rhel5.2 and rhel5.1 have the problem.
rehl5.0 does not have the problem (but also wasn't using e1000e as far as
I can tell).

Here is the driver information I have for what's in rhel5.2 (broken for
pxe after reboots on ice-csn hardware) and what is in the SF version:

old driver from rhel5.2:
driver: e1000e
version: 0.2.0
firmware-version: 2.1-12
bus-info: 0000:04:00.0

new driver from sourceforge (works):
driver: e1000e
version: 0.4.1.7-NAPI
firmware-version: 2.1-12
bus-info: 0000:04:00.0

==================================

lsmod confirms that e1000 not e1000e is loaded in 5.0.

Linus' tree has drivers/net/e1000e/netdev.c:#define DRV_VERSION "0.3.3.3-k2"

George
Comment 2 George Beshers 2008-08-25 14:55:04 EDT

There are substantial differences between upstream
(Linus' tree) and whats in 5.3 for the files under
driver/net/e1000e.  This is good news in that a
backport of a specific patch may be all that is
required.

Alright John, I will get back to other work :).

George
Comment 3 Andy Gospodarek 2008-09-23 11:38:23 EDT
Has is been confirmed that this is still broken on the latest 5.3 development kernels?  There have been some significant changes to the e1000e driver for 5.3 (we updated to 0.3.3.3-k2).  

I'd love to know if this has been fixed upstream and in rhel5.3 or if we need to add this to the list of bugs-fixed-in-sourceforge-driver-but-not-upstream.
Comment 4 John Ronciak 2008-09-23 11:52:57 EDT
Andy, I think this was fixed upstream (if it was the same problem) and I think you already backported the change.  Can somebody at RH try it with your 5.3 test kernel?
Comment 5 Andy Gospodarek 2008-10-06 09:15:28 EDT
George, any chance you can test the latest rhel5.3 beta kernel?
Comment 6 George Beshers 2008-10-06 10:02:31 EDT
I will tomorrow morning.  I got a different behavior Thursday afternoon
with the patch from rhkernel list applied, but I am not sure if the problem
is solved.

George
Comment 7 George Beshers 2008-10-07 16:18:10 EDT
Gospo,

I am seeing a different behavior, but I am seeing another
problem, probably unrelated.  I get beyond the point where
PXE won't load anaconda, but anaconda crashes after asking
for the nfs mount point.  This is rebooting from the
2.5.18-118 kernel.

I also had a problem with a failure to format the disk
drive on one attempt.

I need to try this on other machines.

George
Comment 8 Andy Gospodarek 2008-10-08 16:32:44 EDT
George, that is interesting.  It seems to be like this might indicate that the PXE failure from the description might be fixed though.  Do you think that's a fair statement?
Comment 9 George Beshers 2008-11-06 14:49:25 EST
Hi Andy,

Yes, I have tried the beta on a couple of systems
and it is working fine.  The snapshot1 is currently
being worked on by SGI QE and when they confirm it
working I will flag this as verified.

Thanks,
George
Comment 11 Andy Gospodarek 2009-02-24 08:29:54 EST
closing based on comment #9

Note You need to log in before you can comment on or make changes to this bug.