459747 – PXE boot

Bug 459747 - PXE boot

Summary: PXE boot

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andy Gospodarek
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-08-21 20:11 UTC by George Beshers
Modified:	2018-10-19 19:22 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-02-24 13:29:54 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description George Beshers 2008-08-21 20:11:02 UTC

IT#189898: ice-csn nodes with RHEL5.2 won't PXE boot unless power-cycled

Description of problem:

 PXE boot on ice-csn nodes works only once after the
 AC power cords are plugged in.

How reproducible:

 Always.

Steps to Reproduce:

 The leader nodes are configured to have PXE as the first
 boot option in their boot order.  The same results are
 seen when using F12 to trigger a PXE boot.

   * halt system
   * unplug system
   * plug in system
   * power up
   * PXE configured to boot from local disk works
   * system starts up
   * after system is up, issue a reboot command
   * system reboots, BIOS comes up, etc.
   * PXE boot is attempted but fails:
     -> PXE moving dash appears on the console
     -> DHCP server sees the DHCP requests from the node
     -> PXE can't hear the DHCP server replies
     -> PXE fails saying it didn't hear valid responses
   * repeat the cycle from the top (including
     unplugging AC power) and PXE works again

 It appears that the driver puts the NIC into some weird
 state where it cannot PXE boot after the kernel is
 loaded.

 If the e1000 driver is replaced with one built from
 sourceforge and the steps above repeated, the PXE boot
 does not fail.  There's no need to unplug the node to
 get PXE to work.  The sourceforge driver used was
 e1000-7.6.15.5; it was only tested on SLES, but should
 presumably work on RHEL too.

Actual results:

 PXE boot works only once after AC power cords are
 plugged in.

Expected results:

 PXE boot works every time.

Comment 1 George Beshers 2008-08-25 18:38:14 UTC

From erikj:

rhel5.2 and rhel5.1 have the problem.
rehl5.0 does not have the problem (but also wasn't using e1000e as far as
I can tell).

Here is the driver information I have for what's in rhel5.2 (broken for
pxe after reboots on ice-csn hardware) and what is in the SF version:

old driver from rhel5.2:
driver: e1000e
version: 0.2.0
firmware-version: 2.1-12
bus-info: 0000:04:00.0

new driver from sourceforge (works):
driver: e1000e
version: 0.4.1.7-NAPI
firmware-version: 2.1-12
bus-info: 0000:04:00.0

==================================

lsmod confirms that e1000 not e1000e is loaded in 5.0.

Linus' tree has drivers/net/e1000e/netdev.c:#define DRV_VERSION "0.3.3.3-k2"

George

Comment 2 George Beshers 2008-08-25 18:55:04 UTC


There are substantial differences between upstream
(Linus' tree) and whats in 5.3 for the files under
driver/net/e1000e.  This is good news in that a
backport of a specific patch may be all that is
required.

Alright John, I will get back to other work :).

George

Comment 3 Andy Gospodarek 2008-09-23 15:38:23 UTC

Has is been confirmed that this is still broken on the latest 5.3 development kernels?  There have been some significant changes to the e1000e driver for 5.3 (we updated to 0.3.3.3-k2).  

I'd love to know if this has been fixed upstream and in rhel5.3 or if we need to add this to the list of bugs-fixed-in-sourceforge-driver-but-not-upstream.

Comment 4 John Ronciak 2008-09-23 15:52:57 UTC

Andy, I think this was fixed upstream (if it was the same problem) and I think you already backported the change.  Can somebody at RH try it with your 5.3 test kernel?

Comment 5 Andy Gospodarek 2008-10-06 13:15:28 UTC

George, any chance you can test the latest rhel5.3 beta kernel?

Comment 6 George Beshers 2008-10-06 14:02:31 UTC

I will tomorrow morning.  I got a different behavior Thursday afternoon
with the patch from rhkernel list applied, but I am not sure if the problem
is solved.

George

Comment 7 George Beshers 2008-10-07 20:18:10 UTC

Gospo,

I am seeing a different behavior, but I am seeing another
problem, probably unrelated.  I get beyond the point where
PXE won't load anaconda, but anaconda crashes after asking
for the nfs mount point.  This is rebooting from the
2.5.18-118 kernel.

I also had a problem with a failure to format the disk
drive on one attempt.

I need to try this on other machines.

George

Comment 8 Andy Gospodarek 2008-10-08 20:32:44 UTC

George, that is interesting.  It seems to be like this might indicate that the PXE failure from the description might be fixed though.  Do you think that's a fair statement?

Comment 9 George Beshers 2008-11-06 19:49:25 UTC

Hi Andy,

Yes, I have tried the beta on a couple of systems
and it is working fine.  The snapshot1 is currently
being worked on by SGI QE and when they confirm it
working I will flag this as verified.

Thanks,
George

Comment 11 Andy Gospodarek 2009-02-24 13:29:54 UTC

closing based on comment #9

Note You need to log in before you can comment on or make changes to this bug.