Bug 694324

Summary: RHEL 6.0 pxeboot.0 fails to boot
Product: Red Hat Enterprise Linux 6 Reporter: IBM Bug Proxy <bugproxy>
Component: syslinuxAssignee: Peter Jones <pjones>
Status: CLOSED CANTFIX QA Contact: Release Test Team <release-test-team>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: balkov, hpa, jeder, jkachuck, rwilliam
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-02 20:22:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 684953, 705163    

Description IBM Bug Proxy 2011-04-07 01:01:31 UTC
---Problem Description---
Attempting to PXE boot a machine using pxeboot.0 from RHEL 6.0's
syslinux package fails with the following message:

  "It appears your computer has less than 268K of low ("DOS") RAM...."

However, the machine has many GB of memory.

Substituting pxelinux.0 from RHEL 5.5 causes the machine to boot.
Therefore, this can be considered a regression from RHEL 5.5.

This looks to have something to do with uEFI.  The machines in
question are running uEFI version 1.10 (build ID D6E150AUS)

It appears that something has changed in pxelinux.0 that is stopping
it from working in some sort of legacy mode.  I note that the upstream
pxelinux.0 maintainer expects uEFI support some time mid-2011, so I'm
guessing RHEL 6.0 does not support uEFI booting using pxelinux.0.

This behaves the same way using pxelinux.0 from the latest RHEL 6.1 snapshot.

In all cases the machines are setup to do a legacy boot rather than a uEFI boot.  Therefore, 
uEFI probably isn't.

It is hard to find information on this so if there's no fix then I
would appreciate some useful advice.  :-)

In the mean time I'll check that the uEFI build being used is the
latest available and that the PXE configuration is set to use legacy
support.
 
Machine Type = 3650 M2 
  
---Steps to Reproduce---
 1. Setup PXE network boot.
2. Attempt to boot machine.

Comment 2 RHEL Program Management 2011-04-07 01:04:51 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 H. Peter Anvin 2011-05-24 16:44:58 UTC
Could someone try pxelinux.0 from Syslinux 4.05-pre1 (just created to debug this situation)?  I don't expect it to solve the problem, just give enough information to actually make this problem debuggable.

Comment 4 H. Peter Anvin 2011-05-24 22:24:33 UTC
For what it's worth, I have gotten a *lot* of problem reports about IBM servers lately.  IBM might want to get in touch with me directly about this and several other issues.

Comment 6 IBM Bug Proxy 2011-06-15 02:40:34 UTC
------- Comment From martins.com 2011-06-14 22:34 EDT-------
(In reply to comment #7)
> Could someone try pxelinux.0 from Syslinux 4.05-pre1 (just created to debug
> this situation)?  I don't expect it to solve the problem, just give enough
> information to actually make this problem debuggable.

I'm the middle-man here and I don't have a good enough understanding
of pxelinux or how it is being used here for me to make sense of the
following.  Perhaps you can, Peter?  ;-)

I'm pasting 2 sets of comments here.  The 1st one is about local boot
and the other says that network booting now works with 4.05-pre1!  Do
you need any extra debug info?  If so, is there a configuration option
that will switch it on?  Should the behaviour have changed this much?

--------8<---------8<-------- CUT HERE --------8<---------8<--------

I gave it a shot.  It did behave differently this time.  It seemed to
have no problem retrieving the pxelinux.cfg file from the tftp server
but just seemed to keep pxebooting off of the adapter over and over
again.  That was convienient in that it let me capture all the output
when it loaded:

PXELINUX 4.05 0x4de70367 (Copyright notice)
!PXE entry point found (we hope) at 96B1:00D6 via plan A
UNDI code segment at 96B1 len 532C
UNDI data segment at 90EA len 5C70
Getting cached packet 01 02 03
My IP address seems to be AC1F1045 172.31.16.69
ip=172.31.16.69:172.31.8.1:172.31.0.1:255.255.128.0
BOOTIF=01-e4-1f-13-b5-77-e0
SYSUUID=4ec33eee-ed56-3677-bd4e-36303a1285d3
TFTP prefix:
Trying to load: pxelinux.cfg/01-e4-1f-13-b5-77-e0               ok
Booting from local disk...
PXE-M0F: Exiting Broadcom PXE ROM

After I believe I captured everything being printed to the screen I
copied the old pxelinux.0 from rhel5.5 back to see if it could
continue to boot.  It didn't, it got stuck when trying to perform the
local boot which I assume was due to whatever the the 4.05 file was
doing.  Here is the output
3.10 gives:

PXELINUX 3.10 2005-08-24 (copyright notice)
UNDI data segment at:   00090EA0
UNDI data segment size: 5C70
UNDI data segment at:   00096B10
UNDI data segment size: 532C
PXE entry point found (we hope) at 96B1:00D6
My IP address seems to be AC1F1049 172.31.16.73
ip=172.31.16.73:172.31.8.1:172.31.0.1:255.255.128.0
TFTP prefix:
Trying to load: pxelinux.cfg/01-e4-1f-13-b5-77-e2
Booting from local disk...
PXE-M0F: Exiting Broadcom PXE ROM.

Now, we have both internal network adapters set to pxeboot.  Normally,
when the node is installed, we send the command to boot locally.  When
this happens it will pxeboot from the first nic, then the next nic,
and then continue with the boot sequence until it boots to the hard
disk.  When I copied the older pxelinux.0 I did see it booted from the
first nic, and the second nic, and then got stuck.  The above output
is for the second nic.  After rebooting the server it was able to boot
just fine.

>From the screen it looks like the nics boot UNDI PXE-2.1 v5.2.7.  Our
pxelinux.cfg files are pretty simple, for local booting we just have:

default local

label local
localboot 1

Is there anything we can send it to print out any extra debugging
information that may be useful?  It does seem to read the pxelinux.cfg
file so we might be able to give it something else to do besides
"localboot 1".  Also, the system I am using gets reloaded frequently
so I would like to try this file and see how it behaves when we do
network install of the node.  I will let you know what it does then.

--------8<---------8<-------- CUT HERE --------8<---------8<--------

Network installs seem to work just fine with this pxelinux file.  Went
by too fast to capture the output but the node starting installing red
hat without complaining.  Our pxelinux.cfg file for the node looks
like this

# cat 01-e4-1f-13-b5-77-e0
default network

label local
localboot 1

label network
kernel vmlinuz
append ksdevice=e4:1f:13:b5:77:e0 load_ramdisk=1 initrd=initrd.img ks=http://172.31.8.2:54080/kickstart/ks_node_net.cfg node_type=node node=78810XB nompath blacklist=qla2xxx

Given that this file does not produce the weird 268K of low DOS RAM
message, I think we're looking somewhat good.  Do you know if the
localboot behavior changed?  If we can simply supply a different
instruction to have it continue the boot sequence from pxe then we
should be good.

--------8<---------8<-------- CUT HERE --------8<---------8<--------

Comment 8 H. Peter Anvin 2011-06-16 00:36:44 UTC
"If we can simply supply a different
instruction to have it continue the boot sequence from pxe then we
should be good."

There is no such instruction.  All pxelinux *can* do is hand control back to the BIOS.  There are two ways defined for returning to the BIOS - far return and INT 18h.  pxelinux has historically used far return, but INT 18h was used in some versions if a EFI CSM was detected, in an attempt to work around a bug... unfortunately the workaround broke other machines so it was reverted.

Since you're not getting the low DOS RAM warning anymore I have no way to know what is going on there.

In your case, it sounds like the BIOS is simply booting the same boot alternative over and over again.  There isn't really anything that pxelinux can do about that.

Comment 9 David Cantrell 2011-08-02 20:22:25 UTC
CLOSED CANTFIX per comment #8.