Bug 728010

Summary: KVM guests cannot boot PXE "local"
Product: Red Hat Enterprise Linux 5 Reporter: Brian Cook <bcook>
Component: kvmAssignee: Gleb Natapov <gleb>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 5.5.zCC: anton, berrange, bkearney, bpeck, clalance, crobinso, cvantuin, dwmw2, ewan, gcosta, gfa, itamar, jaswinder, jforbes, jlaska, juzhang, k.georgiou, knoel, lars, mgregg, mhlavink, mkenneth, ondrejj, quintela, rhod, shawn.starr, shuang, sputhenp, tburke, virt-maint, virt-maint
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 472236 Environment:
Last Closed: 2011-10-17 14:31:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580949    

Description Brian Cook 2011-08-03 22:27:29 UTC
+++ This bug was initially created as a clone of Bug #472236 +++

Description of problem:

All QA automated systems rely on PXE local booting for proper provisioning and testing.  All systems are configured in the BIOS to boot PXE first.

When we want to provision the systems, we modify the PXE target (using RHTS or now cobbler).

When we want to boot locally to run tests, we set the default PXE target to "local".

KVM guests do no honor the PXE "local" target.  It seems that once you boot PXE, KVM doesn't attach the already installed disks.

Version-Release number of selected component (if applicable):

kernel-2.6.27.5-113.fc10.x86_64
libvirt-0.4.6-3.fc10.x86_64
kvm-74-5.fc10.x86_64

How reproducible:

Every time.

Steps to Reproduce:
1. Set KVM guest PXE target to "Network Boot" using virt-manager
2. Boot the KVM guest.
3. In the PXE menu, type "local"
  
Actual results:

 * See attached screenshot, xml, and libvirt logfile.

Expected results:

The system should behave as a "real" system behaves and boot the local disk.

Additional info:

 * This makes adding KVM guests into test automation a bit funky since we'll need to do a workaround which involves:

When you want to reprovision a guest:
 1) virsh destroy $GUEST
 2) virsh undefine $GUEST
 3) Edit xml to boot off network
 4) virsh define $XMLFILE
 5) virsh start $GUEST

We'd then need to repeat to have it boot to local disk.

--- Additional comment from jlaska on 2008-11-19 10:08:23 EST ---

Created attachment 324048 [details]
Screenshot

--- Additional comment from jlaska on 2008-11-19 10:08:44 EST ---

Created attachment 324049 [details]
Guest XML configuration

--- Additional comment from jlaska on 2008-11-19 10:09:06 EST ---

Created attachment 324050 [details]
/var/log/libvirt/qemu/vguest2.log

--- Additional comment from mdehaan on 2008-11-19 10:25:27 EST ---

Being able to boot KVM-via-PXE statefully would be highly useful for my testing in Cobbler land as well, and would help with virtual deployment (and re-deployment) of non-Linux guests.

--- Additional comment from berrange on 2008-11-19 10:33:38 EST ---

The XML only specifies a single device for booting. Can you try setting multiple devices

    <boot dev='network'/>
    <boot dev='cdrom'/>
    <boot dev='hd'/>

Which should tell the BIOS to try to boot network, then cdrom, then harddisk in that order.

--- Additional comment from jlaska on 2008-11-19 10:47:41 EST ---

Using ...

  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='network'/>
    <boot dev='cdrom'/>
    <boot dev='hd'/>
  </os>

Results in ...

# cat /var/log/libvirt/qemu/vguest2.log 
/usr/bin/qemu-kvm -S -M pc -m 1024 -smp 2 -name vguest2 -monitor pty -boot ndc -drive file=/dev/VolGroup00/vguest2,if=virtio,index=0,boot=on -net nic,macaddr=54:52:00:29:89:e5,vlan=0,model=virtio -net tap,fd=16,script=,vlan=0,ifname=vnet0 -serial pty -parallel none -usb -vnc 127.0.0.1:1 -k en-us 
char device redirected to /dev/pts/3
char device redirected to /dev/pts/4
Too many option ROMS

Amy I doing that right?

--- Additional comment from crobinso on 2008-11-19 10:50:57 EST ---

Wow! I didn't know you could specify multiple boot devs. Using

    <boot dev='network'/>
    <boot dev='hd'/>

And then pressing 'q' to not boot from networking successfully boots from disk. James, try just the above and see if it does the job for you.

--- Additional comment from mdehaan on 2008-11-19 11:13:53 EST ---

Cole, what we are looking for is when the bootloader is fed the following PXE configuration it should boot from the local disk:

DEFAULT local
PROMPT 0
TIMEOUT 0
TOTALTIMEOUT 0
ONTIMEOUT local

LABEL local
        LOCALBOOT 0


This will enable us to create a KVM "empty shell" that we can assign what OS it is running just based on changing the PXE configuration.

Pressing "q" would be interactive and less useful -- you'd have to catch it really really quickly or you'd be reinstalling.

--- Additional comment from jlaska on 2008-11-19 11:18:52 EST ---

(In reply to comment #7)
> Wow! I didn't know you could specify multiple boot devs. Using
> 
>     <boot dev='network'/>
>     <boot dev='hd'/>
> 
> James, try just the above and see if it does the job for you.

With those options in my XML ... my guest fails to start.

# virsh dumpxml vguest2 | grep -C2 "<boot"
  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='network'/>
    <boot dev='hd'/>
  </os>
  <features>

# virsh start vguest2
libvir: QEMU error : internal error QEMU quit during monitor startup
error: Failed to start domain vguest2

# tail /var/log/libvirt/qemu/vguest2.log 
/usr/bin/qemu-kvm -S -M pc -m 1024 -smp 2 -name vguest2 -monitor pty -boot nc -drive file=/dev/VolGroup00/vguest2,if=virtio,index=0,boot=on -net nic,macaddr=54:52:00:29:89:e5,vlan=0,model=virtio -net tap,fd=12,script=,vlan=0,ifname=vnet0 -serial pty -parallel none -usb -vnc 127.0.0.1:1 -k en-us 
char device redirected to /dev/pts/3
char device redirected to /dev/pts/4
Too many option ROMS

What am I missing?

--- Additional comment from crobinso on 2008-11-19 11:26:01 EST ---

jlaska: hmm, works on F9. sounds like a bug.

mdehaan: you may just have to test it and see what happens. I let the guest boot to our pxe server which doesn't seem to have an explicit 'local' option. Hitting enter without a selection seems to imply local, but qemu then prompts for the boot from (n)etwork or (q)uit. 

Maybe qemu is smart enough to notice a 'boot from local' directive from the PXE server, and won't prompt. You'll just have to test it since I'm not sure how to go about it.

--- Additional comment from mdehaan on 2008-11-19 11:29:08 EST ---

Cole, that's what james was trying to do above when he filed the bug, and I watched it happen.

"""
KVM guests do no honor the PXE "local" target.  It seems that once you boot
PXE, KVM doesn't attach the already installed disks.
"""

What specifically should I test?

--- Additional comment from crobinso on 2008-11-19 11:40:55 EST ---

I just wasn't sure if:

not entering a selection on my pxe server & pressing enter == deliberately selecting 'boot from local' on another pxe server == having the pxe server tell the machine/VM 'hey, boot from local' (which is what I understand RHTS does).

If those are all equivalent, then it sounds like qemu needs fixing to not prompt based on the pxe request.

--- Additional comment from jlaska on 2008-11-19 11:47:41 EST ---

My take on this bug is that the F10 kvm/libvirt doesn't let me specify multiple <boot> options.  If that were fixed, I suspect it would open the door for PXE "local" booting.

--- Additional comment from berrange on 2008-11-19 12:05:05 EST ---

Yes, this is a bug in KVM. The trouble is the new -drive flag and its boot=on syntax is broken wrt to normal -boot arg. We need to use boot=on for VirtIO based disks, but when we do that, then this conflicts with the option ROM for PXE boot. This is a big mess and I'm not sure how to fix it, but it certainly needs addressing somehow, because this is a valid use case

--- Additional comment from triage.org on 2008-11-26 00:36:24 EST ---


This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

--- Additional comment from gcosta on 2008-11-26 08:24:49 EST ---

James,

Do you still have this problem if you switch from virtio to e1000?

You should use this XML excerpt:
    <boot dev='network'/>
    <boot dev='hd'/>

--- Additional comment from jlaska on 2008-11-26 09:29:52 EST ---

Created attachment 324720 [details]
vguest1.xml (w/ multiple <boot> and dev="virtio")

Glauber, 

Yeah, I still seem to have this problem using virtio.

# virsh start vguest1
libvir: QEMU error : internal error QEMU quit during monitor startup
error: Failed to start domain vguest1

# cat /var/log/libvirt/qemu/vguest1.log 
/usr/bin/qemu-kvm -S -M pc -m 1024 -smp 2 -name vguest1 -monitor pty -boot nc -drive file=/dev/VolGroup00/vguest1,if=ide,index=0,boot=on -drive file=,if=ide,media=cdrom,index=2 -net nic,macaddr=54:52:00:55:c8:17,vlan=0,model=virtio -net tap,fd=14,script=,vlan=0,ifname=vnet2 -serial pty -parallel none -usb -vnc 127.0.0.1:3 -k en-us 
char device redirected to /dev/pts/8
char device redirected to /dev/pts/9
Too many option ROMS

# virsh dumpxml vguest1
 <!-- see attachment -->

--- Additional comment from jlaska on 2008-11-26 09:32:04 EST ---

Created attachment 324721 [details]
vguest1.xml (w/ multiple <boot> and dev="e1000")

Now with dev="e1000"

# virsh start vguest1
libvir: QEMU error : internal error QEMU quit during monitor startup
error: Failed to start domain vguest1

# cat /var/log/libvirt/qemu/vguest1.log 
/usr/bin/qemu-kvm -S -M pc -m 1024 -smp 2 -name vguest1 -monitor pty -boot nc -drive file=/dev/VolGroup00/vguest1,if=ide,index=0,boot=on -drive file=,if=ide,media=cdrom,index=2 -net nic,macaddr=54:52:00:55:c8:17,vlan=0,model=e1000 -net tap,fd=19,script=,vlan=0,ifname=vnet2 -serial pty -parallel none -usb -vnc 127.0.0.1:3 -k en-us 
char device redirected to /dev/pts/8
char device redirected to /dev/pts/9
Too many option ROMS

--- Additional comment from gcosta on 2008-11-26 12:57:58 EST ---

I believe the problem itself is very simple (although I don't really know a good solution without thinking a little bit...)

there's only 64k of memory available for option roms, and the virtio rom that ships with our packages is... 64k in size!. So after loading the virtio PXE option rom, we're unable to keep loading option roms, in particular, the extboot option rom we need to kick out virtio boots. ;-(

James said he could boot with an older rom I handled to him, which is 32k in size,
and the problem os "Too many option ROMS" went away.

However, he was still unable to boot from the local target, despite of the fact that he could do a local boot by pressing "q" 

So we really have two problems in here:

The first one is that we cannot boot from our current virtio ROM, because it is too large. We can try to quick fix it by building smaller images. This should be a new BZ agains the etherboot package.

And the other, the fact that roms do not honor the local target. For that, I believe we can keep using this BZ.

--- Additional comment from jlaska on 2008-11-26 13:35:06 EST ---

(In reply to comment #19)
> So we really have two problems in here:
> 
> The first one is that we cannot boot from our current virtio ROM, because it is
> too large. We can try to quick fix it by building smaller images. This should
> be a new BZ agains the etherboot package.

Filed this as bug#473137

--- Additional comment from markmc on 2009-10-13 02:40:37 EDT ---

Apparently this is still a problem with gPXE:

http://www.redhat.com/archives/fedora-virt/2009-October/msg00052.html

Glauber - please take a look

--- Additional comment from fedora-admin-xmlrpc on 2010-03-09 11:54:05 EST ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2010-03-09 12:19:54 EST ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from triage.org on 2010-03-15 08:09:58 EDT ---


This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

--- Additional comment from ondrejj on 2010-05-21 06:43:31 EDT ---

Still problem on Fedora 13 final + updates testing. Any change to fix this?

--- Additional comment from ondrejj on 2010-05-21 06:56:35 EDT ---

I have some success to boot using PXE by booting manually. May be there is too short default timeout for dhcp request. Try this:

1. start virtual machine
2. when you are prompted to press CTRL-B do it
3. try to get dhcp address running this command: dhcp net0
4. repeat step 3 until you do not get address (reply "ok")
5. boot using command: autoboot

If you run "dhcp net0" command immediatelly, it will fail fist time, but second run gets IP address. Then I am able to boot from PXE.

--- Additional comment from ondrejj on 2010-08-18 14:53:20 EDT ---

I think local boot works well on current fedora 13 stable. Do you still have this problem?

But another problem described here (timeout to boot from PXE) is still present. Should I open a new bug for this? Looks like it's enough to increase PXE network timeout by aprox. 3 seconds. Most simpler workaround is to select "Send Key -> Ctrl-Alt-Del" from menu immediatelly (or after 1-3 seconds) after guest start.

--- Additional comment from mgregg on 2010-09-29 15:18:05 EDT ---

I'm still having this dhcp timeout issue on f13. 

Opened https://bugzilla.redhat.com/show_bug.cgi?id=638735 to track it.

--- Additional comment from triage.org on 2011-06-02 14:23:57 EDT ---


This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

--- Additional comment from triage.org on 2011-06-27 10:02:15 EDT ---


Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

--- Additional comment from shawn.starr on 2011-06-27 19:30:42 EDT ---

Reopen, bump to rawhide, I haven't been able to test this recently.

Comment 1 Brian Cook 2011-08-03 22:29:11 UTC
this bug also affects RHEV-H, based on RHEL 5.5 for RHEV 2.2

Comment 2 Michael Gregg 2011-08-03 23:37:16 UTC
To reiterate, the work-around is to run the following as root:

brctl setfd <bridge device> 3 

This sets the bridge delay down to 3 seconds from the 15 or 30 seconds that are default.

This allows bridging while the kvm guest is attempting to pxe boot.

Comment 3 Lars Kellogg-Stedman 2011-08-04 00:28:04 UTC
That's not a workaround at all.  The problem has nothing to do with bridging or network availability in general; the problem is that after *successfully* loading a PXE image, there's no way to instruct the system to boot using the local hard drive.

That is, if your PXE configuration looks like this:

  LABEL local
    LOCALBOOT 0

It will never work.

Comment 4 Brian Cook 2011-08-04 00:34:20 UTC
correct, the PXE configuration is successfully retrieved by the server, but the configuration stated by Lars above does not work, though it should.  

This configuration is useful because it allows an administrator to reimage a machine by adjusting the pxe boot config applied to a mac address and then rebooting the machine and does not require changing boot order.

Comment 5 Michael Gregg 2011-08-04 00:47:49 UTC
It looks like the bug https://bugzilla.redhat.com/show_bug.cgi?id=638735 was closed as a duplicate of the bug that this bug is supposed to be a clone of. 

Contrary to what I though earlier, this bug and #638735 appear to be different issues.

Comment 6 Brian Cook 2011-08-05 01:49:31 UTC
this bug is a clone of Bug #472236.  638735 is a dup of 586324. no relation.