Bug 980882

Summary: PXE boot unusably slow after update/upgrade
Product: [Fedora] Fedora Reporter: Vratislav Podzimek <vpodzime>
Component: ipxeAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 19CC: amit.shah, bcl, berrange, cfergeau, clalancette, crobinso, dwmw2, gansalmon, itamar, jforbes, jonathan, jthompso, jyang, kernel-maint, laine, libvirt-maint, madhu.chinakonda, mkolman, pbonzini, rjones, scottt.tw, veillard, virt-maint, vpodzime
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1036792 (view as bug list) Environment:
Last Closed: 2014-03-05 13:32:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1036792    
Attachments:
Description Flags
kvm_intel_params from the not working (slow) none

Description Vratislav Podzimek 2013-07-03 12:32:06 UTC
Description of problem:
After update to the recent version of libvirt etc. in F18, booting virtual machine over PXE is unusably slow. Upgrading to F19 doesn't solve the issue. It was okay with the older versions.

Version-Release number of selected component (if applicable):
libvirt-daemon-1.0.5.2-1.fc19.x86_64

How reproducible:
100%

Steps to Reproduce:
1. try to boot a virtual machine over PXE (typically via bridge)

Actual results:
iPXE (TFTP communication) is unusably slow to initialize and fetch kernel and initrd

Expected results:
fast initialization and data fetching

Comment 1 Daniel Berrangé 2013-07-03 12:33:44 UTC
Did you upgrade iPXE / QEMU when you noticed the slowdown ? I'm pretty doubtful that libvirt itself can cause guest PXE performance to change in any measurable way.

Comment 2 Vratislav Podzimek 2013-07-03 12:36:42 UTC
There was no iPXE update/upgrade, but there were some QEMU updates/upgrades, I think. I don't know what is the right component for this bug so I started with libvirt and I hope you guys will reassign it properly. Thanks

Comment 3 Vratislav Podzimek 2013-07-08 11:47:26 UTC
This seems to be a kernel issue -- works well with kernel-3.8.13-100.fc17 -- reassigning. Did something change in the TFTP stack recently?

Comment 4 Josh Boyer 2013-09-18 20:47:38 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 5 Vratislav Podzimek 2013-09-24 15:37:39 UTC
Still not resolved, maybe even worse than with older kernels. This really is an annoying bug preventing F19 usage for virtualization in combination with PXE.

Comment 6 Vratislav Podzimek 2013-11-06 08:50:16 UTC
Turns out to highly probably be a bug in iPXE. If I boot the virtual machine via PXE (takes ages) and then run tftp client and fetch vmlinuz and initrd.img from there, everything works as expected.

However, it works well with old kernels so it looks like a kernel change not reflected in iPXE or something like that.

Comment 7 Paolo Bonzini 2013-11-30 09:05:27 UTC
Vratislav, can you please run

grep . /sys/module/kvm_intel/parameters/*

I suspect this is caused by iPXE's usage of big real mode.

Comment 8 Vratislav Podzimek 2013-12-02 10:22:44 UTC
Created attachment 831511 [details]
kvm_intel_params from the not working (slow)

Comment 9 Paolo Bonzini 2013-12-02 10:29:05 UTC
Indeed:

/sys/module/kvm_intel/parameters/emulate_invalid_guest_state:Y
/sys/module/kvm_intel/parameters/unrestricted_guest:N

This is a pretty old machine BTW:

/sys/module/kvm_intel/parameters/flexpriority:N

Comment 10 Vratislav Podzimek 2013-12-02 10:30:19 UTC
Comment on attachment 831511 [details]
kvm_intel_params from the not working (slow)

wrong attachment, will put the proper one in comments

Comment 11 Vratislav Podzimek 2013-12-02 10:32:21 UTC
==not working (slow) F19==
3.11.1-200.fc19.x86_64

/sys/module/kvm_intel/parameters/emulate_invalid_guest_state:Y
/sys/module/kvm_intel/parameters/enable_apicv:N
/sys/module/kvm_intel/parameters/enable_shadow_vmcs:N
/sys/module/kvm_intel/parameters/ept:N
/sys/module/kvm_intel/parameters/eptad:N
/sys/module/kvm_intel/parameters/fasteoi:Y
/sys/module/kvm_intel/parameters/flexpriority:N
/sys/module/kvm_intel/parameters/nested:N
/sys/module/kvm_intel/parameters/ple_gap:0
/sys/module/kvm_intel/parameters/ple_window:4096
/sys/module/kvm_intel/parameters/unrestricted_guest:N
/sys/module/kvm_intel/parameters/vmm_exclusive:Y
/sys/module/kvm_intel/parameters/vpid:N

==working F19 with F17 kernel==
3.8.13-100.fc17.x86_64

/sys/module/kvm_intel/parameters/emulate_invalid_guest_state:Y
/sys/module/kvm_intel/parameters/ept:N
/sys/module/kvm_intel/parameters/eptad:N
/sys/module/kvm_intel/parameters/fasteoi:Y
/sys/module/kvm_intel/parameters/flexpriority:N
/sys/module/kvm_intel/parameters/nested:N
/sys/module/kvm_intel/parameters/ple_gap:0
/sys/module/kvm_intel/parameters/ple_window:4096
/sys/module/kvm_intel/parameters/unrestricted_guest:N
/sys/module/kvm_intel/parameters/vmm_exclusive:Y
/sys/module/kvm_intel/parameters/vpid:N

==not working (slow) F20==
3.11.7-300.fc20.x86_64

/sys/module/kvm_intel/parameters/emulate_invalid_guest_state:Y
/sys/module/kvm_intel/parameters/enable_apicv:N
/sys/module/kvm_intel/parameters/enable_shadow_vmcs:N
/sys/module/kvm_intel/parameters/ept:Y
/sys/module/kvm_intel/parameters/eptad:N
/sys/module/kvm_intel/parameters/fasteoi:Y
/sys/module/kvm_intel/parameters/flexpriority:Y
/sys/module/kvm_intel/parameters/nested:N
/sys/module/kvm_intel/parameters/ple_gap:0
/sys/module/kvm_intel/parameters/ple_window:4096
/sys/module/kvm_intel/parameters/unrestricted_guest:N
/sys/module/kvm_intel/parameters/vmm_exclusive:Y
/sys/module/kvm_intel/parameters/vpid:Y

Comment 12 Vratislav Podzimek 2013-12-02 10:35:20 UTC
Is there any boot option or something like that I could use to disable the problematic feature?

Comment 13 Paolo Bonzini 2013-12-02 15:41:01 UTC
Yes, you can load KVM with "emulate_invalid_guest_state=N" to work around the problem.

I reproduced your scenario with a 2MB vmlinuz and 12MB initrd.img.  Here is my pxelinux.cfg/default file:

default anaconda
prompt 1
timeout 10
label local
  localboot 1
label anaconda
  kernel kernels/vmlinuz
  initrd kernels/initrd.img

I have:
unrestricted_guest=Y => ~1s
emulate_invalid_guest_state=N unrestricted_guest=N => ~1s
emulate_invalid_guest_state=Y unrestricted_guest=N => 10s

So it's not _unbearably slow_, but it is pretty slow indeed.

Comment 14 Vratislav Podzimek 2013-12-02 16:39:08 UTC
Is there a way to use some boot options instead of reloading the modules after each boot?

And compared to your numbers, we have seen much worse cases -- loading vmlinuz and initrd.img in minutes instead of (tens of) seconds.

Comment 15 Joe Thompson 2014-01-04 01:07:58 UTC
I have observed this as well trying to install RHEL 5.8 in a guest under Fedora 20.  Symptoms were that the 5.8 ISO would boot in the guest and allow selecting the install mode, but the initrd would take many many minutes to begin to load, and after the install the system would not accept keyboard input after boot (at least not for a long time -- I eventually got tired of waiting and trying).  The same workaround seems to have fixed it: I added 

options intel_kvm emulate_invalid_guest_state=N

to /etc/modprobe.d/kvm_intel.conf, removed and reinserted the module and the guest now appears to install normally.

Comment 16 Cole Robinson 2014-03-05 13:10:41 UTC
There's a new ipxe update available, ipxe-20140303-1.gitff1e7fc7.fc20, but I'm guessing it doesn't help any.

Paolo, is this truly an ipxe issue, or a kernel/kvm issue due to the slowness of emulation big real mode?

Comment 17 Paolo Bonzini 2014-03-05 13:32:34 UTC
The kernel/KVM issue cannot be really solved except by upgrading the host.

For Fedora I think this is CANTFIX.

Comment 18 Vratislav Podzimek 2014-03-11 08:11:25 UTC
(In reply to Paolo Bonzini from comment #17)
> The kernel/KVM issue cannot be really solved except by upgrading the host.
> 
> For Fedora I think this is CANTFIX.
How can this be a CANTFIX if it is a regression? Isn't the big real mode emulation just broken?

Comment 19 Paolo Bonzini 2014-03-11 09:20:23 UTC
> Isn't the big real mode emulation just broken?

The older code was not accurate and it broke in other cases.  In most cases we could live with the inaccuracy, but not always.  I can look at it as an upstream project, but it's not something that the Fedora project can fix with a downstream-only patch.

In any case, there's nothing to be fixed in iPXE.

Comment 20 Vratislav Podzimek 2014-03-11 14:34:17 UTC
(In reply to Paolo Bonzini from comment #19)
> > Isn't the big real mode emulation just broken?
> 
> The older code was not accurate and it broke in other cases.  In most cases
> we could live with the inaccuracy, but not always.  I can look at it as an
> upstream project, but it's not something that the Fedora project can fix
> with a downstream-only patch.
Agreed.

> 
> In any case, there's nothing to be fixed in iPXE.
I still don't get why a TFTP transfer in the iPXE "session" takes longer than the same transfer in the running system, but I'm not an expert.