Bug 1036792
Summary: | PXE boot 5-10x slower in RHEL due to invalid guest state emulation | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Paolo Bonzini <pbonzini> |
Component: | kernel | Assignee: | Bandan Das <bdas> |
kernel sub component: | KVM | QA Contact: | Virtualization Bugs <virt-bugs> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | amit.shah, bcl, bdas, berrange, cfergeau, clalancette, dwmw2, gansalmon, hhuang, itamar, jforbes, jonathan, juzhang, kernel-maint, knoel, laine, lersek, libvirt-maint, madhu.chinakonda, michen, mkolman, pbonzini, qiguo, rjones, scottt.tw, veillard, virt-maint, vpodzime, xwei |
Version: | 7.0 | Keywords: | TestOnly |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-3.10.0-258.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 980882 | Environment: | |
Last Closed: | 2015-11-19 20:01:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 980882, 1209995 | ||
Bug Blocks: | 1205790 |
Description
Paolo Bonzini
2013-12-02 15:49:59 UTC
I managed to get a 1.6-3x speedup in invalid guest state microbenchmarks. We could consider moving this to kernel if that speedup is enough. (In reply to Paolo Bonzini from comment #3) > I managed to get a 1.6-3x speedup in invalid guest state microbenchmarks. > We could consider moving this to kernel if that speedup is enough. That would be nice. Any progress? Btw, I still see no reason for the PXE boot in a VM being drastically slower than on a real HW with trivial PXE stack firmware. What's the limitation? - iPXE is anything but trivial - when you have suitable hardware that supports the "unrestricted guest" feature, the big real mode code of iPXE runs transparently on the VCPU - when you have old hardware without this feature, then KVM has to emulate such code manually (implement the instructions in software) -- hence "emulate_invalid_guest_state". It is slow, but Paolo and Bandan are optimizing it. - when you have old hardware (no unrestricted guest host CPU feature), and *also* turn off emulation, then... iPXE shouldn't actually run (because the code shouldn't be possible to execute in any way). Qemu should bomb out with an emulation failure from KVM. Hpwever, according to comment 0, under this setup iPXE runs in fact. I don't understand how that's possible, so I'll leave its explanation to Paolo & Bandan. In non-emulate_invalid_guest_state mode, KVM tries to make invalid state work by approximating it within the constraints of the processor. If iPXE works, it means that it doesn't *really* need big real mode; the usage is only there to comply with the protected mode memory allocator spec (PMM). IIUC iPXE has to do a lot of mode transitions because it has to work together with the BIOS. Each time you go out of iPXE and back you get 4 transition (because SeaBIOS also goes to protected mode and back). On a real processor (or a Westmere or more recent) this costs only a few hundred cycles times 4. On older processors the cost, incurred a lot of times per second, is multiplied by 3000 or more, until the point that not only you saturate the CPU, you also start getting slower. See bug 1088784 too, exposed by the many transitions. It's actually Stefan who probably knows more about it than everyone else. Patch(es) available on kernel-3.10.0-258.el7 on system with packages: [root@dhcp-11-118 ~]# rpm -qa | grep -i ipxe ipxe-bootimgs-20130517-6.gitc4bce43.el7.noarch ipxe-roms-20130517-6.gitc4bce43.el7.noarch ipxe-roms-qemu-20130517-6.gitc4bce43.el7.noarch [root@dhcp-11-118 ~]# rpm -q qemu-kvm-rhev qemu-kvm-rhev-2.3.0-1.el7.x86_64 [root@dhcp-11-118 ~]# define a VM with net boot as 1st priority, measuring time from start loading kernel, to the anaconda interface appearing [root@dhcp-11-118 ~]# uname -r 3.10.0-287.el7.x86_64 emulate_invalid_guest_state=Y unrestricted_guest=N => 43 emulate_invalid_guest_state=N unrestricted_guest=N => 24 ========== [root@dhcp-11-118 ~]# uname -r 3.10.0-247.el7.x86_64 emulate_invalid_guest_state=Y unrestricted_guest=N => 57 emulate_invalid_guest_state=N unrestricted_guest=N => 24 [root@dhcp-11-118 ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 42 Model name: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz Stepping: 7 CPU MHz: 3374.035 BogoMIPS: 6185.72 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3 hardware features exposed by default(modprobe kvm_intel) [root@dhcp-11-118 ~]# grep . /sys/module/kvm_intel/parameters/* /sys/module/kvm_intel/parameters/emulate_invalid_guest_state:Y /sys/module/kvm_intel/parameters/enable_apicv:N /sys/module/kvm_intel/parameters/enable_shadow_vmcs:N /sys/module/kvm_intel/parameters/ept:Y /sys/module/kvm_intel/parameters/eptad:N /sys/module/kvm_intel/parameters/fasteoi:Y /sys/module/kvm_intel/parameters/flexpriority:Y /sys/module/kvm_intel/parameters/nested:Y /sys/module/kvm_intel/parameters/ple_gap:0 /sys/module/kvm_intel/parameters/ple_window:4096 /sys/module/kvm_intel/parameters/ple_window_grow:2 /sys/module/kvm_intel/parameters/ple_window_max:1073741823 /sys/module/kvm_intel/parameters/ple_window_shrink:0 /sys/module/kvm_intel/parameters/unrestricted_guest:Y /sys/module/kvm_intel/parameters/vmm_exclusive:Y /sys/module/kvm_intel/parameters/vpid:Y so on fixed version: the buggy configuration could save 14 seconds emulate_invalid_guest_state=Y unrestricted_guest=N while the un-buggy one didn't impacted. Hi Bandan, could you pls have a check whether above test sufficient to verify this bug ? Thanks, Xiaoqing. (In reply to Xiaoqing Wei from comment #15) > so on fixed version: > the buggy configuration could save 14 seconds > emulate_invalid_guest_state=Y unrestricted_guest=N > > while the un-buggy one didn't impacted. > > > Hi Bandan, > > could you pls have a check whether above test sufficient to verify this bug ? Looks good, Additionally, you could run realmode.flat from kvm-unit-tests as another test point. Make sure to set emulate_invalid_guest_state. > Thanks, > Xiaoqing. git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git [root@dhcp-11-118 kvm-unit-tests]# uname -r 3.10.0-287.el7.x86_64 [root@dhcp-11-118 kvm-unit-tests]# grep . /sys/module/kvm_intel/parameters/* /sys/module/kvm_intel/parameters/emulate_invalid_guest_state:Y /sys/module/kvm_intel/parameters/enable_apicv:N /sys/module/kvm_intel/parameters/enable_shadow_vmcs:N /sys/module/kvm_intel/parameters/ept:Y /sys/module/kvm_intel/parameters/eptad:N /sys/module/kvm_intel/parameters/fasteoi:Y /sys/module/kvm_intel/parameters/flexpriority:Y /sys/module/kvm_intel/parameters/nested:Y /sys/module/kvm_intel/parameters/ple_gap:0 /sys/module/kvm_intel/parameters/ple_window:4096 /sys/module/kvm_intel/parameters/ple_window_grow:2 /sys/module/kvm_intel/parameters/ple_window_max:1073741823 /sys/module/kvm_intel/parameters/ple_window_shrink:0 /sys/module/kvm_intel/parameters/pml:N /sys/module/kvm_intel/parameters/unrestricted_guest:N /sys/module/kvm_intel/parameters/vmm_exclusive:Y /sys/module/kvm_intel/parameters/vpid:Y ============================= [root@dhcp-11-118 kvm-unit-tests]# ./run_tests.sh x86/realmode.flat PASS apic PASS ioapic PASS smptest PASS smptest3 PASS vmexit_cpuid PASS vmexit_vmcall PASS vmexit_mov_from_cr8 PASS vmexit_mov_to_cr8 PASS vmexit_inl_pmtimer PASS vmexit_ipi PASS vmexit_ipi_halt PASS vmexit_ple_round_robin PASS access PASS emulator PASS eventinj PASS hypercall PASS idt_test PASS msr skip pmu (/proc/sys/kernel/nmi_watchdog not equal to 0) PASS port80 PASS realmode ^^^^^ the test stops output since this line, for hours. ------------------------------------------- [root@dhcp-11-118 kvm-unit-tests]# tail test.log 1540 cycles/emulated memory RMW instruction Return value from qemu: 1 qemu-kvm -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -kernel x86/s3.flat -smp 1 enabling apic RSDP is at f1e70 RSDT is at 7fe1657 FADT is at 7fe0bda FACS is at 7fe0000 resume vector addr is 7fe000c copy resume code from 400348 ^^^^ test log also stop output for hours. Bandan, any idea ? (In reply to Xiaoqing Wei from comment #17) > git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git ... > > [root@dhcp-11-118 kvm-unit-tests]# tail test.log > 1540 cycles/emulated memory RMW instruction > Return value from qemu: 1 > qemu-kvm -enable-kvm -device pc-testdev -device > isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device > pci-testdev -kernel x86/s3.flat -smp 1 > enabling apic > RSDP is at f1e70 > RSDT is at 7fe1657 > FADT is at 7fe0bda > FACS is at 7fe0000 > resume vector addr is 7fe000c > copy resume code from 400348 runtests.sh doesn't take an argument, it just runs all the tests. This behavior in the resume test is expected. You can just run realmode.flat test by executing: ./x86-run x86/realmode.flat Then, check the number of cycles per emulated instruction. This should be smaller for build 287. > > ^^^^ test log also stop output for hours. > > > > Bandan, > any idea ? [root@dhcp-11-118 kvm-unit-tests]# tail -n 20 247 287 ==> 247 <== PASS: movsx ah PASS: movzx ah PASS: bswap PASS: aad PASS: aam PASS: xlat PASS: salc (1) PASS: salc (2) PASS: fninit PASS: mov dr with mod bits FAIL: smsw PASS: nopl FAIL: xadd 2129 cycles/emulated jump instruction 2301 cycles/emulated move instruction 2215 cycles/emulated arithmetic instruction 2496 cycles/emulated memory load instruction 2607 cycles/emulated memory store instruction 3040 cycles/emulated memory RMW instruction Return value from qemu: 3 ==> 287 <== PASS: movsx ah PASS: movzx ah PASS: bswap PASS: aad PASS: aam PASS: xlat PASS: salc (1) PASS: salc (2) PASS: fninit PASS: mov dr with mod bits PASS: smsw PASS: nopl PASS: xadd 774 cycles/emulated jump instruction 870 cycles/emulated move instruction 835 cycles/emulated arithmetic instruction 1076 cycles/emulated memory load instruction 1155 cycles/emulated memory store instruction 1539 cycles/emulated memory RMW instruction Return value from qemu: 1 [root@dhcp-11-118 kvm-unit-tests]# the cycles per instruction for build 287 is smaller then 247, as expected. but the return value from qemu both non-zero on both kernel versions, # tail -n5 x86/README Legacy notes: The exit status of the binary (and the script) is inconsistent: with qemu-system, after the unit-test is done, the exit status of qemu is 1, different from the 'old style' qemu-kvm, whose exit status in successful completion is 0. so I think the value of 1 on 287(the patched kernel) should be OK, Bandan, could you please confirm ? or correct me if I'm wrong. Thanks, Xiaoqing. (In reply to Xiaoqing Wei from comment #19) > [root@dhcp-11-118 kvm-unit-tests]# tail -n 20 247 287 > ==> 247 <== > PASS: movsx ah > PASS: movzx ah > PASS: bswap > PASS: aad > PASS: aam > PASS: xlat > PASS: salc (1) > PASS: salc (2) > PASS: fninit > PASS: mov dr with mod bits > FAIL: smsw > PASS: nopl > FAIL: xadd > 2129 cycles/emulated jump instruction > 2301 cycles/emulated move instruction > 2215 cycles/emulated arithmetic instruction > 2496 cycles/emulated memory load instruction > 2607 cycles/emulated memory store instruction > 3040 cycles/emulated memory RMW instruction > Return value from qemu: 3 > > ==> 287 <== > PASS: movsx ah > PASS: movzx ah > PASS: bswap > PASS: aad > PASS: aam > PASS: xlat > PASS: salc (1) > PASS: salc (2) > PASS: fninit > PASS: mov dr with mod bits > PASS: smsw > PASS: nopl > PASS: xadd > 774 cycles/emulated jump instruction > 870 cycles/emulated move instruction > 835 cycles/emulated arithmetic instruction > 1076 cycles/emulated memory load instruction > 1155 cycles/emulated memory store instruction > 1539 cycles/emulated memory RMW instruction > Return value from qemu: 1 > [root@dhcp-11-118 kvm-unit-tests]# > > > the cycles per instruction for build 287 is smaller then 247, as expected. > but the return value from qemu both non-zero on both kernel versions, > > > # tail -n5 x86/README > Legacy notes: > The exit status of the binary (and the script) is inconsistent: with > qemu-system, after the unit-test is done, the exit status of qemu is 1, > different from the 'old style' qemu-kvm, whose exit status in successful > completion is 0. > > > so I think the value of 1 on 287(the patched kernel) should be OK, > > Bandan, > > could you please confirm ? > or correct me if I'm wrong. Yes, this looks good and the results confirm that the bug has been fixed. You are right about the return value, you can safely ignore it. Thank you for the thorough testing! Bandan > Thanks, > Xiaoqing.
> Yes, this looks good and the results confirm that the bug has been fixed.
> You are right about the return value, you can safely ignore it. Thank you
> for the thorough testing!
>
> Bandan
thx for the confirm,
moving to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2152.html |