Bug 1088784
Summary: | qemu ' KVM internal error. Suberror: 1' when query cpu frequently during pxe boot in Intel "Q95xx" host | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Qian Guo <qiguo> | ||||||
Component: | kernel | Assignee: | Paolo Bonzini <pbonzini> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 7.0 | CC: | alex.williamson, bdas, hhuang, juzhang, knoel, lersek, michen, mtosatti, pbonzini, qiguo | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-3.10.0-143.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1097363 (view as bug list) | Environment: | |||||||
Last Closed: | 2015-03-05 11:55:54 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1116936 | ||||||||
Bug Blocks: | 1078775, 1097363, 1113511 | ||||||||
Attachments: |
|
Created attachment 887067 [details]
/proc/cpuinfo of host with q9500
> > So according to above, this bug is a regression bug of ipxe, hit with > ipxe-roms-qemu-20130517-5.gitc4bce43.el7.noarch and can not hit with > ipxe-roms-qemu-20130517-4.gitc4bce43.el7.noarch. According to this comment, add regression keyword. > > HIGHLIGHT: this bug only can reproduce on hosts with cpu intel q9500/q9550 > serials. Please notes, QE tested several intel host. and this issue only happens on q9500/q9550 so far. Set priority as urgent since this is a regression issue. Set the severity as medium since the issue only happens q9500/q9550 so far. 100% reproducible indeed even with the Fedora iPXE. The end of the trace is as follows: kvm_emulate_insn: 9c7a0:20e: 0f 20 c0 kvm_entry: vcpu 0 kvm_emulate_insn: 9c7a0:211: 0c 01 kvm_entry: vcpu 0 kvm_emulate_insn: 9c7a0:213: 0f 22 c0 kvm_userspace_exit: reason KVM_EXIT_INTR (10) kvm_entry: vcpu 0 kvm_emulate_insn: 9c7a0:216: 0f 22 c0 kvm_emulate_insn: 9c7a0:216: 0f 22 c0 FAIL From a first look, the KVM_EXIT_INTR causes the VM to re-enter with the wrong instruction pointer. The repeated dump at offset 0x216 is a bug in the kvm plugin of trace-cmd. Disabling it (trace-cmd report -N) shows that even the first byte of the instruction fails to be fetched: kvm_emulate_insn: 9c7a0:216: (prot16) failed The reason is that "info cpus" causes the KVM_SET_SREGS ioctl to be triggered at exactly the wrong time, when CR0.PE = 0 but the real mode segment is still in CS. KVM_SET_SREGS ioctl resets the cached CPL value (which is 0), and the next call to vmx_get_cpl thinks that the CPL is 2 in my case or 3 in RHEL (that's bits 0-1 of CS). Thus the bug is sensitive to the code size. If it happens that CS's bits 0-1 are 0, the bug doesn't show up. Fixing it is not exactly trivial, but not too hard either. We need to hijack the cs.padding field of kvm_segment to host the CPL, and QEMU needs to get and set the CPL too (which it stores in bits 0-1 of hflags). The padding is currently ignored, so we also need a new VM capability that can be enabled with KVM_ENABLE_CAP. In addition, vmx_set_cr0 must force CPL=0 always when CR0.PE=0, not just if VM86 mode is in use. The last sentence should have been "In addition, vmx_set_cr0 must force CPL=0 always when CR0.PE becomes 1, not just if VM86 mode is in use". Simpler patch at http://article.gmane.org/gmane.comp.emulators.kvm.devel/121884/raw Patch(es) available on kernel-3.10.0-143.el7 Reproduced this bug by kernel-3.10.0-140.el7.x86_64 Steps 1.Boot guest in a q9500 host /usr/libexec/qemu-kvm -cpu Penryn -m 4G -smp 4,sockets=1,cores=4,threads=1 -M pc -enable-kvm -device piix3-usb-uhci,id=usb -name rhel7 -nodefaults -nodefconfig -device virtio-balloon-pci,id=balloon0 -vnc :10 -vga std -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio -drive file=test,if=none,media=disk,format=raw,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0 -netdev tap,id=netdev0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=netdev0,id=vn1,mac=52:54:a0:0b:00:01 -boot menu=on -monitor unix:/tmp/m1,server,nowait -S 2.Repeatedly info cpus while true; do echo "info cpus" |nc -U /tmp/m1 ; done 3.Continue guest (qemu) c Result: qemu crashed: KVM internal error. Suberror: 1 emulation failure EAX=00000011 EBX=00010063 ECX=00000030 EDX=00002ca8 ESI=401a7f78 EDI=b10a0000 EBP=00009cf2 ESP=00002ca8 EIP=00000213 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] CS =9c7b 0009c7b0 ffffffff 00809b00 DPL=0 CS16 [-RA] SS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] DS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] FS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] GS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 0009cf30 00000037 IDT= 00000000 0000ffff CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=66 0f 01 16 10 00 66 0f 01 1e 48 00 0f 20 c0 0c 01 0f 22 c0 <66> ea a4 00 00 00 08 00 0f 20 c0 24 fe 0f 22 c0 ff 2e 4e 00 2e a1 be 06 8e d8 8e c0 8e e0 ..... So this bug is reproduced. Verify this bug with kernel-3.10.0-196.el7.x86_64 Steps as above Result, qemu works well and guest can access bios and boot deivce. So this bug is fixed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0290.html |
Created attachment 887066 [details] dmidecode of my host with Q9500 Description of problem: When query cpu frequently during guest pxe boots, qemu crashed, and just occurs on host with cpu (witch I used and hit this bug ). 'Intel(R) Core(TM)2 Quad CPU Q9500 @ 2.83GHz' 'Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz' Version-Release number of selected component (if applicable): ipxe-roms-qemu-20130517-5.gitc4bce43.el7.noarch How reproducible: 100% Steps to Reproduce: 1.Boot guest with network: # /usr/libexec/qemu-kvm -cpu Penryn -m 4G -smp 4,sockets=1,cores=4,threads=1 -M pc -enable-kvm -device piix3-usb-uhci,id=usb -name rhel7 -nodefaults -nodefconfig -device virtio-balloon-pci,id=balloon0 -vnc :10 -vga std -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio -drive file=test,if=none,media=disk,format=raw,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0 -netdev tap,id=netdev0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=netdev0,id=vn1,mac=52:54:a0:0b:00:01 -boot menu=on -monitor unix:/tmp/m1,server,nowait -S 2.In another host session, query cpu frequently: # while true; do echo "info cpus" |nc -U /tmp/m1 ; done 3.Start guest to boot Actual results: qemu print following infos: (qemu) KVM internal error. Suberror: 1 emulation failure EAX=00000011 EBX=e5f8dfff ECX=00000030 EDX=00002ca8 ESI=40176888 EDI=00000000 EBP=00009cf2 ESP=00002ca8 EIP=00000213 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] CS =9c7b 0009c7b0 ffffffff 00809b00 DPL=0 CS16 [-RA] SS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] DS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] FS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] GS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 0009cf30 00000037 IDT= 00000000 0000ffff CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=66 0f 01 16 10 00 66 0f 01 1e 48 00 0f 20 c0 0c 01 0f 22 c0 <66> ea a4 00 00 00 08 00 0f 20 c0 24 fe 0f 22 c0 ff 2e 4e 00 2e a1 be 06 8e d8 8e c0 8e e0 repeatedly print same failure Expected results: qemu-kvm works well, Additional info: 1. If at this time, quit the query cpus loop, and under hmp, do system-reset, guest can reboot successfully, and under hmp, check guest status, it is running. 2.I test this case for some hosts, only the hosts with cpu 'Intel(R) Core(TM)2 Quad CPU Q9500 @ 2.83GHz' 'Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz ' hit this issue, the flollowings are the host infos with 'Intel(R) Core(TM)2 Quad CPU Q9500 @ 2.83GHz'. # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 23 Model name: Intel(R) Core(TM)2 Quad CPU Q9500 @ 2.83GHz Stepping: 10 CPU MHz: 2833.000 BogoMIPS: 5653.07 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 3072K NUMA node0 CPU(s): 0-3 and I will attach the dmidecode and /proc/cpuinfo of the host in this bug. 3.The other hosts I tests that can not hit, are with following cpus: 3.1.Model name: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz 3.2.Model name: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz 4.This bug is not related with qemu or kernel, I test qemu-kvm-1.5.3-60.el7.x86_64 & qemu-kvm-1.5.3-50.el7.x86_64 & qemu-kvm-1.5.3-49.el7.x86_64 , and kernel-3.10.0-95.el7.x86_64, kernel-3.10.0-121.el7.x86_64 with all above builds, can not reproduce this bug when with ipxe-roms-qemu-20130517-4.gitc4bce43.el7.noarch installed. So according to above, this bug is a regression bug of ipxe, hit with ipxe-roms-qemu-20130517-5.gitc4bce43.el7.noarch and can not hit with ipxe-roms-qemu-20130517-4.gitc4bce43.el7.noarch. HIGHLIGHT: this bug only can reproduce on hosts with cpu intel q9500/q9550 serials.