Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1088784

Summary:

qemu ' KVM internal error. Suberror: 1' when query cpu frequently during pxe boot in Intel "Q95xx" host

Product:

Red Hat Enterprise Linux 7

Reporter:

Qian Guo <qiguo>

Component:

kernel

Assignee:

Paolo Bonzini <pbonzini>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

medium

Docs Contact:

Priority:

urgent

Version:

7.0

CC:

alex.williamson, bdas, hhuang, juzhang, knoel, lersek, michen, mtosatti, pbonzini, qiguo

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

kernel-3.10.0-143.el7

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1097363 (view as bug list)

Environment:

Last Closed:

2015-03-05 11:55:54 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1116936

Bug Blocks:

1078775, 1097363, 1113511

Attachments:

Description	Flags
dmidecode of my host with Q9500	none
/proc/cpuinfo of host with q9500	none

Description Qian Guo 2014-04-17 07:44:39 UTC

Created attachment 887066 [details]
dmidecode of my host with Q9500

Description of problem:
When query cpu frequently during guest pxe boots, qemu crashed, and just occurs on host with cpu (witch I used and hit this bug ).
'Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz' 
'Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz'

Version-Release number of selected component (if applicable):
ipxe-roms-qemu-20130517-5.gitc4bce43.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1.Boot guest with network:
# /usr/libexec/qemu-kvm -cpu Penryn -m 4G -smp 4,sockets=1,cores=4,threads=1 -M pc -enable-kvm  -device piix3-usb-uhci,id=usb -name rhel7 -nodefaults -nodefconfig  -device virtio-balloon-pci,id=balloon0  -vnc :10 -vga std -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0   -monitor stdio     -drive file=test,if=none,media=disk,format=raw,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0 -netdev tap,id=netdev0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=netdev0,id=vn1,mac=52:54:a0:0b:00:01 -boot menu=on -monitor unix:/tmp/m1,server,nowait -S

2.In another host session, query cpu frequently:
# while true; do echo "info cpus" |nc -U /tmp/m1 ; done

3.Start guest to boot

Actual results:
qemu print following infos:

(qemu) KVM internal error. Suberror: 1
emulation failure
EAX=00000011 EBX=e5f8dfff ECX=00000030 EDX=00002ca8
ESI=40176888 EDI=00000000 EBP=00009cf2 ESP=00002ca8
EIP=00000213 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
CS =9c7b 0009c7b0 ffffffff 00809b00 DPL=0 CS16 [-RA]
SS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
DS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
FS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
GS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     0009cf30 00000037
IDT=     00000000 0000ffff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=66 0f 01 16 10 00 66 0f 01 1e 48 00 0f 20 c0 0c 01 0f 22 c0 <66> ea a4 00 00 00 08 00 0f 20 c0 24 fe 0f 22 c0 ff 2e 4e 00 2e a1 be 06 8e d8 8e c0 8e e0


repeatedly print same failure

Expected results:
qemu-kvm works well, 

Additional info:
1. If at this time, quit the query cpus loop, and under hmp, do system-reset, guest can reboot successfully, and under hmp, check guest status, it is running.

2.I test this case for some hosts, only the hosts with  cpu 'Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz' 'Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
' hit this issue, the flollowings are the host infos with 'Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz'.

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 23
Model name:            Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz
Stepping:              10
CPU MHz:               2833.000
BogoMIPS:              5653.07
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              3072K
NUMA node0 CPU(s):     0-3


and I will attach the dmidecode and /proc/cpuinfo  of the host in this bug.

3.The other hosts I tests that can not hit, are with following cpus:
3.1.Model name:            Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
3.2.Model name:            Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz


4.This bug is not related with qemu or kernel,

I test qemu-kvm-1.5.3-60.el7.x86_64 & qemu-kvm-1.5.3-50.el7.x86_64 & qemu-kvm-1.5.3-49.el7.x86_64 , and kernel-3.10.0-95.el7.x86_64, kernel-3.10.0-121.el7.x86_64 
with all above builds, can not reproduce this bug when with ipxe-roms-qemu-20130517-4.gitc4bce43.el7.noarch installed.

So according to above, this bug is a regression bug of ipxe, hit with ipxe-roms-qemu-20130517-5.gitc4bce43.el7.noarch and can not hit with ipxe-roms-qemu-20130517-4.gitc4bce43.el7.noarch.

HIGHLIGHT: this bug only can reproduce on hosts with cpu intel q9500/q9550 serials.

Comment 1 Qian Guo 2014-04-17 07:45:26 UTC

Created attachment 887067 [details]
/proc/cpuinfo of host with q9500

Comment 4 juzhang 2014-04-17 08:24:56 UTC

> 
> So according to above, this bug is a regression bug of ipxe, hit with
> ipxe-roms-qemu-20130517-5.gitc4bce43.el7.noarch and can not hit with
> ipxe-roms-qemu-20130517-4.gitc4bce43.el7.noarch.

According to this comment, add regression keyword.

> 
> HIGHLIGHT: this bug only can reproduce on hosts with cpu intel q9500/q9550
> serials.

Please notes, QE tested several intel host. and this issue only happens on q9500/q9550 so far.

Set priority as urgent since this is a regression issue. Set the severity as medium since the issue only happens q9500/q9550 so far.

Comment 16 Paolo Bonzini 2014-05-12 15:07:53 UTC

100% reproducible indeed even with the Fedora iPXE.  The end of the trace is as follows:

kvm_emulate_insn:     9c7a0:20e: 0f 20 c0
kvm_entry:            vcpu 0
kvm_emulate_insn:     9c7a0:211: 0c 01
kvm_entry:            vcpu 0
kvm_emulate_insn:     9c7a0:213: 0f 22 c0
kvm_userspace_exit:   reason KVM_EXIT_INTR (10)
kvm_entry:            vcpu 0
kvm_emulate_insn:     9c7a0:216: 0f 22 c0
kvm_emulate_insn:     9c7a0:216: 0f 22 c0 FAIL

From a first look, the KVM_EXIT_INTR causes the VM to re-enter with the wrong instruction pointer.

Comment 17 Paolo Bonzini 2014-05-13 11:52:10 UTC

The repeated dump at offset 0x216 is a bug in the kvm plugin of trace-cmd.  Disabling it (trace-cmd report -N) shows that even the first byte of the instruction fails to be fetched:

 kvm_emulate_insn:     9c7a0:216: (prot16) failed

The reason is that "info cpus" causes the KVM_SET_SREGS ioctl to be triggered at exactly the wrong time, when CR0.PE = 0 but the real mode segment is still in CS.  KVM_SET_SREGS ioctl resets the cached CPL value (which is 0), and the next call to vmx_get_cpl thinks that the CPL is 2 in my case or 3 in RHEL (that's bits 0-1 of CS).  Thus the bug is sensitive to the code size.  If it happens that CS's bits 0-1 are 0, the bug doesn't show up.

Fixing it is not exactly trivial, but not too hard either.  We need to hijack the cs.padding field of kvm_segment to host the CPL, and QEMU needs to get and set the CPL too (which it stores in bits 0-1 of hflags).  The padding is currently ignored, so we also need a new VM capability that can be enabled with KVM_ENABLE_CAP.

In addition, vmx_set_cr0 must force CPL=0 always when CR0.PE=0, not just if VM86 mode is in use.

Comment 18 Paolo Bonzini 2014-05-13 11:52:52 UTC

The last sentence should have been "In addition, vmx_set_cr0 must force CPL=0 always when CR0.PE becomes 1, not just if VM86 mode is in use".

Comment 19 Paolo Bonzini 2014-05-14 14:40:25 UTC

Simpler patch at http://article.gmane.org/gmane.comp.emulators.kvm.devel/121884/raw

Comment 21 Jarod Wilson 2014-08-07 20:54:38 UTC

Patch(es) available on kernel-3.10.0-143.el7

Comment 24 Qian Guo 2014-10-30 06:24:32 UTC

Reproduced this bug by kernel-3.10.0-140.el7.x86_64

Steps
1.Boot guest in a q9500 host
/usr/libexec/qemu-kvm -cpu Penryn -m 4G -smp 4,sockets=1,cores=4,threads=1 -M pc -enable-kvm  -device piix3-usb-uhci,id=usb -name rhel7 -nodefaults -nodefconfig  -device virtio-balloon-pci,id=balloon0  -vnc :10 -vga std -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0   -monitor stdio     -drive file=test,if=none,media=disk,format=raw,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0 -netdev tap,id=netdev0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=netdev0,id=vn1,mac=52:54:a0:0b:00:01 -boot menu=on -monitor unix:/tmp/m1,server,nowait -S

2.Repeatedly info cpus
while true; do echo "info cpus" |nc -U /tmp/m1 ; done

3.Continue guest 
(qemu) c

Result: qemu crashed:
KVM internal error. Suberror: 1
emulation failure
EAX=00000011 EBX=00010063 ECX=00000030 EDX=00002ca8
ESI=401a7f78 EDI=b10a0000 EBP=00009cf2 ESP=00002ca8
EIP=00000213 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
CS =9c7b 0009c7b0 ffffffff 00809b00 DPL=0 CS16 [-RA]
SS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
DS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
FS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
GS =9cf2 0009cf20 ffffffff 00809300 DPL=0 DS16 [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     0009cf30 00000037
IDT=     00000000 0000ffff
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=66 0f 01 16 10 00 66 0f 01 1e 48 00 0f 20 c0 0c 01 0f 22 c0 <66> ea a4 00 00 00 08 00 0f 20 c0 24 fe 0f 22 c0 ff 2e 4e 00 2e a1 be 06 8e d8 8e c0 8e e0
.....


So this bug is reproduced.

Verify this bug with kernel-3.10.0-196.el7.x86_64

Steps as above

Result, qemu works well and guest can access bios and boot deivce.

So this bug is fixed

Comment 26 errata-xmlrpc 2015-03-05 11:55:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html