Bug 1421430 - Emulation of nested 'Westmere' CPU fails to boot on a host with Broadwell CPU
Summary: Emulation of nested 'Westmere' CPU fails to boot on a host with Broadwell CPU
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 24
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-12 09:52 UTC by Nadav Goldin
Modified: 2017-03-14 20:13 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-03-14 20:13:40 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Output of 'virsh capabilities' and /proc/cpuinfo for the guest and host (17.08 KB, text/plain)
2017-02-12 09:52 UTC, Nadav Goldin
no flags Details

Description Nadav Goldin 2017-02-12 09:52:13 UTC
Created attachment 1249481 [details]
Output of 'virsh capabilities' and /proc/cpuinfo for the guest and host

Description of problem: 

Starting a VM with 'Westmere' CPU Model inside a VM created with 'Westmere' CPU model, fails using a host which has a (real) Broadwell CPU: 

qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed.

The setup used:

Layer 0(host):
Intel Broadwell CPU
4.8.15-200.fc24.x86_64
qemu-kvm-2.6.2-5.fc24.x86_64

Layer 1(guest):
started with: '-cpu Westmere,+vmx'

cat /proc/cpuinfo reports:
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good nopl pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid

Attempting to start a VM, again with '-cpu Westmere,+vmx' inside the guest fails with:

2017-02-12 09:34:35.729+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org), qemu version: 2.6.0 (qemu-kvm-ev-2.6.0-28.el7_3.3.1), hostname: vm-el73.lago.local
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=a84c9822-vm-el73,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-a84c9822-vm-el73/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Westmere,+vmx -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object iothread,id=iothread1 -uuid 1c162b0a-3da3-4f6b-a80a-1d2fc615e00a -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-a84c9822-vm-el73/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=off,strict=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/home/ngoldin/.lago/default/images/vm-el73_root.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,serial=1,discard=unmap -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:c0:a5:00:02,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-3-a84c9822-vm-el73/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed.
2017-02-12 09:34:36.071+0000: shutting down



Version-Release number of selected component (if applicable):

Host:
Linux 4.8.15-200.fc24.x86_64
qemu-kvm-2.6.2-5.fc24.x86_64
libvirt-1.3.3.2-1.fc24.x86_64

Guest:
Linux 3.10.0-514.2.2.el7.x86_64
qemu-kvm-ev-2.6.0-28.el7_3.3.1.x86_64
libvirt-2.0.0-10.el7_3.4.x86_64


How reproducible:
100% (with the described setup)


Steps to Reproduce:
1. Start a VM on a Broadwell host, with '-cpu Westmere,+vmx'
2. Attempt to start another VM inside the created VM in stage 1, again with '-cpu Westmere,+vmx'



Actual results:
qemu-kvm fails to start with:
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion ret == n' failed.`


Expected results:
qemu-kvm will start the VM properly.


Additional info:
1. In the attached files are the outputs of 'virsh capabilities' and 'cat /proc/cpuinfo' for the host and guest.

2. Also, libguestfs fails to start an appliance using KVM in the guest with the same error(from that I assume this is not related to libvirt).

3. We had reports of the same issue with Skylake CPU family.

Comment 1 Nadav Goldin 2017-02-16 16:22:28 UTC
Was able to replicate this upstream:
https://bugs.launchpad.net/qemu/+bug/1665389

After some more trials, I don't think its related to the CPU model of the last layer, other CPUs fail too:

[root@vm-el73 qemu]# /usr/libexec/qemu-kvm -s -machine q35,accel=kvm,usb=off -cpu host
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed.
Aborted
[root@vm-el73 qemu]# /usr/libexec/qemu-kvm -s -machine q35,accel=kvm,usb=off -cpu qemu64
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs: Assertion `ret == n' failed.
Aborted

Comment 2 Cole Robinson 2017-02-16 18:43:59 UTC
Thanks for reporting it upstream as well. As you mention in the upstream bug, the assertion is different. This assertion has a related RHEL bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1339196

dgilbert, you looked into that issue, maybe additional changes are needed? See nadav's upstream report as well:

https://bugs.launchpad.net/qemu/+bug/1665389

Comment 3 Dr. David Alan Gilbert 2017-02-16 19:00:06 UTC
(In reply to Cole Robinson from comment #2)
> Thanks for reporting it upstream as well. As you mention in the upstream
> bug, the assertion is different. This assertion has a related RHEL bug:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1339196
> 
> dgilbert, you looked into that issue, maybe additional changes are needed?
> See nadav's upstream report as well:
> 
> https://bugs.launchpad.net/qemu/+bug/1665389

Yes, commented added on the lp.  Note it may be a different bug, that assert fires when any of a zillion msrs fails, we just need to find which one; the other case was perf counters; that was a nest but with a vmware L0.

Comment 4 Paolo Bonzini 2017-02-20 14:28:33 UTC
Nested is currently supported only with -cpu host. Kernel 4.9 has the necessary support but QEMU doesn't.

Comment 5 Cole Robinson 2017-03-14 20:13:40 UTC
Since this bug was also reported in upstream launchpad, which has more discussion, I'm closing this. If there's a backportable patch that lands we can reopen this to track it, otherwise when qemu.git is fixed the changes will eventually be rebased in to fedora


Note You need to log in before you can comment on or make changes to this bug.