Description of problem: It appears that on 4.19.x series nested virtualization on my Intel 7820x CPU doesn't work anymore. On same hardware with previous kernel, I was using nested virtualization happily and now I get: 2018-12-06 23:52:05.167+0000: starting up libvirt version: 4.7.0, package: 1.fc29 (Fedora Project, 2018-09-04-10:29:06, ), qemu version: 3.0.0qemu-3.0.0-2.fc29, kernel: 4.18.16-300.fc29.x86_64, hostname: openshift-libvirt.lan LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -name guest=test1-bootstrap,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-test1-bootstrap/master-key.aes -machine pc-i440fx-3.0,accel=kvm,usb=off,dump-guest-core=off -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 5ba43ae2-c1ff-483e-bdbe-ea5b8c5a5bfb -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/lib/libvirt/images/test1-bootstrap,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3a:b9:7b:c3:28:a4,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charchannel0 -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -fw_cfg name=opt/com.coreos/config,file=/var/lib/libvirt/images/test1-bootstrap.ign -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2018-12-06 23:52:05.167+0000: Domain id=1 is tainted: custom-argv 2018-12-06T23:52:05.225201Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0) 2018-12-06T23:52:05.225464Z qemu-system-x86_64: -chardev pty,id=charchannel0: char device redirected to /dev/pts/4 (label charchannel0) KVM: entry failed, hardware error 0x7 EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=06 66 05 00 00 01 00 8e c1 26 66 a3 74 f7 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Version-Release number of selected component (if applicable): Linux xaos.lan 4.19.6-300.fc29.x86_64 #1 SMP Sun Dec 2 17:33:14 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Always Steps to Reproduce: 1. Enable nested virtualization via options kvm_intel nested=1 2. Started a VM using virt-manager by choosing "host-passthrough" CPU model. 3. Try booting up another VM inside the VM and nested VM gets paused. Actual results: Nested VM doesn't start and fails with above error. Expected results: Nested VM should start.
rocessor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz stepping : 4 microcode : 0x200004d cpu MHz : 1200.206 cache size : 11264 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4 _2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cd p_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd av x512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pt s flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 7200.00 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz stepping : 4 microcode : 0x200004d cpu MHz : 1200.124 cache size : 11264 KB physical id : 0 siblings : 16 core id : 1 cpu cores : 8 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4 _2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cd p_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd av x512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pt s flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 7200.00 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:
I am also experiencing this issue. Some variations I have tried: 1) Using Linux 4.19.16 and 4.19.17 as built from the Fedora 29 SRPM as a reference both always fail identical to the original reported: "KVM: entry failed, hardware error 0x7" 2) Using Linux 4.14.79, 4.14.93, and several other updates all work. 3) Using Qemu 2.12, Qemu 3.0, and Qemu 3.1 on the L0 host does not change the outcome. 4) Using Qemu 2.10, Qemu 2.12 on the L1 host does not change the outcome. 5) If I disable "enable_apicv" in the L1 host, it starts working. I can "virsh destroy" the L2 guest and switch this value on or off and see it work and fail without restarting the L1 host. However, I don't know that this provides it is the virtual APIC support that is broken as this could just tickle the timing and avoid a race condition of some sort? 6) If I disable KVM, and use Qemu software emulation, it starts working. In my case, it happens to fail during virt-install right after starting the L2 guest.
Also: 7) The behaviour stays the same whether the guest is running an RHEL 7.6 kernel, or Linux 4.14.93, or Linux 4.19.17. The guest kernel doesn't seem to be a factor. All of this suggests to me a problem introduced somewhere between Linux 4.14 and 4.19.
Another report of this same issue: https://stackoverflow.com/questions/54110025/nested-virtualization-kvm-entry-failed-hardware-error-0x7 , as well as a follow up here: https://superuser.com/questions/1395752/nested-virtualization-kvm-entry-failed-hardware-error-0x7
I tried Fedora 29's Linux kernel configuration for 4.20.5, and nested worked on the first try. If somebody else can verify that 4.20.x works, then I suppose this isn't a Fedora 29 issue but a Linux 4.19.y issue. However, I can still reproduce failure on 4.19.18, and if somebody could direct me on how to debug this, I would like to provide feedback to the kernel developers to fix this in the 4.19.y LTS?
After reviewing the changes in 4.20.5, that are not in 4.19.18, I was able to identify the patch that fixes the problem for my use case: https://github.com/torvalds/linux/commit/22a7cdcae6a4a3c8974899e62851d270956f58ce After applying this patch, virt-install within L1 works again. As Fedora 29 has now moved on to 4.20.3+, which includes this patch, I think this issue could be closed once the original author confirms that installing 4.20.3+ fixes the problem for them as well. However, I would like to get this patch into 4.19.y LTS. Can somebody familiar with how to do this please let me know what needs to be done to start this process?
Thanks for the bisect. If you want that to go in 4.19, send an e-mail to stable.org with the commit hash, which branches you want it applied to and a short summary of the bug it fixes.
Thanks, Laura! The change is queued for 4.19.y. Hemant: Does 4.20.3+ resolve your symptoms as well?
Yeah I can confirm that this is fixed in 4.20.3+ kernel on Fedora 29.
Thanks for letting us know. I'm going to close the bug. Please open a new bug if the problem shows up again.