Bug 1657296 - Nested virtualization broken on latest Fedora29 kernel
Summary: Nested virtualization broken on latest Fedora29 kernel
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-07 15:52 UTC by Hemant Kumar
Modified: 2019-01-29 05:30 UTC (History)
18 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-01-29 05:30:44 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Hemant Kumar 2018-12-07 15:52:03 UTC
Description of problem:

It appears that on 4.19.x series nested virtualization on my Intel 7820x CPU doesn't work anymore. 

On same hardware with previous kernel, I was using nested virtualization happily and now I get:

2018-12-06 23:52:05.167+0000: starting up libvirt version: 4.7.0, package: 1.fc29 (Fedora Project, 2018-09-04-10:29:06, ), qemu version: 3.0.0qemu-3.0.0-2.fc29, kernel: 4.18.16-300.fc29.x86_64, hostname: openshift-libvirt.lan
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -name guest=test1-bootstrap,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-test1-bootstrap/master-key.aes -machine pc-i440fx-3.0,accel=kvm,usb=off,dump-guest-core=off -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 5ba43ae2-c1ff-483e-bdbe-ea5b8c5a5bfb -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/lib/libvirt/images/test1-bootstrap,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3a:b9:7b:c3:28:a4,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charchannel0 -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -fw_cfg name=opt/com.coreos/config,file=/var/lib/libvirt/images/test1-bootstrap.ign -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
2018-12-06 23:52:05.167+0000: Domain id=1 is tainted: custom-argv
2018-12-06T23:52:05.225201Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0)
2018-12-06T23:52:05.225464Z qemu-system-x86_64: -chardev pty,id=charchannel0: char device redirected to /dev/pts/4 (label charchannel0)
KVM: entry failed, hardware error 0x7
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=06 66 05 00 00 01 00 8e c1 26 66 a3 74 f7 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00



Version-Release number of selected component (if applicable):

Linux xaos.lan 4.19.6-300.fc29.x86_64 #1 SMP Sun Dec 2 17:33:14 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:

Always


Steps to Reproduce:
1. Enable nested virtualization via options kvm_intel nested=1
2. Started a VM using virt-manager by choosing "host-passthrough" CPU model.
3. Try booting up another VM inside the VM and nested VM gets paused.

Actual results:

Nested VM doesn't start and fails with above error.


Expected results:

Nested VM should start.

Comment 1 Hemant Kumar 2018-12-07 15:53:52 UTC
rocessor       : 0                                                                                                    
vendor_id       : GenuineIntel                                                                                         
cpu family      : 6                                                                                                    
model           : 85                                                                                                   
model name      : Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz                                                             
stepping        : 4                                                                                                    
microcode       : 0x200004d                                                                                            
cpu MHz         : 1200.206                                                                                             
cache size      : 11264 KB
physical id     : 0                                                        
siblings        : 16     
core id         : 0                                                                                                    
cpu cores       : 8                                                                                                    
apicid          : 0                                                                                                    
initial apicid  : 0                                                                                                    
fpu             : yes                                                                                                  
fpu_exception   : yes                                                                                                  
cpuid level     : 22                                                                                                   
wp              : yes                                                                                                  
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
 sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_
tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4
_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cd
p_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 
hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd av
x512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pt
s flush_l1d              
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf                                            bogomips        : 7200.00                                                                                              clflush size    : 64                                                                                                   cache_alignment : 64                                                                                                   address sizes   : 46 bits physical, 48 bits virtual                                                                    
power management:                                                                                                                                                                                                                             processor       : 1                                                                                                    
vendor_id       : GenuineIntel                                                                                         
cpu family      : 6                                                                                                    
model           : 85                                                                                                   
model name      : Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz                                                             
stepping        : 4                                                                                                    
microcode       : 0x200004d                                                                                            
cpu MHz         : 1200.124                                                                                             
cache size      : 11264 KB
physical id     : 0                                                        
siblings        : 16     
core id         : 1                                                                                                    
cpu cores       : 8                                                                                                    
apicid          : 2                                                                                                    
initial apicid  : 2                                                                                                    
fpu             : yes                                                                                                  
fpu_exception   : yes                                                                                                  
cpuid level     : 22                                                                                                   
wp              : yes                                                                                                  
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
 sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_
tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4
_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cd
p_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 
hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd av
x512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pt
s flush_l1d              
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf                                            bogomips        : 7200.00                                                                                              clflush size    : 64                                                                                                   cache_alignment : 64                                                                                                   address sizes   : 46 bits physical, 48 bits virtual                                                                    
power management:

Comment 2 Mark Mielke 2019-01-25 03:20:04 UTC
I am also experiencing this issue. Some variations I have tried:

1) Using Linux 4.19.16 and 4.19.17 as built from the Fedora 29 SRPM as a reference both always fail identical to the original reported: "KVM: entry failed, hardware error 0x7"

2) Using Linux 4.14.79, 4.14.93, and several other updates all work.

3) Using Qemu 2.12, Qemu 3.0, and Qemu 3.1 on the L0 host does not change the outcome.

4) Using Qemu 2.10, Qemu 2.12 on the L1 host does not change the outcome.

5) If I disable "enable_apicv" in the L1 host, it starts working. I can "virsh destroy" the L2 guest and switch this value on or off and see it work and fail without restarting the L1 host. However, I don't know that this provides it is the virtual APIC support that is broken as this could just tickle the timing and avoid a race condition of some sort?

6) If I disable KVM, and use Qemu software emulation, it starts working.

In my case, it happens to fail during virt-install right after starting the L2 guest.

Comment 3 Mark Mielke 2019-01-25 03:23:04 UTC
Also:

7) The behaviour stays the same whether the guest is running an RHEL 7.6 kernel, or Linux 4.14.93, or Linux 4.19.17. The guest kernel doesn't seem to be a factor.

All of this suggests to me a problem introduced somewhere between Linux 4.14 and 4.19.

Comment 5 Mark Mielke 2019-01-27 20:44:44 UTC
I tried Fedora 29's Linux kernel configuration for 4.20.5, and nested worked on the first try. If somebody else can verify that 4.20.x works, then I suppose this isn't a Fedora 29 issue but a Linux 4.19.y issue. However, I can still reproduce failure on 4.19.18, and if somebody could direct me on how to debug this, I would like to provide feedback to the kernel developers to fix this in the 4.19.y LTS?

Comment 6 Mark Mielke 2019-01-28 03:09:16 UTC
After reviewing the changes in 4.20.5, that are not in 4.19.18, I was able to identify the patch that fixes the problem for my use case:

https://github.com/torvalds/linux/commit/22a7cdcae6a4a3c8974899e62851d270956f58ce

After applying this patch, virt-install within L1 works again.

As Fedora 29 has now moved on to 4.20.3+, which includes this patch, I think this issue could be closed once the original author confirms that installing 4.20.3+ fixes the problem for them as well.

However, I would like to get this patch into 4.19.y LTS. Can somebody familiar with how to do this please let me know what needs to be done to start this process?

Comment 7 Laura Abbott 2019-01-28 05:28:46 UTC
Thanks for the bisect. If you want that to go in 4.19, send an e-mail to stable.org with the commit hash, which branches you want it applied to and a short summary of the bug it fixes.

Comment 8 Mark Mielke 2019-01-28 14:20:49 UTC
Thanks, Laura! The change is queued for 4.19.y.

Hemant: Does 4.20.3+ resolve your symptoms as well?

Comment 9 Hemant Kumar 2019-01-28 17:42:17 UTC
Yeah I can confirm that this is fixed in 4.20.3+ kernel on Fedora 29.

Comment 10 Laura Abbott 2019-01-29 05:30:44 UTC
Thanks for letting us know. I'm going to close the bug. Please open a new bug if the problem shows up again.


Note You need to log in before you can comment on or make changes to this bug.