Bug 1657296
Summary: | Nested virtualization broken on latest Fedora29 kernel | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Hemant Kumar <hekumar> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 29 | CC: | airlied, bskeggs, ewk, hdegoede, ichavero, itamar, jarodwilson, jglisse, john.j5live, jonathan, josef, kernel-maint, labbott, linville, mark, mchehab, mjg59, steved |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-01-29 05:30:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hemant Kumar
2018-12-07 15:52:03 UTC
rocessor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz stepping : 4 microcode : 0x200004d cpu MHz : 1200.206 cache size : 11264 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4 _2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cd p_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd av x512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pt s flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 7200.00 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz stepping : 4 microcode : 0x200004d cpu MHz : 1200.124 cache size : 11264 KB physical id : 0 siblings : 16 core id : 1 cpu cores : 8 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4 _2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cd p_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd av x512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pt s flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf bogomips : 7200.00 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: I am also experiencing this issue. Some variations I have tried: 1) Using Linux 4.19.16 and 4.19.17 as built from the Fedora 29 SRPM as a reference both always fail identical to the original reported: "KVM: entry failed, hardware error 0x7" 2) Using Linux 4.14.79, 4.14.93, and several other updates all work. 3) Using Qemu 2.12, Qemu 3.0, and Qemu 3.1 on the L0 host does not change the outcome. 4) Using Qemu 2.10, Qemu 2.12 on the L1 host does not change the outcome. 5) If I disable "enable_apicv" in the L1 host, it starts working. I can "virsh destroy" the L2 guest and switch this value on or off and see it work and fail without restarting the L1 host. However, I don't know that this provides it is the virtual APIC support that is broken as this could just tickle the timing and avoid a race condition of some sort? 6) If I disable KVM, and use Qemu software emulation, it starts working. In my case, it happens to fail during virt-install right after starting the L2 guest. Also: 7) The behaviour stays the same whether the guest is running an RHEL 7.6 kernel, or Linux 4.14.93, or Linux 4.19.17. The guest kernel doesn't seem to be a factor. All of this suggests to me a problem introduced somewhere between Linux 4.14 and 4.19. Another report of this same issue: https://stackoverflow.com/questions/54110025/nested-virtualization-kvm-entry-failed-hardware-error-0x7 , as well as a follow up here: https://superuser.com/questions/1395752/nested-virtualization-kvm-entry-failed-hardware-error-0x7 I tried Fedora 29's Linux kernel configuration for 4.20.5, and nested worked on the first try. If somebody else can verify that 4.20.x works, then I suppose this isn't a Fedora 29 issue but a Linux 4.19.y issue. However, I can still reproduce failure on 4.19.18, and if somebody could direct me on how to debug this, I would like to provide feedback to the kernel developers to fix this in the 4.19.y LTS? After reviewing the changes in 4.20.5, that are not in 4.19.18, I was able to identify the patch that fixes the problem for my use case: https://github.com/torvalds/linux/commit/22a7cdcae6a4a3c8974899e62851d270956f58ce After applying this patch, virt-install within L1 works again. As Fedora 29 has now moved on to 4.20.3+, which includes this patch, I think this issue could be closed once the original author confirms that installing 4.20.3+ fixes the problem for them as well. However, I would like to get this patch into 4.19.y LTS. Can somebody familiar with how to do this please let me know what needs to be done to start this process? Thanks for the bisect. If you want that to go in 4.19, send an e-mail to stable.org with the commit hash, which branches you want it applied to and a short summary of the bug it fixes. Thanks, Laura! The change is queued for 4.19.y. Hemant: Does 4.20.3+ resolve your symptoms as well? Yeah I can confirm that this is fixed in 4.20.3+ kernel on Fedora 29. Thanks for letting us know. I'm going to close the bug. Please open a new bug if the problem shows up again. |