Bug 1853447
Summary: | Guest IA32_SPEC_CTRL wrmsr failure on AMD processors that support STIBP but don't support for IBRS | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Maxim Levitsky <mlevitsk> |
Component: | kernel | Assignee: | Maxim Levitsky <mlevitsk> |
kernel sub component: | KVM | QA Contact: | Yumei Huang <yuhuang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | high | CC: | chayang, dgilbert, nanliu, virt-maint, wei.huang2, yuhuang |
Version: | 8.2 | ||
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-4.18.0-233.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-04 01:24:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1858147, 1869125 |
Description
Maxim Levitsky
2020-07-02 17:26:06 UTC
Patch posted upstream: https://lore.kernel.org/kvm/20200702174455.282252-1-mlevitsk@redhat.com/T/#u Hi Maxim, I have been trying to reproduce the issue with nested vm, but there is no such error in L2 guest. Would you please help check my steps and find out if there is something wrong? Thanks in advance. Test details: - Host: kernel-4.18.0-222.el8.x86_64 amd-daytona-03.khw1.lab.eng.bos.redhat.com # cpuid -r -1 | grep 0x80000008 0x80000008 0x00: eax=0x00003030 ebx=0x018cf757 ecx=0x0000707f edx=0x00010000 # lscpu | grep stibp | grep ibrs Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca Update qemu source code(qemu-kvm-4.2.0-30.module+el8.3.0+7298+c26a06b8) as below: diff --git a/target/i386/cpu.c b/target/i386/cpu.c index a343de0c9d..81aa354021 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1133,7 +1133,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { "clzero", NULL, "xsaveerptr", NULL, NULL, NULL, NULL, NULL, NULL, "wbnoinvd", NULL, NULL, - "ibpb", NULL, NULL, NULL, + "ibpb", "amd-ibrs", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, "amd-ssbd", "virt-ssbd", "amd-no-ssb", NULL, Boot L1 guest with the updated qemu with "-cpu host,+topoext", get cpuid and cpu flags in L1 guest as below. # cpuid -r -1 | grep 0x80000008 0x80000008 0x00: eax=0x00003030 ebx=0x02001205 ecx=0x00000007 edx=0x00000000 # lscpu | grep stibp | grep ibrs Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd virt_ssbd arat npt nrip_save umip rdpid arch_capabilities Then boot L2 guest inside L1 guest with "-smp 16,maxcpus=16,cores=4,threads=2,dies=1,sockets=2 -cpu host,+topoext", L2 guest can boot up without error, no call trace info in dmesg. It looks like your host does support Please disregard the patch I asked to apply to qemu, it should work without it as well (my mistake, I haven't wrapped my head around how qemu deals with unknown cpuid bits, now I think I understand it correctly, it just zeros them in the guest cpuid, which is what we want anyway in this case) On top of that the change you applied is incorrect since it 'inserts' a new flag in the list, while position of each flag must match the bit index used in hardware. So just try the same but without any patches to qemu. IN addition to that your host supports both IBRS and STIBP thus the bug can't be reproduced on the host. To reproduce it in the nested, you need your L1 guest to support STIBP but not IBRS, which isn't what currently is the case. To achieve this I think you should boot the L1 guest with -cpu host,+topoext,-spec-ctrl (This disables the intel specific CPUID bit for IBRS which we set when host supports IBRS, in addition to that, AMD specific bit 'amd-ibrs', which your qemu doesn't yet support anyway, it zeroed anyway so it is not passed to the guest) Once you verified that L1 guest supports STIBP and doesn't support IBRS, then boot L2 guest as you did, and you should hit the issue in the L2 guest boot. And finally a note that if we verify that server AMD cpus (e.g EPYCs) do support both IBRS and STIBP, then we can lower priority of this bug. It is especially interesting to see CPU flags of a EPYC ROME CPU to see if it is affected by this. Looks like EPYC ROME cpus are not affected by this (support both mitigations), from one report I got. $ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7402 24-Core Processor stepping : 0 microcode : 0x830101c cpu MHz : 3267.374 cache size : 512 KB physical id : 0 siblings : 48 core id : 0 cpu cores : 24 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 5600.02 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] Thanks Maxim. Update test steps as you suggested as below. 1. Boot L1 guest with "-cpu host,+topoext,-spec-ctrl", there is no ibrs flag in L1 guest. # lscpu | grep Flags Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_save umip rdpid arch_capabilities 2. Boot L2 guest with "-smp 16,maxcpus=16,cores=4,threads=2,dies=1,sockets=2 -cpu host,+topoext", got following call trace inside L2 guest. [ 8.373253] unchecked MSR access error: WRMSR to 0x48 (tried to write 0x0000000000000006) at rIP: 0xffffffffad063f34 (native_write_msr+0x4/0x20) [ 8.373255] Call Trace: [ 8.373305] speculation_ctrl_update+0x78/0x1f0 [ 8.373310] speculation_ctrl_update_current+0x1b/0x20 [ 8.373313] ssb_prctl_set+0xb2/0xd0 [ 8.373315] arch_seccomp_spec_mitigate+0x27/0x40 [ 8.373319] do_seccomp+0x691/0x6e0 [ 8.373323] do_syscall_64+0x5b/0x1a0 [ 8.373329] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 8.373332] RIP: 0033:0x7fca8c24978d The call trace info is different from comment 0, but same to https://bugzilla.redhat.com/show_bug.cgi?id=1808996#c0. However, the write value (0x0000000000000006) is same to yours. Do you think it's reproduced? The call trace is most likely different because the kernel only complains once per MSR address about faults to access it, the new call trace is known to me too. The MSR value is good, it indicates that you hit the issue in this bug and not the issue in the original bug. For reference, 0x4 is SSBD mitigation, and 0x2 is the STIBP mitigation, and in the original bug report which was related to SSBD only the write was 0x4, but here it is 0x6 indicating that guest wants to enable both. Thanks for checking the EPYC processors. This means that this issue is low priority since it only seems to affect desktop Ryzens. It still should be fixed though, and I meanwhile posted another patch upstream for it, as was suggested by upstream developers. In fact I see now that my patch is now accepted upstream (in kvm/queue branch of kvm.git) Thanks Maxim, I'm lowering the priority. What's the state of this bz? Is this the patch that went in downstream as 5d8739265af9c - if so I think that means it's in current downstream? (I ask because of bz 1858147) It is in kvm/next branch and will be soon merged upstream IMHO. I talked about this bug with Paulo and we sort of reached the decision to leave it to be backported as a part of 'mass' backport of KVM code which will happen eventualy, since this issue seems to be specific to desktop processors. Best regards, Maxim Levitsky (In reply to Maxim Levitsky from comment #11) > It is in kvm/next branch and will be soon merged upstream IMHO. > > I talked about this bug with Paulo and we sort of reached the decision to > leave it > to be backported as a part of 'mass' backport of KVM code which will happen > eventualy, > since this issue seems to be specific to desktop processors. > > Best regards, > Maxim Levitsky Do you have a link to the exact patch? I'm curious whether it's related to the windows case because it feels very close. https://lkml.org/lkml/2020/7/8/597 As I said in this bug description, this bug is the same bug for Linux and Windows. The only difference is that on Linux it was always possible to trigger it since Linux, will start using STIBP as soon as SMT is enabled and it detects Intel's or AMD's CPUID bit for it, but on Windows running on AMD, they only check AMD specific bit which got exposed with EPYC ROME enablement patch via 'amd-stibp' cpuid feature. The root cause for both is that guest uses STIBP (which is OK), but we give it nice #GP, when it does because IBRS is not indicated as supported and we were checking it for the #GP condition. It probably just another microcode bug btw - I bet that my CPU does support IBRS, but just the bit in CPUID is not set for some reason. Belive me or not by on my CPU also bit for x2apic is not set, but it does have it - I patched the kernel to ignore cpuid bit and it it works. Not that I need x2apic that much since AVIC doesn't support it, but still nice to have. (In reply to Maxim Levitsky from comment #13) > https://lkml.org/lkml/2020/7/8/597 > > As I said in this bug description, this bug is the same bug for Linux and > Windows. > The only difference is that on Linux it was always possible to trigger it > since Linux, > will start using STIBP as soon as SMT is enabled and it detects Intel's or > AMD's CPUID > bit for it, but on Windows running on AMD, they only check AMD specific bit > which > got exposed with EPYC ROME enablement patch via 'amd-stibp' cpuid feature. > > The root cause for both is that guest uses STIBP (which is OK), but we give > it nice #GP, > when it does because IBRS is not indicated as supported and we were checking > it for the #GP condition. > It probably just another microcode bug btw - I bet that my CPU does support > IBRS, but just the bit > in CPUID is not set for some reason. > > Belive me or not by on my CPU also bit for x2apic is not set, but it does > have it - I patched the > kernel to ignore cpuid bit and it it works. Not that I need x2apic that much > since AVIC doesn't support > it, but still nice to have. In https://lkml.org/lkml/2020/7/8/597, you said: "One such case is when host CPU supports STIBP mitigation but doesn't support IBRS (as is the case with some Zen2 AMD cpus), " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Did you get this info based on AMD Public PPR? > Did you get this info based on AMD Public PPR?
Nope, this is just an observation from my system, plus multiple complains on reddit about the same issue.
But woah, since this happens on EPYC, then I'll backport this fix right away!
Setting dev_ack+ and ITR=8.3.0 due to comment 17 and adding bug 1858147 to blocks based on comment 15 Patch(es) available on kernel-4.18.0-233.el8 *** Bug 1858147 has been marked as a duplicate of this bug. *** Since your host system seems to support both STIBP and IBRS it won't trigger the bug. I would suggest you to take any AMD system that has working nesting and reproduce there. The way I setup the test is that L1 creates the STIBP+!IBRS configuration thus simulating the real hardware that has it, and L2 shows the WRMSR error (or not after my fix) on that. (In reply to Maxim Levitsky from comment #27) > Since your host system seems to support both STIBP and IBRS it won't trigger > the bug. > I would suggest you to take any AMD system that has working nesting and > reproduce there. > > The way I setup the test is that L1 creates the STIBP+!IBRS configuration > thus simulating the real hardware > that has it, and L2 shows the WRMSR error (or not after my fix) on that. Hi Maxim, I searched in beaker, we don't have amd systems which support stibp and don't support ibrs. And we only have three kinds of AMD systems - Milan, Rome and Ryzen 5 Pro, that support both stibp and ibrs. However, they all have the issue with nesting (bug 1878097). As the errata deadline is approaching, I guess we have two options here. 1. Verify this bug as the test for duplicated bug 1858147 has passed in comment 25. 2. Wait for the nesting bug to be fixed, then we need to drop this bug from errata. Which one do you prefer? Let's see if I can "answer" for Maxim here... Personally, I prefer option 1 as we are fixing something that occurred and we have 2 examples of that. AIUI, the idea is that it is not possible to validate now because bug 1878097 is blocking that (a nesting bug). So let's ask this - if you tested using the steps from bug 1858147, then would that bug be fixed now? Similarly, what about bug 1869125 - if you re-ran those steps, would that bug be fixed now? If the answer to both is yes they're fixed, then I believe we declare success for this too using those two as the reference. Neither of those from my reading note anything about nesting. The concept of nesting was there to provide a mechanism to "show" how to see the the problem with STIBP and IBRS without having any other means. IIRC, the reason why we kept this bug and made it the blocker for the other two is that for those two scenarios it was determined that by fixing this bug, then those two would also be resolved. Carrying this bug until the newer one is fixed just because we were using nesting to validate this one, doesn't feel right. I see in bug 1878097 that stibp and ibrs are set, so it's not even the same core condition. Yumei: Please try using https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=31523332 as your L1 kernel - that should help you work around bz 1878097 (only very lightly tested) to enable you to verify this one. (In reply to Dr. David Alan Gilbert from comment #31) > Yumei: Please try using > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=31523332 as your > L1 kernel - that should help you work around bz 1878097 (only very lightly > tested) to enable you to verify this one. Thanks David, this build works well without nesting bug, L2 guest can boot up without MSR access error on Milan system. But normally we are supposed to use an official downstream build when it comes to bug verification. This bug is against L1 kernel, so we can't verify it with a temporary build as L1 kernel. (In reply to John Ferlan from comment #30) > Let's see if I can "answer" for Maxim here... > > Personally, I prefer option 1 as we are fixing something that occurred and > we have 2 examples of that. > > AIUI, the idea is that it is not possible to validate now because bug > 1878097 is blocking that (a nesting bug). > > So let's ask this - if you tested using the steps from bug 1858147, then > would that bug be fixed now? > > Similarly, what about bug 1869125 - if you re-ran those steps, would that > bug be fixed now? Yes, bug 1858147 and 1869125 share the same scenario, and have been verified in comment 25. > > If the answer to both is yes they're fixed, then I believe we declare > success for this too using those two as the reference. > > Neither of those from my reading note anything about nesting. The concept of > nesting was there to provide a mechanism to "show" how to see the the > problem with STIBP and IBRS without having any other means. > > IIRC, the reason why we kept this bug and made it the blocker for the other > two is that for those two scenarios it was determined that by fixing this > bug, then those two would also be resolved. Carrying this bug until the > newer one is fixed just because we were using nesting to validate this one, > doesn't feel right. I see in bug 1878097 that stibp and ibrs are set, so > it's not even the same core condition. I agree. I'm moving to verified, thanks! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4431 |