Bug 1592276
Summary: | EPYC-IBPB not working with Windows 1803 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michael Lipp <mnl> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | agedosier, airlied, berrange, bskeggs, clalancette, crobinso, ehabkost, ewk, hdegoede, ichavero, itamar, jarodwilson, jcm, jeremy, jforbes, jglisse, john.j5live, jonathan, josef, kernel-maint, laine, lersek, libvirt-maint, linville, mchehab, mikhail.v.gavrilov, mjg59, mnl, steved, veillard, yakman2020 |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-02-18 18:12:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michael Lipp
2018-06-18 10:24:21 UTC
(In reply to Michael Lipp from comment #0) > > The virtual machine created when selecting Windows 10 as guest configures > cpu EPYC-IBPB. This does not work. > > Strange enough, providing the EPYC-IBPB configuration as a delta to an > Opteron_G5 (using the information from cpu.xml) works, i.e. this boots: This changes the CPU family reported to the guest, which suggests the guest OS has some specific logic its applying for EPYC family, that it doesn't do for Opteron_G5 family. There was someone on the QEMU IRC channel last week complaining of the same problem. They suggested that adding the 'virt-ssbd' feature fixed the problem. eg <cpu mode='custom'> <model fallback='forbid'>EPYC</model> <feature policy='require' name='virt-ssbd'/> </cpu> This 'virt-ssbd' feature is for fixing a recent CPU vulnerability on x86. Unfortunately that feature is not yet available in the Fedora RPMs, so its not easy to test it right now. (In reply to Michael Lipp from comment #0) > Description of problem: > > Windows 1803 does not boot from the (downloaded) installation DVD with the > predefined configuration for a Windows 10 guest (BSOD "System Thread > exception not handled"). > > Version-Release number of selected component (if applicable): > > libvirt is 3.7.0-4 running on fc27. [snip] > The virtual machine created when selecting Windows 10 as guest configures > cpu EPYC-IBPB. This does not work. Are you sure that is correct - neither libvirt 3.7.0, or QEMU in Fedora 27 provide an EPYC CPU, nor a EPYC-IBPB CPU, so I'm not sure how you can be using them with standard Fedora packages. Have you installed newer libvirt/qemu by chance ? Can you confirm a baseline EPYC without the -IBPB variant is booting ok? (In reply to Daniel Berrange from comment #2) > Are you sure that is correct - neither libvirt 3.7.0, or QEMU in Fedora 27 > provide an EPYC CPU, nor a EPYC-IBPB CPU, so I'm not sure how you can be > using them with standard Fedora packages. > > Have you installed newer libvirt/qemu by chance ? Very sorry about that. The server is newly installed and *FC28*. libvirt/qemu are the vanilla fc28 packages. (My everyday working-horse guest system running on the server is still fc27, must be the reason for my confusion.) (In reply to Jon Masters from comment #3) > Can you confirm a baseline EPYC without the -IBPB variant is booting ok? Quite the opposite. Configuring EPYC instead of EPYC-IBPB was the first thing I tried to work around the problem. Doesn't work with EPYC either. Michael, can you check your host dmesg when the guest crashes? Also, can you try to load the "kvm" module with "ignore_msrs=1"? (See this thread on vfio-users: <https://www.redhat.com/archives/vfio-users/2018-May/msg00004.html>.) Thanks. (In reply to Michael Lipp from comment #4) > (In reply to Daniel Berrange from comment #2) > > Are you sure that is correct - neither libvirt 3.7.0, or QEMU in Fedora 27 > > provide an EPYC CPU, nor a EPYC-IBPB CPU, so I'm not sure how you can be > > using them with standard Fedora packages. > > > > Have you installed newer libvirt/qemu by chance ? > > Very sorry about that. The server is newly installed and *FC28*. > libvirt/qemu are the vanilla fc28 packages. > > (My everyday working-horse guest system running on the server is still fc27, > must be the reason for my confusion.) No worries, that's in fact good ! As Laszlo asks, could you check if you see anything in dmesg / systemd journal that is related to KVM, and/or any warnings in /var/log/libvirt/qemu/$GUEST.log when you get the crashed guest. I've just built updates for Fedora 28 that provide the new virt-ssbd feature flag https://bodhi.fedoraproject.org/updates/qemu-2.11.1-3.fc28 https://bodhi.fedoraproject.org/updates/libvirt-4.1.0-3.fc28 If you install those and ensure you're running kernel >= 4.16.10-301 then you should be able to add the virt-ssbd feature flag to your guest XML CPU config. It shouldn't require any microcode changes to use virt-ssbd. Assuming our hypothesis is correct, virt-ssbd feature should fix the guest, but the ignore_msrs=1 suggestion may well also fix it. Would be good if you are able to confirm both. 1) Nothing special in the logs (just the usual startup messages). Actually, there is no "crash" of the guest. The guest enters a reboot loop with the BSOD, which it performs very reliably and without any messages showing up in the journal. 2) "kvm" module with "ignore_msrs=1" fixes things. 3) I've downloaded the new RPMs and updated all packages that had been installed: # rpm -qa | fgrep qemu- ipxe-roms-qemu-20170710-3.git0600d3ae.fc28.noarch qemu-block-nfs-2.11.1-3.fc28.x86_64 qemu-system-x86-core-2.11.1-3.fc28.x86_64 qemu-kvm-2.11.1-3.fc28.x86_64 qemu-block-iscsi-2.11.1-3.fc28.x86_64 qemu-block-rbd-2.11.1-3.fc28.x86_64 qemu-img-2.11.1-3.fc28.x86_64 qemu-common-2.11.1-3.fc28.x86_64 qemu-block-curl-2.11.1-3.fc28.x86_64 qemu-block-gluster-2.11.1-3.fc28.x86_64 qemu-block-ssh-2.11.1-3.fc28.x86_64 qemu-system-x86-2.11.1-3.fc28.x86_64 libvirt-daemon-driver-qemu-4.1.0-3.fc28.x86_64 qemu-block-dmg-2.11.1-3.fc28.x86_64 # rpm -qa | fgrep libvirt- libvirt-daemon-4.1.0-3.fc28.x86_64 libvirt-daemon-config-nwfilter-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-mpath-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-libxl-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-nodedev-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-logical-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-xen-4.1.0-3.fc28.x86_64 libvirt-devel-4.1.0-3.fc28.x86_64 libvirt-glib-1.0.0-5.fc28.x86_64 libvirt-daemon-driver-network-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-secret-4.1.0-3.fc28.x86_64 libvirt-daemon-config-network-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-sheepdog-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-rbd-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-vbox-4.1.0-3.fc28.x86_64 libvirt-client-4.1.0-3.fc28.x86_64 python2-libvirt-4.1.0-1.fc28.x86_64 libvirt-libs-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-nwfilter-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-interface-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-gluster-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-disk-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-iscsi-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-uml-4.1.0-3.fc28.x86_64 libvirt-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-qemu-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-scsi-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-4.1.0-3.fc28.x86_64 libvirt-daemon-kvm-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-core-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-lxc-4.1.0-3.fc28.x86_64 libvirt-daemon-driver-storage-zfs-4.1.0-3.fc28.x86_64 libvirt-bash-completion-4.1.0-3.fc28.x86_64 I restarted the system to make sure that everything is "in place". Then I've changed the configuration to: <cpu mode='custom' match='exact' check='partial'> <model fallback='forbid'>EPYC-IBPB</model> <topology sockets='1' cores='4' threads='1'/> <feature policy='require' name='virt-ssbd'/> </cpu> This does NOT fix the problem. Hello Michael, your results are fully consistent with my own test results. The issue seems to be that the Windows guest attempts to read the MSR (Model Specific Register) C001_102C. From a KVM trace I captured: CPU-14681 [017] 2685.221364: kvm_msr: msr_read c001102c = 0x0 (#GP) Note the "#GP". According to the latest publicly available AMD documentation [*], there is no such MSR. So this looks like a Windows bug to me. (Or, maybe a KVM bug that misleads Windows to read this MSR? I'm unsure.) [*] See AMD publication "Preliminary Processor Programming Reference (PPR) for AMD Family 17h Models 00h-0Fh Processors" <https://developer.amd.com/resources/developer-guides-manuals/>, section "2.1.12.5 MSRs - MSRC001_1xxx". The last MSR before C001_102C is C001_1027 (DR0_ADDR_MASK, [Address Mask For DR0 Breakpoints]), while the first MSR after C001_102C is C001_1030 (IBS_FETCH_CTL, [IBS Fetch Control]). There's one match on Google for "MSRc001102c": http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209-mtrr.txt So it looks like this MSR exists and is readable on some hosts, we can ask AMD for help figuring out what it's supposed to contain. Should we ping AMD about this? (In reply to Jon Masters from comment #3) > Can you confirm a baseline EPYC without the -IBPB variant is booting ok? I can confirm it is not. (In reply to Jon Masters from comment #3) > Can you confirm a baseline EPYC without the -IBPB variant is booting ok? I can confirm it is not.(In reply to Daniel Berrange from comment #7) > (In reply to Michael Lipp from comment #4) > > (In reply to Daniel Berrange from comment #2) > > > Are you sure that is correct - neither libvirt 3.7.0, or QEMU in Fedora 27 > > > provide an EPYC CPU, nor a EPYC-IBPB CPU, so I'm not sure how you can be > > > using them with standard Fedora packages. > > > > > > Have you installed newer libvirt/qemu by chance ? > > > > Very sorry about that. The server is newly installed and *FC28*. > > libvirt/qemu are the vanilla fc28 packages. > > > > (My everyday working-horse guest system running on the server is still fc27, > > must be the reason for my confusion.) > > > No worries, that's in fact good ! > > As Laszlo asks, could you check if you see anything in dmesg / systemd > journal that is related to KVM, and/or any warnings in > /var/log/libvirt/qemu/$GUEST.log when you get the crashed guest. > > I've just built updates for Fedora 28 that provide the new virt-ssbd feature > flag > > https://bodhi.fedoraproject.org/updates/qemu-2.11.1-3.fc28 > https://bodhi.fedoraproject.org/updates/libvirt-4.1.0-3.fc28 > > If you install those and ensure you're running kernel >= 4.16.10-301 then > you should be able to add the virt-ssbd feature flag to your guest XML CPU > config. It shouldn't require any microcode changes to use virt-ssbd. > > Assuming our hypothesis is correct, virt-ssbd feature should fix the guest, > but the ignore_msrs=1 suggestion may well also fix it. Would be good if you > are able to confirm both. I can confirm that on a Threaripper 1920x the ignore_msrs workaround doesn/t The system hangs then BSODs with a watchdog timeout. AMD has confirmed this MSR access is acceptable, albeit unexpected, so KVM kernel module needs enhancing to handle this MSR (probably by ignoring writes to it). *** Bug 1615160 has been marked as a duplicate of this bug. *** Has there been any motion on this? We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs. Fedora 28 has now been rebased to 4.18.10-300.fc28. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29. If you experience different issues, please open a new bug report for those. Issue still reproduced on $ uname -r 4.19.0-0.rc6.git4.1.fc30.x86_64 There's a RHEL bug tracking this as well: https://bugzilla.redhat.com/show_bug.cgi?id=1593190 FYI this was fixed upstream in: commit 0e1b869fff60c81b510c2d00602d778f8f59dd9a Author: Eduardo Habkost <ehabkost> Date: Mon Dec 17 22:34:18 2018 -0200 kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Cc: stable.org Signed-off-by: Eduardo Habkost <ehabkost> Signed-off-by: Paolo Bonzini <pbonzini> which is part of the Linux v4.20 release. This version is already shipped in Fedora updates, so I'm going to mark this as closed now. |