Bug 1979276
Summary: | SVM: non atomic memslot updates cause boot failure with seabios and cpu-pm=on | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Maxim Levitsky <mlevitsk> | |
Component: | qemu-kvm | Assignee: | Emanuele Giuseppe Esposito <eesposit> | |
qemu-kvm sub component: | General | QA Contact: | liunana <nanliu> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | unspecified | |||
Priority: | low | CC: | adam, chayang, eesposit, jinzhao, juzhang, mrezanin, nanliu, nilal, pbonzini, peterx, virt-maint, vkuznets, yuhuang | |
Version: | 9.0 | Keywords: | Reopened, Triaged | |
Target Milestone: | beta | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-7.2.0-5.el9 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2161188 (view as bug list) | Environment: | ||
Last Closed: | 2023-05-09 07:19:27 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2161188 |
Description
Maxim Levitsky
2021-07-05 12:52:04 UTC
Note that I do consider commit e72436bc3a5206f95bb384e741154166ddb3202e to be correct in principle but I wonder if we need to do the same thing on VMX as well. Also note that the same VM boots on VMX (although same issue likely to be present there), it lacks the code that was added in the commit e72436bc3a5206f95bb384e741154166ddb3202e. Best regards, Maxim Levitsky Hi Maxim, I can't reproduce the issue. Guest can boot up normally with '-overcommit cpu-pm=on' with seabios. The test details are as below, would you please help check if anything missing, thanks! qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09 kernel-4.18.0-318.el8.x86_64 seabios-1.14.0-1.module+el8.5.0+11676+e3d1fd29.x86_64 QEMU cli: # /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -device pvpanic,ioport=0x505,id=id5SK4co \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,addr=0x3 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel850-64-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:39:3a:3b:3c:3d,id=idzyzw7g,vectors=4,netdev=idhia6GM,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idhia6GM \ -m 8192,slots=20,maxmem=80G \ -smp 64 \ -cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ -serial tcp:0:1234,server,nowait \ -monitor unix:/tmp/monitor3,server,nowait \ -qmp tcp:0:3333,server,nowait \ -overcommit cpu-pm=on As I mentioned this is only broken on SVM, because commit e72436bc3a5206f95bb384e741154166ddb3202e only added the code for SVM, although similar code probably is needed for VMX as well. On VMX the same issue exists but it doesn't cause the guest to fail boot because during the time the memslot is disabled, the affected VCPUs are in endless loop of nested page faults which ends when the memslot is re-installed. Thanks Maxim. I was able to reproduce on EPYC host. # /usr/libexec/qemu-kvm -overcommit cpu-pm=on -m 8G -smp 8 /home/kvm_autotest_root/images/rhel850-64-virtio-scsi.qcow2 -vnc :0 KVM internal error. Suberror: 1 emulation failure KVM internal error. Suberror: 1 emulation failure KVM internal error. Suberror: 1 emulation failure KVM internal error. Suberror: 1 emulation failure KVM internal error. Suberror: 1 emulation failure KVM internal error. Suberror: 1 emulation failure KVM internal error. Suberror: 1 emulation failure EAX=000f35fe EBX=00000000 ECX=000002ff EDX=00000006 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006d94 EIP=000fd08d EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ... We have discussed several times adding atomic memslot update with Peter. This seems to be a good reason to add it! Hi Paolo! I agree with that! However I would like to hear your opinion about revering this commit that I found until we have atomic memslots updates, on the ground that same thing isn't done on VMX anyway. If you agree to revert it, then I'll send a patch. If you think we should keep this commit, then to be consistent, we should do the same on VMX (or move that check to the common code), and then we need to fix the code in this commit to translate RIP to GPA, since otherwise the check it does can be wrong. I can send patches for both things. What do you think? Best regards, Maxim Levitsky Note: we already have https://bugzilla.redhat.com/show_bug.cgi?id=1855298 which seems to describe an issue with the same root cause. @Vitaly Kuznetsov. Yep, exactly, and it is yet another reason to have atomic memslot updates. Assigned to Rick for next level triage per bz process and age of bug created or assigned to virt-maint without triage. Bulk update: Move RHEL8 bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. Can reproduce this bug both on RHEL8.6.0 and RHEL.9.0.0 on Milan host. Test Env: qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949 4.18.0-356.el8.x86_64 qemu-kvm-6.2.0-1.el9.x86_64 5.14.0-32.el9.x86_64 Hi, all, I assume we need the new ioctl() that Paolo mentioned, say KVM_SET_USER_MEMORY_REGIONS (one more "S"), and it should simply just allow an array of kvm_userspace_memory_region structs? (I used to think that allowing memslot to contain holes would work, but it seems not if we simply mark memory regions RW/RO and switch between them.. and that idea is probably adding even more complexity without major gain..) The major (but IMHO not-so-big) challenge, afaict, is that we should allow __kvm_set_memory_region() to apply things upon a temporary kvm_memslots, then in the new ioctl() we'd need to not do install_new_memslots() to the VM until we applied all the memslot changes to the temporary one, so we do RCU assignment only once to replace the working set with the temp one. QEMU needs to teach the memory core to batch update in memory_region_transaction_commit() if we have that new kernel detected. But I haven't looked into any real details of it, so I can't tell whether there's other hidden challenges. Maxime/Vitaly, have you started looking into the details of it, yet? Let me know if I can help in any form; as I do see that we're piling up use cases for it so we'll probably need to tackle it ultimately, in any form.. Thanks, Peter (In reply to Peter Xu from comment #19) > Maxime/Vitaly, have you started looking into the details of it, yet? Let me > know if I can help in any form; as I do see that we're piling up use cases > for it so we'll probably need to tackle it ultimately, in any form.. I didn't do any real work yet, I have a BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1855298 which also needs this. Feel free to pick this up if you have some spare cycles! @Peter Xu, I agree 100% with everything you said, this is pretty much exactly what I know about this issue. I am not yet aware of any hidden challenges but as usual there might be. Best regards, Maxim Levitsky Thanks, Vitaly, Maxime. After I tried to remember more.. the initial issue I encountered was when sync dirty bitmap during memslot removal that'll be racy if we cannot do it atomically (sync+remove), because guest can still dirty data after sync but before removal. The thing to be investigated is whether above sync+remove issue can be a real problem of guest data loss during migration. Say, whether there is real use case of guest writing valid data during memory layout changes. Since I don't know all the reasons for memory layout changes so I cannot say.. I think we need to decide whether we need to fix that along, which comment 19 proposal may not be able to solve, afaiu, because sync is not set-user-mem. Any further thoughts on above from anyone would still be welcomed. At the meantime, I also hope I didn't miss any other relevant problems.. *** Bug 2041007 has been marked as a duplicate of this bug. *** Still can reproduce this issue on 9.1. Test Env: 5.14.0-127.el9.x86_64 qemu-kvm-7.0.0-8.el9.x86_64 libvirt-client-8.5.0-1.el9.x86_64 dell-per6515-04.lab.eng.pek2.redhat.com Model name: AMD EPYC 7302P 16-Core Processor @Emanuele: Since you currently work on this bug I assigned it to you. I also did some investigation on why we were not able to reproduce it with seabios and cpu-pm=on. So: - First of all as I said in Comment 2 (and we didn't notice this, I forgot about it), the KVM internal error happens in this case only on AMD because the patch that started it was AMD only. - Second, indeed the patch was reverted and somehow accepted without generating too much noise: commit 31c25585695abdf03d6160aa6d829e855b256329 Author: Sean Christopherson <seanjc> Date: Thu Jan 20 01:07:12 2022 +0000 Revert "KVM: SVM: avoid infinite loop on NPF from bad address" Revert a completely broken check on an "invalid" RIP in SVM's workaround for the DecodeAssists SMAP errata. kvm_vcpu_gfn_to_memslot() obviously expects a gfn, i.e. operates in the guest physical address space, whereas RIP is a virtual (not even linear) address. The "fix" worked for the problematic KVM selftest because the test identity mapped RIP. Fully revert the hack instead of trying to translate RIP to a GPA, as the non-SEV case is now handled earlier, and KVM cannot access guest page tables to translate RIP. This reverts commit e72436bc3a5206f95bb384e741154166ddb3202e. Fixes: e72436bc3a52 ("KVM: SVM: avoid infinite loop on NPF from bad address") Reported-by: Liam Merwick <liam.merwick> Cc: stable.org Signed-off-by: Sean Christopherson <seanjc> Reviewed-by: Liam Merwick <liam.merwick> Message-Id: <20220120010719.711476-3-seanjc> Signed-off-by: Paolo Bonzini <pbonzini> The underlying issue of course as we both know is still there. You might have luck reproducing it with this bug https://bugzilla.redhat.com/show_bug.cgi?id=1855298 But for me it looks like it is 'working' as well, so you might have to write a unit test to trigger the issue. The fix was done in userspace, rather than on KVM. Upstream patches: https://patchew.org/QEMU/20221111154758.1372674-1-eesposit@redhat.com/ It will be probably merged after the feature freeze, in qemu 8.0. While we wait, @liunana or anyone else if you want you can change the component/subcomponent and maybe BZ title (I don't know what is the correct category so I'll leave it to you). Thank you, Emanuele (In reply to Emanuele Giuseppe Esposito from comment #28) > The fix was done in userspace, rather than on KVM. > > Upstream patches: > https://patchew.org/QEMU/20221111154758.1372674-1-eesposit@redhat.com/ > > It will be probably merged after the feature freeze, in qemu 8.0. > > While we wait, @liunana or anyone else if you want you can change the > component/subcomponent and maybe BZ title (I don't know what is the correct > category so I'll leave it to you). > > Thank you, > Emanuele Thanks Emanuele. I can reproduce this issue with upstream qemu on EPYC Milan host: # ./qemu-system-x86_64 -overcommit cpu-pm=on -m 8G -smp 8 -monitor stdio -cpu host -machine accel=kvm QEMU 7.1.93 monitor - type 'help' for more information (qemu) VNC server running on ::1:5900 KVM internal error. Suberror: 1 KVM internal error. Suberror: 1 extra data[0]: 0x0000000000000000 extra data[1]: 0x0000000000000400 extra data[2]: 0x0000000100000014 ...... And I can't reproduce this issue with upstream qemu with your patches on EPYC Milan host: # ./qemu-system-x86_64 -overcommit cpu-pm=on -m 8G -smp 8 -monitor stdio -cpu host -machine accel=kvm QEMU 7.1.93 monitor - type 'help' for more information (qemu) VNC server running on ::1:5900 (qemu) I will change the component of this bug to qemu now. Feel free to change it back if you have a different advice, thanks. Best regards Nana Liu Still waiting that the patches land upstream After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. Can reproduce this issue with qemu-kvm-7.2.0-4.el9.x86_64. Can't reproduce this issue with qemu-kvm-7.2.0-5.el9.x86_64. # /usr/libexec/qemu-kvm -overcommit cpu-pm=on -m 8G -smp 8 -monitor stdio -cpu host -machine accel=kvm -M q35 QEMU 7.2.0 monitor - type 'help' for more information (qemu) VNC server running on ::1:5900 Move this bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2162 |