Bug 1932279
Summary: | [aarch64] qemu core dumped when using smmuv3 and iommu_platform enabling at virtio-gpu-pci | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Yihuang Yu <yihyu> |
Component: | qemu-kvm | Assignee: | Eric Auger <eric.auger> |
qemu-kvm sub component: | Devices | QA Contact: | Yihuang Yu <yihyu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | drjones, eric.auger, jinzhao, juzhang, lcapitulino, qzhang, virt-maint, zhenyzha |
Version: | 8.4 | Keywords: | Reopened, Triaged |
Target Milestone: | rc | ||
Target Release: | 8.5 | ||
Hardware: | aarch64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-6.0.0-22.module+el8.5.0+11695+95588379 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-16 07:51:47 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1885765 |
Description
Yihuang Yu
2021-02-24 12:39:51 UTC
Hi Yihuang, On my side, just adding "-device virtio-gpu-pci,bus=pcie.1,addr=0x0,iommu_platform=on" to my usual command line, I get the following qemu warnings, witnessing something wrong happens: smmuv3-iommu-memory-region-0-0 translation failed for iova=0xfab10000(no recorded event) smmuv3-iommu-memory-region-0-0 translation failed for iova=0xfab10000(no recorded event) smmuv3-iommu-memory-region-0-0 translation failed for iova=0xfab11000(no recorded event) qemu-system-aarch64: virtio: bogus descriptor or out of resources This is with an upstream qemu. After applying the inflight series "PATCH v2 0/7] Some vIOMMU fixes" I don't get those traces anymore. Hope this will fix the issue. Do you have means to test with an upstream qemu including that series? https://www.mail-archive.com/qemu-devel@nongnu.org/msg785650.html Otherwise we will need to wait for 8.5 Thanks Eric Hum, it seems that they are different problems. Eric, I cloned your qemu repository and rebase to the latest upstream code, but the qemu still crash. Steps: git clone https://github.com/eauger/qemu.git cd qemu/ git remote add upstream https://git.qemu.org/git/qemu.git git checkout viommu_fixes_for_6-v2 git pull upstream master --rebase ./configure --target-list=aarch64-softmmu make -j 128 MALLOC_PERTURB_=1 /home/qemu/build/qemu-system-aarch64 \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \ -blockdev node-name=file_aavmf_vars,driver=file,filename=/var/lib/avocado/data/avocado-vt/images/rhel840-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \ -machine virt,gic-version=host,iommu=smmuv3,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0,iommu_platform=on \ -m 8192 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'host' \ -serial unix:'/var/tmp/serial-serial0',server,nowait \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel840-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -enable-kvm \ qemu-system-aarch64: ../util/iov.c:59: iov_to_buf_full: Assertion `offset == 0' failed. Aborted (core dumped) Hi Yihuang, I have identified another issue related to SMMU range invalidation emulation using new avocado-qemu tests. I sent a fix yesterday: [PATCH] hw/arm/smmuv3: Another range invalidation fix. Would you have means/time to check if this patch fixes the observed issue? This might be related. Thank you in advance Best Regards Eric (In reply to Eric Auger from comment #3) > Hi Yihuang, > > I have identified another issue related to SMMU range invalidation emulation > using new avocado-qemu tests. I sent a fix yesterday: > [PATCH] hw/arm/smmuv3: Another range invalidation fix. > Would you have means/time to check if this patch fixes the observed issue? > This might be related. > > Thank you in advance > > Best Regards > > Eric Thank you, Eric I will try to build an upstream qemu with your patch and test again. Will update the test result later. Yihuang Update test result with the patch "hw/arm/smmuv3: Another range invalidation fix" Steps: git clone https://gitlab.com/qemu-project/qemu.git -b v6.0.0-rc5 cd qemu/ wget https://patchwork.ozlabs.org/project/qemu-devel/patch/20210421172910.11832-1-eric.auger@redhat.com/raw/ -O hw-arm-smmuv3-Another-range-invalidation-fix.diff git apply hw-arm-smmuv3-Another-range-invalidation-fix.diff ./configure --target-list=aarch64-softmmu make -j 64 MALLOC_PERTURB_=1 /home/qemu/build/qemu-system-aarch64 \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \ -blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel840-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \ -machine virt,gic-version=host,iommu=smmuv3,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0,iommu_platform=on \ -m 8192 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'host' \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0,iommu_platform=on \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel840-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -enable-kvm \ -serial stdio ...... ...... [ 4.260930] [drm] pci: virtio-gpu-pci detected at 0000:03:00.0 [ 4.262634] [drm] features: -virgl +edid [ 4.264716] [drm] number of scanouts: 1 [ 4.265828] [drm] number of cap sets: 0 [ 4.267576] [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0 [ 4.270192] random: fast init done qemu-system-aarch64: ../util/iov.c:59: iov_to_buf_full: Assertion `offset == 0' failed. Aborted (core dumped) Hum OK. I will try to reproduce again. Thanks! Using my qemu command line I cannot reproduce at the moment. Can you share the avocado test you are running. I will try to test at libvirt level instead. I can reproduce with libvirt by just keeping the SMMU on the virtio-gpu-pci and removing it on virtio-scsi-pci and virtio-net-pci. qemu-system-aarch64: ../util/iov.c:59: iov_to_buf_full: Assertion `offset == 0' failed. 2021-04-27 09:27:39.330+0000: shutting down, reason=crashed So it looks like the problem comes from a QEMU virtio-gpu-pci device bug. This latter does not work either with virtio-iommu. Since virtio-iommu is more debug friendly (as all maps are easily seen), here is what is observed with virtio-iommu with current code: A QEMU failure happens in virtio_gpu_create_mapping_iov at (*iov)[i].iov_base = dma_memory_map(VIRTIO_DEVICE(g)->dma_as, a, &len, DMA_DIRECTION_TO_DEVICE); if (!(*iov)[i].iov_base || len != l) { qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to map MMIO memory for" len != l is the reason. What is observed on guest is, in drm/virtio/virtgpu_object.c virtio_gpu_object_create virtio_gpu_object_shmem_init / use_dma_api dma_map_sgtable() succeeds dma_map_sgtable iommu_dma_map_sg (drivers/iommu/dma-iommu.c) iommu_dma_alloc_iova iommu_map_sg_atomic __iommu_map_sg __finalise_sg All the vIOMMU mappings are OK. The __finalise_sg reshuffle the SGs and possibly concatenate them even if they point to discontiguous GPAs. when the virtio_gpu_object_attach gets calls, it uses this concatenated sglist. Those are observed on QEMU side, IOVAs and lengths are correct. The problem shows up on dma_memory_map which takes the iova and the concatenated length. dma_memory_map address_space_map (softmmu/physmem.c) flatview_translate : OK flatview_extend_translation : NOK all the vIOMMU translations seem OK. flatview_extend_translation fails because of the guest SG concatenation, in the loop at some point the condition (xlat != base + done) becomes true. this is due to the fact SG #0 IOVA0, PA0, L0 SG #1 IOVA0 + LENGTH, PA1 with PA1 != PA0 + L0, L1 were concatenated into a single SG #0 IOVA0, L0 + L1, spanning a non contiguous range of PA So the fix consists in handling the dma_memory_map() in virtio_gpu_create_mapping_iov(), ie. deconcatenate the SGs. Hi Yihuang, Gerd just sent "[PATCH] virtio-gpu: handle partial maps properly" on the qemu upstream ML. In my case it fixed my issues, both with smmuv3 and virtio-iommu (also failed). Please could you check if it fixes the error in your setup. Please apply this fix on top of [PATCH] hw/arm/smmuv3: Another range invalidation fix Thank you in advance Best Regards Eric Hi Yihuang, Hum maybe not the right time yet to test on your side. I was able to reproduce with fed33. There is still something wrong besides those 2 fixes. I still get qemu-system-aarch64: ../util/iov.c:59: iov_to_buf_full: Assertion `offset == 0' failed. when the virtio-gpu-pci is instantiated. We progress but slowly :-( Also I added the virtio-gpu-pci to the new smmuv3 avocado-qemu acceptance tests and they don't pass either... Thanks Eric Eric, thank you for adding it to avocado-qemu acceptance. I think 'virtio-gpu-pci' + 'smmuv3' is not a common user scenario, so it’s acceptable that the progress is not so fast. Sorry for taking so long to reply. I still work on it in the prospect to fix it in 8.5 but given the slow progress let's remove the ITR. Does not seem to be a bug at qemu level. The issue rather seems to be at guest kernel level in the virtio-gpu driver. I opened a separate BZ on the kernel: BZ 1971821. Closing this bug on RHEL AV qemu-kvm *** This bug has been marked as a duplicate of bug 1971821 *** Let's reopen that bug as we need to backport 9049f8bc44 virtio-gpu: handle partial maps properly (6 weeks ago) <Gerd Hoffmann> to fix those errors [ 67.330862] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 67.540790] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 70.690482] virtio_gpu_dequeue_ctrl_func: 14 callbacks suppressed *** Bug 1974807 has been marked as a duplicate of this bug. *** Host kernel: 4.18.0-320.el8.aarch64 Guest kernel: 4.18.0-322.el8.aarch64 qemu version: qemu-kvm-6.0.0-23.module+el8.5.0+11740+35571f13.aarch64 Steps: 1. Launch a guest: MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \ -blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel850-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \ -machine virt,gic-version=host,iommu=smmuv3,memory-backend=mem-machine_mem,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0,iommu_platform=on \ -m 41984 \ -object memory-backend-ram,size=41984M,id=mem-machine_mem \ -smp 22,maxcpus=22,cores=11,threads=1,sockets=2 \ -cpu 'host' \ -chardev socket,server=on,id=qmp_id_qmpmonitor1,wait=off,path=/tmp/avocado__286yk02/monitor-qmpmonitor1-20210709-043744-r6oGdJ9k \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,server=on,id=qmp_id_catch_monitor,wait=off,path=/tmp/avocado__286yk02/monitor-catch_monitor-20210709-043744-r6oGdJ9k \ -mon chardev=qmp_id_catch_monitor,mode=control \ -serial unix:'/tmp/avocado__286yk02/serial-serial0-20210709-043744-r6oGdJ9k',server=on,wait=off \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel850-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \ -device virtio-net-pci,mac=9a:9b:68:07:8b:37,rombar=0,id=idwSx0yR,netdev=idtH6vDY,bus=pcie-root-port-4,addr=0x0 \ -netdev tap,id=idtH6vDY,vhost=on,vhostfd=18,fd=14 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -enable-kvm \ 2. check the guest's dmesg. 2021-07-09 04:38:12: [ 3.624114] [drm] pci: virtio-gpu-pci detected at 0000:03:00.0 2021-07-09 04:38:12: [ 3.626023] [drm] features: -virgl +edid 2021-07-09 04:38:12: [ 3.628489] [drm] number of scanouts: 1 2021-07-09 04:38:12: [ 3.629739] [drm] number of cap sets: 0 2021-07-09 04:38:12: [ 3.631716] [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0 2021-07-09 04:38:13: [ 3.635046] Console: switching to colour frame buffer device 128x48 2021-07-09 04:38:13: [ 3.642515] virtio_gpu virtio0: [drm] fb0: virtio_gpudrmfb frame buffer device So move the bug status to 'VERIFIED'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4684 |