Bug 1972795

Summary: [kernel] Support virtio-iommu
Product: Red Hat Enterprise Linux 9 Reporter: Eric Auger <eric.auger>
Component: kernelAssignee: Eric Auger <eric.auger>
kernel sub component: Virtualization QA Contact: Yihuang Yu <yihyu>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: medium CC: drjones, gshan, hkrzesin, jinzhao, jsnitsel, juzhang, lcapitulino, mst, qzhang, yihyu, zhenyzha
Version: 9.0Keywords: Triaged
Target Milestone: beta   
Target Release: 9.0 Beta   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-5.14.0-0.rc6.46.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-03 21:15:19 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1477099    

Description Eric Auger 2021-06-16 16:06:11 UTC
This BZ to track the enablement of the virtio-iommu driver in RHEL9 aarch64 guest and the backport of the ACPI VIOT integration for both aarch64 and x86-64. Primary goal is to support it on aarch64 because this is currently the only full vIOMMU solution on ARM (despite the perf may not be good enough for dynamic mappings) supporting vhost and VFIO integration.

Enablement on x86 needs to be discussed.

Comment 1 Eric Auger 2021-06-28 08:00:07 UTC
[PATCH v5 0/5] Add support for ACPI VIOT 
Applied for v5.14 by Joerg

Comment 5 Yihuang Yu 2021-07-30 08:24:07 UTC
The full test results also require the support of qemu part(bug 1477099), but it isn't ready, so I compiled an upstream qemu for test.

# /home/qemu/build/qemu-system-aarch64 -version
QEMU emulator version 6.0.91 (v6.1.0-rc1-26-g768832575d)

Launch a guest with iommu=none in machine type, and append virtio-iommu-pci, the guest can be started without any error.

/home/qemu/build/qemu-system-aarch64 \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \
    -blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel900-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \
    -machine virt,gic-version=host,kernel-irqchip=on,iommu=none,memory-backend=mem-machine_mem,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars,accel=kvm,acpi=off \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device virtio-iommu-pci,bus=pcie-root-port-1,addr=0x0 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-gpu-pci,bus=pcie-root-port-2,addr=0x0,iommu_platform=on \
    -m 14336 \
    -object memory-backend-ram,size=14336M,id=mem-machine_mem  \
    -smp 112,maxcpus=112,cores=56,threads=1,sockets=2  \
    -cpu 'host' \
    -serial unix:'/tmp/serial-serial0',server=on,wait=off \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-3,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-4,addr=0x0,iommu_platform=on \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
    -device virtio-net-pci,mac=9a:d2:d4:21:90:5f,rombar=0,id=iddMtzGu,netdev=idsFl6t5,bus=pcie-root-port-5,addr=0x0,iommu_platform=on  \
    -netdev tap,id=idsFl6t5,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew \
    -enable-kvm \
    -monitor stdio

# dmesg | grep iommu
[   14.412693] iommu: Default domain type: Translated 
[   28.093230] virtio_iommu virtio0: input address: 64 bits
[   28.095519] virtio_iommu virtio0: page mask: 0xfffffffffffff000


But if launch guest with iommu=smmuv3, the guest console always stay at the stage of printing the following log:
[   27.480871] arm_smmu_evtq_thread: 25 callbacks suppressed
[   27.480878] arm-smmu-v3 9050000.smmuv3: event 0x02 received:
[   27.483617] arm-smmu-v3 9050000.smmuv3:  0x0000050000000002
[   27.484984] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.486350] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.487719] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.489094] arm-smmu-v3 9050000.smmuv3: event 0x02 received:
[   27.490540] arm-smmu-v3 9050000.smmuv3:  0x0000050000000002
[   27.492085] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.494638] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.496927] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.498309] arm-smmu-v3 9050000.smmuv3: event 0x02 received:
[   27.499710] arm-smmu-v3 9050000.smmuv3:  0x0000050000000002
[   27.501146] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.502531] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.503899] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.505276] arm-smmu-v3 9050000.smmuv3: event 0x02 received:
[   27.506668] arm-smmu-v3 9050000.smmuv3:  0x0000050000000002
[   27.508038] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.509425] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000
[   27.510841] arm-smmu-v3 9050000.smmuv3:  0x0000000000000000

and 

[  259.243252] INFO: task systemd-udevd:1178 blocked for more than 122 seconds.
[  259.245728]       Tainted: G               X --------- ---  5.14.0-0.rc3.29.el9.aarch64 #1
[  259.248488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  259.251179] task:systemd-udevd   state:D stack:    0 pid: 1178 ppid:  1122 flags:0x00000a09
[  259.253980] Call trace:
[  259.255335]  __switch_to+0xd8/0x114
[  259.256926]  __schedule+0x1f0/0x50c
[  259.258499]  schedule+0x4c/0xd0
[  259.259988]  async_synchronize_cookie_domain+0xe0/0x14c
[  259.262062]  async_synchronize_full+0x24/0x30
[  259.263860]  do_init_module+0x210/0x280
[  259.265535]  load_module+0x9b0/0xb40
[  259.267126]  __do_sys_init_module+0xe8/0x160
[  259.268881]  __arm64_sys_init_module+0x28/0x34
[  259.270702]  invoke_syscall.constprop.0+0x58/0xf0
[  259.272560]  el0_svc_common.constprop.0+0x160/0x164
[  259.274455]  do_el0_svc+0x34/0xcc
[  259.275974]  el0_svc+0x2c/0x90
[  259.277405]  el0t_64_sync_handler+0xa4/0x130
[  259.279130]  el0t_64_sync+0x198/0x19c

Comment 6 Yihuang Yu 2021-07-30 08:29:29 UTC
Hi Eric,

Can you help review comment 5? I am not sure if I missed something or the upstream qemu is not ready yet.
Are basic iommu tests sufficient to verify this kernel bug?

Thank you in advance
Yihuang Yu

Comment 9 Eric Auger 2021-08-02 16:21:23 UTC
Reverting the state to ASSIGNED as I should set the CONFIG to "y" instead of "m"

Comment 16 Yihuang Yu 2021-08-27 01:42:58 UTC
Based on comment 14, I will move the bug to VERIFIED. More test cases will be run after qemu is supported.

Hello Eric,
Please help to test your scenario as well, thanks.