virtual IOMMU support for ARM virt machine is looming. On ARM we emulate the SMMU v3 (More background can be found at http://events17.linuxfoundation.org/sites/events/files/slides/viommu_arm.pdf). The SMMUv3 gets instantiated using a QEMU machine option: -machine virt,iommu=smmuv3 The Intel "-device" solution was rejected by Peter. We would need libvirt to support its instantiaton. As opposed to Intel's iommu, the smmuv3 will not support integration with VFIO devices (map notifiers are not supported). This means that as soon as we attempt to make VFIO devices and SMMUv3 co-exist, the scenario should be rejected if possible (ie. the VFIO device will not function). Integration with vhost-net should be supported. There is 1 patch submitted upstream to reach that goal but not yet upstreamed at the moment of this writing.
Patches posted upstream. https://www.redhat.com/archives/libvir-list/2019-May/msg00813.html
Basic support for SMMUv3 pushed upstream. commit 1462881f4ec80c27e4803098d0a8b3f8a98b0ff0 Author: Andrea Bolognani <abologna> Date: Tue May 28 14:18:15 2019 +0200 qemu: Format SMMUv3 IOMMU https://bugzilla.redhat.com/show_bug.cgi?id=1575526 Signed-off-by: Andrea Bolognani <abologna> Reviewed-by: Ján Tomko <jtomko> v5.4.0-27-g1462881f4e Right our validation will only catch for very obvious issues, such as attempting to use SMMUv3 on a non-aarch64, non-virt guest or with a QEMU binary that doesn't support the feature. Eric, can you think of additional checks we might want to perform? One thing that came to my mind while working on this, for example, is whether SMMUv3 requires GICv3 or can work with GICv2 guests as well. Note that I'm mostly interested in SMMUv3-specific checks at the moment: as we've already discussed a few months back, there is a lot of room for improvement when it comes to validating IOMMU setups in libvirt that also applies to Intel IOMMU, but I'd like to address that as part of a separate effort.
SMMUv3 can be instantiated in machvirt in both kvm accelerated and TCG mode. It is compatible with GICv2 and GICv3. In kvm accelerated mode, I use to test it with virtio-net (w and w/o vhost) and virtio-block pci devices making sure 1) it boots, 2) the groups are created on guest side and 3) the NIC works fine without producing translation errors and doing some real network traffic. It would make sense to test with different guest flavors, varying the page size used for dma mapping. Then I would consider those test variants: hot-plug/hot-unplug of the virtio-device, system reboot, save/restore.
(In reply to Eric Auger from comment #21) > SMMUv3 can be instantiated in machvirt in both kvm accelerated and TCG mode. > It is compatible with GICv2 and GICv3. In kvm accelerated mode, I use to > test it with virtio-net (w and w/o vhost) and virtio-block pci devices > making sure 1) it boots, 2) the groups are created on guest side and 3) the > NIC works fine without producing translation errors and doing some real > network traffic. It would make sense to test with different guest flavors, > varying the page size used for dma mapping. Then I would consider those test > variants: hot-plug/hot-unplug of the virtio-device, system reboot, > save/restore. Cool, it sounds like we don't need to introduce additional SMMUv3-specific validation then. Do you want me to prepare a scratch build of libvirt that includes SMMUv3 support so you can test the scenarios mentioned above?
Yes please. I will test it.
To test the feature, simply add <iommu model='smmuv3'/> to your guest configuration.
scratch build tested along with 8.1 machine type on QC Amberwing with a Rhel8 guest. After adding the xml elements: <iommu model='smmuv3'/> in the devices section + <driver iommu='on'/> for the NIC, SCSI controller, virtio-serial I can see in the QEMU cmd line: -machine virt-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off,gic-version=3,iommu=smmuv3 -device virtio-scsi-pci,iommu_platform=on,id=scsi0,bus=pci.3,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,iommu_platform=on,bus=pci.4,addr=0x0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:62:88,bus=pci.1,addr=0x0,iommu_platform=on Guest boots properly and the NIC works fine. Just a question, with stdalone upstream qemu I use to utilize the following add-on options for my virtio-pci devices. ",iommu_platform,disable-modern=off,disable-legacy=on". I guess on downstream "disable-modern=off,disable-legacy=on" are not requested as defaults? Played with save/restore, reboot, ... Did not notice any issue. Thanks Eric
(In reply to Eric Auger from comment #26) > scratch build tested along with 8.1 machine type on QC Amberwing with a > Rhel8 guest. After adding the xml elements: > > <iommu model='smmuv3'/> in the devices section + <driver iommu='on'/> for > the NIC, SCSI controller, virtio-serial I can see in the QEMU cmd line: > > -machine > virt-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off,gic-version=3, > iommu=smmuv3 > -device virtio-scsi-pci,iommu_platform=on,id=scsi0,bus=pci.3,addr=0x0 > -device > virtio-serial-pci,id=virtio-serial0,iommu_platform=on,bus=pci.4,addr=0x0 > -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:62:88,bus=pci.1, > addr=0x0,iommu_platform=on > > Guest boots properly and the NIC works fine. > > Played with save/restore, reboot, ... Did not notice any issue. Neat! > Just a question, with stdalone upstream qemu I use to utilize the following > add-on options for my virtio-pci devices. > > ",iommu_platform,disable-modern=off,disable-legacy=on". I guess on > downstream "disable-modern=off,disable-legacy=on" are not requested as > defaults? That's not an upstream vs downstream difference, it's just that libvirt will (unless instructed otherwise) always put VirtIO devices in PCI Express slots, which in turn results in them showing up as non-transitional. If you have a very recent libvirt (upstream 5.2.0 or one where the feature has been backported as part of Bug 1614127) you can even tell it explicitly that you want VirtIO devices to be non-transitional by using model='virtio-non-transitional' instead of model='virtio' (see https://libvirt.org/formatdomain.html#elementsVirtioTransitional for more documentation), which will result in the very same options as you have on your command line being used. But again, you shouldn't even need to do that :)
I just tested with a host PCI device assigned to the guest. This is not supposed to work as the SMMUv3 is not currently integrated with VFIO. As expected I can find in the QEMU log: -device vfio-pci,host=0004:01:00.0,id=hostdev0,bus=pci.1,addr=0x0: warning: SMMUv3 does not support notification on MAP: device vfio-pci will not function properly" However the guest does not boot properly. virsh presents the VM as "paused" whereas the guest is expected to boot properly. I checked as qemu level and it appears the same warning is emitted and the guest also seems to be stalled. I identified the issue and this is due to the fact memory_region_iommu_replay() gets called by VFIO and the default implementation gets called. This loops on the whole guest address range and calls translate() on each page. As a result this freezes the guest. I will attempt to propose a dummy custom implementation for the IOMMU associated callback and this should fix the issue.
wrt issue reported in Comment 28, Sent "[PATCH 0/2] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci" upstream.
(In reply to Eric Auger from comment #29) > wrt issue reported in Comment 28, Sent "[PATCH 0/2] ARM SMMUv3: Fix spurious > notification errors and stall with vfio-pci" upstream. Eric, what's the status of that series? Do you have a separate bug tracking it? Do you think that, based on your previous testing, we can mark this bug as MODIFIED since the libvirt part was AFAICT working as expected and the only issue you encountered was in QEMU?
About the status of the series, I answered separately in the associated BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1693193#c3 Yes I think we can change the status to POST? as per discussion in https://bugzilla.redhat.com/show_bug.cgi?id=1713735#c11. The stall with VFIO was the only issue I discovered during the testing and we can simply document this is not supported at the moment. The spurious notifications were discovered in another development context.
Move to Verified based on Eric's test and comments.