Bug 1575526

Summary: vSMMUv3 Support
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Eric Auger <eric.auger>
Component: libvirtAssignee: Andrea Bolognani <abologna>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: abologna, drjones, dyuan, hpopal, jdenemar, jsuchane, knoel, lcapitulino, mtessun, mzhan, ptoscano, wei, xuzhang
Target Milestone: rcKeywords: FutureFeature, OtherQA
Target Release: 8.0   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-5.5.0-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-11 11:01:01 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1430408, 1713735    
Bug Blocks: 1543699, 1677408    

Description Eric Auger 2018-05-07 08:09:24 UTC
virtual IOMMU support for ARM virt machine is looming. On ARM we emulate the SMMU v3 (More background can be found at http://events17.linuxfoundation.org/sites/events/files/slides/viommu_arm.pdf).

The SMMUv3 gets instantiated using a QEMU machine option:
-machine virt,iommu=smmuv3

The Intel "-device" solution was rejected by Peter.

We would need libvirt to support its instantiaton.

As opposed to Intel's iommu, the smmuv3 will not support integration with VFIO devices (map notifiers are not supported). This means that as soon as we attempt to make VFIO devices and SMMUv3 co-exist, the scenario should be rejected if possible (ie. the VFIO device will not function).

Integration with vhost-net should be supported. There is 1 patch submitted upstream to reach that goal but not yet upstreamed at the moment of this writing.

Comment 17 Andrea Bolognani 2019-05-28 15:33:17 UTC
Patches posted upstream.

  https://www.redhat.com/archives/libvir-list/2019-May/msg00813.html

Comment 20 Andrea Bolognani 2019-06-03 16:54:51 UTC
Basic support for SMMUv3 pushed upstream.

  commit 1462881f4ec80c27e4803098d0a8b3f8a98b0ff0
  Author: Andrea Bolognani <abologna>
  Date:   Tue May 28 14:18:15 2019 +0200

    qemu: Format SMMUv3 IOMMU

    https://bugzilla.redhat.com/show_bug.cgi?id=1575526

    Signed-off-by: Andrea Bolognani <abologna>
    Reviewed-by: Ján Tomko <jtomko>

  v5.4.0-27-g1462881f4e

Right our validation will only catch for very obvious issues, such
as attempting to use SMMUv3 on a non-aarch64, non-virt guest or with
a QEMU binary that doesn't support the feature.

Eric, can you think of additional checks we might want to perform?

One thing that came to my mind while working on this, for example, is
whether SMMUv3 requires GICv3 or can work with GICv2 guests as well.

Note that I'm mostly interested in SMMUv3-specific checks at the
moment: as we've already discussed a few months back, there is a lot
of room for improvement when it comes to validating IOMMU setups in
libvirt that also applies to Intel IOMMU, but I'd like to address
that as part of a separate effort.

Comment 21 Eric Auger 2019-06-03 17:31:08 UTC
SMMUv3 can be instantiated in machvirt in both kvm accelerated and TCG mode. It is compatible with GICv2 and GICv3. In kvm accelerated mode, I use to test it with virtio-net (w and w/o vhost) and virtio-block pci devices making sure 1) it boots, 2) the groups are created on guest side and 3) the NIC works fine without producing translation errors and doing some real network traffic. It would make sense to test with different guest flavors, varying the page size used for dma mapping. Then I would consider those test variants: hot-plug/hot-unplug of the virtio-device, system reboot, save/restore.

Comment 22 Andrea Bolognani 2019-06-05 12:26:05 UTC
(In reply to Eric Auger from comment #21)
> SMMUv3 can be instantiated in machvirt in both kvm accelerated and TCG mode.
> It is compatible with GICv2 and GICv3. In kvm accelerated mode, I use to
> test it with virtio-net (w and w/o vhost) and virtio-block pci devices
> making sure 1) it boots, 2) the groups are created on guest side and 3) the
> NIC works fine without producing translation errors and doing some real
> network traffic. It would make sense to test with different guest flavors,
> varying the page size used for dma mapping. Then I would consider those test
> variants: hot-plug/hot-unplug of the virtio-device, system reboot,
> save/restore.

Cool, it sounds like we don't need to introduce additional
SMMUv3-specific validation then.

Do you want me to prepare a scratch build of libvirt that includes
SMMUv3 support so you can test the scenarios mentioned above?

Comment 23 Eric Auger 2019-06-05 12:38:16 UTC
Yes please. I will test it.

Comment 25 Andrea Bolognani 2019-06-06 10:10:37 UTC
To test the feature, simply add

  <iommu model='smmuv3'/>

to your guest configuration.

Comment 26 Eric Auger 2019-06-09 16:58:05 UTC
scratch build tested along with 8.1 machine type on QC Amberwing with a Rhel8 guest. After adding the xml elements:

<iommu model='smmuv3'/> in the devices section + <driver iommu='on'/> for the NIC, SCSI controller, virtio-serial I can see in the QEMU cmd line:

-machine virt-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off,gic-version=3,iommu=smmuv3
-device virtio-scsi-pci,iommu_platform=on,id=scsi0,bus=pci.3,addr=0x0
-device virtio-serial-pci,id=virtio-serial0,iommu_platform=on,bus=pci.4,addr=0x0
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:62:88,bus=pci.1,addr=0x0,iommu_platform=on

Guest boots properly and the NIC works fine.

Just a question, with stdalone upstream qemu I use to utilize the following add-on options for my virtio-pci devices.

",iommu_platform,disable-modern=off,disable-legacy=on". I guess on downstream "disable-modern=off,disable-legacy=on" are not requested as defaults?

Played with save/restore, reboot, ... Did not notice any issue.

Thanks

Eric

Comment 27 Andrea Bolognani 2019-06-10 07:40:02 UTC
(In reply to Eric Auger from comment #26)
> scratch build tested along with 8.1 machine type on QC Amberwing with a
> Rhel8 guest. After adding the xml elements:
> 
> <iommu model='smmuv3'/> in the devices section + <driver iommu='on'/> for
> the NIC, SCSI controller, virtio-serial I can see in the QEMU cmd line:
> 
> -machine
> virt-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off,gic-version=3,
> iommu=smmuv3
> -device virtio-scsi-pci,iommu_platform=on,id=scsi0,bus=pci.3,addr=0x0
> -device
> virtio-serial-pci,id=virtio-serial0,iommu_platform=on,bus=pci.4,addr=0x0
> -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:62:88,bus=pci.1,
> addr=0x0,iommu_platform=on
> 
> Guest boots properly and the NIC works fine.
> 
> Played with save/restore, reboot, ... Did not notice any issue.

Neat!

> Just a question, with stdalone upstream qemu I use to utilize the following
> add-on options for my virtio-pci devices.
> 
> ",iommu_platform,disable-modern=off,disable-legacy=on". I guess on
> downstream "disable-modern=off,disable-legacy=on" are not requested as
> defaults?

That's not an upstream vs downstream difference, it's just that
libvirt will (unless instructed otherwise) always put VirtIO devices
in PCI Express slots, which in turn results in them showing up as
non-transitional.

If you have a very recent libvirt (upstream 5.2.0 or one where the
feature has been backported as part of Bug 1614127) you can even tell
it explicitly that you want VirtIO devices to be non-transitional by
using model='virtio-non-transitional' instead of model='virtio' (see
https://libvirt.org/formatdomain.html#elementsVirtioTransitional for
more documentation), which will result in the very same options as
you have on your command line being used. But again, you shouldn't
even need to do that :)

Comment 28 Eric Auger 2019-06-10 16:09:42 UTC
I just tested with a host PCI device assigned to the guest. This is not supposed to work as the SMMUv3 is not currently integrated with VFIO.
As expected I can find in the QEMU log:

-device vfio-pci,host=0004:01:00.0,id=hostdev0,bus=pci.1,addr=0x0:
warning: SMMUv3 does not support notification on MAP: device vfio-pci
will not function properly"

However the guest does not boot properly. virsh presents the VM as "paused" whereas the guest is expected to boot properly.
I checked as qemu level and it appears the same warning is emitted and the guest also seems to be stalled. I identified the issue
and this is due to the fact memory_region_iommu_replay() gets called by VFIO and the default implementation gets called. This loops
on the whole guest address range and calls translate() on each page. As a result this freezes the guest. I will attempt to propose
a dummy custom implementation for the IOMMU associated callback and this should fix the issue.

Comment 29 Eric Auger 2019-06-11 14:32:38 UTC
wrt issue reported in Comment 28, Sent "[PATCH 0/2] ARM SMMUv3: Fix spurious notification errors and stall with vfio-pci" upstream.

Comment 30 Andrea Bolognani 2019-07-23 07:28:41 UTC
(In reply to Eric Auger from comment #29)
> wrt issue reported in Comment 28, Sent "[PATCH 0/2] ARM SMMUv3: Fix spurious
> notification errors and stall with vfio-pci" upstream.

Eric,

what's the status of that series? Do you have a separate bug tracking
it?

Do you think that, based on your previous testing, we can mark this
bug as MODIFIED since the libvirt part was AFAICT working as expected
and the only issue you encountered was in QEMU?

Comment 31 Eric Auger 2019-07-23 07:50:55 UTC
About the status of the series, I answered separately in the associated BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1693193#c3

Yes I think we can change the status to POST? as per discussion in https://bugzilla.redhat.com/show_bug.cgi?id=1713735#c11. The stall with VFIO was the only issue I discovered during the testing and we can simply document this is not supported at the moment. The spurious notifications were discovered in another development context.

Comment 35 Min Zhan 2019-12-10 02:43:53 UTC
Move to Verified based on Eric's test and comments.