Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2151555

Summary: [virtio-iommu] "VFIO_MAP_DMA failed: File exists" errors when VFIO devices are in the same guest iommu group
Product: Red Hat Enterprise Linux 9 Reporter: Eric Auger <eric.auger>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Devices QA Contact: jinl
Status: CLOSED MIGRATED Docs Contact:
Severity: medium    
Priority: medium CC: alex.williamson, coli, jinl, jinzhao, juzhang, mst, peterx, virt-maint, yama, yanghliu, zhguo
Version: 9.2Keywords: MigratedToJIRA, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-07 14:29:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Auger 2022-12-07 13:53:30 UTC
With the setup described in BZ https://bugzilla.redhat.com/show_bug.cgi?id=2103649

where the 2 assigned PFs end up in the same iommu group on the guest (by mistake in that case due to the bug, but this can actually happen if the 2 assigned devices are pluged downstream to a PCIe to PCI bridge), we get spurious qemu errors:

2022-12-07T13:47:03.661326Z qemu-system-x86_64: VFIO_MAP_DMA failed: File exists
2022-12-07T13:47:03.661333Z qemu-system-x86_64: vfio_dma_map(0x556cc5e0acf0, 0xbfbd1000, 0x1000, 0x7f1692853000) = -17 (File exists)

This is due to a bug in the QEMU virtio-iommu device where the replay and remap function only do MAP without prior UNMAP, causing this attempt dma map onto an existing mapping.

Comment 1 Eric Auger 2022-12-07 13:58:23 UTC
Sent "[PATCH for 8.0 0/2] virtio-iommu: Fix Replay" upstream

Comment 2 jinl 2022-12-13 03:52:35 UTC
try to reproduce this bz with assigning 2 devices to PCIe to PCI bridge

qemu command
-device '{"driver":"virtio-iommu","id":"iommu0","bus":"pcie.0","addr":"0x4"}' \
-device '{"driver":"pcie-pci-bridge","id":"pci.16","bus":"pci.1","addr":"0x0"}' \
-device '{"driver":"vfio-pci","host":"0000:02:00.0","id":"hostdev0","bus":"pci.16","addr":"0x2"}' \
-device '{"driver":"vfio-pci","host":"0000:00:1a.0","id":"hostdev1","bus":"pci.16","addr":"0x1"}' \
...

check iommu group in vm
#lspci
01:00.0 PCI bridge: Red Hat, Inc. Device 000e
02:01.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
02:02.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe

#dmesg | grep iommu
[    2.080581] xhci_hcd 0000:04:00.0: Adding to iommu group 0
[    2.917698] pcieport 0000:00:02.0: Adding to iommu group 1
[    2.927124] pcieport 0000:00:02.1: Adding to iommu group 2
[    2.936803] pcieport 0000:00:02.2: Adding to iommu group 3
[    2.945757] pcieport 0000:00:02.3: Adding to iommu group 4
[    2.954896] pcieport 0000:00:02.4: Adding to iommu group 5
[    2.963906] pcieport 0000:00:02.5: Adding to iommu group 6
[    2.972958] pcieport 0000:00:02.6: Adding to iommu group 7
[    2.981013] pcieport 0000:00:02.7: Adding to iommu group 8
[    2.989058] pcieport 0000:00:03.0: Adding to iommu group 9
[    2.998632] pcieport 0000:00:03.1: Adding to iommu group 10
[    3.006949] pcieport 0000:00:03.2: Adding to iommu group 11
[    3.015260] pcieport 0000:00:03.3: Adding to iommu group 12
[    3.024007] pcieport 0000:00:03.4: Adding to iommu group 13
[    3.031627] pcieport 0000:00:03.5: Adding to iommu group 14
[    3.039167] pcieport 0000:00:03.6: Adding to iommu group 15
[    3.057450] virtio-pci 0000:00:01.0: Adding to iommu group 16
[    3.061114] virtio-pci 0000:03:00.0: Adding to iommu group 17
[    3.064687] virtio-pci 0000:05:00.0: Adding to iommu group 18
[    3.068482] virtio-pci 0000:06:00.0: Adding to iommu group 19
[    3.072032] virtio-pci 0000:08:00.0: Adding to iommu group 20
[    3.076255] virtio-pci 0000:09:00.0: Adding to iommu group 21
[    4.110884] ahci 0000:00:1f.2: Adding to iommu group 22
[    6.740302] lpc_ich 0000:00:1f.0: Adding to iommu group 22
[    6.794388] i801_smbus 0000:00:1f.3: Adding to iommu group 22

The assigned devices and pcie-pci-bridge didn't in iommu group.But with intel_iommu, it can be added to the same group.

with intel_iommu

-device '{"driver":"intel-iommu","id":"iommu0","intremap":"on","caching-mode":true,"device-iotlb":true}' \
-device '{"driver":"pcie-pci-bridge","id":"pci.16","bus":"pci.1","addr":"0x0"}' \
-device '{"driver":"vfio-pci","host":"0000:02:00.0","id":"hostdev0","bus":"pci.16","addr":"0x1"}' \
-device '{"driver":"vfio-pci","host":"0000:00:1a.0","id":"hostdev1","bus":"pci.16","addr":"0x2"}' \
...

#lspci
01:00.0 PCI bridge: Red Hat, Inc. Device 000e
02:01.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:02.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)

[    2.428317] pci 0000:01:00.0: Adding to iommu group 25
[    2.429055] pci 0000:02:01.0: Adding to iommu group 25
[    2.429800] pci 0000:02:02.0: Adding to iommu group 25

Eric, can you help to check the behavior? Thanks.

Comment 3 Eric Auger 2022-12-21 16:29:03 UTC
(In reply to jinl from comment #2)
> try to reproduce this bz with assigning 2 devices to PCIe to PCI bridge
> 
> qemu command
> -device
> '{"driver":"virtio-iommu","id":"iommu0","bus":"pcie.0","addr":"0x4"}' \
> -device
> '{"driver":"pcie-pci-bridge","id":"pci.16","bus":"pci.1","addr":"0x0"}' \
> -device
> '{"driver":"vfio-pci","host":"0000:02:00.0","id":"hostdev0","bus":"pci.16",
> "addr":"0x2"}' \
> -device
> '{"driver":"vfio-pci","host":"0000:00:1a.0","id":"hostdev1","bus":"pci.16",
> "addr":"0x1"}' \
> ...
> 
> check iommu group in vm
> #lspci
> 01:00.0 PCI bridge: Red Hat, Inc. Device 000e
> 02:01.0 USB controller: Intel Corporation C610/X99 series chipset USB
> Enhanced Host Controller #2 (rev 05)
> 02:02.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> BCM5720 Gigabit Ethernet PCIe
> 
> #dmesg | grep iommu
> [    2.080581] xhci_hcd 0000:04:00.0: Adding to iommu group 0
> [    2.917698] pcieport 0000:00:02.0: Adding to iommu group 1
> [    2.927124] pcieport 0000:00:02.1: Adding to iommu group 2
> [    2.936803] pcieport 0000:00:02.2: Adding to iommu group 3
> [    2.945757] pcieport 0000:00:02.3: Adding to iommu group 4
> [    2.954896] pcieport 0000:00:02.4: Adding to iommu group 5
> [    2.963906] pcieport 0000:00:02.5: Adding to iommu group 6
> [    2.972958] pcieport 0000:00:02.6: Adding to iommu group 7
> [    2.981013] pcieport 0000:00:02.7: Adding to iommu group 8
> [    2.989058] pcieport 0000:00:03.0: Adding to iommu group 9
> [    2.998632] pcieport 0000:00:03.1: Adding to iommu group 10
> [    3.006949] pcieport 0000:00:03.2: Adding to iommu group 11
> [    3.015260] pcieport 0000:00:03.3: Adding to iommu group 12
> [    3.024007] pcieport 0000:00:03.4: Adding to iommu group 13
> [    3.031627] pcieport 0000:00:03.5: Adding to iommu group 14
> [    3.039167] pcieport 0000:00:03.6: Adding to iommu group 15
> [    3.057450] virtio-pci 0000:00:01.0: Adding to iommu group 16
> [    3.061114] virtio-pci 0000:03:00.0: Adding to iommu group 17
> [    3.064687] virtio-pci 0000:05:00.0: Adding to iommu group 18
> [    3.068482] virtio-pci 0000:06:00.0: Adding to iommu group 19
> [    3.072032] virtio-pci 0000:08:00.0: Adding to iommu group 20
> [    3.076255] virtio-pci 0000:09:00.0: Adding to iommu group 21
> [    4.110884] ahci 0000:00:1f.2: Adding to iommu group 22
> [    6.740302] lpc_ich 0000:00:1f.0: Adding to iommu group 22
> [    6.794388] i801_smbus 0000:00:1f.3: Adding to iommu group 22
> 
> The assigned devices and pcie-pci-bridge didn't in iommu group.But with
> intel_iommu, it can be added to the same group.
This does not look normal. Would it be possible to get access to the machine?

Thanks

Eric

Comment 4 jinl 2022-12-22 01:46:05 UTC
(In reply to Eric Auger from comment #3)
> This does not look normal. Would it be possible to get access to the machine?
> 
> Thanks
> 
> Eric

Hi Eric,

Host IP: 10.73.224.223
Host passwd: kvmautotest
Guest name: rhel
Guest passwd: redhat

I also install a vm with intel_iommu named intel, you can compare the behavior, the passwd is the same.

Thanks.

Comment 5 Eric Auger 2022-12-22 16:59:41 UTC
Thank you for the machine coordinates. Looking at it ...

Comment 6 Eric Auger 2022-12-22 18:15:06 UTC
Hello,

I can reproduce on my ARM machine. I don't want to prevent you from using your machine. I will try to fix on mine. Nevertheless please keep your setup in mind as I may need it in short term to handle the primary BZ topic (-EEXIST replay issue). I will create a separate BZ for the group issue, against qemu.

Thanks

Eric

Comment 7 jinl 2022-12-23 02:18:58 UTC
(In reply to Eric Auger from comment #6)
> Hello,
> 
> I can reproduce on my ARM machine. I don't want to prevent you from using
> your machine. I will try to fix on mine. Nevertheless please keep your setup
> in mind as I may need it in short term to handle the primary BZ topic
> (-EEXIST replay issue). I will create a separate BZ for the group issue,
> against qemu.
> 
> Thanks
> 
> Eric

File a Bug 2155954 to track the iommu group issue. Thanks.

Comment 8 Eric Auger 2023-02-02 17:17:54 UTC
Moving the ITR to 9.3 since the fix is not upstream yet and I have difficulties to set up a reproducer (pcie to pci bridge not working with virtio-iommu)

Comment 9 John Ferlan 2023-07-18 12:23:21 UTC
Eric - Any update on this - is there an upstream fix or should/can we just move this to the backlog until there is

Comment 10 Eric Auger 2023-07-18 18:26:37 UTC
(In reply to John Ferlan from comment #9)
> Eric - Any update on this - is there an upstream fix or should/can we just
> move this to the backlog until there is
Hi John, yes this is rather low priority at the moment. I reset the ITR and the assignee. Hope this is sufficient.