Bug 2050175
Summary: | VM with q35, maxcpus=256 and two host devices from the same IOMMU group cannot be started | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Milan Zamazal <mzamazal> | ||||
Component: | qemu-kvm | Assignee: | Amnon Ilan <ailan> | ||||
qemu-kvm sub component: | Devices | QA Contact: | Yanghang Liu <yanghliu> | ||||
Status: | CLOSED NOTABUG | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | unspecified | CC: | ahadas, alex.williamson, chayang, coli, gilboad, imammedo, jinzhao, juzhang, mst, virt-maint, yanghliu, ymankad | ||||
Version: | 8.6 | ||||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2022-05-12 17:24:44 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 2048429, 2081241 | ||||||
Attachments: |
|
Description
Milan Zamazal
2022-02-03 12:15:31 UTC
Hi Milan,
It seems to me that this bug should be a invalid bug.
The intel iommu device and two PFs (which are in the same iommu group) are conflict with each other for address spaces part, which will block the vm to be started.
> -machine pc-q35-rhel8.4.0,usb=off,dump-guest-core=off,kernel_irqchip=split,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,graphics=off \
> -device intel-iommu,intremap=on,caching-mode=on,eim=on \
> -device vfio-pci,host=0000:09:00.0,id=ua-21dbb711-7f4f-4958-b6da-f9f052587e6d,bus=pci.4,addr=0x0 \
> -device vfio-pci,host=0000:09:00.1,id=ua-b0231a3f-07ad-442c-9308-62a0fefc9b52,bus=pci.5,addr=0x0 \
The vm is likely to be started successfully after you remove the intel iommu device or the two PFs(in the same iommu group) in the vm configuration.
Can you help retest and confirm it in your test environment ?
The folowing bugs are likely to have the same root cause with this bug. Bug 1619739 - [RFE] vfio non-singleton group + viommu support Bug 1627499 - [RFE] Account for AddressSpace aliases due to conventional PCI buses Bug 1715724 - The same iommu_group NICs can not be assigned to a Win2019 guest at the same time Hi Yanghang, thank you for explanation. It indeed looks like the same cause: We add IOMMU when the maximum number of vCPUs >= 256, which is when the problem occurs if there are additionally two devices in the same IOMMU group. IOMMU is required for max vCPUs >= 256. That means the number of vCPUs is limited when there are multiple devices in the same IOMMU group. I don't see an obvious connection with maxcpus. Perhaps the best person to look into it is the one who might know more about vfio&co. CCing Alex. As YangHang correctly identifies, the addition of the vIOMMU at >=256 vCPU introduces multiple address spaces for devices, making this configuration invalid. If we were only to assign one device from the IOMMU group, this would be a valid configuration, for instance the GPU could be installed without the audio function. Alternatively we'd need to move to a guest PCI topology that doesn't requirement multiple device address spaces. This can be accomplished with a pcie-to-pci bridge device. All devices on the conventional PCI side of the bridge share an address space, therefore the restriction of a single address space within an IOMMU group is satisfied. (In reply to Amnon Ilan from comment #10) > Alex, Is it related to bug#1619734? No, this is an isolation and address space issue, not an accounting issue. I wish it were an accounting issue, upstream work with iommufd should eventually fix that, though the timeline is not insignificant. > How do you see the next steps here? IOMMU groups represent the smallest unit of isolation for device assignment. In some cases devices are grouped together because the IOMMU cannot distinguish separate devices, in other cases it's because we cannot conclusively determine that untranslated DMA between the devices is prevented. The former is often a topology issue on the host, for instance host devices on a conventional PCI bus all use the same requester ID. This case cannot be solved, the devices necessarily share an IOV address space. The latter case is more common for multi-function devices or the result of host interconnect device which might allow redirection, ie. root ports and switches. For these cases we recommend system, interconnect, and device vendors to support PCIe Access Control Services (ACS) which allows the OS to identify, and in some cases control, the isolation. For existing hardware, our only option is to consult with the hardware vendor to determine whether equivalent isolation exists in routing or between functions and add software quirks to the kernel to expose the inherent isolation via smaller groups. Upstream work largely focuses on singleton groups, which is where we expect hardware designed for these sorts of use cases to converge. So while there might be some opportunity to create separate address spaces with a group using the developments in iommufd, I don't necessarily expect that to be a focus. The path forward here is to handle multi-devices groups on a case by case basis, identify whether the grouping is the result of the system, the interconnects, or the device itself and work with the appropriate partner to determine if an isolation quirk is appropriate. Meanwhile, recommend devices and systems that don't have such issues to customers. Prior to this requirement to support vIOMMU for large numbers of vCPUs, I think the worlds of VMs with both assigned GPUs and vIOMMU didn't often cross paths. We've already worked actively to provide quirks for many NIC devices. Hi Milan May I ask if your have any other concerns for this bug ? Is it ok for you that we close this bug ? (In reply to Yanghang Liu from comment #12) > Is it ok for you that we close this bug ? Hi Yanghang, as I understand the explanations above, there is no single solution and such problems must be handled case by case. In such a case, let's close the bug. If there is a specific case in future we need to handle then we'll open a separate bug. Closing this bug as NOTABUG based on Milan's comment #c13 |