After upgrading to CentOS 7.2 (from 7.1 installation) starting of VMs with attached DVB PCI devices failes. MSG: VM vdr is down with error. Exit message: Verbinden von PCI-Gerät '0000:00:1c.0' mit vfio-pci fehlgeschlagen: Kein passendes Gerät gefunden. This is partially german and means: VM vdr is down with error. Exit message: Connecting of PCI-Device '0000:00:1c.0' with vfio-pci failed: no device found Booting kernel 3.10.0-229.20.1.el7.x86_64 without any other changes works. Tested Kernel which trigger this Bug: 3.10.0-327.3.1.el7.x86_64 3.10.0-327.4.4.el7.x86_64 3.10.0-327.4.5.el7.x86_64 This Bug shows on two different machines. The first with Desktop chipset Z170T, the other is a Xeon with C226. Both with different DVB cards. The listed host devices, IOMMU numbers etc. are listed correctly, only booting the VM failes. I dont know which component is responsible for the 'not finding'.
Please supply outputs of following commands on both kernels: - lspci - virsh -r nodedev-list - vdsClient -s 0 hostdevListByCaps and /etc/vdsm/vdsm.log containing the information about VM start (the XML and libvirt error) from new kernel.
Created attachment 1121218 [details] lspci 229.20 kernel Z170
Created attachment 1121219 [details] lspci 327.4.5 kernel Z170
Created attachment 1121221 [details] virsh 229.20 kernel Z170
Created attachment 1121223 [details] virsh 327.4.5 kernel Z170
Created attachment 1121224 [details] vds 229.20 kernel Z170
Created attachment 1121225 [details] vds 327.4.5 kernel Z170
Created attachment 1121226 [details] vdsm.log startup VM with dvb PCI Card on Z170 kernel 327.4.5
I tested some other devices an amd graphics card works with 229 kernel and does not with 327. USB devices are working on both kernel. maybe this helps
Part of the issue is the fact that we assign the whole IOMMU group to a VM incl. root port/pci/pcie bridges. I'm not sure how that worked in the first scenario, but there could have been updates to Z170 isolation quirks. Our scenario will currently look like this: 02:00.0 Multimedia controller: Digital Devices GmbH Cine V7 (iommu group 6) -> (assuming it's the device) shares group with 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #8 (rev f1) (iommu group 6) -> both will be detached and assigned to a VM Alex, do you know of some notable change between those kernels? Why did the assignment with root port included work with older kernel?
(In reply to Martin Polednik from comment #10) > Part of the issue is the fact that we assign the whole IOMMU group to a VM > incl. root port/pci/pcie bridges. I'm not sure how that worked in the first > scenario, but there could have been updates to Z170 isolation quirks. > > Our scenario will currently look like this: > 02:00.0 Multimedia controller: Digital Devices GmbH Cine V7 (iommu group 6) > -> (assuming it's the device) shares group with > 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port > #8 (rev f1) > (iommu group 6) > -> both will be detached and assigned to a VM > > Alex, do you know of some notable change between those kernels? Why did the > assignment with root port included work with older kernel? It was a bug in older kernels that has since been fixed. See: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7c2e211f3c95b91912a92a8c6736343690042e2e The vfio-pci driver has never supported anything other than type 0 header devices (ie. endpoints). Binding bridges, including root ports, to vfio-pci can disconnect management drivers, hotplug drivers, and sometimes leaves the bridge in a disabled state where devices behind the bridge don't work. As the code shows, this was always the intention to prevent binding these devices, but a bug allowed it anyway.
current workaround: Use the lower kernel version for the time being OR try to use a group without bridge/root port in it.
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
In my cases i can not avoid bridge/root port. This means RHEL 7.1 kernel. I suggest to remove RHEL 7.2 from the supported list of ovirt 3.6 or at least a severe warning in vfio-pci cases.
Possible workaround, must be applied on each hypervisor targeted for VFIO assignment where the bug occurs: # script start cat << EOF > workaround.patch --- hostdev.py 2016-03-16 11:18:49.928014050 +0100 +++ /usr/share/vdsm/hostdev.py 2016-03-16 11:17:34.000000000 +0100 @@ -33,6 +33,15 @@ pass +def _pci_header_type(device_name): + with open('/sys/bus/pci/devices/{0}/config'.format( + name_to_pci_path(device_name)), 'rb') as f: + f.seek(0x0e) + header_type = ord(f.read(1)) & 0x7f + + return header_type + + def name_to_pci_path(device_name): return device_name[4:].replace('_', '.').replace('.', ':', 2) @@ -193,7 +202,7 @@ libvirt_device, device_params = _get_device_ref_and_params(device_name) capability = CAPABILITY_TO_XML_ATTR[device_params['capability']] - if capability == 'pci': + if capability == 'pci' and not _pci_header_type(device_name): try: iommu_group = device_params['iommu_group'] except KeyError: @@ -212,7 +221,7 @@ libvirt_device, device_params = _get_device_ref_and_params(device_name) capability = CAPABILITY_TO_XML_ATTR[device_params['capability']] - if capability == 'pci': + if capability == 'pci' and not _pci_header_type(device_name): try: iommu_group = device_params['iommu_group'] except KeyError: EOF patch /usr/share/vdsm/hostdev.py workaround.patch service vdsmd restart # script end
Created attachment 1141755 [details] workaround patch Also newer version of the patch that should work 3.6.0 through 3.6.3. Submitted as attachment due to possible malformation caused by copy-pasting patch around.
I tested the patch from comment #18 my platform: oVirt 3.6.4 , libvirt-1.2.17-13.el7_2.4, vdsm-4.17.23.2-1.el7 without patch kernel 3.10.0-229.20.1.el7.x86_64 -> works 3.10.0-327.13.1.el7.x86_64 -> failed with patch 3.10.0-229.20.1.el7.x86_64 -> works 3.10.0-327.13.1.el7.x86_64 -> works thx
Reassigned, workaround is missing. rhevm-3.6.6.2-0.1.el6 qemu-kvm-rhev-2.3.0-31.el7_2.11.x86_64 vdsm-4.17.28-0.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
would you provide more details? you shouldn't be able to assign a device which is not an end node anymore. Are you saying you still can?
When i add the dvb device it automatically add's the pcie root port (same iommu, grey listed). This works as before and is expected. This bug describes the problem that it could not find pci devices during startup (after kernel update). This happens with VGA Adapter, NIC, DVB device ... I think the problems mentioned in comment #10 and comment #11 are not related with this bug. The root ports are added through the web page, and yes it still works.
The oVirt 3.6.6 release notes lists this Bug as fixed. This is NOT true, but the patch from comment #18 still fixes the problem.
(In reply to MNontschev from comment #27) > The oVirt 3.6.6 release notes lists this Bug as fixed. This is NOT true, but > the patch from comment #18 still fixes the problem. yes, it was later found out that one patch is not really in the build, and reopened and retargeted to 3.6.7 on 2016-05-16 13:00:01 CEST Please try in 3.6.7
Verified, the patch was merged and it's possible to attach host device to VM. (verified with GPU, PCI and USB devices). Verification build: rhevm-3.6.7.1-0.1.el6.noarch qemu-kvm-rhev-2.3.0-31.el7_2.12.x86_64 sanlock-3.2.4-2.el7_2.x86_64 vdsm-4.17.28-0.el7ev.noarch -> upgraded to vdsm-4.17.29-0.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64 Also, an upgrade of VDSM build from vdsm-4.17.28-0.el7ev.noarch to vdsm-4.17.29-0.el7ev.noarch was used in order to see if bug 1341299 is affecting this bug. Verification scenario: 1. Use host with vdsm-4.17.28-0.el7ev: Browse webadmin -> virtual machines tab -> select VM -> host devices tab -> add device 2. list is empty (bug 1341299) 3. upgrade VDSM to vdsm-4.17.29-0.el7ev 4. Navigate to hosts tab and refresh host capabilities 5. Navigate to virtual machines tab -> select VM -> host devices tab -> add device 6. Verify devices are now listed. 7. Attach GPU, PCI and USB devices to VM. 8. Run VM and verify devices are attached properly.
Also verified that host devices is_assignable == True under hostdevListByCaps: vdsClient -s 0 hostdevListByCaps