Bug 1256486
Summary: | NetXtreme II BCM5709 - device is behind a switch lacking ACS | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | lejeczek <peljasz> |
Component: | libvirt | Assignee: | Laine Stump <laine> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.1 | CC: | dyuan, honzhang, laine, mzhan, peljasz, rbalakri |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-1.2.17-7.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-19 06:52:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
lejeczek
2015-08-24 17:34:18 UTC
Yeah, the ACS check is a relic of pre-vfio device assignment that wasn't noticed when adding support for vfio. It needs to be bypassed when vfio is used for device assignment. I'll work up a patch for it. Posted this patch upstream for review. *very* simple, which has me wondering what I did wrong :-) https://www.redhat.com/archives/libvir-list/2015-August/msg00890.html ok, many thanks. Trying with relaxed_acs_check = 1 also fails, throwing different errors, or rather almost no error: internal error: unable to execute QEMU command 'device_add': Device initialization failed. I'm trying simply to attach this: <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0' bus='0x24' slot='0x0' function='0x1'/> </source> </hostdev> virsh # nodedev-dumpxml pci_0000_24_00_1 <device> <name>pci_0000_24_00_1</name> <path>/sys/devices/pci0000:20/0000:20:02.0/0000:21:00.0/0000:22:04.0/0000:24:00.1</path> <parent>pci_0000_22_04_0</parent> <driver> <name>vfio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>36</bus> <slot>0</slot> <function>1</function> <product id='0x1639'>NetXtreme II BCM5709 Gigabit Ethernet</product> <vendor id='0x14e4'>Broadcom Corporation</vendor> <iommuGroup number='27'> <address domain='0x0000' bus='0x22' slot='0x04' function='0x0'/> <address domain='0x0000' bus='0x24' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0x24' slot='0x00' function='0x1'/> </iommuGroup> <pci-express> <link validity='cap' port='0' speed='5' width='4'/> <link validity='sta' speed='5' width='4'/> </pci-express> </capability> </device> Pushed this upstream: commit 108d591b1144bc6cb5d1199f6fc23ee972b76e86 Author: Laine Stump <laine> Date: Wed Aug 26 02:04:23 2015 -0400 hostdev: skip ACS check when using VFIO for device assignment The ACS checks are meaningless when using the more modern VFIO driver for device assignment since VFIO has its own more complete and exact checks, but I didn't realize that when I added support for VFIO. This patch eliminates the ACS check when preparing PCI devices for assignment if VFIO is being used. Verify it as follows. # rpm -q libvirt libvirt-1.2.17-8.el7.x86_64 # cat /etc/libvirt/qemu.conf |grep relax # to guests. By setting relaxed_acs_check to 1 such devices will be allowed to #relaxed_acs_check = 1 # virsh nodedev-dumpxml pci_0000_03_00_0 <device> <name>pci_0000_03_00_0</name> <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0</path> <parent>pci_0000_00_01_0</parent> <driver> <name>igb</name> </driver> <capability type='pci'> <domain>0</domain> <bus>3</bus> <slot>0</slot> <function>0</function> <product id='0x10c9'>82576 Gigabit Network Connection</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='virt_functions'> <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> <address domain='0x0000' bus='0x03' slot='0x10' function='0x2'/> <address domain='0x0000' bus='0x03' slot='0x10' function='0x4'/> <address domain='0x0000' bus='0x03' slot='0x10' function='0x6'/> <address domain='0x0000' bus='0x03' slot='0x11' function='0x0'/> <address domain='0x0000' bus='0x03' slot='0x11' function='0x2'/> <address domain='0x0000' bus='0x03' slot='0x11' function='0x4'/> </capability> <iommuGroup number='14'> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </iommuGroup> <pci-express> <link validity='cap' port='247' speed='2.5' width='4'/> <link validity='sta' speed='2.5' width='1'/> </pci-express> </capability> </device> # virsh edit r7 (Add the following VF xml to guest) <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> </source> </hostdev> Domain r7 XML configuration edited. # virsh start r7 Domain r7 started # virsh dumpxml r7|grep /hostdev -B7 <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> Test hotplug scenario.The result is expected. [root@sriov1 /]# cat /etc/libvirt/qemu.conf |grep relax # to guests. By setting relaxed_acs_check to 1 such devices will be allowed to #relaxed_acs_check = 1 [root@sriov1 /]# lspci|grep 82576 03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) [root@sriov1 /]# virsh start r7 Domain r7 started [root@sriov1 ~]# cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> </source> </hostdev> [root@sriov1 ~]# cat hostdev1.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x1'/> </source> </hostdev> [root@sriov1 ~]# cat hostdev2.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x2'/> </source> </hostdev> [root@sriov1 ~]# virsh attach-device r7 hostdev.xml Device attached successfully [root@sriov1 ~]# virsh attach-device r7 hostdev1.xml Device attached successfully [root@sriov1 ~]# virsh attach-device r7 hostdev2.xml Device attached successfully [root@sriov1 ~]# virsh dumpxml r7|grep /hostdev -B7 <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x2'/> </source> <alias name='hostdev2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </hostdev> Hi laine Do we need to verify the bug using NetXtreme II BCM5709 ? Thanks hongming The network devices in the verification from Comment8 and Comment9 is a intel 82576 with ACS capabilites. Verify it using a intel 82576 without ACS as follows. 1) reproduce the bug using libvirt-1.2.17-6.el7.x86_64 # lspci|grep 82576 0e:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0e:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0f:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0f:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0f:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 0f:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 10:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 10:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 11:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 11:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 11:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 11:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) # rpm -q libvirt libvirt-1.2.17-6.el7.x86_64 # cat /etc/libvirt/qemu.conf |grep relax # to guests. By setting relaxed_acs_check to 1 such devices will be allowed to #relaxed_acs_check = 1 # virsh dumpxml r7|grep /hostdev -B5 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> # virsh start r7 error: Failed to start domain r7 error: Requested operation is not valid: PCI device 0000:0f:10.0 is not assignable 2) Verify it using libvirt-1.2.17-8.el7.x86_64 # cat /etc/libvirt/qemu.conf |grep relax # to guests. By setting relaxed_acs_check to 1 such devices will be allowed to #relaxed_acs_check = 1 # virsh dumpxml r7|grep /hostdev -B5 <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> # virsh start r7 error: Failed to start domain r7 error: internal error: process exited while connecting to monitor: 2015-09-10T05:13:47.479416Z qemu-kvm: -device vfio-pci,host=0f:10.0,id=hostdev0,bus=pci.0,addr=0x8: vfio: error, group 24 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver. 2015-09-10T05:13:47.479469Z qemu-kvm: -device vfio-pci,host=0f:10.0,id=hostdev0,bus=pci.0,addr=0x8: vfio: failed to get group 24 2015-09-10T05:13:47.479504Z qemu-kvm: -device vfio-pci,host=0f:10.0,id=hostdev0,bus=pci.0,addr=0x8: Device initialization failed 2015-09-10T05:13:47.479518Z qemu-kvm: -device vfio-pci,host=0f:10.0,id=hostdev0,bus=pci.0,addr=0x8: Device 'vfio-pci' could not be initialized Because the VF and its PF in the same iommu group 24 as follows, there exists a Bug 1046838 - [Intel 7.1 Bug] PF and VF are in the same iommu_group. Now it is failed to pci passthrough. The device in Comment8 and Comment9 is a 82576 with ACS , its VFs are each alone in their own group. # virsh nodedev-dumpxml pci_0000_0f_10_0 <device> <name>pci_0000_0f_10_0</name> <path>/sys/devices/pci0000:00/0000:00:1c.6/0000:0c:00.0/0000:0d:02.0/0000:0f:10.0</path> <parent>pci_0000_0d_02_0</parent> <driver> <name>igbvf</name> </driver> <capability type='pci'> <domain>0</domain> <bus>15</bus> <slot>16</slot> <function>0</function> <product id='0x10ca'>82576 Virtual Function</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='phys_function'> <address domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/> </capability> <iommuGroup number='24'> <address domain='0x0000' bus='0x0d' slot='0x02' function='0x0'/> <address domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0x0e' slot='0x00' function='0x1'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x0'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x1'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x2'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x3'/> </iommuGroup> <pci-express> <link validity='cap' port='2' speed='2.5' width='4'/> <link validity='sta' width='0'/> </pci-express> </capability> </device> # virsh nodedev-dumpxml pci_0000_0e_00_0 <device> <name>pci_0000_0e_00_0</name> <path>/sys/devices/pci0000:00/0000:00:1c.6/0000:0c:00.0/0000:0d:02.0/0000:0e:00.0</path> <parent>pci_0000_0d_02_0</parent> <driver> <name>igb</name> </driver> <capability type='pci'> <domain>0</domain> <bus>14</bus> <slot>0</slot> <function>0</function> <product id='0x10e8'>82576 Gigabit Network Connection</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='virt_functions'> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x0'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x2'/> </capability> <iommuGroup number='24'> <address domain='0x0000' bus='0x0d' slot='0x02' function='0x0'/> <address domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0x0e' slot='0x00' function='0x1'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x0'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x1'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x2'/> <address domain='0x0000' bus='0x0f' slot='0x10' function='0x3'/> </iommuGroup> <pci-express> <link validity='cap' port='2' speed='2.5' width='4'/> <link validity='sta' speed='2.5' width='4'/> </pci-express> </capability> </device> Also refer to https://bugzilla.redhat.com/show_bug.cgi?id=1046838#c18 Intel has published an update to the E3-1200 spec that calls out the lack of ACS support and recommends that direct device assignment should be avoided on these machines. The spec update is here: https://www-ssl.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf I don't understand all the details, but as far as I do understand it, ACS checks were inadequate in some cases, and too pedantic in others, leading to inability to assign some devices that should have been assignable as well as vice versa. If you just want to test that the ACS check has been removed from the code, you can try setting up vfio device assignment in an unprivileged libvirt domain (see http://vfio.blogspot.com/2015/09/libvirt-1219-session-mode-device.html ). This definitely will not work with the ACS check present, but will work if its been removed (and you get all the other details correct, e.g. managed='no', setting ulimit -l etc). If you want to check that a device/bus combination which would have failed the ACS check but can actually be safely assigned is now assignable, you'll need to get some advice on a particular piece of hardware that fits the description. I had thought that *someone* somewhere responded to this patch saying that it solved their problem, but can't find it now (it's possible I imagined it) The person who filed this bug will not get relief from this patch by itself, because they actually have multiple devices in the same IOMMU group, so simply eliminating the ACS check isn't going to be enough - they'll need to switch to "managed='no'", and manually unbind all three of the devices in the IOMMU group from their host drivers, i.e. the XML should be changed to this: <hostdev mode='subsystem' type='pci' managed='no'> <source> <address domain='0x0' bus='0x24' slot='0x0' function='0x1'/> </source> </hostdev> and prior to attempting the device assignment, they need to run this: virsh nodedev-detach pci_0000_22_04_0 virsh nodedev-detach pci_0000_24_00_0 virsh nodedev-detach pci_0000_24_00_1 Of course this would mean that the devices at 22:04.0 and 24:00.0 wouldn't be usable by either the host or any other guests, which probably isn't what they want, but VFIO has apparently determined that is unsafe (due to the possibility of the host/guest that controls one of the devices using it to examing/alter the DMA space of another device that is being used by a different guest or the host). Verify it as follows. The result is expected.Move its status to VERIFIED. [root@sriov1 ~]# rpm -q libvirt libvirt-1.2.17-9.el7.x86_64 [root@sriov1 ~]# virsh nodedev-detach pci_0000_03_10_0 --driver vfio Device pci_0000_03_10_0 detached [root@sriov1 ~]# virsh nodedev-dumpxml pci_0000_03_10_0 <device> <name>pci_0000_03_10_0</name> <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:10.0</path> <parent>pci_0000_00_01_0</parent> <driver> <name>vfio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>3</bus> <slot>16</slot> <function>0</function> <product id='0x10ca'>82576 Virtual Function</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='phys_function'> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </capability> <iommuGroup number='25'> <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> </iommuGroup> <pci-express> <link validity='cap' port='0' speed='2.5' width='4'/> <link validity='sta' width='0'/> </pci-express> </capability> </device> [root@sriov1 ~]# grep test /etc/security/limits.conf test hard memlock unlimited test soft memlock unlimited [root@sriov1 ~]# ll /dev/vfio/25 crw-------. 1 root root 245, 0 Sep 23 16:45 /dev/vfio/25 [root@sriov1 ~]# chown test:test /dev/vfio/25 [root@sriov1 ~]# ll /dev/vfio/25 crw-------. 1 test test 245, 0 Sep 23 16:45 /dev/vfio/25 [root@sriov1 ~]# su - test Last login: Wed Sep 23 16:53:32 CST 2015 on pts/0 [test@sriov1 root]$ ulimit -l unlimited [test@sriov1 ~]$ virsh start r7.1 Domain r7.1 started [test@sriov1 ~]$ cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> </source> </hostdev> [test@sriov1 ~]$ virsh attach-device r7.1 hostdev.xml Device attached successfully [test@sriov1 ~]$ virsh dumpxml r7.1|grep /hostdev -B7 <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </hostdev> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2202.html |