Description of problem: Assign 14 VFs to guest, but only 2 VFs can be seen in RHEL5.9 PV guest. It's not dup to Bug 835768 - SR-IOV: Given 14 VFs to RHEL7 guest but only 2 enabled That bug is for RHEL7 HVM guest and is related to emul_xen_unplug while this one is for RHEL5 PV which do not has that option. Version-Release number of selected component (if applicable): Host: RHEL5.9 2.6.18-343.el5xen, xen-3.0.3-142.el5 Guest: RHEL5.9 2.6.18-343.el5xen How reproducible: 100% Steps to Reproduce: [in host]: 1.enable VF in Domain0 with max_vfs=7 2.bind PCI device to pciback driver # echo ${VF_device_ID} > /sys/bus/pci/drivers/igbvf/unbind # echo ${VF_device_ID} > /sys/bus/pci/drivers/pciback/new_slot # echo ${VF_device_ID} > /sys/bus/pci/drivers/pciback/bind 3. create a guest with vfs # xm cr ${file.cfg} pci=${VF_device_ID} pci=${VF_device_ID} ... pci=${VF_device_ID} 4. check VF with "xm pci-list" # xm pci-list ${domainID} [in guest]: 1. check VF by "lspci" # lspci 2.check VF by "ifconfig" # ifconfig Actual results: Only 2 VF can be seen in guest, # lspci | grep 82576 00:00.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:01.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) Expected results: All VF assigned to guest can be seen. Additional info: Same issue with RHEL5.8, so it is not a regression.
From the guest dmesg: pcifront pci-0: Installing PCI frontend pcifront pci-0: Creating PCI Frontend Bus 0000:00 ACPI Error (tbxfroot-0512): Could not map memory at 0000040E for length 2 [20060707] ACPI Exception (tbxfroot-0400): AE_NO_MEMORY, RSDP structure not found - Flags=8 [20060707] ACPI: System description tables not found pci 0000:00:00.0: reg 10: [mem 0xf4248000-0xf424bfff 64bit] pci 0000:00:00.0: reg 1c: [mem 0xf4268000-0xf426bfff 64bit] pcifront_scan_root() [drivers/xen/pcifront/pci_op.c] prints "Creating PCI Frontend Bus". I'll have to dig into it. pcifront_scan_root pci_scan_bus_parented pci_create_bus pci_scan_child_bus pci_scan_slot /* for each slot */ pci_scan_single_device /* for each function */ pci_scan_device pci_setup_device pci_read_bases __pci_read_base prints "pci 0000:00:00.0: reg 10: [mem..." etc pci_device_add pci_scan_msi_device pci_walk_bus pcifront_claim_resource pci_claim_resource insert_resource __request_resource pci_bus_add_devices pci_bus_add_device When scanning the pcifront PCI bus, only one passthru VF is found. Additionally, some time during the scan, an attempt is made to find the ACPI Root System Description Pointer. The attempt fails, which is why the tbxfroot-0512 / tbxfroot-0400 errors are printed by acpi_tb_find_rsdp() and acpi_find_root_pointer(). Address 0000040E is ACPI_EBDA_PTR_LOCATION. The mapping attempt for that address is made with acpi_os_map_memory(), which ultimately ends up in __ioremap() [arch/i386/mm/ioremap-xen.c]. I notice that "xm dmesg" contains (XEN) mm.c:630:d2 Non-privileged (2) attempt to map I/O space 00000000 I'll add a WARN() to acpi_tb_find_rsdp() to get a stack trace and see where exactly it is called during the bus scan. I have a fleeting suspicion that this ACPI error gets in the way of enumerating the rest of devices. Two side points: - ACPI is disabled in PV domU's (domU dmesg: "ACPI: Interpreter disabled."; see "acpi_disabled" in "arch/x86_64/kernel/setup-xen.c"), - a web search for the tbxfroot errors at the top turns up a few hits, even Xen-related, but nothing usable.
Created attachment 628089 [details] warn if acpi_os_map_memory() fails in acpi_tb_find_rsdp() pcifront pci-0: Installing PCI frontend pcifront pci-0: Creating PCI Frontend Bus 0000:00 WARNING: at drivers/acpi/tables/tbxfroot.c:508 acpi_tb_find_rsdp() Call Trace: [<ffffffff803870fb>] acpi_find_root_pointer+0x63/0x20f [<ffffffff803729a0>] acpi_os_get_root_pointer+0x9/0x26 [<ffffffff803872fa>] acpi_get_firmware_table+0x53/0x278 [<ffffffff8039607f>] acpi_hest_firmware_first_pci+0x42/0x1e4 [<ffffffff80353e59>] __pci_bus_find_cap+0x48/0x57 [<ffffffff80352b04>] pci_setup_device+0xd5/0x2d2 [<ffffffff80352dff>] pci_scan_single_device+0xfe/0x12f [<ffffffff80352e4e>] pci_scan_slot+0x1e/0x51 [<ffffffff8035332b>] pci_scan_child_bus+0x23/0x99 [<ffffffff80353437>] pci_scan_bus_parented+0x16/0x21 [<ffffffff803c5a22>] pcifront_scan_root+0x98/0x117 ... ACPI Error (tbxfroot-0513): Could not map memory at 0000040E for length 2 [20060707] ACPI Exception (tbxfroot-0400): AE_NO_MEMORY, RSDP structure not found - Flags=8 [20060707] ACPI: System description tables not found pci 0000:00:00.0: reg 10: [mem 0xf4248000-0xf424bfff 64bit] pci 0000:00:00.0: reg 1c: [mem 0xf4268000-0xf426bfff 64bit]
pcifront_scan_root pci_scan_bus_parented pci_create_bus pci_scan_child_bus pci_scan_slot /* for each slot */ pci_scan_single_device /* for each function */ pci_scan_device pci_setup_device set_pci_aer_firmware_first <---- traversed now acpi_hest_firmware_first_pci acpi_get_firmware_table("HEST") acpi_os_get_root_pointer acpi_find_root_pointer acpi_tb_find_rsdp pci_read_bases __pci_read_base prints "pci 0000:00:00.0: reg 10: [mem..." etc pci_device_add pci_scan_msi_device pci_walk_bus pcifront_claim_resource pci_claim_resource insert_resource __request_resource pci_bus_add_devices pci_bus_add_device set_pci_aer_firmware_first() has return type "void", and it only decides about "pdev->aer_firmware_first". So its failure seems to be unrelated to the premature end of the bus scan.
Created attachment 628112 [details] debug messages in pci_scan_slot() The scan terminates because the device reports itself as non-multi-function and the first found function is func 0. pcifront pci-0: Installing PCI frontend pcifront pci-0: Creating PCI Frontend Bus 0000:00 pci_scan_slot: scan_all_fns=1 pci_scan_slot: func=0 devfn=0 pci 0000:00:00.0: reg 10: [mem 0xf4248000-0xf424bfff 64bit] pci 0000:00:00.0: reg 1c: [mem 0xf4268000-0xf426bfff 64bit] pci_scan_slot: dev=ffff88018058b000 pci 0000:00:00.0: pci_scan_slot: nr=0 multifunction=0 /* * If this is a single function device, * don't scan past the first function. */ if (!dev->multifunction) { if (func > 0) { dev->multifunction = 1; } else { break; } } dev->multifunction comes from pci_setup_device() (which is called, through several layers, inside the above loop); paraphrasing: u8 hdr_type; pci_read_config_byte(dev, PCI_HEADER_TYPE, &hdr_type); dev->multifunction = !!(hdr_type & 0x80); I'll have to see why the device is reported as single-function, when the host clearly constructs it as multi-function. From the dom0 dmesg: pciback: vpci: 0000:03:10.0: assign to virtual slot 0 pciback: vpci: 0000:03:10.1: assign to virtual slot 0 func 1 pciback: vpci: 0000:03:10.2: assign to virtual slot 0 func 2 pciback: vpci: 0000:03:10.3: assign to virtual slot 0 func 3 pciback: vpci: 0000:03:10.4: assign to virtual slot 0 func 4 pciback: vpci: 0000:03:10.5: assign to virtual slot 0 func 5 pciback: vpci: 0000:03:10.6: assign to virtual slot 0 func 6 (Printed by pciback_add_pci_dev(), file "drivers/xen/pciback/vpci.c".)
The non-multifunction setting seems to come straight from the igbvf device. When the PV domU goes through pci_scan_slot pci_scan_single_device pci_scan_device pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, &l) pci_setup_device pci_read_config_byte(dev, PCI_HEADER_TYPE, &hdr_type) dev->multifunction = !!(hdr_type & 0x80); dom0 reports (with pciback.verbose_request=1) pciback: 0000:03:10.0: read 4 bytes at 0x0 pciback: 0000:03:10.0: read 4 bytes at 0x0 = 10ca8086 pciback: 0000:03:10.0: read 1 bytes at 0xe pciback: 0000:03:10.0: read 1 bytes at 0xe = 0 (PCI_VENDOR_ID == 0x00, PCI_HEADER_TYPE == 0x0e -- offsets in config space). The pciback messages are printed by pciback_config_read() [drivers/xen/pciback/conf_space.c], which first retrieves the value from the real device, then modifies it as appropriate, based on the quirks/overlays installed for the given config space offset. The list of config space header overlays can be found in "drivers/xen/pciback/conf_space_header.c", array "header_common". Offset PCI_HEADER_TYPE is not overlaid. Checking in dom0: # lspci -s 03:10.0 -v -x -nn 03:10.0 Ethernet controller [0200]: Intel Corporation 82576 Virtual Function [8086:10ca] (rev 01) Subsystem: Intel Corporation Device [8086:a04c] Flags: bus master, fast devsel, latency 0 [virtual] Memory at f4248000 (64-bit, non-prefetchable) [size=16K] [virtual] Memory at f4268000 (64-bit, non-prefetchable) [size=16K] Capabilities: [70] MSI-X: Enable+ Count=3 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: pciback Kernel modules: igbvf 00: ff ff ff ff 04 00 10 00 01 00 00 02 00 00 00 00 ^^ 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 4c a0 30: 00 00 00 00 70 00 00 00 00 00 00 00 00 00 00 00 Offset 0x0e has value 0x00, which decodes as non-multi-function (MSB is clear), PCI_HEADER_TYPE_NORMAL. (It is interesting that vendor & device are both reported as 0xFFFF, but "header_common" does have overlays for those.) Compare the physical function: # lspci -s 03:00.0 -v -x -nn 03:00.0 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01) Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter [8086:a04c] Flags: bus master, fast devsel, latency 0, IRQ 21 Memory at f4200000 (32-bit, non-prefetchable) [size=128K] Memory at f4400000 (32-bit, non-prefetchable) [size=4M] I/O ports at d000 [size=32] Memory at f4240000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=10 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-6c-22-d0 Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [160] Single Root I/O Virtualization (SR-IOV) Kernel driver in use: igb Kernel modules: igb 00: 86 80 c9 10 07 05 10 00 01 00 00 02 10 00 80 00 ^^ 10: 00 00 20 f4 00 00 40 f4 01 d0 00 00 00 00 24 f4 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 4c a0 30: 00 00 00 00 40 00 00 00 00 00 00 00 05 01 00 00 Here MSB is set (multi-function device).
(In reply to comment #7) > (It is interesting that vendor & device are both reported as 0xFFFF, but > "header_common" does have overlays for those.) The answer to that is in upstream Linux commit fd5b221b.
Asked for guidance on xen-devel: http://lists.xen.org/archives/html/xen-devel/2012-10/msg01217.html
Created attachment 628292 [details] (proposed dom0 patch) xen PV passthru: assign SR-IOV virtual functions to separate virtual slots This patch solves the problem for me. Now dom0 prints: pciback: vpci: 0000:03:10.0: assign to virtual slot 0 pciback: vpci: 0000:03:10.1: assign to virtual slot 1 pciback: vpci: 0000:03:10.2: assign to virtual slot 2 pciback: vpci: 0000:03:10.3: assign to virtual slot 3 pciback: vpci: 0000:03:10.4: assign to virtual slot 4 pciback: vpci: 0000:03:10.5: assign to virtual slot 5 pciback: vpci: 0000:03:10.6: assign to virtual slot 6 In the guest: [root@pv-guest ~]# lspci 00:00.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:01.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:02.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:03.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:04.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:05.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:06.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) (slot == func is an artifact and neither required nor relevant) They all get IP addresses over DHCP and are externally pingable.
Posted upstream patch: http://lists.xen.org/archives/html/xen-devel/2012-10/msg01239.html
(In reply to comment #11) > Posted upstream patch: > http://lists.xen.org/archives/html/xen-devel/2012-10/msg01239.html ack to proposed patch....
posted upstream v2 patch: http://lists.xen.org/archives/html/xen-devel/2012-10/msg01291.html
Created attachment 628739 [details] (proposed dom0 patch, v2) xen PV passthru: assign SR-IOV virtual functions to separate virtual slots Simplified patch as suggested on xen-devel for v1. The second hunk from upstream v2 is not backported because we don't have <http://xenbits.xensource.com/hg/linux-2.6.18-xen.hg/rev/4b9f2293d750>. Tested the v2 patch too (locally built dom0, same RHEL-5 domU as before), with results visible in comment 10. The RHEL-6 PV guest doesn't support Xen pcifront. (Upstream Linux gained it with upstream commit 956a9202, which targeted 2.6.34.)
(In reply to comment #10) > (slot == func is an artifact and neither required nor relevant) Note that this working is *not* by blind luck. "scan_all_fns" used by pci_scan_slot() is invariably 1 in the Xen kernel. See include/asm-x86_64/mach-xen/asm/pci.h include/asm-i386/mach-xen/asm/pci.h /* On Xen we have to scan all functions since Xen hides bridges from * us. If a bridge is at fn=0 and that slot has a multifunction * device, we won't find the additional devices without scanning all * functions. */ #undef pcibios_scan_all_fns #define pcibios_scan_all_fns(a, b) 1
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate, in the next release of Red Hat Enterprise Linux.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Patch(es) available in kernel-2.6.18-360.el5 You can download this test kernel (or newer) from http://people.redhat.com/plougher/el5/ Detailed testing feedback is always welcomed. If you require guidance regarding testing, please ask the bug assignee.
Verified with: host: 2.6.18-365.el5xen xen-3.0.3-144.el5 guest: 2.6.18-348.el5xen [in host] [root@dhcp-9-22 home]# xm cr pv.cfg pci=0000:03:10.0 pci=0000:03:10.1 pci=0000:03:10.2 pci=0000:03:10.3 pci=0000:03:10.4 pci=0000:03:10.5 pci=0000:03:10.6 pci=0000:03:10.7 pci=0000:03:11.0 pci=0000:03:11.1 pci=0000:03:11.2 pci=0000:03:11.3 pci=0000:03:11.4 pci=0000:03:11.5 Using config file "./pv.cfg". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Started domain xen-pv-64 [root@dhcp-9-22 home]# xm li Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 4947 8 r----- 201.2 xen-pv-64 3 1024 2 -b---- 0.3 [root@dhcp-9-22 home]# xm pci-list 3 domain bus slot func 0 3 10 0 0 3 10 1 0 3 10 2 0 3 10 3 0 3 10 4 0 3 10 5 0 3 10 6 0 3 10 7 0 3 11 0 0 3 11 1 0 3 11 2 0 3 11 3 0 3 11 4 0 3 11 5 [in guest] [root@dhcp-8-166 ~]# lspci 00:00.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:01.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:02.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:03.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:04.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:05.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:06.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:07.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:08.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:09.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:0a.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:0b.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:0c.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 00:0d.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) So change the status to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1348.html