Description of problem: After kernel-xen boot up with iommu enabled , VF can not be enabled in Dom0 by loading igb module with max_vfs=n (n=1...7). The device is Intel 82576 Gigabit network card which support SR-IOV. Version-Release number of selected component (if applicable): kernel-xen-2.6.18-164.el5 and kernel-xen-2.6.18-164.9.1.el5 xen-3.0.3-94.el5 How reproducible: 100% Steps to Reproduce: 1. boot up xen host with iommu enabled 2. $ lspci ... 03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) ... 3. $ modprobe -r igb 4. $ modprobe igb max_vfs=2 5. $ lspci # get same result as in step 2 ... 03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) ... 6. $ cat /var/log/message # when remove igb (step 3) and reload igb with max_vfs=2 (step 4) ... Dec 16 02:44:59 intel-x5550-12-1 kernel: ACPI: PCI interrupt for device 0000:03:00.1 disabled Dec 16 02:44:59 intel-x5550-12-1 kernel: ACPI: PCI interrupt for device 0000:03:00.0 disabled Dec 16 02:45:06 intel-x5550-12-1 kernel: Intel(R) Gigabit Ethernet Network Driver - version 1.3.16-k2 Dec 16 02:45:06 intel-x5550-12-1 kernel: Copyright (c) 2007-2009 Intel Corporation. Dec 16 02:45:06 intel-x5550-12-1 kernel: PCI: Enabling device 0000:03:00.0 (0100 -> 0102) Dec 16 02:45:06 intel-x5550-12-1 kernel: ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 28 (level, low) -> IRQ 22 Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.0: Intel(R) Gigabit Ethernet Network Connection Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.0: eth3: (PCIe:2.5Gb/s:Width x4) 00:1b:21:39:8b:18 Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.0: eth3: PBA No: e43709-003 Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s) Dec 16 02:45:06 intel-x5550-12-1 kernel: PCI: Enabling device 0000:03:00.1 (0100 -> 0102) Dec 16 02:45:06 intel-x5550-12-1 kernel: ACPI: PCI Interrupt 0000:03:00.1[B] -> GSI 40 (level, low) -> IRQ 23 Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.1: Intel(R) Gigabit Ethernet Network Connection Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.1: eth4: (PCIe:2.5Gb/s:Width x4) 00:1b:21:39:8b:19 Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.1: eth4: PBA No: e43709-003 Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s) Dec 16 02:45:07 intel-x5550-12-1 kernel: ADDRCONF(NETDEV_UP): eth3: link is not ready Dec 16 02:45:07 intel-x5550-12-1 kernel: ADDRCONF(NETDEV_UP): eth4: link is not ready Dec 16 02:45:09 intel-x5550-12-1 kernel: igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Dec 16 02:45:09 intel-x5550-12-1 kernel: ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready Dec 16 02:45:09 intel-x5550-12-1 kernel: igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Dec 16 02:45:09 intel-x5550-12-1 kernel: ADDRCONF(NETDEV_CHANGE): eth4: link becomes ready ... Actual results: VFs cannot be enabled. Expected results: VFs should be available after reload the igb module with max_vfs param Additional info: $ cat /boot/grub/grub.conf ... title Red Hat Enterprise Linux Server (2.6.18-164.el5xen) root (hd0,0) kernel /xen.gz-2.6.18-164.el5 iommu=1 module /vmlinuz-2.6.18-164.el5xen ro root=/dev/VolGroup01/LogVol00 module /initrd-2.6.18-164.el5xen.img ... $ uname -a Linux intel-x5550-12-1 2.6.18-164.el5xen #1 SMP Tue Aug 18 15:59:52 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux $ xm dmesg | grep VT-d (XEN) Intel VT-d has been enabled (XEN) Intel VT-d snoop control disabled $ modinfo igb filename: /lib/modules/2.6.18-164.el5xen/kernel/drivers/net/igb/igb.ko version: 1.3.16-k2 license: GPL description: Intel(R) Gigabit Ethernet Network Driver author: Intel Corporation, <e1000-devel.net> srcversion: 78555F0A019E05BADBD95AA alias: pci:v00008086d000010D6sv*sd*bc*sc*i* alias: pci:v00008086d000010A9sv*sd*bc*sc*i* alias: pci:v00008086d000010A7sv*sd*bc*sc*i* alias: pci:v00008086d000010E8sv*sd*bc*sc*i* alias: pci:v00008086d000010E7sv*sd*bc*sc*i* alias: pci:v00008086d000010E6sv*sd*bc*sc*i* alias: pci:v00008086d0000150Asv*sd*bc*sc*i* alias: pci:v00008086d000010C9sv*sd*bc*sc*i* depends: 8021q vermagic: 2.6.18-164.el5xen SMP mod_unload gcc-4.1 parm: max_vfs:Maximum number of virtual functions to allocate per physical function (uint) module_sig: 883f3504a8b9b84bd273d74512bb1128dcc09f6de0e11f4701e731966ec2b9e259d8952d91f9009d1750ebf3a120d977468bea8b5bec2118a692e7b $ xm dmesg #refer to the attachment $ dmesg #refer to the attachment $ cat /var/log/message #refer to the attachment
Created attachment 378701 [details] xm dmesg log
Created attachment 378702 [details] /var/log/message
Created attachment 378703 [details] dmesg
You need to set the following on the *kernel* command line: pci_pt_e820_access=on then the PCI_MMCONF space will be available, which is what is needed to enable the VF's. If the above works for you, please acknowledge & close the BZ.
(In reply to comment #4) > You need to set the following on the *kernel* command line: > > pci_pt_e820_access=on > > then the PCI_MMCONF space will be available, which is what is needed > to enable the VF's. > > If the above works for you, please acknowledge & close the BZ. I have tried the param, still not work. title Red Hat Enterprise Linux Server (2.6.18-164.el5xen) root (hd0,0) kernel /xen.gz-2.6.18-164.el5 iommu=1 module /vmlinuz-2.6.18-164.el5xen ro root=/dev/VolGroup01/LogVol00 pci_pt_e820_access=on module /initrd-2.6.18-164.el5xen.img ...
Are you 100% sure VTd is enabled in the BIOS? try iommu=force ; I believe that will moan if the BIOS setting isn't on. (it will also enable like iommu=1; the code essentially enables it to run if you just add iommu=foobar, and foobar != disable, off, no, false, 0. Can you also boot the non-xen kernel with intel_iommu=on and add the dmesg output here as well?
(In reply to comment #6) > Are you 100% sure VTd is enabled in the BIOS? > sure, and can see the following messages: $ xm dmesg | grep VT-d (XEN) Intel VT-d has been enabled (XEN) Intel VT-d snoop control disabled > try iommu=force ; I believe that will moan if the BIOS setting isn't on. > (it will also enable like iommu=1; the code essentially enables it to > run if you just add iommu=foobar, and foobar != disable, off, no, false, 0. > tried iommu=force, get same result as 'iommu=1' > Can you also boot the non-xen kernel with intel_iommu=on and add the > dmesg output here as well? It works well in on-xen kernel, boot with intel_iommu=on and reload igb with max_vfs=7, then the VFs are available, dmesg log is attached. __________________________________________________________________________ title Red Hat Enterprise Linux Server-base (2.6.18-164.el5) root (hd0,0) kernel /vmlinuz-2.6.18-164.el5 ro root=/dev/VolGroup01/LogVol00 intel_iommu=on initrd /initrd-2.6.18-164.el5.img __________________________________________________________________________
Created attachment 379644 [details] regular kernel, enable iommu, reload igb module with max_vfs=7
Comment on attachment 379644 [details] regular kernel, enable iommu, reload igb module with max_vfs=7 $ lspci | grep 82576 03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 03:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) 03:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
hmm.... in the /etc/xen/xend-config.sxp add the following at the bottom: pci-dev-assign-strick-check no at the bottom of the file. I didn't think this affected dom0, but maybe it does.
(In reply to comment #10) > hmm.... in the /etc/xen/xend-config.sxp add the following at the bottom: > > pci-dev-assign-strick-check no > > at the bottom of the file. > > I didn't think this affected dom0, but maybe it does. tried, no effect.
(In reply to comment #5) > (In reply to comment #4) > > You need to set the following on the *kernel* command line: > > > > pci_pt_e820_access=on > > > > then the PCI_MMCONF space will be available, which is what is needed > > to enable the VF's. > > > > If the above works for you, please acknowledge & close the BZ. > > I have tried the param, still not work. > > title Red Hat Enterprise Linux Server (2.6.18-164.el5xen) > root (hd0,0) > kernel /xen.gz-2.6.18-164.el5 iommu=1 > module /vmlinuz-2.6.18-164.el5xen ro root=/dev/VolGroup01/LogVol00 > pci_pt_e820_access=on > module /initrd-2.6.18-164.el5xen.img > ... You definitely need to have pci_pt_e820_access=on, and in the dom0 dmesg attachment from Comment #3 it is not set. Can you verify that you see this in the dom0 dmesg when it is set: PCI: Using MMCONFIG at f0000000 Instead of this: PCI: Not using MMCONFIG. PCI: Using configuration type 1 Also, can you give us the lspci -vvv -xxxx output of one of the igb PF's. Something like: # lspci -vvv -xxxx -s 03:00.0
When setting pci_pt_e820_access=on, we see: "PCI: Cannot map mmconfig aperture for segment 0" which is in arch/x86_64/pci/mmconfig.c and it means that ioremap is failing. Don't know ioremap is failing (on -xen kernel, working on bare-metal). Note, though, that the last device-assignment & sr-iov test I did on an hp-z800 (which is what the console said this test machine is) didn't properly do sr-iov in the bios (on the hp-z800 in rhts in Westford lab). but the bios would have to if bare-metal -164 works. So, if you can load the -debug kernels on that system for further testing, that's help. I'll try to make some custom kernels to trace through this code as well, to see if the problem can be narrowed down.
The machine e820 map (Xen, xm dmesg) is: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 0000000000095800 (usable) (XEN) 0000000000095800 - 00000000000a0000 (reserved) (XEN) 00000000000e8000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cefa5800 (usable) (XEN) 00000000cefa5800 - 00000000d0000000 (reserved) (XEN) 00000000f0000000 - 00000000f8000000 (reserved) (XEN) 00000000fec00000 - 00000000fed40000 (reserved) (XEN) 00000000fed45000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000330000000 (usable) The MCFG table shows: [02Ch 044 8] Base Address : 00000000F0000000 [034h 052 2] Segment Group Number : 0000 [036h 054 1] Start Bus Number : 00 [037h 055 1] End Bus Number : 7F [038h 056 4] Reserved : 00000000 Note the End Bus Number. This means the mmconfig region is 0xf000000-0xf8000000, which is marked reserved in e820. However, the kernel will ioremap with a hardcoded region that is up to End Bus Number FF (IOW, all possible 256 PCI busses), which makes a region of 0xf0000000-0x100000000 spanning 3 e820 sections. I suspect this would all work fine if we simply ioremap'd exactly the space the BIOS requested (or test on an i386 install which doesn't ioremap). Something like this: --- a/arch/x86_64/pci/mmconfig.c +++ b/arch/x86_64/pci/mmconfig.c @@ -165,9 +165,11 @@ void __init pci_mmcfg_init(void) return; } for (i = 0; i < pci_mmcfg_config_num; ++i) { + unsigned long mmcfg_aper = pci_mmcfg_config[i].end_bus_number - + mmcfg_aper *= 32 * 8 * 4096; pci_mmcfg_virt[i].cfg = &pci_mmcfg_config[i]; pci_mmcfg_virt[i].virt = ioremap_nocache(pci_mmcfg_config[i].bas - MMCONFIG_APER_MAX); + mmcfg_aper); if (!pci_mmcfg_virt[i].virt) { printk("PCI: Cannot map mmconfig aperture for segment %d pci_mmcfg_config[i].pci_segment_group_number);
Create rpm's with similar patch to comment #14. Sorry for delay; I did a brew build before company holiday, but when I came back, brew had dumped the (scratch) build, so I had to re-run the build. see the following location for rpm's: http://people.redhat.com/~ddutile/rhel5/bz547980/ Please let me know whether VF's are visible with the dom0 kernel-xen rpm.
After install kernel-xen rpm and boot up the system, VFs are still not visible. The error 'PCI: Cannot map mmconfig aperture for segment 0' is still exist in Dom0 dmesg. $ uname -a Linux intel-x5550-12-1 2.6.18-164.el5bz547980v1xen #1 SMP Mon Jan 4 11:38:11 EST 2010 x86_64 x86_64 x86_64 GNU/Linux $ cat /boot/grub/grub.conf ... title Red Hat Enterprise Linux Server (2.6.18-164.el5bz547980v1xen) root (hd0,0) kernel /xen.gz-2.6.18-164.el5bz547980v1 iommu=force module /vmlinuz-2.6.18-164.el5bz547980v1xen ro root=/dev/VolGroup01/LogVol00 pci_pt_e820_access=on module /initrd-2.6.18-164.el5bz547980v1xen.img ... $ xm dmesg ... (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 0000000000095800 (usable) (XEN) 0000000000095800 - 00000000000a0000 (reserved) (XEN) 00000000000e8000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cefa5800 (usable) (XEN) 00000000cefa5800 - 00000000d0000000 (reserved) (XEN) 00000000f0000000 - 00000000f8000000 (reserved) (XEN) 00000000fec00000 - 00000000fed40000 (reserved) (XEN) 00000000fed45000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000330000000 (usable) ... $ lspci -vvv -xxxx -s 03:00.0 03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 23 Region 0: Memory at e3200000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at e3400000 (32-bit, non-prefetchable) [size=4M] Region 2: I/O ports at b000 [disabled] [size=32] Region 3: Memory at e3240000 (32-bit, non-prefetchable) [size=16K] [virtual] Expansion ROM at e4400000 [disabled] [size=4M] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [70] MSI-X: Enable+ Mask- TabSize=10 Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [a0] Express Endpoint IRQ 0 Device: Supported: MaxPayload 512 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <512ns, L1 <64us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ Device: MaxPayload 256 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s L1, Port 247 Link: Latency L0s <4us, L1 <64us Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x1 00: 86 80 c9 10 06 05 18 00 01 00 00 02 10 00 80 00 10: 00 00 20 e3 00 00 40 e3 01 b0 00 00 00 00 24 e3 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 3c a0 30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 00 00 40: 01 50 23 c8 00 20 00 1a 00 00 00 00 00 00 00 00 50: 05 70 80 01 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 11 a0 09 80 03 00 00 00 03 20 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 10 00 02 00 c2 8c 00 10 30 28 19 00 41 6c 03 f7 b0: 00 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Created attachment 381706 [details] Dom0 dmesg kernel-xen-2.6.18-164.el5bz547980v1.x86_64
Created attachment 381707 [details] xm dmesg kernel-xen-2.6.18-164.el5bz547980v1.x86_64
/my bad! I edited a kernel git tree & forgot to commit before doing make srpm & brew build. please try the following: http://people.redhat.com/~ddutile/rhel5/bz547980/ kernel-xen-2.6.18-164.el5bz547980v2.x86_64.rpm (note: v2 version of patch.... now you know (one) reason why I tag my builds with a version number.... ;-) ).
(In reply to comment #20) > /my bad! > I edited a kernel git tree & forgot to commit before doing make srpm & brew > build. > please try the following: > > http://people.redhat.com/~ddutile/rhel5/bz547980/ > kernel-xen-2.6.18-164.el5bz547980v2.x86_64.rpm > > (note: v2 version of patch.... now you know (one) reason why I tag my > builds with a version number.... ;-) ). This package works. The VFs can be visible now. please refer to the host dmesg and xm dmesg logs.
Created attachment 381905 [details] Dom0 dmesg 2.6.18-164.el5bz547980v2xen
Created attachment 381906 [details] xm dmesg 2.6.18-164.el5bz547980v2xen
so for regression-mgmt reasons, the general concensus is to implement a platform-level/enabled quirk for this case. Can the reporter pls send / attach the dmi-decode for this platform to this bz? thanks..
Created attachment 382137 [details] output of dmidecode
A -185 kernel-xen with the POSTED patch can be pulled from this location: http://people.redhat.com/ddutile/rhel5/bz547980/kernel-xen-2.6.18-185.el5bz547980v4.x86_64.rpm
This patch is recommended to be backported to the 5.4-z stream. It appears that this problem is becoming common on platforms that support Intel virtualization and SRIOV. Few platforms (BIOS's) use to support SRIOV, so it wasn't a visible problem; more and more BIOS updates are including SRIOV support (scanning PCI device's extended PCI config space & providing mapping space for VF's on PCI (physical) devices). For example, when I first tested an HP z800, it did not have (BIOS) VF support, so I could not do VF device assignment (aka, pass-through) testing on it. This bug was found on an HP z800, and the one I tested the patch was obviously updated, since it showed this bug & the patch was confirmed on it, and it's the same (rhts/beaker) system I couldn't do testing on it 3 months back. I've been pinged by 3 other bz's for this patch to work around this issue, to enable them to resolve other virt/VF issues, but this one stops them from the get-go, before they can debug the other virt bugs. So, to avoid a (relatively small) wave/storm of bz's with this problem, it's prudent to backport to 5.4's zstream to limit customer issues. Note: The (final) patch was developed to reduce exposure to regression on rhel5 virt systems -- only on xen kernels w/pci_pt_e820_access=on set, which avoids changing rhel5 behavior on (a) bare metal kernel (b) xen kernels that are not doing PCI VF device assignment. Additionally, the patch adds a kernel param to defeat the bug fix if BIOS has VF mapping correct, but ACPI spec of PCI's max bus number busted, so there is in-field option to defeat this workaround if some perverse condition occurs that wasn't thought of in this patch.
*** Bug 563539 has been marked as a duplicate of this bug. ***
I test on Westmere-HEDT with RHEL5.5 GA Snapshot 2, the issue is fixed on this platform. Xen Version: xen-3.0.3-105.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html