Bug 547980 - [SR-IOV] VF can not be enabled in Dom0
Summary: [SR-IOV] VF can not be enabled in Dom0
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.4
Hardware: All
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Don Dutile (Red Hat)
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 563539 (view as bug list)
Depends On:
Blocks: 560665
TreeView+ depends on / blocked
 
Reported: 2009-12-16 07:49 UTC by Qixiang Wan
Modified: 2013-01-11 02:39 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 06:51:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
xm dmesg log (5.67 KB, text/plain)
2009-12-16 07:50 UTC, Qixiang Wan
no flags Details
/var/log/message (56.34 KB, text/plain)
2009-12-16 07:51 UTC, Qixiang Wan
no flags Details
dmesg (28.64 KB, text/plain)
2009-12-16 07:51 UTC, Qixiang Wan
no flags Details
regular kernel, enable iommu, reload igb module with max_vfs=7 (37.85 KB, text/plain)
2009-12-21 16:18 UTC, Qixiang Wan
no flags Details
Dom0 dmesg kernel-xen-2.6.18-164.el5bz547980v1.x86_64 (29.02 KB, text/plain)
2010-01-05 06:14 UTC, Qixiang Wan
no flags Details
xm dmesg kernel-xen-2.6.18-164.el5bz547980v1.x86_64 (6.08 KB, text/x-log)
2010-01-05 06:14 UTC, Qixiang Wan
no flags Details
Dom0 dmesg 2.6.18-164.el5bz547980v2xen (33.84 KB, text/plain)
2010-01-06 03:44 UTC, Qixiang Wan
no flags Details
xm dmesg 2.6.18-164.el5bz547980v2xen (6.03 KB, text/plain)
2010-01-06 03:45 UTC, Qixiang Wan
no flags Details
output of dmidecode (27.28 KB, text/plain)
2010-01-07 02:48 UTC, Qixiang Wan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Qixiang Wan 2009-12-16 07:49:48 UTC
Description of problem:
After kernel-xen boot up with iommu enabled , VF can not be enabled in Dom0 by loading igb module with max_vfs=n (n=1...7). The device is Intel 82576 Gigabit network card which support SR-IOV. 

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-164.el5 and kernel-xen-2.6.18-164.9.1.el5
xen-3.0.3-94.el5

How reproducible:
100%

Steps to Reproduce:
1. boot up xen host with iommu enabled
2. $ lspci
...
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
...
3. $ modprobe -r igb
4. $ modprobe igb max_vfs=2
5. $ lspci      # get same result as in step 2
...
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
...
6. $ cat /var/log/message  # when remove igb (step 3) and reload igb with max_vfs=2 (step 4)
...
Dec 16 02:44:59 intel-x5550-12-1 kernel: ACPI: PCI interrupt for device 0000:03:00.1 disabled
Dec 16 02:44:59 intel-x5550-12-1 kernel: ACPI: PCI interrupt for device 0000:03:00.0 disabled
Dec 16 02:45:06 intel-x5550-12-1 kernel: Intel(R) Gigabit Ethernet Network Driver - version 1.3.16-k2
Dec 16 02:45:06 intel-x5550-12-1 kernel: Copyright (c) 2007-2009 Intel Corporation.
Dec 16 02:45:06 intel-x5550-12-1 kernel: PCI: Enabling device 0000:03:00.0 (0100 -> 0102)
Dec 16 02:45:06 intel-x5550-12-1 kernel: ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 28 (level, low) -> IRQ 22
Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.0: Intel(R) Gigabit Ethernet Network Connection
Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.0: eth3: (PCIe:2.5Gb/s:Width x4) 00:1b:21:39:8b:18
Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.0: eth3: PBA No: e43709-003
Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
Dec 16 02:45:06 intel-x5550-12-1 kernel: PCI: Enabling device 0000:03:00.1 (0100 -> 0102)
Dec 16 02:45:06 intel-x5550-12-1 kernel: ACPI: PCI Interrupt 0000:03:00.1[B] -> GSI 40 (level, low) -> IRQ 23
Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.1: Intel(R) Gigabit Ethernet Network Connection
Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.1: eth4: (PCIe:2.5Gb/s:Width x4) 00:1b:21:39:8b:19
Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.1: eth4: PBA No: e43709-003
Dec 16 02:45:06 intel-x5550-12-1 kernel: igb 0000:03:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
Dec 16 02:45:07 intel-x5550-12-1 kernel: ADDRCONF(NETDEV_UP): eth3: link is not ready
Dec 16 02:45:07 intel-x5550-12-1 kernel: ADDRCONF(NETDEV_UP): eth4: link is not ready
Dec 16 02:45:09 intel-x5550-12-1 kernel: igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Dec 16 02:45:09 intel-x5550-12-1 kernel: ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready
Dec 16 02:45:09 intel-x5550-12-1 kernel: igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Dec 16 02:45:09 intel-x5550-12-1 kernel: ADDRCONF(NETDEV_CHANGE): eth4: link becomes ready
...

 
Actual results:
VFs cannot be enabled.

Expected results:
VFs should be available after reload the igb module with max_vfs param

Additional info:

$ cat /boot/grub/grub.conf
...
title Red Hat Enterprise Linux Server (2.6.18-164.el5xen)
	root (hd0,0)
	kernel /xen.gz-2.6.18-164.el5 iommu=1
	module /vmlinuz-2.6.18-164.el5xen ro root=/dev/VolGroup01/LogVol00
	module /initrd-2.6.18-164.el5xen.img
...

$ uname -a 
Linux intel-x5550-12-1 2.6.18-164.el5xen #1 SMP Tue Aug 18 15:59:52 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

$ xm dmesg | grep VT-d
(XEN) Intel VT-d has been enabled
(XEN) Intel VT-d snoop control disabled

$ modinfo igb
filename:       /lib/modules/2.6.18-164.el5xen/kernel/drivers/net/igb/igb.ko
version:        1.3.16-k2
license:        GPL
description:    Intel(R) Gigabit Ethernet Network Driver
author:         Intel Corporation, <e1000-devel.net>
srcversion:     78555F0A019E05BADBD95AA
alias:          pci:v00008086d000010D6sv*sd*bc*sc*i*
alias:          pci:v00008086d000010A9sv*sd*bc*sc*i*
alias:          pci:v00008086d000010A7sv*sd*bc*sc*i*
alias:          pci:v00008086d000010E8sv*sd*bc*sc*i*
alias:          pci:v00008086d000010E7sv*sd*bc*sc*i*
alias:          pci:v00008086d000010E6sv*sd*bc*sc*i*
alias:          pci:v00008086d0000150Asv*sd*bc*sc*i*
alias:          pci:v00008086d000010C9sv*sd*bc*sc*i*
depends:        8021q
vermagic:       2.6.18-164.el5xen SMP mod_unload gcc-4.1
parm:           max_vfs:Maximum number of virtual functions to allocate per physical function (uint)
module_sig:	883f3504a8b9b84bd273d74512bb1128dcc09f6de0e11f4701e731966ec2b9e259d8952d91f9009d1750ebf3a120d977468bea8b5bec2118a692e7b

$ xm dmesg
#refer to the attachment

$ dmesg
#refer to the attachment

$ cat /var/log/message
#refer to the attachment

Comment 1 Qixiang Wan 2009-12-16 07:50:44 UTC
Created attachment 378701 [details]
xm dmesg log

Comment 2 Qixiang Wan 2009-12-16 07:51:17 UTC
Created attachment 378702 [details]
/var/log/message

Comment 3 Qixiang Wan 2009-12-16 07:51:58 UTC
Created attachment 378703 [details]
dmesg

Comment 4 Don Dutile (Red Hat) 2009-12-18 21:18:02 UTC
You need to set the following on the *kernel* command line:

pci_pt_e820_access=on

then the PCI_MMCONF space will be available, which is what is needed
to enable the VF's.

If the above works for you, please acknowledge & close the BZ.

Comment 5 Qixiang Wan 2009-12-20 08:37:30 UTC
(In reply to comment #4)
> You need to set the following on the *kernel* command line:
> 
> pci_pt_e820_access=on
> 
> then the PCI_MMCONF space will be available, which is what is needed
> to enable the VF's.
> 
> If the above works for you, please acknowledge & close the BZ.  

I have tried the param, still not work.

title Red Hat Enterprise Linux Server (2.6.18-164.el5xen)
 root (hd0,0)
 kernel /xen.gz-2.6.18-164.el5 iommu=1
 module /vmlinuz-2.6.18-164.el5xen ro root=/dev/VolGroup01/LogVol00 pci_pt_e820_access=on
 module /initrd-2.6.18-164.el5xen.img
...

Comment 6 Don Dutile (Red Hat) 2009-12-21 15:38:39 UTC
Are you 100% sure VTd is enabled in the BIOS?

try iommu=force ;  I believe that will moan if the BIOS setting isn't on.
(it will also enable like iommu=1;  the code essentially enables it to
run if you just add iommu=foobar, and foobar != disable, off, no, false, 0.

Can you also boot the non-xen kernel with intel_iommu=on and add the
dmesg output here as well?

Comment 7 Qixiang Wan 2009-12-21 16:14:33 UTC
(In reply to comment #6)
> Are you 100% sure VTd is enabled in the BIOS?
> 

sure, and can see the following messages:

$ xm dmesg | grep VT-d
(XEN) Intel VT-d has been enabled
(XEN) Intel VT-d snoop control disabled


> try iommu=force ;  I believe that will moan if the BIOS setting isn't on.
> (it will also enable like iommu=1;  the code essentially enables it to
> run if you just add iommu=foobar, and foobar != disable, off, no, false, 0.
> 

tried iommu=force, get same result as 'iommu=1'

> Can you also boot the non-xen kernel with intel_iommu=on and add the
> dmesg output here as well?  

It works well in on-xen kernel, boot with intel_iommu=on and reload igb with max_vfs=7, then the VFs are available, dmesg log is attached.
__________________________________________________________________________
title Red Hat Enterprise Linux Server-base (2.6.18-164.el5)
	root (hd0,0)
	kernel /vmlinuz-2.6.18-164.el5 ro root=/dev/VolGroup01/LogVol00 intel_iommu=on
	initrd /initrd-2.6.18-164.el5.img
__________________________________________________________________________

Comment 8 Qixiang Wan 2009-12-21 16:18:48 UTC
Created attachment 379644 [details]
regular kernel, enable iommu, reload igb module with max_vfs=7

Comment 9 Qixiang Wan 2009-12-21 16:20:27 UTC
Comment on attachment 379644 [details]
regular kernel, enable iommu, reload igb module with max_vfs=7

$  lspci | grep 82576
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
03:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)

Comment 10 Don Dutile (Red Hat) 2009-12-21 16:50:05 UTC
hmm.... in the /etc/xen/xend-config.sxp  add the following at the bottom:

pci-dev-assign-strick-check no

at the bottom of the file.

I didn't think this affected dom0, but maybe it does.

Comment 11 Qixiang Wan 2009-12-21 17:09:36 UTC
(In reply to comment #10)
> hmm.... in the /etc/xen/xend-config.sxp  add the following at the bottom:
> 
> pci-dev-assign-strick-check no
> 
> at the bottom of the file.
> 
> I didn't think this affected dom0, but maybe it does.  

tried, no effect.

Comment 12 Chris Wright 2009-12-21 18:00:36 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > You need to set the following on the *kernel* command line:
> > 
> > pci_pt_e820_access=on
> > 
> > then the PCI_MMCONF space will be available, which is what is needed
> > to enable the VF's.
> > 
> > If the above works for you, please acknowledge & close the BZ.  
> 
> I have tried the param, still not work.
> 
> title Red Hat Enterprise Linux Server (2.6.18-164.el5xen)
>  root (hd0,0)
>  kernel /xen.gz-2.6.18-164.el5 iommu=1
>  module /vmlinuz-2.6.18-164.el5xen ro root=/dev/VolGroup01/LogVol00
> pci_pt_e820_access=on
>  module /initrd-2.6.18-164.el5xen.img
> ...  

You definitely need to have pci_pt_e820_access=on, and in the dom0 dmesg attachment from Comment #3 it is not set.  Can you verify that you see this in the dom0 dmesg when it is set:

PCI: Using MMCONFIG at f0000000

Instead of this:
PCI: Not using MMCONFIG.
PCI: Using configuration type 1

Also, can you give us the lspci -vvv -xxxx output of one of the igb PF's.  Something like:

# lspci -vvv -xxxx -s 03:00.0

Comment 13 Don Dutile (Red Hat) 2009-12-22 00:01:50 UTC
When setting pci_pt_e820_access=on,
we see:
 "PCI: Cannot map mmconfig aperture for segment 0"

which is in arch/x86_64/pci/mmconfig.c

and it means that ioremap is failing.

Don't know ioremap is failing (on -xen kernel, working on bare-metal).

Note, though, that the last device-assignment & sr-iov test I did on an hp-z800 (which is what the console said this test machine is) didn't properly do sr-iov in the bios (on the hp-z800 in rhts in Westford lab).

but the bios would have to if bare-metal -164 works.


So, if you can load the -debug kernels on that system for further testing,
that's help.
I'll try to make some custom kernels to trace through this code as well,
to see if the problem can be narrowed down.

Comment 14 Chris Wright 2009-12-22 01:04:47 UTC
The machine e820 map (Xen, xm dmesg) is:
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 0000000000095800 (usable)
(XEN)  0000000000095800 - 00000000000a0000 (reserved)
(XEN)  00000000000e8000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000cefa5800 (usable)
(XEN)  00000000cefa5800 - 00000000d0000000 (reserved)
(XEN)  00000000f0000000 - 00000000f8000000 (reserved)
(XEN)  00000000fec00000 - 00000000fed40000 (reserved)
(XEN)  00000000fed45000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000330000000 (usable)

The MCFG table shows:
[02Ch 044  8]                 Base Address : 00000000F0000000
[034h 052  2]         Segment Group Number : 0000
[036h 054  1]             Start Bus Number : 00
[037h 055  1]               End Bus Number : 7F
[038h 056  4]                     Reserved : 00000000

Note the End Bus Number.  This means the mmconfig region is 0xf000000-0xf8000000, which is marked reserved in e820.  However, the kernel will ioremap with a hardcoded region that is up to End Bus Number FF (IOW, all possible 256 PCI busses), which makes a region of 0xf0000000-0x100000000 spanning 3 e820 sections.

I suspect this would all work fine if we simply ioremap'd exactly the space the BIOS requested (or test on an i386 install which doesn't ioremap).  Something like this:

--- a/arch/x86_64/pci/mmconfig.c
+++ b/arch/x86_64/pci/mmconfig.c
@@ -165,9 +165,11 @@ void __init pci_mmcfg_init(void)
                return;
        }
        for (i = 0; i < pci_mmcfg_config_num; ++i) {
+               unsigned long mmcfg_aper = pci_mmcfg_config[i].end_bus_number - 
+               mmcfg_aper *= 32 * 8 * 4096;
                pci_mmcfg_virt[i].cfg = &pci_mmcfg_config[i];
                pci_mmcfg_virt[i].virt = ioremap_nocache(pci_mmcfg_config[i].bas
-                                                        MMCONFIG_APER_MAX);
+                                                        mmcfg_aper);
                if (!pci_mmcfg_virt[i].virt) {
                        printk("PCI: Cannot map mmconfig aperture for segment %d
                               pci_mmcfg_config[i].pci_segment_group_number);

Comment 15 Don Dutile (Red Hat) 2010-01-04 20:04:19 UTC
Create rpm's with similar patch to comment #14.
Sorry for delay; I did a brew build before company holiday,
but when I came back, brew had dumped the (scratch) build, so
I had to re-run the build.

see the following location for rpm's:
http://people.redhat.com/~ddutile/rhel5/bz547980/

Please let me know whether VF's are visible with 
the dom0 kernel-xen rpm.

Comment 16 Qixiang Wan 2010-01-05 06:07:52 UTC
After install kernel-xen rpm and boot up the system, VFs are still not visible.

The error 'PCI: Cannot map mmconfig aperture for segment 0' is still exist in Dom0 dmesg.

$ uname -a 
Linux intel-x5550-12-1 2.6.18-164.el5bz547980v1xen #1 SMP Mon Jan 4 11:38:11 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

$ cat /boot/grub/grub.conf
...
title Red Hat Enterprise Linux Server (2.6.18-164.el5bz547980v1xen)
	root (hd0,0)
	kernel /xen.gz-2.6.18-164.el5bz547980v1 iommu=force
	module /vmlinuz-2.6.18-164.el5bz547980v1xen ro root=/dev/VolGroup01/LogVol00 pci_pt_e820_access=on
	module /initrd-2.6.18-164.el5bz547980v1xen.img
...

$ xm dmesg
...
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 0000000000095800 (usable)
(XEN)  0000000000095800 - 00000000000a0000 (reserved)
(XEN)  00000000000e8000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000cefa5800 (usable)
(XEN)  00000000cefa5800 - 00000000d0000000 (reserved)
(XEN)  00000000f0000000 - 00000000f8000000 (reserved)
(XEN)  00000000fec00000 - 00000000fed40000 (reserved)
(XEN)  00000000fed45000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000330000000 (usable)
...

$ lspci -vvv -xxxx -s 03:00.0
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
	Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 23
	Region 0: Memory at e3200000 (32-bit, non-prefetchable) [size=128K]
	Region 1: Memory at e3400000 (32-bit, non-prefetchable) [size=4M]
	Region 2: I/O ports at b000 [disabled] [size=32]
	Region 3: Memory at e3240000 (32-bit, non-prefetchable) [size=16K]
	[virtual] Expansion ROM at e4400000 [disabled] [size=4M]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] MSI-X: Enable+ Mask- TabSize=10
		Vector table: BAR=3 offset=00000000
		PBA: BAR=3 offset=00002000
	Capabilities: [a0] Express Endpoint IRQ 0
		Device: Supported: MaxPayload 512 bytes, PhantFunc 0, ExtTag-
		Device: Latency L0s <512ns, L1 <64us
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
		Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
		Device: MaxPayload 256 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s L1, Port 247
		Link: Latency L0s <4us, L1 <64us
		Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
		Link: Speed 2.5Gb/s, Width x1
00: 86 80 c9 10 06 05 18 00 01 00 00 02 10 00 80 00
10: 00 00 20 e3 00 00 40 e3 01 b0 00 00 00 00 24 e3
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 3c a0
30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 00 00
40: 01 50 23 c8 00 20 00 1a 00 00 00 00 00 00 00 00
50: 05 70 80 01 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 11 a0 09 80 03 00 00 00 03 20 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 10 00 02 00 c2 8c 00 10 30 28 19 00 41 6c 03 f7
b0: 00 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Comment 17 Qixiang Wan 2010-01-05 06:14:13 UTC
Created attachment 381706 [details]
Dom0 dmesg kernel-xen-2.6.18-164.el5bz547980v1.x86_64

Comment 18 Qixiang Wan 2010-01-05 06:14:58 UTC
Created attachment 381707 [details]
xm dmesg kernel-xen-2.6.18-164.el5bz547980v1.x86_64

Comment 20 Don Dutile (Red Hat) 2010-01-05 22:09:42 UTC
/my bad!
I edited a kernel git tree & forgot to commit before doing make srpm & brew build.
please try the following:

http://people.redhat.com/~ddutile/rhel5/bz547980/
     kernel-xen-2.6.18-164.el5bz547980v2.x86_64.rpm

(note: v2 version of patch.... now you know (one) reason why I tag my
       builds with a version number.... ;-) ).

Comment 22 Qixiang Wan 2010-01-06 03:41:07 UTC
(In reply to comment #20)
> /my bad!
> I edited a kernel git tree & forgot to commit before doing make srpm & brew
> build.
> please try the following:
> 
> http://people.redhat.com/~ddutile/rhel5/bz547980/
>      kernel-xen-2.6.18-164.el5bz547980v2.x86_64.rpm
> 
> (note: v2 version of patch.... now you know (one) reason why I tag my
>        builds with a version number.... ;-) ).  

This package works. The VFs can be visible now.

please refer to the host dmesg and xm dmesg logs.

Comment 23 Qixiang Wan 2010-01-06 03:44:03 UTC
Created attachment 381905 [details]
Dom0 dmesg 2.6.18-164.el5bz547980v2xen

Comment 24 Qixiang Wan 2010-01-06 03:45:15 UTC
Created attachment 381906 [details]
xm dmesg 2.6.18-164.el5bz547980v2xen

Comment 25 Don Dutile (Red Hat) 2010-01-06 22:38:36 UTC
so for regression-mgmt reasons, the general concensus is to
implement a platform-level/enabled quirk for this case.

Can the reporter pls send / attach the dmi-decode for this platform
to this bz?

thanks..

Comment 26 Qixiang Wan 2010-01-07 02:48:49 UTC
Created attachment 382137 [details]
output of dmidecode

Comment 27 Don Dutile (Red Hat) 2010-01-28 16:04:30 UTC
A -185 kernel-xen with the POSTED patch can be pulled from this location:

http://people.redhat.com/ddutile/rhel5/bz547980/kernel-xen-2.6.18-185.el5bz547980v4.x86_64.rpm

Comment 28 Don Dutile (Red Hat) 2010-01-29 16:15:23 UTC
This patch is recommended to be backported to the 5.4-z stream.

It appears that this problem is becoming common on platforms
that support Intel virtualization and SRIOV.  Few platforms (BIOS's)
use to support SRIOV, so it wasn't a visible problem; more and more
BIOS updates are including SRIOV support (scanning PCI device's
extended PCI config space & providing mapping space for VF's on 
PCI (physical) devices).  For example, when I first tested an HP z800,
it did not have (BIOS) VF support, so I could not do VF device
assignment (aka, pass-through) testing on it.  This bug was found
on an HP z800, and the one I tested the patch was obviously updated,
since it showed this bug & the patch was confirmed on it, and it's the
same (rhts/beaker) system I couldn't do testing on it 3 months back.

I've been pinged by 3 other bz's for this patch to work around
this issue, to enable them to resolve other virt/VF issues,
but this one stops them from the get-go, before they can debug
the other virt bugs.

So, to avoid a (relatively small) wave/storm of bz's with this problem,
it's prudent to backport to 5.4's zstream to limit customer issues.

Note: The (final) patch was developed to reduce exposure to regression
      on rhel5 virt systems -- only on xen kernels w/pci_pt_e820_access=on
      set, which avoids changing rhel5 behavior on
      (a) bare metal kernel
      (b) xen kernels that are not doing PCI VF device assignment.
    Additionally, the patch adds a kernel param to defeat the bug fix
    if BIOS has VF mapping correct, but ACPI spec of PCI's max bus number
    busted, so there is in-field option to defeat this workaround if 
    some perverse condition occurs that wasn't thought of in this patch.

Comment 34 Don Dutile (Red Hat) 2010-02-11 15:54:52 UTC
*** Bug 563539 has been marked as a duplicate of this bug. ***

Comment 36 Xu Jiajun 2010-03-02 06:47:11 UTC
I test on Westmere-HEDT with RHEL5.5 GA Snapshot 2, the issue is fixed on this platform.

Xen Version:
xen-3.0.3-105.el5

Comment 39 errata-xmlrpc 2010-03-30 06:51:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.