RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 679642 - x3100 can't generate vfs on AMD host
Summary: x3100 can't generate vfs on AMD host
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Alex Williamson
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: Rhel6KvmTier1
TreeView+ depends on / blocked
 
Reported: 2011-02-23 04:10 UTC by Chao Yang
Modified: 2011-03-01 15:54 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-03-01 03:49:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
device message (1.77 KB, text/plain)
2011-02-23 04:10 UTC, Chao Yang
no flags Details
device message after passing func_mode=7 to vxge (2.49 KB, text/plain)
2011-03-01 02:43 UTC, Chao Yang
no flags Details
lspci -vvv info (87.22 KB, text/plain)
2011-03-01 02:44 UTC, Chao Yang
no flags Details
device message after reboot (72.77 KB, text/plain)
2011-03-01 02:45 UTC, Chao Yang
no flags Details
pci tree (5.58 KB, text/plain)
2011-03-01 05:29 UTC, Chao Yang
no flags Details
pci info (78.01 KB, text/plain)
2011-03-01 05:37 UTC, Chao Yang
no flags Details

Description Chao Yang 2011-02-23 04:10:39 UTC
Created attachment 480326 [details]
device message

Description of problem:


Version-Release number of selected component (if applicable):
AMD host & Magny-cours & X3100

# uname -r
2.6.32-118.el6.x86_64
# lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    1
Core(s) per socket:    12
CPU socket(s):         2
NUMA node(s):          4
Vendor ID:             AuthenticAMD
CPU family:            16
Model:                 9
Stepping:              1
CPU MHz:               1900.321
BogoMIPS:              3800.37
Virtualization:        AMD-V
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              5118K
NUMA node0 CPU(s):     0,2,4,6,8,10
NUMA node1 CPU(s):     12,14,16,18,20,22
NUMA node2 CPU(s):     13,15,17,19,21,23
NUMA node3 CPU(s):     1,3,5,7,9,11

How reproducible:
100%

Steps to Reproduce:
1.compile REL_2.0.28.21260_LX_SRC-vxge and install vxge.ko
# modinfo vxge
filename:       /lib/modules/2.6.32-118.el6.x86_64/updates/drivers/net/vxge/vxge.ko
description:    Neterion's X3100 Series 10GbE PCIe I/OVirtualized Server Adapter
version:        2.0.28.21260-p3.0.1.2
license:        Dual BSD/GPL
srcversion:     F0FEAD03AAAD881453187F7
alias:          pci:v000017D5d00005833sv*sd*bc*sc*i*
alias:          pci:v000017D5d00005733sv*sd*bc*sc*i*
depends:        
vermagic:       2.6.32-118.el6.x86_64 SMP mod_unload modversions 
parm:           intr_type:int
parm:           vlan_tag_strip:int
parm:           promisc_en:int
parm:           promisc_all_en:int
parm:           rec_all_vid:int
parm:           max_config_vpath:int
parm:           max_mac_vpath:int
parm:           max_config_dev:int
parm:           func_mode:int
parm:           fw_upgrade:int
parm:           factory_default:int
parm:           port_mode:int
parm:           port_behavior:int
parm:           l2_switch:int
parm:           catch_basin_mode:int
parm:           port_failure:int
parm:           bw:array of int
parm:           tx_bw:array of int
parm:           rx_bw:array of int
parm:           priority:array of int
parm:           napi:int
parm:           lro:int
parm:           rx_steering_type:int
parm:           tx_steering_type:int
parm:           tx_pause_enable:int
parm:           rx_pause_enable:int
parm:           exec_mode:int
parm:           intr_adapt:int
parm:           udp_stream:int

2.generate vfs by:
# modprobe -r vxge;modprobe vxge func_mode=2
func_mode:
         Changes the PCI function mode.
         0  - SF1_VP17 (1 function, 17 Vpaths)
         1  - MF8_VP2  (8 functions, 2 Vpaths each)
         2  - SR17_VP1 (17 VFs with 1 Vpath each)
         3  - MR17_VP1 (17 Virtual Hierarchies, 1 Vpath/Function/Hierarchy)
         4  - MR8_VP2  (8 Virtual Hierarchies, 2 Vpath/Function/Hierarchy)
         5  - MF17_VP1 (17 functions, 1 vpath each (PCIe ARI))
         6  - SR8_VP2  (1PF, 7VF, 2 Vpaths each)
         7  - SR4_VP4  (1PF, 3VF, 4 Vpaths each)
         8  - MF2_VP8  (2 functions, 8 Vpaths each)
         9  - MF4_VP4  (4 Functions, 4 Vpaths each)
         10 - MR4_VP4  (4 Virtual Hierarchies, 4 Vpaths/Function/Hierarchy)

  
Actual results:
fail to generate any vf.

Expected results:
x3100 can work with the mode I give.

Additional info:
# lspci -vvv -t
-+-[0000:20]-+-00.0  ATI Technologies Inc RD890 Northbridge only dual slot (2x8) PCI-e GFX Hydra part
 |           +-00.2  ATI Technologies Inc Device 5a23
 |           +-02.0-[21]--
 |           +-03.0-[22]--
 |           \-0b.0-[23]--
 \-[0000:00]-+-00.0  ATI Technologies Inc RD890 PCI to PCI bridge (external gfx0 port A)
             +-00.2  ATI Technologies Inc Device 5a23
             +-02.0-[01]--+-00.0  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
             |            \-00.1  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
             +-03.0-[02]--+-00.0  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
             |            \-00.1  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
             +-04.0-[03-08]----00.0-[04-08]--+-00.0-[05]----00.0  LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
             |                               +-01.0-[06]--
             |                               +-04.0-[07]--
             |                               \-05.0-[08]--
             +-09.0-[09]----00.0  Exar Corp. X3100 Series 10 Gigabit Ethernet PCIe    (X3100 is here) <<<<<<<-------------------------------------------------



# lspci -vvv -s 00:09.0|grep -i ari
        BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ ARIFwd+
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd+

# dmesg |grep -i sriov
eth4: SRIOV 17 - 17 VF, 1 vpath per VF Enabled
# dmesg |grep -i vxge
vxge 0000:09:00.0: eth50: Link Down
vxge 0000:09:00.0: PCI INT A disabled
vxge: Unknown parameter `func_mode'
vxge: Copyright(c) 2002-2010 Exar Inc.
vxge: Driver version: 2.0.28.21260-p3.0.1.2
vxge 0000:09:00.0: PCI INT A -> Link[LN48] -> GSI 48 (level, high) -> IRQ 48
vxge 0000:09:00.0: setting latency timer to 64
vxge 0000:09:00.0: not enough MMIO resources for SR-IOV

Comment 3 Chao Yang 2011-02-23 08:13:08 UTC
Additional info:
this x3100 card can generate vf in west-mere platform and 82576 card works fine in this AMD host.

Comment 5 Alex Williamson 2011-02-28 14:00:26 UTC
(In reply to comment #0)
> # modprobe -r vxge;modprobe vxge func_mode=2
> func_mode:
>          Changes the PCI function mode.
>          0  - SF1_VP17 (1 function, 17 Vpaths)
>          1  - MF8_VP2  (8 functions, 2 Vpaths each)
>          2  - SR17_VP1 (17 VFs with 1 Vpath each)
>          3  - MR17_VP1 (17 Virtual Hierarchies, 1 Vpath/Function/Hierarchy)
>          4  - MR8_VP2  (8 Virtual Hierarchies, 2 Vpath/Function/Hierarchy)
>          5  - MF17_VP1 (17 functions, 1 vpath each (PCIe ARI))
>          6  - SR8_VP2  (1PF, 7VF, 2 Vpaths each)
>          7  - SR4_VP4  (1PF, 3VF, 4 Vpaths each)
>          8  - MF2_VP8  (2 functions, 8 Vpaths each)
>          9  - MF4_VP4  (4 Functions, 4 Vpaths each)
>          10 - MR4_VP4  (4 Virtual Hierarchies, 4 Vpaths/Function/Hierarchy)
> 
> 
> Actual results:
> fail to generate any vf.
> 
> Expected results:
> x3100 can work with the mode I give.

In reality, I find that the "bigger" modes this card supports only work on very few systems because each VF requires non-trivial MMIO, and the BIOS typically does not open the bridge aperture wide enough.

> # dmesg |grep -i sriov
> eth4: SRIOV 17 - 17 VF, 1 vpath per VF Enabled
> # dmesg |grep -i vxge
> vxge 0000:09:00.0: eth50: Link Down
> vxge 0000:09:00.0: PCI INT A disabled
> vxge: Unknown parameter `func_mode'
> vxge: Copyright(c) 2002-2010 Exar Inc.
> vxge: Driver version: 2.0.28.21260-p3.0.1.2
> vxge 0000:09:00.0: PCI INT A -> Link[LN48] -> GSI 48 (level, high) -> IRQ 48
> vxge 0000:09:00.0: setting latency timer to 64
> vxge 0000:09:00.0: not enough MMIO resources for SR-IOV

This is the indicator for that occurring.  This may work in other systems that have better sriov support in the bios and it may work in this system by configuring a mode with fewer VFs.  Please provide full lspci -vvv for the system so we can see the bridge apertures.  Please also test with func_mode=7 to reduce the number of VFs generated.  This appears to be a platform BIOS issue.

Comment 6 Don Dutile (Red Hat) 2011-02-28 14:51:40 UTC
(In reply to comment #3)
> Additional info:
> this x3100 card can generate vf in west-mere platform and 82576 card works fine
> in this AMD host.

westmere platform has "strong" BIOS for SRIOV devices, i.e.,g can make PCI bridge windows large enough for big-mem-SRIOV-VF devices.  82576 VFs use small amt
of mem-mapped space, and often 'squeeze' into the left over space in a PCI bridge (which is required to map on multiples of 1MB).

On the other hand, we've had numerous AMD boxes that don't support SRIOV well at all, so as Alex stated in previous comment, this looks like a BIOS issue with your AMD box.

Comment 7 Chao Yang 2011-03-01 02:42:26 UTC
(In reply to comment #5)
>  This may work in other systems that
> have better sriov support in the bios and it may work in this system by
> configuring a mode with fewer VFs.  Please provide full lspci -vvv for the
> system so we can see the bridge apertures.  Please also test with func_mode=7
> to reduce the number of VFs generated.  This appears to be a platform BIOS
> issue.
I think failed with func_mode=7, after reboot host still cannot see vf, will attach dmesg and lspci -vvv.

# modprobe -r vxge;modprobe vxge func_mode=7
# lspci|grep Eth
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
09:00.0 Ethernet controller: Exar Corp. X3100 Series 10 Gigabit Ethernet PCIe (rev 02)

Comment 8 Chao Yang 2011-03-01 02:43:54 UTC
Created attachment 481484 [details]
device message after passing func_mode=7 to vxge

Comment 9 Chao Yang 2011-03-01 02:44:32 UTC
Created attachment 481485 [details]
lspci -vvv info

Comment 10 Chao Yang 2011-03-01 02:45:24 UTC
Created attachment 481486 [details]
device message after reboot

Comment 11 Alex Williamson 2011-03-01 03:19:12 UTC
(In reply to comment #8)
> Created attachment 481484 [details]
> device message after passing func_mode=7 to vxge

Most of this log is for mode 2, it doesn't seem to include anything after reboot for mode 7.

Comment 12 Alex Williamson 2011-03-01 03:20:39 UTC
(In reply to comment #11)
> (In reply to comment #8)
> > Created attachment 481484 [details]
> > device message after passing func_mode=7 to vxge
> 
> Most of this log is for mode 2, it doesn't seem to include anything after
> reboot for mode 7.

Apologies, I see the new dmesg in a later attachment.

Comment 13 Alex Williamson 2011-03-01 03:49:51 UTC
It seems pretty clear that this BIOS isn't even attempting to open the bridge apertures for sr-iov devices.  The parent bridge is this:

00:09.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port H) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=09, subordinate=09, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: ef300000-ef3fffff
	Prefetchable memory behind bridge: 00000000e4800000-00000000e57fffff

So we have 1MB of MMIO and 16MB of prefetchable MMIO.  The PF uses:

09:00.0 Ethernet controller: Exar Corp. X3100 Series 10 Gigabit Ethernet PCIe (rev 02)
	Subsystem: Exar Corp. X3120 Dual Port 10GBase-CR
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 48
	Region 0: Memory at e4800000 (64-bit, prefetchable) [size=8M]
	Region 2: Memory at e57fc000 (64-bit, prefetchable) [size=8K]
	Region 4: Memory at e57fe000 (64-bit, prefetchable) [size=8K]
	Expansion ROM at ef380000 [disabled] [size=512K]

8M + 16k of prefetchable MMIO, so the bridge has the minimum aperture for just the PF, and it would be pure luck if the VFs had room here.

Each of the VFs requires 3 prefetchables ranges:

	Capabilities: [170] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy+
		IOVSta:	Migration-
		Initial VFs: 16, Total VFs: 16, Number of VFs: 16, Function Dependency Link: 00
		VF offset: 1, stride: 1, Device ID: 5833
		Supported Page Size: 000007ff, System Page Size: 00000001
		Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
		Region 2: Memory at 00000000e5000000 (64-bit, prefetchable)
		Region 4: Memory at 00000000e5020000 (64-bit, prefetchable)
		VF Migration: offset: 00000000, BIR: 0

We can see that BAR0 didn't get mapped, and BARs 2 & 4 aren't even within the bridge aperture.  We could use setpci to figure out the size of these:

setpci -s 09:00.0 194.l
<report address, as above>
setpci -s 09:00.0 194.l=ffffffff
setpci -s 09:00.0 194.l
<report size mask>
setpci -s 09:00.0 194.l=<restore value reported above>

Repeat for offsets 19c and 1a4.  I don't think this is really necessary though since it's pretty obvious that this BIOS isn't leaving room for the VFs.  Testing needs to be done on an AMD based system with sufficient SR-IOV support in the BIOS.

Comment 14 Alex Williamson 2011-03-01 04:03:39 UTC
(In reply to comment #13)
> We can see that BAR0 didn't get mapped, and BARs 2 & 4 aren't even within the
> bridge aperture.

Correction, BARs 2 & 4 do fit in the bridge aperture.  IIRC, each VF has the same resource requirements as the PF, 8MB + 8k + 8k.  The smaller BARs fit; 16 8k BARs in the 128k from e5000000 - e501ffff and 16 8k BARs from e5020000 - e503ffff.  The 16 8M BAR0s would require 128MB on their own.  With the smaller VF BARs and the PF BARs and PCI alignment, the BIOS would need to open the prefetchable aperture on 00:09.0 to 256MB.

Comment 15 Chao Yang 2011-03-01 04:58:43 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > We can see that BAR0 didn't get mapped, and BARs 2 & 4 aren't even within the
> > bridge aperture.
> 
> Correction, BARs 2 & 4 do fit in the bridge aperture.  IIRC, each VF has the
> same resource requirements as the PF, 8MB + 8k + 8k.  The smaller BARs fit; 16
> 8k BARs in the 128k from e5000000 - e501ffff and 16 8k BARs from e5020000 -
> e503ffff.  The 16 8M BAR0s would require 128MB on their own.  With the smaller
> VF BARs and the PF BARs and PCI alignment, the BIOS would need to open the
> prefetchable aperture on 00:09.0 to 256MB.

Alex,
 We already know 82576 works fine on this AMD host, so my question is what's the difference between 82576 and x3100? I mean how to determine whether a BIOS will give a sufficient support to SRIOV capability nic card? Your answer will really help us a lot, thank you in advance!

Comment 16 Chao Yang 2011-03-01 05:27:32 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > We can see that BAR0 didn't get mapped, and BARs 2 & 4 aren't even within the
> > > bridge aperture.
> > 
> > Correction, BARs 2 & 4 do fit in the bridge aperture.  IIRC, each VF has the
> > same resource requirements as the PF, 8MB + 8k + 8k.  The smaller BARs fit; 16
> > 8k BARs in the 128k from e5000000 - e501ffff and 16 8k BARs from e5020000 -
> > e503ffff.  The 16 8M BAR0s would require 128MB on their own.  With the smaller
> > VF BARs and the PF BARs and PCI alignment, the BIOS would need to open the
> > prefetchable aperture on 00:09.0 to 256MB.
Alex,
 As your comment says above, the BIOS would need to open the prefetchable aperture on 00:09.0 to 256MB, but on another machine, x3100 can generate VFs successfully, its prefetchable memory is only 00000000e0000000-00000000e08fffff, so I am getting confused, could you please explain? will attach lspci -vvv info

Comment 17 Chao Yang 2011-03-01 05:29:57 UTC
Created attachment 481515 [details]
pci tree

# lspci -vvv -s 00:01.0
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: e4400000-ec7fffff
	Prefetchable memory behind bridge: 00000000e0000000-00000000e08fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: Hewlett-Packard Company Device 130a
	Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit-
		Address: fee00000  Data: 4061
		Masking: 00000002  Pending: 00000000
	Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ RBE+ FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <512ns, L1 <64us
			ClockPM- Surprise+ LLActRep+ BwNot+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
			Slot #  0, PowerLimit 0.000000; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd Off, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range BCD, TimeoutDis+ ARIFwd+
		DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis- ARIFwd+
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Capabilities: [e0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [150] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
	Capabilities: [160] Vendor Specific Information <?>
	Kernel driver in use: pcieport
	Kernel modules: shpchp

Comment 18 Chao Yang 2011-03-01 05:37:09 UTC
Created attachment 481516 [details]
pci info

Comment 19 Alex Williamson 2011-03-01 05:54:50 UTC
(In reply to comment #15)
> 
> Alex,
>  We already know 82576 works fine on this AMD host, so my question is what's
> the difference between 82576 and x3100? I mean how to determine whether a BIOS
> will give a sufficient support to SRIOV capability nic card? Your answer will
> really help us a lot, thank you in advance!

It's simply a matter of the resource requirements.  An 82576 PF has 2 non-prefetchable BARs, 128k & 16k.  For a typical dual-port 82576, factoring PCI alignment, the BIOS needs to program the bridge aperture to at least 512k.  Each VF for the 82576 requires 2 16k BARs.  For a dual port, 82576 w/ 7 VFs per PF, that's 128k * 2 + 16k * 2 + 16k * 7 + 16k * 7 = 512k.

So, all the VFs for a dual port card will fit into the extra space left over by the alignment requirements for the PF, and it should work even if the BIOS has no SR-IOV support.  Note that the PCI spec actually requires a minimum granularity of 1M for prefetchable and non-prefetchable apertures, so there's actually more than enough space.

The only resource contention I can imagine in setting up the VFs for an 82576 would be if the device shares a bus with other devices, which might infringe on the extra space.  Perhaps you could see this on a system where the 82576 is an integrated device.

The x3100, on the other hand, needs 16x the minimum alignment of the PF to support all of the VFs.  The BIOS must support SR-IOV to enable this device.  I suspect the massive resource requirements play a part in why Exar chose to support MF modes, which are supported by non-SR-IOV aware BIOSes.

Comment 20 Alex Williamson 2011-03-01 06:24:34 UTC
(In reply to comment #17)
> 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root
> Port 1 (rev 13) (prog-if 00 [Normal decode])
>  Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
>  I/O behind bridge: 0000f000-00000fff
>  Memory behind bridge: e4400000-ec7fffff
>  Prefetchable memory behind bridge: 00000000e0000000-00000000e08fffff

03:00.0 Ethernet controller: Exar Corp. X3100 Series 10 Gigabit Ethernet PCIe (rev 02)
	Subsystem: Exar Corp. X3120 Dual Port 10GBase-CR
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 28
	Region 0: Memory at e0000000 (64-bit, prefetchable) [size=8M]
	Region 2: Memory at e0800000 (64-bit, prefetchable) [size=8K]
	Region 4: Memory at e0802000 (64-bit, prefetchable) [size=8K]
	Capabilities: [170] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
		IOVSta:	Migration-
		Initial VFs: 16, Total VFs: 16, Number of VFs: 16, Function Dependency Link: 00
		VF offset: 1, stride: 1, Device ID: 5833
		Supported Page Size: 000007ff, System Page Size: 00000001
		Region 0: Memory at 00000000e4800000 (64-bit, prefetchable)
                                            ^^^^^^^^
		Region 2: Memory at 00000000e0804000 (64-bit, prefetchable)
		Region 4: Memory at 00000000e0824000 (64-bit, prefetchable)
		VF Migration: offset: 00000000, BIR: 0

In this case, note where the VF Region 0 memory is allocated.  This comes from the non-prefetchable memory range of the bridge (note it's valid to use non-prefetchable bridge ranges for prefetchable device ranges, but not the reverse).  So in this configuration, the 8MB VF BARs come out of the range
e4800000 - ec800000 and the 8k BARs are all allocated out of the prefetchable range of the bridge.  I'm not sure why the BIOS opened an extra 4MB of non-prefetchable aperture.

Also, in coming up with 256MB, I was assuming normal PCI natural alignment for resources.  Bridges actually have 1MB granularity, which doesn't need to be naturally aligned (as highlighted by this 132MB range above).  So for VFs, we actually need 16 * 8k + 16 * 8k + 16 * 8M and the PF needs 8k + 8k + 8M, which can all fit in 137MB.  The above does it as 9MB of prefetchable + 128MB of non-prefetchable (+ 4MB unallocated under the bridge).

Comment 21 Chao Yang 2011-03-01 06:47:35 UTC
(In reply to comment #20)
> (In reply to comment #17)
> > 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root
> > Port 1 (rev 13) (prog-if 00 [Normal decode])
> >  Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
> >  I/O behind bridge: 0000f000-00000fff
> >  Memory behind bridge: e4400000-ec7fffff
> >  Prefetchable memory behind bridge: 00000000e0000000-00000000e08fffff
> 
> 03:00.0 Ethernet controller: Exar Corp. X3100 Series 10 Gigabit Ethernet PCIe
> (rev 02)
>  Subsystem: Exar Corp. X3120 Dual Port 10GBase-CR
>  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping-
> SERR+ FastB2B- DisINTx+
>  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
>  Latency: 0, Cache Line Size: 64 bytes
>  Interrupt: pin A routed to IRQ 28
>  Region 0: Memory at e0000000 (64-bit, prefetchable) [size=8M]
>  Region 2: Memory at e0800000 (64-bit, prefetchable) [size=8K]
>  Region 4: Memory at e0802000 (64-bit, prefetchable) [size=8K]
>  Capabilities: [170] Single Root I/O Virtualization (SR-IOV)
>   IOVCap: Migration-, Interrupt Message Number: 000
>   IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
>   IOVSta: Migration-
>   Initial VFs: 16, Total VFs: 16, Number of VFs: 16, Function Dependency Link:
> 00
>   VF offset: 1, stride: 1, Device ID: 5833
>   Supported Page Size: 000007ff, System Page Size: 00000001
>   Region 0: Memory at 00000000e4800000 (64-bit, prefetchable)
>                                             ^^^^^^^^
>   Region 2: Memory at 00000000e0804000 (64-bit, prefetchable)
>   Region 4: Memory at 00000000e0824000 (64-bit, prefetchable)
>   VF Migration: offset: 00000000, BIR: 0
> 
> In this case, note where the VF Region 0 memory is allocated.  This comes from
> the non-prefetchable memory range of the bridge (note it's valid to use
> non-prefetchable bridge ranges for prefetchable device ranges, but not the
> reverse).  So in this configuration, the 8MB VF BARs come out of the range
> e4800000 - ec800000 and the 8k BARs are all allocated out of the prefetchable
> range of the bridge.  I'm not sure why the BIOS opened an extra 4MB of
> non-prefetchable aperture.
> 
> Also, in coming up with 256MB, I was assuming normal PCI natural alignment for
> resources.  Bridges actually have 1MB granularity, which doesn't need to be
> naturally aligned (as highlighted by this 132MB range above).  So for VFs, we
> actually need 16 * 8k + 16 * 8k + 16 * 8M and the PF needs 8k + 8k + 8M, which
> can all fit in 137MB.  The above does it as 9MB of prefetchable + 128MB of
> non-prefetchable (+ 4MB unallocated under the bridge).

Alex

Thanks a lot,very useful.
BTW,based on your comments,we still have a problem,we can not make sure host whether or not support sr-iov before we real use it,even more,maybe we need to calculate prefetchable momory.would you please give me a more specific suggestion about sr-iov HW prerequisites?thanks again.

Best Regards,
Chayang

Comment 22 Alex Williamson 2011-03-01 15:54:43 UTC
(In reply to comment #21)
> BTW,based on your comments,we still have a problem,we can not make sure host
> whether or not support sr-iov before we real use it,even more,maybe we need to
> calculate prefetchable momory.would you please give me a more specific
> suggestion about sr-iov HW prerequisites?thanks again.

Don is probably better able to comment on the specific hardware requirements as far as things like ARI/ACS in the chipset.  Unfortunately on the BIOS side, there seems to be little we can do other than try it and use analysis like above to verify that if it doesn't work, it's because the BIOS isn't mapping sufficient resources.  Ideally we can also let the hardware vendors know about these problems.  It might be a good goal to make something like biosbits.org specifically test for these kinds of issues.


Note You need to log in before you can comment on or make changes to this bug.