Bug 544366 - KVM PCI device assignment has broken PCI config space emulation
Summary: KVM PCI device assignment has broken PCI config space emulation
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 5.7
Assignee: Don Dutile (Red Hat)
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 507711 Rhel5KvmTier2 618260 624790 635660 643901
TreeView+ depends on / blocked
 
Reported: 2009-12-04 18:09 UTC by Chris Wright
Modified: 2013-01-09 22:05 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 643901 (view as bug list)
Environment:
Last Closed: 2011-07-28 21:25:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Chris Wright 2009-12-04 18:09:16 UTC
Description of problem:

When assigning a PCI device to a guest the device dependent PCI config space emulation is broken.  KVM will hardcode offsets to PCI capabilites for MSI and MSI-X.  This may conflict with existing capabilities or device registers causing the guest OS device driver to fail.

Version-Release number of selected component (if applicable):

>= kvm-83-105.el5

How reproducible:

This is 100% reproducible for devices that place capabilities or registers in the area where KVM puts MSI and MSI-X capabilities.  So reproducible, but device dependent.

Steps to Reproduce:
1. launch guest with assigned PCI device
2. start device driver in guest
3. driver fails to initialize

Comment 2 Bob Sibley 2010-03-16 14:06:28 UTC
I have upgraded to the BIOS that AMD says supports IOMMU.

There is still an issue with 10G Neterion Inc. X3100 Series cards, with the following errors:
Mar 16 09:47:42 perf31 kernel: PCI: Failed to allocate mem resource #12:8000000@d0000000 for 0000:03:00.0
Mar 16 09:47:42 perf31 kernel: vxge 0000:03:00.0: not enough MMIO resources for SR-IOV

This is known good module, SR-IOV works on an Intel able to create VFs.
 
The 1G KUWALA cards still are functional and able create VFs.

/var/log/messages:

Mar 16 09:47:12 perf31 kernel: AMD IOMMU: Using protection domain 24 for device 03:00.0
Mar 16 09:47:12 perf31 kernel: ACPI: PCI interrupt for device 0000:03:00.0 disabled
Mar 16 09:47:42 perf31 kernel: vxge: Copyright(c) 2002-2009 Neterion Inc
Mar 16 09:47:42 perf31 kernel: vxge: Driver version: 2.0.6.18937-k
Mar 16 09:47:42 perf31 kernel: PCI: Enabling device 0000:03:00.0 (0140 -> 0142)
Mar 16 09:47:42 perf31 kernel: ACPI: PCI Interrupt 0000:03:00.0[A] -> Link [LN32] -> GSI 32 (level, high) -> IRQ 138
Mar 16 09:47:42 perf31 kernel: PCI: Failed to allocate mem resource #12:8000000@d0000000 for 0000:03:00.0
Mar 16 09:47:42 perf31 kernel: vxge 0000:03:00.0: not enough MMIO resources for SR-IOV
Mar 16 09:47:42 perf31 kernel: eth0: SERIAL NUMBER: SXC0949004
Mar 16 09:47:42 perf31 kernel: eth0: PART NUMBER: X3110SR0001
Mar 16 09:47:42 perf31 kernel: eth0: Neterion X3110 Single-Port SR 10GbE Server Adapter
Mar 16 09:47:42 perf31 kernel: eth0: MAC ADDR: 00:0C:FC:01:00:C7
Mar 16 09:47:42 perf31 kernel: eth0: Link Width x8
Mar 16 09:47:42 perf31 kernel: eth0: Firmware version : 1.4.4 Date : 09/17/2009
Mar 16 09:47:42 perf31 kernel: eth0: Single Root IOV Mode Enabled
Mar 16 09:47:42 perf31 kernel: eth0: 1 Vpath(s) opened
Mar 16 09:47:42 perf31 kernel: eth0: Interrupt type MSI-X
Mar 16 09:47:42 perf31 kernel: eth0: RTH steering enabled for TCP_IPV4
Mar 16 09:47:42 perf31 kernel: eth0: Tx port steering enabled
Mar 16 09:47:42 perf31 kernel: eth0: Generic receive offload enabled
Mar 16 09:47:42 perf31 kernel: eth0: Rx doorbell mode enabled
Mar 16 09:47:42 perf31 kernel: eth0: VLAN tag stripping Enabled
Mar 16 09:47:42 perf31 kernel: eth0: Ring blocks : 2
Mar 16 09:47:42 perf31 kernel: eth0: Fifo blocks : 14
Mar 16 09:47:42 perf31 kernel: eth0: MTU is 1500


./lspci
03:00.0 Class 0200: Device 17d5:5833 (rev 02)
	Subsystem: Device 17d5:6030
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 138
	Region 0: Memory at ce800000 (64-bit, prefetchable) [size=8M]
	Region 2: Memory at ce201000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at ce200000 (64-bit, prefetchable) [size=256]
	[virtual] Expansion ROM at ce280000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [70] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <2us, L1 <2us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Latency L0 <256ns, L1 <4us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [a0] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=2 offset=00000000
		PBA: BAR=2 offset=00000800
	Capabilities: [c0] Vital Product Data
		Not readable
	Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [110] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [150] Power Budgeting <?>
	Capabilities: [160] Device Serial Number 00-0c-fc-00-00-01-00-c7
	Capabilities: [170] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy+
		IOVSta:	Migration-
		Initial VFs: 16, Total VFs: 16, Number of VFs: 1, Function Dependency Link: 00
		VF offset: 1, stride: 1, Device ID: 5833
		Supported Page Size: 000007ff, System Page Size: 00000001
		Region 0: Memory at 0000000000000000 (64-bit, prefetchable)
		Region 2: Memory at 0000000000000000 (64-bit, prefetchable)
		Region 4: Memory at 0000000000000000 (64-bit, prefetchable)
		VF Migration: offset: 00000000, BIR: 0
	Capabilities: [1b0] Vendor Specific Information <?>
	Kernel driver in use: vxge
	Kernel modules: vxge



cat /proc/iomem
00010000-0009d7ff : System RAM
0009d800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000cafff : Video ROM
000cb000-000cbfff : Adapter ROM
000d1800-000d19ff : Adapter ROM
000f0000-000fffff : System ROM
00100000-c7e7ffff : System RAM
  00200000-00484429 : Kernel code
  0048442a-005c9b8f : Kernel data
  08000000-0bffffff : GART
c7e80000-c7e8afff : ACPI Tables
c7e8b000-c7e8cfff : ACPI Non-volatile Storage
c7e8d000-c7ffffff : reserved
c8004000-c8007fff : amd_iommu
c8024000-c80243ff : 0000:00:11.0
  c8024000-c80243ff : ahci
c8024400-c80244ff : 0000:00:12.2
  c8024400-c80244ff : ehci_hcd
c8024800-c80248ff : 0000:00:13.2
  c8024800-c80248ff : ehci_hcd
c8025000-c8025fff : 0000:00:12.0
  c8025000-c8025fff : ohci_hcd
c8026000-c8026fff : 0000:00:12.1
  c8026000-c8026fff : ohci_hcd
c8027000-c8027fff : 0000:00:13.0
  c8027000-c8027fff : ohci_hcd
c8028000-c8028fff : 0000:00:13.1
  c8028000-c8028fff : ohci_hcd
c8029000-c8029fff : 0000:00:14.5
  c8029000-c8029fff : ohci_hcd
c8100000-c8bfffff : PCI Bus #01
  c8100000-c8103fff : 0000:01:00.0
    c8100000-c8103fff : igb
  c8104000-c8107fff : 0000:01:00.1
    c8104000-c8107fff : igb
  c8120000-c813ffff : 0000:01:00.0
    c8120000-c813ffff : igb
  c8140000-c815ffff : 0000:01:00.1
    c8140000-c815ffff : igb
  c8160000-c817ffff : 0000:01:00.0
    c8160000-c8163fff : 0000:01:10.0
      c8160000-c8163fff : kvm_assigned_device
    c8164000-c8167fff : 0000:01:10.2
      c8164000-c8167fff : igbvf
  c8180000-c819ffff : 0000:01:00.0
    c8180000-c8183fff : 0000:01:10.0
      c8180000-c8183fff : kvm_assigned_device
    c8184000-c8187fff : 0000:01:10.2
      c8184000-c8187fff : igbvf
  c81a0000-c81bffff : 0000:01:00.1
    c81a0000-c81a3fff : 0000:01:10.1
      c81a0000-c81a3fff : igbvf
    c81a4000-c81a7fff : 0000:01:10.3
      c81a4000-c81a7fff : igbvf
  c81c0000-c81dffff : 0000:01:00.1
    c81c0000-c81c3fff : 0000:01:10.1
      c81c0000-c81c3fff : igbvf
    c81c4000-c81c7fff : 0000:01:10.3
      c81c4000-c81c7fff : igbvf
  c8400000-c87fffff : 0000:01:00.0
    c8400000-c87fffff : igb
  c8800000-c8bfffff : 0000:01:00.1
    c8800000-c8bfffff : igb
ca000000-cdffffff : PCI Bus #02
  ca000000-cbffffff : 0000:02:00.0
    ca000000-cbffffff : bnx2
  cc000000-cdffffff : 0000:02:00.1
    cc000000-cdffffff : bnx2
ce000000-ce0fffff : PCI Bus #04
  ce000000-ce003fff : 0000:04:00.0
    ce000000-ce003fff : qla2xxx
  ce004000-ce007fff : 0000:04:00.1
    ce004000-ce007fff : qla2xxx
ce100000-ce1fffff : PCI Bus #05
  ce100000-ce10ffff : 0000:05:06.0
  ce120000-ce13ffff : 0000:05:06.0
ce200000-ceffffff : PCI Bus #03
  ce200000-ce2000ff : 0000:03:00.0
    ce200000-ce2000ff : vxge
  ce201000-ce201fff : 0000:03:00.0
    ce201000-ce201fff : vxge
  ce280000-ce2fffff : 0000:03:00.0
  ce800000-ceffffff : 0000:03:00.0
    ce800000-ceffffff : vxge
cf000000-cf7fffff : PCI Bus #01
  cf000000-cf3fffff : 0000:01:00.0
  cf400000-cf7fffff : 0000:01:00.1
cf800000-cf8fffff : PCI Bus #02
  cf800000-cf81ffff : 0000:02:00.0
  cf820000-cf83ffff : 0000:02:00.1
cf900000-cf9fffff : PCI Bus #03
cfa00000-cfafffff : PCI Bus #04
  cfa00000-cfa3ffff : 0000:04:00.0
  cfa40000-cfa7ffff : 0000:04:00.1
d0000000-d7ffffff : PCI Bus #05
  d0000000-d7ffffff : 0000:05:06.0
e0000000-efffffff : reserved
fec00000-fec0ffff : reserved
fee00000-fee00fff : reserved
fff00000-ffffffff : reserved
100000000-837ffffff : System RAM


BIOS version:

# dmidecode 2.10
SMBIOS 2.5 present.
56 structures occupying 1852 bytes.
Table at 0xC7EDA000.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
	Vendor: Phoenix Technologies Ltd.
	Version: PDNAX1-A
	Release Date: 01/26/2010
	Address: 0xE3230
	Runtime Size: 118224 bytes


2.6.18-191.el5 #1 SMP Mon Mar 1 15:59:02 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

kmod-kvm-83-160.el5
kvm-83-160.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-qemu-img-83-160.el5

bob

Comment 4 Larry Troan 2010-10-18 13:12:43 UTC
Will clone this bug for RHEL6.... No bug open at this time but Exar (formerly Neterion) indicating failure on the latest RHEL6 build.

Comment 7 Larry Troan 2010-11-16 13:29:44 UTC
Don, is there an update on this for 5.6 based on your discussion with Alex? 
If not, we need to push this to 5.7.

Comment 8 Don Dutile (Red Hat) 2010-11-16 14:39:43 UTC
(In reply to comment #7)
> Don, is there an update on this for 5.6 based on your discussion with Alex? 
> If not, we need to push this to 5.7.

This will need to be pushed to 5.7.

Alex is working on a solution for upstream.  Once that's completed and accepted upstream, then we can work on backporting that support back to 5.7 (& rhel6.x).
It's fairly invasive (requires adding vfio and qemu interfaces to it), so it's going to be a while for the solution to settle upstream and we have enough testing that it's ready for enterprise use.

I added Alex to the cc: list so he can directly comment if he wants to add more.

Comment 9 Larry Troan 2010-12-06 17:31:50 UTC
Any updated status on this bug? Will it make 5.6?

Comment 13 Larry Troan 2011-01-11 16:18:46 UTC
Pushed to 5.7


Note You need to log in before you can comment on or make changes to this bug.