Bug 514458 - [RHEL5.4 Xen]: xm pci-list-assignable-devices doesn't show all available devices
Summary: [RHEL5.4 Xen]: xm pci-list-assignable-devices doesn't show all available devices
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Dexuan Cui
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-29 07:59 UTC by Chris Lalancette
Modified: 2010-07-19 13:43 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-15 06:47:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Output of lspci -tv (2.38 KB, text/plain)
2009-09-14 07:07 UTC, Chris Lalancette
no flags Details
Output of lspci -vvvv -xxxx (67.45 KB, text/plain)
2009-09-14 07:07 UTC, Chris Lalancette
no flags Details

Description Chris Lalancette 2009-07-29 07:59:06 UTC
Description of problem:
My machine is an AMD Barcelona machine, with the following output from lspci:

00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:06.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
01:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
03:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
03:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
06:00.0 VGA compatible controller: ATI Technologies Inc RV610 video device [Radeon HD 2400 PRO]
06:00.1 Audio device: ATI Technologies Inc RV610 audio device [Radeon HD 2400 PRO]
80:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
80:01.0 RAM memory: nVidia Corporation MCP55 LPC Bridge (rev a3)
80:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
80:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
80:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
80:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)

While testing the PCI passthrough capabilities of Xen, I decided to passthrough 2 of these devices to my paravirtualized RHEL-5 guest.  Towards that end, I did:

# modprobe pciback
# echo -n 0000:00:09.0 > /sys/bus/pci/drivers/forcedeth/unbind
# echo -n 0000:00:09.0 > /sys/bus/pci/drivers/pciback/new_slot
# echo -n 0000:00:09.0 > /sys/bus/pci/drivers/pciback/bind
# echo -n 0000:01:05.0 > /sys/bus/pci/drivers/pciback/new_slot
# echo -n 0000:01:05.0 > /sys/bus/pci/drivers/pciback/bind

That is, I wanted to passthrough the second ethernet device (0000:00:09.0) and the firewire device (0000:01:05.0).  Note that I didn't have to unbind the firewire card, since I never loaded the driver for it.

However, after doing this, I see:

[root@amd1 drivers]# xm pci-list-assignable-devices
0000:01:05.0

That is, only the firewire card showed up in the list of assignable devices.  I know they were both successfully hidden, because I am able to start a guest with these devices passed through, and the guest can see both pieces of hardware.  This seems like another bug in xm pci-list-assignable-devices.

Jirka mentioned that xm pci-list-assignable-devices may only show devices that have page-aligned MMIO bars, because those are the only ones you can passthrough to an HVM guest.  However, if this is supposed to be common functionality between PV and HVM guests, we need to show these devices as assignable for PV guests.  I would suggest that we show all of the devices, but mark the ones that we can't passthrough to an HVM guest with a short message describing exactly that.

Comment 1 Dexuan Cui 2009-09-11 03:59:34 UTC
"00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)" -- 00:09.0 has a "Bridge:" in the name -- seems unusual?

Hi Chris Lalancette,
Please supply the info of "lspci -tv" and "lspci -vvvv -xxxx" so I can have enough info to know what's wrong here.

Comment 2 Chris Lalancette 2009-09-14 07:07:21 UTC
Created attachment 360884 [details]
Output of lspci -tv

Comment 3 Chris Lalancette 2009-09-14 07:07:51 UTC
Created attachment 360885 [details]
Output of lspci -vvvv -xxxx

Comment 4 Dexuan Cui 2009-09-15 06:23:50 UTC
(In reply to comment #0)
> 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
From the "lspci -vvvv -xxxx" info in Comment #3, we can see 00:09.0 has an non-page-aligned MMIO BAR:
	Region 3: Memory at b0005c00 (32-bit, non-prefetchable) [size=16].

Actually in the host:
00:02.0 has an MMIO BAR with base b0005000;
00:08.0 has an MMIO BAR with base b0005800 and b0005400;
00:09.0 has an MMIO BAR with base b0005c00.
We can see such 3 devices/functions have BARs included in the same page.
This means it's unsafe to assign them to guests, including both pv guest and hvm guest, because hypervisor can only perform the mmio permission control at a granularity of page, e.g., if we allow 00:09.0 to be assigned to a pv or an hvm guest, the guest can access the MMIO bars of 00:02.0 and 00:08.0 that are not assigned to it at all! 

So I think such kind of device is not assignable: when we try to do that, the guest construction would fail and xend would complain "pci: 0000:00:09.0: non-page-aligned MMIO BAR found."
That's why RHEL5.4 Xen and upstream Xen don't show the device in the output of "xm pci-list-assignable-devices".

> Jirka mentioned that xm pci-list-assignable-devices may only show devices that
> have page-aligned MMIO bars, because those are the only ones you can
> passthrough to an HVM guest.  However, if this is supposed to be common
> functionality between PV and HVM guests, we need to show these devices as
> assignable for PV guests.  I would suggest that we show all of the devices, but
> mark the ones that we can't passthrough to an HVM guest with a short message
> describing exactly that.  
As I stated above, this applies to both pv guest and hvm guest, so I don't think we should show devices with non-page-aligned MMIO BAR.

Comment 5 Chris Lalancette 2009-09-15 06:47:46 UTC
(In reply to comment #4)
> (In reply to comment #0)
> > 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
> From the "lspci -vvvv -xxxx" info in Comment #3, we can see 00:09.0 has an
> non-page-aligned MMIO BAR:
>  Region 3: Memory at b0005c00 (32-bit, non-prefetchable) [size=16].
> 
> Actually in the host:
> 00:02.0 has an MMIO BAR with base b0005000;
> 00:08.0 has an MMIO BAR with base b0005800 and b0005400;
> 00:09.0 has an MMIO BAR with base b0005c00.
> We can see such 3 devices/functions have BARs included in the same page.
> This means it's unsafe to assign them to guests, including both pv guest and
> hvm guest, because hypervisor can only perform the mmio permission control at a
> granularity of page, e.g., if we allow 00:09.0 to be assigned to a pv or an hvm
> guest, the guest can access the MMIO bars of 00:02.0 and 00:08.0 that are not
> assigned to it at all! 

Hm, OK, that is interesting.

> 
> So I think such kind of device is not assignable: when we try to do that, the
> guest construction would fail and xend would complain "pci: 0000:00:09.0:
> non-page-aligned MMIO BAR found."

But no, I definitely was able to pass it through to the PV guest (although I might have disabled strict checking at that point; I can't remember).

Given all of that, I agree, we shouldn't necessarily fix this one for PV guests.  I think we have a workaround for people who used to do this (by disabling strict checking), and since it's unsafe, we don't need to list it in pci-list-assignable.  I'll close as WONTFIX.


Chris Lalancette

Comment 6 Dexuan Cui 2009-09-15 07:07:51 UTC
(In reply to comment #5)
> > So I think such kind of device is not assignable: when we try to do that, the
> > guest construction would fail and xend would complain "pci: 0000:00:09.0:
> > non-page-aligned MMIO BAR found."
> But no, I definitely was able to pass it through to the PV guest (although I
> might have disabled strict checking at that point; I can't remember).
I think even if we set pci-dev-assign-strict-check=no, we're still unable to assign the device to PV guest as xend would prevent us with the "pci: 0000:00:09.0: non-page-aligned MMIO BAR found". I think pci-dev-assign-strict-check=no can only workaround the co-assignment issue for pv guest. So you may want to double check it. :-)
(I don't have such an 'IOV-unfriendly' BIOS so I can't try it myself.)

> Given all of that, I agree, we shouldn't necessarily fix this one for PV
> guests.  I think we have a workaround for people who used to do this (by
> disabling strict checking), and since it's unsafe, we don't need to list it in
> pci-list-assignable.  I'll close as WONTFIX.
> Chris Lalancette  
Yes, in old RHEL Xen (e.g., RHEL 5.3 Xen or 5.2 Xen), we could assign such kind of device to PV guest, but we didn't realize we were at the risk of being attacked from guest.


I think the best solution is: BIOS vendors to not assign non-page-aligned MMIO BAR to device if VT-d is enabled. Actually I guess most of BIOS developer's manuals/specifications should have already contained such a suggestion. And BIOS vendors should supply bios updates to fix the existing BIOSes.

To cope with the existing bad BIOSes, we can let dom0 fix the non-page-aligned MMIO BAR when it boots up. In upstream linux-2.6.18-xen.hg, there are the code and kernel parameters: it
should be http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev=reassign_resources. Maybe they could be backed port into RHEL Xen'd Dom0. The line-of-code might be 1k~1.5k by my rough estimation. Maybe you would be interested in this. :-)

Comment 7 Paolo Bonzini 2010-04-08 15:48:38 UTC
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).

Comment 8 Chris Lalancette 2010-07-19 13:43:42 UTC
Clearing out old flags for reporting purposes.

Chris Lalancette


Note You need to log in before you can comment on or make changes to this bug.