Description of problem: My machine is an AMD Barcelona machine, with the following output from lspci: 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3) 00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2) 00:06.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control 01:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 03:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 03:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 06:00.0 VGA compatible controller: ATI Technologies Inc RV610 video device [Radeon HD 2400 PRO] 06:00.1 Audio device: ATI Technologies Inc RV610 audio device [Radeon HD 2400 PRO] 80:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 80:01.0 RAM memory: nVidia Corporation MCP55 LPC Bridge (rev a3) 80:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3) 80:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 80:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) 80:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3) While testing the PCI passthrough capabilities of Xen, I decided to passthrough 2 of these devices to my paravirtualized RHEL-5 guest. Towards that end, I did: # modprobe pciback # echo -n 0000:00:09.0 > /sys/bus/pci/drivers/forcedeth/unbind # echo -n 0000:00:09.0 > /sys/bus/pci/drivers/pciback/new_slot # echo -n 0000:00:09.0 > /sys/bus/pci/drivers/pciback/bind # echo -n 0000:01:05.0 > /sys/bus/pci/drivers/pciback/new_slot # echo -n 0000:01:05.0 > /sys/bus/pci/drivers/pciback/bind That is, I wanted to passthrough the second ethernet device (0000:00:09.0) and the firewire device (0000:01:05.0). Note that I didn't have to unbind the firewire card, since I never loaded the driver for it. However, after doing this, I see: [root@amd1 drivers]# xm pci-list-assignable-devices 0000:01:05.0 That is, only the firewire card showed up in the list of assignable devices. I know they were both successfully hidden, because I am able to start a guest with these devices passed through, and the guest can see both pieces of hardware. This seems like another bug in xm pci-list-assignable-devices. Jirka mentioned that xm pci-list-assignable-devices may only show devices that have page-aligned MMIO bars, because those are the only ones you can passthrough to an HVM guest. However, if this is supposed to be common functionality between PV and HVM guests, we need to show these devices as assignable for PV guests. I would suggest that we show all of the devices, but mark the ones that we can't passthrough to an HVM guest with a short message describing exactly that.
"00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)" -- 00:09.0 has a "Bridge:" in the name -- seems unusual? Hi Chris Lalancette, Please supply the info of "lspci -tv" and "lspci -vvvv -xxxx" so I can have enough info to know what's wrong here.
Created attachment 360884 [details] Output of lspci -tv
Created attachment 360885 [details] Output of lspci -vvvv -xxxx
(In reply to comment #0) > 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) From the "lspci -vvvv -xxxx" info in Comment #3, we can see 00:09.0 has an non-page-aligned MMIO BAR: Region 3: Memory at b0005c00 (32-bit, non-prefetchable) [size=16]. Actually in the host: 00:02.0 has an MMIO BAR with base b0005000; 00:08.0 has an MMIO BAR with base b0005800 and b0005400; 00:09.0 has an MMIO BAR with base b0005c00. We can see such 3 devices/functions have BARs included in the same page. This means it's unsafe to assign them to guests, including both pv guest and hvm guest, because hypervisor can only perform the mmio permission control at a granularity of page, e.g., if we allow 00:09.0 to be assigned to a pv or an hvm guest, the guest can access the MMIO bars of 00:02.0 and 00:08.0 that are not assigned to it at all! So I think such kind of device is not assignable: when we try to do that, the guest construction would fail and xend would complain "pci: 0000:00:09.0: non-page-aligned MMIO BAR found." That's why RHEL5.4 Xen and upstream Xen don't show the device in the output of "xm pci-list-assignable-devices". > Jirka mentioned that xm pci-list-assignable-devices may only show devices that > have page-aligned MMIO bars, because those are the only ones you can > passthrough to an HVM guest. However, if this is supposed to be common > functionality between PV and HVM guests, we need to show these devices as > assignable for PV guests. I would suggest that we show all of the devices, but > mark the ones that we can't passthrough to an HVM guest with a short message > describing exactly that. As I stated above, this applies to both pv guest and hvm guest, so I don't think we should show devices with non-page-aligned MMIO BAR.
(In reply to comment #4) > (In reply to comment #0) > > 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) > From the "lspci -vvvv -xxxx" info in Comment #3, we can see 00:09.0 has an > non-page-aligned MMIO BAR: > Region 3: Memory at b0005c00 (32-bit, non-prefetchable) [size=16]. > > Actually in the host: > 00:02.0 has an MMIO BAR with base b0005000; > 00:08.0 has an MMIO BAR with base b0005800 and b0005400; > 00:09.0 has an MMIO BAR with base b0005c00. > We can see such 3 devices/functions have BARs included in the same page. > This means it's unsafe to assign them to guests, including both pv guest and > hvm guest, because hypervisor can only perform the mmio permission control at a > granularity of page, e.g., if we allow 00:09.0 to be assigned to a pv or an hvm > guest, the guest can access the MMIO bars of 00:02.0 and 00:08.0 that are not > assigned to it at all! Hm, OK, that is interesting. > > So I think such kind of device is not assignable: when we try to do that, the > guest construction would fail and xend would complain "pci: 0000:00:09.0: > non-page-aligned MMIO BAR found." But no, I definitely was able to pass it through to the PV guest (although I might have disabled strict checking at that point; I can't remember). Given all of that, I agree, we shouldn't necessarily fix this one for PV guests. I think we have a workaround for people who used to do this (by disabling strict checking), and since it's unsafe, we don't need to list it in pci-list-assignable. I'll close as WONTFIX. Chris Lalancette
(In reply to comment #5) > > So I think such kind of device is not assignable: when we try to do that, the > > guest construction would fail and xend would complain "pci: 0000:00:09.0: > > non-page-aligned MMIO BAR found." > But no, I definitely was able to pass it through to the PV guest (although I > might have disabled strict checking at that point; I can't remember). I think even if we set pci-dev-assign-strict-check=no, we're still unable to assign the device to PV guest as xend would prevent us with the "pci: 0000:00:09.0: non-page-aligned MMIO BAR found". I think pci-dev-assign-strict-check=no can only workaround the co-assignment issue for pv guest. So you may want to double check it. :-) (I don't have such an 'IOV-unfriendly' BIOS so I can't try it myself.) > Given all of that, I agree, we shouldn't necessarily fix this one for PV > guests. I think we have a workaround for people who used to do this (by > disabling strict checking), and since it's unsafe, we don't need to list it in > pci-list-assignable. I'll close as WONTFIX. > Chris Lalancette Yes, in old RHEL Xen (e.g., RHEL 5.3 Xen or 5.2 Xen), we could assign such kind of device to PV guest, but we didn't realize we were at the risk of being attacked from guest. I think the best solution is: BIOS vendors to not assign non-page-aligned MMIO BAR to device if VT-d is enabled. Actually I guess most of BIOS developer's manuals/specifications should have already contained such a suggestion. And BIOS vendors should supply bios updates to fix the existing BIOSes. To cope with the existing bad BIOSes, we can let dom0 fix the non-page-aligned MMIO BAR when it boots up. In upstream linux-2.6.18-xen.hg, there are the code and kernel parameters: it should be http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev=reassign_resources. Maybe they could be backed port into RHEL Xen'd Dom0. The line-of-code might be 1k~1.5k by my rough estimation. Maybe you would be interested in this. :-)
This bug was closed during 5.5 development and it's being removed from the internal tracking bugs (which are now for 5.6).
Clearing out old flags for reporting purposes. Chris Lalancette