Bug 796677 - [XEN] Xen PCI-passthrough does not work with a Emulex Saturn-X LightPulse
Summary: [XEN] Xen PCI-passthrough does not work with a Emulex Saturn-X LightPulse
Keywords:
Status: CLOSED DUPLICATE of bug 735890
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.7
Hardware: All
OS: All
medium
medium
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-23 13:01 UTC by asilva
Modified: 2018-11-26 19:16 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-27 18:58:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Xen __ioremap() debugging (6.21 KB, patch)
2012-02-24 15:42 UTC, Laszlo Ersek
no flags Details | Diff

Description asilva 2012-02-23 13:01:48 UTC
> Description of problem:
Xen PCI-passthrough does not work with a Emulex Saturn-X  LightPulse Fibre Channel Host Adapter"

According to the tests, from the various reproducers we have determined that the issue does not occur for PCI pass-through on non-Emulex cards (non-lpfc) and indeed also does not occur on non-Saturn-X based lpfc cards. 

> Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux 5
Xen Virtualization
Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (Emulex: LPe12002, HP: AJ763A/82E)

> How reproducible:
Always

> Steps to Reproduce:
modprobe.conf entry:

---
options pciback hide="(0000:1a:00.0)(0000:1a:00.1)"
install lpfc /sbin/modprobe pciback ; /sbin/modprobe --first-time --ignore-install lpfc
---

After boot, loaded lpfc: 

  modprobe lpfc

the PCI device has been bound successfully to the pciback driver:

---
[root@ibm-x3550m3-01 ~]#  ll /sys/bus/pci/drivers/lpfc/
total 0
--w------- 1 root root 4096 Feb  7 10:59 bind
lrwxrwxrwx 1 root root    0 Feb  7 10:59 module -> ../../../../module/lpfc
--w------- 1 root root 4096 Feb  7 10:59 new_id
--w------- 1 root root 4096 Feb  7 10:59 remove_id
--w------- 1 root root 4096 Feb  7 10:59 unbind

[root@ibm-x3550m3-01 ~]#  ll /sys/bus/pci/drivers/pciback
total 0
lrwxrwxrwx 1 root root    0 Feb  7 11:00 0000:1a:00.0 -> ../../../../devices/pci0000:00/0000:00:07.0/0000:1a:00.0
lrwxrwxrwx 1 root root    0 Feb  7 11:00 0000:1a:00.1 -> ../../../../devices/pci0000:00/0000:00:07.0/0000:1a:00.1
--w------- 1 root root 4096 Feb  7 11:00 bind
lrwxrwxrwx 1 root root    0 Feb  7 11:00 module -> ../../../../module/pciback
--w------- 1 root root 4096 Feb  7 11:00 new_id
--w------- 1 root root 4096 Feb  7 11:00 new_slot
-rw------- 1 root root 4096 Feb  7 11:00 permissive
-rw------- 1 root root 4096 Feb  7 11:00 quirks
--w------- 1 root root 4096 Feb  7 11:00 remove_id
--w------- 1 root root 4096 Feb  7 11:00 remove_slot
-r-------- 1 root root 4096 Feb  7 11:00 slots
--w------- 1 root root 4096 Feb  7 11:00 unbind
---

restarted the xend service:

---
[root@ibm-x3550m3-01 ~]# service xend restart
restart xend:                                              [  OK  ]
---

Assignable devices are shown as follows before vm1 is started:

---
[root@ibm-x3550m3-01 ~]# xm pci-list-assignable-devices
0000:1a:00.1 0000:1a:00.0
---

Then started the RHEL 5 virtual machine:

---
[root@ibm-x3550m3-01 ~]# virsh start vm1
Domain vm1 started
---

On the dom0, we see the following devices assigned to the domU vm1:

---
[root@ibm-x3550m3-01 ~]# xm pci-list vm1
domain   bus   slot   func
0    1a    0      0      
0    1a    0      1    
---

...they also have now been hidden from the list of assignable devices: 

---
[root@ibm-x3550m3-01 ~]# xm pci-list-assignable-devices
[root@ibm-x3550m3-01 ~]# 
---

Now on the console for vm1, lspci shows the following devices assigned:

---
[root@localhost ~]# lspci | grep Saturn
00:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
00:00.1 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
---

However, /var/log/messages still shows the same errors:

---
Feb  7 12:16:23 localhost kernel: Emulex LightPulse Fibre Channel SCSI driver 8.2.0.96.2p
Feb  7 12:16:23 localhost kernel: Copyright(c) 2004-2011 Emulex.  All rights reserved.
Feb  7 12:16:23 localhost kernel: PCI: Enabling device 0000:00:00.0 (0000 -> 0002)
Feb  7 12:16:23 localhost kernel: lpfc 0000:00:00.0: ioremap failed for SLIM memory.
Feb  7 12:16:23 localhost kernel: lpfc 0000:00:00.0: 0:1402 Failed to set up pci memory space.
Feb  7 12:16:23 localhost kernel: PCI: Enabling device 0000:00:00.1 (0000 -> 0002)
Feb  7 12:16:23 localhost kernel: lpfc 0000:00:00.1: ioremap failed for SLIM memory.
Feb  7 12:16:23 localhost kernel: lpfc 0000:00:00.1: 0:1402 Failed to set up pci memory space.
---

  
> Actual results:
the domU can not access the device
# dmesg | grep lpfc
lpfc 0000:00:00.0: ioremap failed for SLIM memory.
lpfc 0000:00:00.0: 0:1402 Failed to set up pci memory space.
lpfc 0000:00:00.1: ioremap failed for SLIM memory.
lpfc 0000:00:00.1: 0:1402 Failed to set up pci memory space.


> Expected results:
the domU can access the device

Comment 3 Laszlo Ersek 2012-02-23 15:11:38 UTC
The error message "ioremap failed for SLIM memory" is printed by
lpfc_sli_pci_mem_setup() [drivers/scsi/lpfc/lpfc_init.c,
2.6.18-274.17.1.el5]:

  5293  /* Get the bus address of Bar0 and Bar2 and the number of bytes
  5294   * required by each mapping.
  5295   */
  5296  phba->pci_bar0_map = pci_resource_start(pdev, 0);
  5297  bar0map_len = pci_resource_len(pdev, 0);
  5298  
  5299  phba->pci_bar2_map = pci_resource_start(pdev, 2);
  5300  bar2map_len = pci_resource_len(pdev, 2);
  5301  
  5302  /* Map HBA SLIM to a kernel virtual address. */
  5303  phba->slim_memmap_p = ioremap(phba->pci_bar0_map, bar0map_len);
  5304  if (!phba->slim_memmap_p) {
  5305          dev_printk(KERN_ERR, &pdev->dev,
  5306                     "ioremap failed for SLIM memory.\n");
  5307          goto out;
  5308  }

According to the Sep 23, 2011 domU config file attached to the CP case, the
guest is PV. (This seems consistent with the tendency that HVM guests get
passed-through devices as 06:00.0 and 07:00.0, IIRC, but the BDFs reported
in comment 0 are different.)

First, PCI passthrough to a PV guest is insecure (guest could setup a DMA
wherever it wants). Second, the guest kernel is thus a xenified kernel,
which diverts ioremap() as follows:

ioremap() [include/asm-x86_64/mach-xen/asm/io.h]
-> __ioremap() [arch/i386/mm/ioremap-xen.c]
  -> get_vm_area() [mm/vmalloc.c]
  -> __direct_remap_pfn_range() [arch/i386/mm/ioremap-xen.c]
    -> HYPERVISOR_mmu_update()

After some checks, __ioremap() grabs a virtual address range, then
__direct_remap_pfn_range() kicks the hypervisor in a batched loop to point
the "init_mm" PTEs, covering the vaddr range, to the requested machine
frames. Either the early checks fire, or the hypervisor refuses one of the
PTE updates.

"lspci -v -v -v" on the reproducer machine mentioned in comment 2 reports
the following regions:

  1a:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre
    Channel Host Adapter (rev 03)

    Region 0: Memory at 97a08000 (64-bit, non-prefetchable) [size=4K]
    Region 2: Memory at 97a00000 (64-bit, non-prefetchable) [size=16K]
    Region 4: I/O ports at 2100 [size=256]

  1a:00.1 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre
    Channel Host Adapter (rev 03)

    Region 0: Memory at 97a09000 (64-bit, non-prefetchable) [size=4K]
    Region 2: Memory at 97a04000 (64-bit, non-prefetchable) [size=16K]
    Region 4: I/O ports at 2000 [size=256]

According to the error message and the lpfc_sli_pci_mem_setup() source,
region 0 is "SLIM memory", while region 2 is "HBA control registers". So the
xenified ioremap() fails to map 97a08000..+4K and 97a09000..+4K.

I would recommend:

- checking 97a08000 against the Xen E820 map (xm dmesg) -- does the card
  specify a region that Xen considers reserved RAM (or RAM at all)?

- testing the PV guest with RHEL-5.8 GA in both host and guest

- passthrough to an HVM guest (with iommu enabled) instead of the PV guest

- if none of those work, I'll add debug logging to __ioremap() (see above)
  and figure out where exactly it fails.

Comment 4 Laszlo Ersek 2012-02-24 15:42:05 UTC
Created attachment 565629 [details]
Xen __ioremap() debugging

(In reply to comment #3)

> I would recommend:
> 
> - checking 97a08000 against the Xen E820 map (xm dmesg) -- does the card
>   specify a region that Xen considers reserved RAM (or RAM at all)?
> 
> - testing the PV guest with RHEL-5.8 GA in both host and guest
> 
> - passthrough to an HVM guest (with iommu enabled) instead of the PV guest
> 
> - if none of those work, I'll add debug logging to __ioremap() (see above)
>   and figure out where exactly it fails.

For the fourth option.

Comment 7 Laszlo Ersek 2012-02-27 18:41:24 UTC
Reproduced the bug in-house with the x86_64 -308 xen kernel (both domU and
dom0).

When the guest (domid == 1) logs (during boot):

PCI: Enabling device 0000:00:00.0 (0000 -> 0002)
PCI: Setting latency timer of device 0000:00:00.0 to 64
lpfc 0000:00:00.0: ioremap failed for SLIM memory.
lpfc 0000:00:00.0: 0:1402 Failed to set up pci memory space.

The hypervisor prints:

(XEN) mm.c:630:d1 Non-privileged (1) attempt to map I/O space 00097a08

Referring back to comment 3:

>     Region 0: Memory at 97a08000 (64-bit, non-prefetchable) [size=4K]
> [...]
> According to the error message and the lpfc_sli_pci_mem_setup() source,
> region 0 is "SLIM memory",

Machine frame 97a08 corresponds to this mach addr.

Adding vendor-id:device-id ('10df:f100') to /etc/xen/xend-pci-permissive.sxp
only takes care of the PCI conf space access:

pciback 0000:1a:00.0: enabling permissive mode configuration space accesses!
pciback 0000:1a:00.0: permissive mode is potentially unsafe!
pciback 0000:1a:00.1: enabling permissive mode configuration space accesses!
pciback 0000:1a:00.1: permissive mode is potentially unsafe!

but the hypervisor keeps complaining about the non-privileged mapping
attempt.

Comment 9 Laszlo Ersek 2012-02-27 18:58:49 UTC
This bug is a duplicate of bug 735890.

See in particular:
- bug 735890 comment 15,
- bug 735890 comment 16,
- bug 735890 comment 17.

Applying that analysis to the current values:

[root@in-house-reproducer ~]# cat -n \
                              /sys/bus/pci/devices/0000:1a:00.0/resource
     1  0x0000000097a08000 0x0000000097a08fff 0x0000000000020204
     2  0x0000000000000000 0x0000000000000000 0x0000000000000000
     3  0x0000000097a00000 0x0000000097a03fff 0x0000000000020204
     4  0x0000000000000000 0x0000000000000000 0x0000000000000000
     5  0x0000000000002100 0x00000000000021ff 0x0000000000020101
     6  0x0000000000000000 0x0000000000000000 0x0000000000000000
     7  0x0000000098200000 0x000000009823ffff 0x0000000000027200

From those, line 1, 3, 7 are iomem ranges, while line 5 describes an ioport
range. Other lines are ignored. The following is a xend log excerpt, written
at domU startup:

    Unconstrained device: 0000:1a:00.0
    pci: enabling ioport 0x2100/0x100
    pci: enabling iomem 0x97a08000/0x1000 pfn 0x97a08/0x1
    pci: enabling iomem 0x97a00000/0x4000 pfn 0x97a00/0x4
    pci: enabling iomem 0x98200000/0x40000 pfn 0x98200/0x40
    pci-msix: remove permission for 0x97a02000/0x20000 0x97a02/0x20
    pci-msix: remove permission for 0x97a03000/0x1000 0x97a03/0x1

Note that "iomem 0x97a08000/0x1000" from the above corresponds to

    Region 0: Memory at 97a08000 (64-bit, non-prefetchable) [size=4K]

from comment 3, and to

    (XEN) mm.c:630:d1 Non-privileged (1) attempt to map I/O space 00097a08

from comment 7. When xend removes the permission for the MSI-X range
0x97a02000/0x20000 (see duplicate bug 735890), it kills any previously
granted permission for included/overlapping iomem regions:

0x97a02000 -- start of MSI-X iomem (inclusive)
0x97a08000 -- start of lpfc SLI memory (inclusive)
0x97a09000 -- end of lpfc SLI memory (exclusive), size 0x1000
0x97a22000 -- end of MSI-X iomem (exclusive), size 0x20000

which causes lpfc_sli_pci_mem_setup() to fail in the guest.

*** This bug has been marked as a duplicate of bug 735890 ***


Note You need to log in before you can comment on or make changes to this bug.