Bug 490153 - PCI passthrough of Symbios SCSI device doesn't work
PCI passthrough of Symbios SCSI device doesn't work
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.2
All Linux
low Severity medium
: rc
: ---
Assigned To: Laszlo Ersek
Red Hat Kernel QE team
: Reopened
Depends On:
Blocks: 514490
  Show dependency treegraph
 
Reported: 2009-03-13 11:34 EDT by Orion Poplawski
Modified: 2012-08-15 08:49 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-15 08:49:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
SOS Report for the hypervisor. (2.04 MB, application/x-bzip2)
2012-03-08 08:59 EST, Andreas Thienemann
no flags Details

  None (edit)
Description Orion Poplawski 2009-03-13 11:34:12 EDT
Description of problem:

Trying to get PCI passthrough working for a Symbios Logic 53c875 SCSI
controller.  Running EL 5.2 with xen 3.0.3-73 but with test kernel
2.6.18-132.el5virttest10xen on the Dom0.

First I needed to add 1000:000f to xend-pci-permissive.sxp because of:

pciback 0000:04:09.0: Driver tried to write to a read-only configuration space
field at offset 0x44, size 2.

The controller fails on the 5.2 guest with:

SCSI subsystem initialized
PCI: Enabling device 0000:00:00.0 (0000 -> 0003)
sym0: <875> rev 0x26 at pci 0000:00:00.0 irq 20
sym0: Tekram NVRAM, ID 7, Fast-20, SE, parity checking 
Failed to obtain physical IRQ 20
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.3 
scsi 0:0:0:0: ABORT operation started. 
scsi 0:0:0:0: ABORT operation timed-out. 
scsi 0:0:0:0: DEVICE RESET operation started.
scsi 0:0:0:0: DEVICE RESET operation timed-out.
scsi 0:0:0:0: BUS RESET operation started.
scsi 0:0:0:0: BUS RESET operation timed-out.
scsi 0:0:0:0: HOST RESET operation started.
sym0: SCSI BUS has been reset.           
scsi 0:0:0:0: HOST RESET operation timed-out.
scsi 0:0:0:0: scsi: Device offlined - not ready after error recovery  
....

This may be fixed in c/s 791 in the linux-2.6.18-xen.hg
tree

Version-Release number of selected component (if applicable):
2.6.18-132.el5virttest10xen

How reproducible:
Every time
Comment 2 Andrew Jones 2010-06-23 11:16:16 EDT
Orion,

would you mind seeing if this still reproduces with RHEL 5.5? We didn't pick up the patch you pointed at, but before backporting it I'd like to see if this problem still exists.

Thanks,
Andrew
Comment 3 Orion Poplawski 2010-06-24 18:06:31 EDT
I'm currently running the 5.4 kernel (2.6.18-164.15.1.el5xen) on my 5.5 dom0 machine due to bug 607806, but the pci passthrough seems to be better, though I cannot read tapes.  I had to add 04:05.0 to my pciback hide options due to:

Error: pci: improper device assignment specified: pci: 0000:04:05.0 must be co-assigned to the same guest with 0000:04:09.0, but it is not owned by pciback.

SCSI module loads in the 5.5 domU:

PCI: Enabling device 0000:00:00.0 (0000 -> 0003)
sym0: <875> rev 0x26 at pci 0000:00:00.0 irq 19
sym0: Tekram NVRAM, ID 7, Fast-20, SE, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.3
  Vendor: EXABYTE   Model: EXB-89008E030203  Rev: V37f
  Type:   Sequential-Access                  ANSI SCSI revision: 02
 target0:0:6: Beginning Domain Validation
 target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:6: Domain Validation skipping write tests
 target0:0:6: Ending Domain Validation
 target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
  Vendor: EXABYTE   Model: EXB-89008E030203  Rev: V37f
  Type:   Sequential-Access                  ANSI SCSI revision: 02
 target0:0:8: Beginning Domain Validation
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: Domain Validation skipping write tests
 target0:0:8: Ending Domain Validation
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
scsi 0:0:6:0: Attached scsi generic sg0 type 1
scsi 0:0:8:0: Attached scsi generic sg1 type 1
st: Version 20070203, fixed bufsize 32768, s/g segs 256
st 0:0:6:0: Attached scsi tape st0
st0: try direct i/o: yes (alignment 512 B)
st 0:0:8:0: Attached scsi tape st1
st1: try direct i/o: yes (alignment 512 B)
 target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
st0: Block limits 1 - 245760 bytes.
 target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)
 target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15)

doing a tar tvf /dev/st0 triggered some kind of network hiccup.  Running it again produced the following on the domU console:

Fatal DMA error! Please use 'swiotlb=force'
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:159
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/xen/pci-0/pci0000:00/0000:00:00.0/class
CPU 0 
Modules linked in: nfs fscache nfs_acl autofs4 i2c_dev i2c_core lockd sunrpc ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath scsi_dh parport_pc lp parport st sg sym53c8xx scsi_transport_spi scsi_mod pcspkr xennet dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2693, comm: tar Not tainted 2.6.18-194.3.1.el5xen #1
RIP: e030:[<ffffffff802726f1>]  [<ffffffff802726f1>] dma_map_sg+0x143/0x1ae
RSP: e02b:ffff880003de1b58  EFLAGS: 00010082
RAX: 000000000000002f RBX: ffff88000d03d080 RCX: ffffffff804f9c28
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000001 R08: ffffffff804f9c28 R09: 0000000000000000
R10: 000000000000002d R11: 0000000000000000 R12: ffff88000d03d080
R13: ffff88000f40d870 R14: 0000000000000000 R15: ffff88000418c748
FS:  00002b1e1fcedc30(0000) GS:ffffffff805d2000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process tar (pid: 2693, threadinfo ffff880003de0000, task ffff88000409f860)
Stack:  ffff88000c709c00  ffff88000c7d4080  ffff88000c7d4080  ffff88000c709000 
 0000000000000002  ffffffff88144882  ffff88000c709000  ffff88000c7d4080 
 ffff88000f9e2800  ffff88000f1ee048 
Call Trace:
 [<ffffffff88144882>] :sym53c8xx:sym_setup_data_and_start+0x13b/0x2c3
 [<ffffffff88143bb5>] :sym53c8xx:sym53c8xx_queue_command+0xf8/0x106
 [<ffffffff88100c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322
 [<ffffffff88106103>] :scsi_mod:scsi_request_fn+0x2c5/0x39c
 [<ffffffff80336156>] blk_execute_rq_nowait+0x89/0xa0
 [<ffffffff88105b83>] :scsi_mod:scsi_execute_async+0x356/0x399
 [<ffffffff88171df5>] :st:st_do_scsi+0x1c6/0x221
 [<ffffffff88171656>] :st:st_sleep_done+0x0/0x60
 [<ffffffff88173181>] :st:check_tape+0x2bd/0x564
 [<ffffffff88173d1d>] :st:st_open+0x1ce/0x20c
 [<ffffffff8024b2fe>] chrdev_open+0x14d/0x183
 [<ffffffff8024b1b1>] chrdev_open+0x0/0x183
 [<ffffffff8021edc8>] __dentry_open+0xd9/0x1dc
 [<ffffffff80227bca>] do_filp_open+0x2a/0x38
 [<ffffffff8029595f>] recalc_sigpending_and_wake+0x9/0x1a
 [<ffffffff8021a270>] do_sys_open+0x44/0xbe
 [<ffffffff802602f9>] tracesys+0xab/0xb6


Code: 0f 0b 68 f6 7e 49 80 c2 9f 00 48 8b 3b 48 2b 3d eb c1 46 00 
RIP  [<ffffffff802726f1>] dma_map_sg+0x143/0x1ae
 RSP <ffff880003de1b58>
 <0>Kernel panic - not syncing: Fatal exception
Comment 4 Orion Poplawski 2010-06-24 18:12:39 EDT
A second try was more successful.  So perhaps just flaky?
Comment 5 Andrew Jones 2010-06-25 05:57:30 EDT
Hmm, if it's working sometimes, but not other times, then maybe the swiotlb is too small? It's size can be increased from the default (64M) with swiotlb=<size> Do you get any other logs in the guest's dmesg or host's dmesg or 'xm dmesg' when attempting to use it?
Comment 6 Orion Poplawski 2010-06-25 13:25:28 EDT
As a sanity check, can you see if I'm using this correctly?

I have /etc/modprobe.d/pciback with:

options pciback hide=(0000:04:05.0)(0000:04:09.0)

And in my /etc/xen/domain file:

pci = [ '04:09.0' ]

However, when I boot my dom0, pciback isn't loaded, and my dom0 has loaded the sym53c8xx driver.  Presumably I need to force the loading of pciback early and perhaps disable loading sym53c8xx?
Comment 7 Orion Poplawski 2010-06-25 14:00:21 EDT
Okay, figured out how to preload the pciback module in the initrd, so up and running there.  Will do some more testing.
Comment 8 Orion Poplawski 2010-06-30 12:22:57 EDT
Getting lots on this in xm dmesg:

(XEN) mm.c:630:d6 Non-privileged (6) attempt to map I/O space 000000f0
(XEN) mm.c:630:d3 Non-privileged (3) attempt to map I/O space 000000f0
Comment 9 Orion Poplawski 2010-06-30 12:32:00 EDT
Booted domU with swiotlb=force, so far so good.
Comment 10 Paolo Bonzini 2011-04-01 09:56:37 EDT
Thanks for the information!  Please reopen if there are problems.
Comment 11 Andreas Thienemann 2012-03-06 08:01:52 EST
Same situation here, 53c875 controller being passed through to a pv vm.

xen-3.0.3-132.el5_7.2
kernel-xen-2.6.18-274.18.1.el5

pci devices:
02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14)
02:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14)

modprobe.conf:
options pciback hide=(0000:00:05.0)(0000:02:05.0)

Booting the virtual machine without swiotlb=force results in the following trace:

Starting udev: Fatal DMA error! Please use 'swiotlb=force'
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:161
invalid opcode: 0000 [1] SMP 
last sysfs file: /block/xvda/xvda1/dev
CPU 0 
Modules linked in: sym53c8xx scsi_transport_spi scsi_mod pcspkr xennet dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 18, comm: kblockd/0 Not tainted 2.6.18-274.18.1.el5xen #1
RIP: e030:[<ffffffff802724f4>]  [<ffffffff802724f4>] dma_map_sg+0x140/0x1ad
RSP: e02b:ffff88003fa49d50  EFLAGS: 00010086
RAX: 000000000000002f RBX: 0000000000000002 RCX: ffff88003fa5c070
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000001 R08: 0000000000000010 R09: 000000003f6f2200
R10: 0000000000000001 R11: 00000000ffffffff R12: ffff88003da96c80
R13: ffff88003fa5c070 R14: 0000000000000000 R15: ffffffff80338a9c
FS:  00002b7052c22710(0000) GS:ffffffff80631000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process kblockd/0 (pid: 18, threadinfo ffff88003fa48000, task ffff88003f9eb040)
Stack:  ffff88003d4f1c00  ffff88003deabc80  ffff88003deabc80  ffff88003d4f1000 
 0000000000000002  ffffffff8816c888  ffff88003d4f1000  ffff88003deabc80 
 ffff88003fed8000  0000000000000000 
Call Trace:
 [<ffffffff8816c888>] :sym53c8xx:sym_setup_data_and_start+0x13b/0x2c3
 [<ffffffff8816bbbb>] :sym53c8xx:sym53c8xx_queue_command+0xf8/0x106
 [<ffffffff88128db2>] :scsi_mod:scsi_dispatch_cmd+0x2ac/0x366
 [<ffffffff8812e421>] :scsi_mod:scsi_request_fn+0x2c7/0x39e
 [<ffffffff8025c91d>] generic_unplug_device+0x22/0x37
 [<ffffffff8024f1b0>] run_workqueue+0x9e/0xfb
 [<ffffffff8024ba3d>] worker_thread+0x0/0x122
 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8024bb2d>] worker_thread+0xf0/0x122
 [<ffffffff80289619>] default_wake_function+0x0/0xe
 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff802339b7>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff802338b9>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12


Code: 0f 0b 68 65 10 4a 80 c2 a1 00 eb fe 48 2b 15 a9 a4 5e 00 48 
RIP  [<ffffffff802724f4>] dma_map_sg+0x140/0x1ad
 RSP <ffff88003fa49d50>
 <0>Kernel panic - not syncing: Fatal exception

What is interesting however, the hypervisor will freeze at this point and not even reply to icmp echo requests anymore.

Enabling swiotlb=force works, the virtual machine boots and everything is fine.
When the virtual machine is shutdown the hypervisor will lock up the same way as above right after the "System halted." message is printed by the vm. No icmp responses, no nothing fromt he hypervisor.

This was working fine with older RHEL versions but is not working with RHEL5.7 anymore.

Is there any good advice on getting debugging data for this from the hypervisor?
Comment 12 Paolo Bonzini 2012-03-06 09:03:16 EST
Can you please:

1) attach a sosreport of the host with no running guest;

2) attach a serial console to the machine and gather output from the hypervisor just before it freezes.  Also, when it freezes, type ^A^A^A (Ctrl-A three times) followed by "q", "z" and "i", and gather output from those as well.  Thanks!
Comment 13 Laszlo Ersek 2012-03-06 09:22:12 EST
(In reply to comment #11)

> Enabling swiotlb=force works, the virtual machine boots and everything is
> fine. When the virtual machine is shutdown the hypervisor will lock up the
> same way as above right after the "System halted." message is printed by
> the vm. No icmp responses, no nothing fromt he hypervisor.

Can you please check that on the Xen serial console? In the grub.conf entry,
add

    com1=115200,8n1 loglvl=all guest_loglvl=all sync_console

to the xen.gz command line, and

    console=ttyS0,115200 ignore_loglevel
    
to the vmlinuz command line. After reboot, if you press ^A six times in
minicom (= 3x ^A for the hypervisor), it will switch to debug-keys mode. You
should see a short message informing about that:

(XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to
                               DOM0).

This would confirm the hypervisor is responsive.

You can also find in KnowledgeBase how to set up kdump in dom0. Then (after
switching the serial console to Xen debug-keys like above) pressing 'C' in
minicom during the hang would dump a Xen+dom0 vmcore. Opening a case with
Global Support Services could give the vmcore a first round analysis.


> This was working fine with older RHEL versions but is not working with
> RHEL5.7 anymore.

Since we don't seem to have such a controller handy, can you please test it
with some earlier 5.7.z kernel versions? (The stackdump in your comment is
from 2.6.18-274.18.1.el5xen.) Or did you mean 5.6 as last working version?

Thank you.
Comment 14 Laszlo Ersek 2012-03-07 07:32:57 EST
Setting needinfo for comment 12 & 13.
Comment 15 Andreas Thienemann 2012-03-07 07:50:26 EST
Thanks for the comments. I'm in the process of having the test machine wired up to a Cyclades.

ETA currently is Thursday.
Comment 16 Andreas Thienemann 2012-03-08 08:59:31 EST
Created attachment 568641 [details]
SOS Report for the hypervisor.

I used the opportunity to reinstall the machine with RHEL 5.8 + latest upgrades from RHN. This is basically a fresh minimum installation on the box to rule out any side effects...

[root@sysiphus2 ~]# sosreport 

sosreport (version 1.7)

This utility will collect some detailed  information about the
hardware and  setup of your  Red Hat Enterprise Linux  system.
The information is collected and an archive is  packaged under
/tmp, which you can send to a support representative.
Red Hat will use this information for diagnostic purposes ONLY
and it will be considered confidential information.

This process may take a while to complete.
No changes will be made to your system.

Press ENTER to continue, or CTRL-C to quit.

Please enter your first initial and last name [sysiphus2]: 
Please enter the case number that you are generating this report for: 

could not setup plugin auditd
<type 'instance'>: ("local variable 'flog' referenced before assignment",)
could not setup plugin hardware
<type 'instance'>: ('list index out of range',)
 plugin yum finished ...                            
 Completed.

Creating compressed archive...
  
Your sosreport has been generated and saved in:
  /tmp/sosreport-sysiphus2-550849-1c0d9a.tar.bz2

The md5sum is: 45002e2abbcd92ff4aa0c630c91c0d9a

Please send this file to your support representative.

[root@sysiphus2 ~]#
Comment 17 Laszlo Ersek 2012-03-08 09:24:14 EST
Thank you for the sosreport. (I assume the problem persists after the upgrade.)

Please provide the serial console output and the vmcore as well if possible. Articles that could prove useful:
- https://access.redhat.com/knowledge/solutions/6038
- https://access.redhat.com/knowledge/solutions/2112
- https://access.redhat.com/knowledge/articles/69964

Thank you.
Comment 18 Andreas Thienemann 2012-03-08 16:12:36 EST
I know have the serial console connected and configured to your instructions.

If booting without the swiotlb=force command, the machine will still lock up with the same traceback printed.

All I saw on the console at this point is the following:

pciback: vpci: 0000:02:05.0: assign to virtual slot 0
pciback: vpci: 0000:02:05.1: assign to virtual slot 0 func 1
device vif1.0 entered promiscuous mode
type=1700 audit(1331240822.377:13): dev=vif1.0 prom=256 old_prom=0 auid=4294967295 ses=4294967295
kernel direct mapping tables up to 100800000 @ 12b4000-22c8000
(XEN) mm.c:630:d1 Non-privileged (1) attempt to map I/O space 00000000
(XEN) mm.c:630:d1 Non-privileged (1) attempt to map I/O space 00000000
blkback: ring-ref 9, event-channel 9, protocol 1 (x86_64-abi)
PCI: Enabling device 0000:02:05.0 (0000 -> 0003)
ACPI: PCI Interrupt 0000:02:05.0[A] -> GSI 20 (level, low) -> IRQ 21
xenbr0: port 3(vif1.0) entering forwarding state
PCI: Enabling device 0000:02:05.1 (0000 -> 0003)
ACPI: PCI Interrupt 0000:02:05.1[B] -> GSI 21 (level, low) -> IRQ 20


Interestingly, CTRL-A A A doesn't do anything at this point anymore.
During normal operation however, typing that key combination gets me the expected (XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0). message.


Running with swiotlb=force makes the bootup succeed but powerdown of the virtual machine leads to a lock up of the hypervisor.

xenbr0: port 3(vif1.0) entering disabled state
device vif1.0 left promiscuous mode
type=1700 audit(1331241023.994:14): dev=vif1.0 prom=0 old_prom=256 auid=4294967295 ses=4294967295
xenbr0: port 3(vif1.0) entering disabled state

Is all I've gotten at that point before the whole machine (hypervisor) included locks up.

Last messages on the virtual machine are the following:

Turning off swap:  
Unmounting file systems:  
Halting system...
md: stopping all md devices.
System halted.
Comment 19 Laszlo Ersek 2012-03-09 03:45:06 EST
We should try rmmod-ing the module in the domU before shutting down the domU.
Comment 20 Laszlo Ersek 2012-03-09 04:11:15 EST
- Interesting module parameters:

parm: verb:0 for minimal verbosity, 1 for normal, 2 for excessive (byte)
parm: debug:Set bits to enable debugging (uint)
parm: safe:Set other settings to a "safe mode" (charp)

"safe=y" implies "verb=2".

- Debug bits:

#define DEBUG_ALLOC     (0x0001)
#define DEBUG_PHASE     (0x0002)
#define DEBUG_POLL      (0x0004)
#define DEBUG_QUEUE     (0x0008)
#define DEBUG_RESULT    (0x0010)
#define DEBUG_SCATTER   (0x0020)
#define DEBUG_SCRIPT    (0x0040)
#define DEBUG_TINY      (0x0080)
#define DEBUG_TIMING    (0x0100)
#define DEBUG_NEGO      (0x0200)
#define DEBUG_TAGS      (0x0400)
#define DEBUG_POINTER   (0x0800)

- What if the driver doesn't unmap all bounce buffers?

- We should try passthrough to a fullvirt guest, on an IOMMU-enabled machine.
Comment 22 Andreas Thienemann 2012-03-09 05:22:54 EST
(In reply to comment #19)
> We should try rmmod-ing the module in the domU before shutting down the domU.

Tried it:

[root@amanda ~]# rmmod sym53c8xx
sym1: detaching ...
sym1: resetting chip
sym0: detaching ...
sym0: resetting chip
[root@amanda ~]# 

But no change so far. Hypervisor still dies without any messages printed.

I'll try some of the module parameters.

I am not sure I can actually find an IOMMU enabled machine which will be able to house the card. It's the only symbios controller card I have around and it's PCI.
All my IOMMU machines are PCIe though. Will make certain however.
Comment 23 Laszlo Ersek 2012-03-09 05:42:41 EST
(In reply to comment #22)

> [root@amanda ~]# rmmod sym53c8xx
> sym1: detaching ...
> sym1: resetting chip
> sym0: detaching ...
> sym0: resetting chip
> [root@amanda ~]# 
> 
> But no change so far. Hypervisor still dies without any messages printed.

When does it freeze? A little bit after you remove the module in the domU, or when you shut down the domu (after the rmmod)?

[root@amanda ~]# rmmod sym53c8xx
/* ... */
                                    <---- does it lock up here?
[root@amanda ~]# shutdown -h now
/* ... */
                                    <---- or here?

> I'll try some of the module parameters.

Thanks!

> I am not sure I can actually find an IOMMU enabled machine which will be able
> to house the card.

Okay, let's postpone that for now; we have other ideas to try.

Here's a further option: if the box survives the rmmod (indefinitely, not just for a few seconds), we could grab a dom0/hv vmcore and check the swiotlb (bounce buffer) status. It should say "everything free", since the module is absent.
Comment 24 Andreas Thienemann 2012-03-09 07:30:05 EST
The box freezes after the shutdown when it actually says "System halted.".

The rmmod succeeds and the box keeps working fine. So we can grab the vmcore and see what we'll fine.
Comment 26 Laszlo Ersek 2012-03-09 09:52:51 EST
The host is RHEL-5.8.

The guest was originally RHEL-5.7.z (see comment 11).

Now I could only test with a CentOS (...) -308 (and -308.1.1) build. Those didn't even boot (with swiotlb=force). They bug out in:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at lib/../arch/i386/kernel/swiotlb.c:160
invalid opcode: 0000 [1] SMP 
last sysfs file: 
CPU 0 
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.18-308.1.1.el5xen #1
RIP: e030:[<ffffffff8034e39e>]  [<ffffffff8034e39e>] swiotlb_init_with_default_size+0xa0/0x1a0
RSP: e02b:ffffffff80755f40  EFLAGS: 00010282
RAX: 00000000fffffff4 RBX: 0000000000000100 RCX: 00000000004ef406
RDX: ffffffffff578000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000004000000 R08: 0000000000001000 R09: ffffffff807c9ac0
R10: 0000000000000000 R11: 0000000000000048 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff80633000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff80754000, task ffffffff80501b80)
Stack:  0000000000000314  0000000002020800  0000000000000000  ffffffff80276f37 
 0000000000000000  ffffffff80769b2b  0000000000000000  0000000002020800 
 0000000000000000  0000000000000000 
Call Trace:
 [<ffffffff80276f37>] pci_swiotlb_init+0x9/0x2d
 [<ffffffff80769b2b>] mem_init+0x13/0x1e8
 [<ffffffff8075ea7b>] start_kernel+0x189/0x224
 [<ffffffff8075e1e5>] _sinittext+0x1e5/0x1eb


Code: 0f 0b 68 a1 43 4b 80 c2 a0 00 eb fe 48 83 eb 80 48 8b 05 33 
RIP  [<ffffffff8034e39e>] swiotlb_init_with_default_size+0xa0/0x1a0
 RSP <ffffffff80755f40>
 <0>Kernel panic - not syncing: Fatal exception
----------- [cut here ] --------- [please bite here ] ---------


independently from whether or not I added dom0_mem=-128M to the Xen command line (see below why I tried that).

The bug hits here (... at least looking at the RHEL-5.8.z source -- we'll need a RHEL guest here, not a CentOS one):

swiotlb_init_with_default_size() [arch/i386/kernel/swiotlb.c] 

        for (i = 0; i < iotlb_nslabs; i += IO_TLB_SEGSIZE) {
                int rc = xen_create_contiguous_region(
                        (unsigned long)iotlb_virt_start + (i << IO_TLB_SHIFT),
                        get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT),
                        IO_TLB_DMA_BITS);
                BUG_ON(rc);
        }

The default size is 64 MB [lib/swiotlb.c], IO_TLB_SHIFT is 11. So (because we didn't pass "swiotlb=NNNN,force")

iotlb_nslabs = (64 * (1 << 20) >> 11) = 64 * 512 = 32768

It is then rounded up to be a multiple of IO_TLB_SEGSIZE (128) -- no change.

We try to allocate 32768 / 128 = 256 contiguous regions, each 256 KB in size. One of those xen_create_contiguous_region() calls fails into the loop, the guest kernel panics, and the hypervisor freezes (^A^A^A over serial ceases working).

If I try hard, I can squeeze these symptoms in the original problem -- or my idea thereof:

(a) The hypervisor freezes when a PV domain with *unreleased* contiguous regions (bounce buffers) is destroyed.

(b) Originally (= 5.7.z guest), this situation was produced by the sym53c8xx driver: when it was removed, some bounce buffers (sg scatter-gather areas) remained unreleased. We didn't see any problems until the domain was shut down. Then the hypervisor froze.

(c) With the 5.8 guest, we trigger the problem during boot: we allocate *some* bounce buffers, then kill the domain (for whatever reason) while still owning those bounce buffers.
Comment 27 Laszlo Ersek 2012-03-09 12:17:10 EST
I'm wrong about the bounce buffer allocation/leak causing the freeze. I specified "swiotlb=8192,force" on the domU command line, which caused the domU to panic in alloc_bootmem_low_pages(), right before the loop quoted above -- before the first call to xen_create_contiguous_region(). The hypervisor still froze.
Comment 28 Laszlo Ersek 2012-03-09 13:14:46 EST
"xm pci-list-assignable-devices" doesn't list either of the devices (0000:02:05.0 0000:02:05.1). I think this is due to the fact that both devices / functions have non-page-aligned BARs -- and the pci-list-assignable-devices command seems to exclude those. Search "tools/python/xen/util/pci.py" for "has_non_page_aligned_bar" -- methods check_mmio_bar and PciDevice::__init__.

xm_pci_list_assignable_devices lists only what passes through check_mmio_bar [tools/python/xen/xm/main.py].

02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14)

        Region 0: I/O ports at 7000 [size=256]
        Region 1: Memory at ff0ff800 (32-bit, non-prefetchable) [size=256]
        Region 2: Memory at ff0fd000 (32-bit, non-prefetchable) [size=4K]

        Kernel modules: sym53c8xx

02:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14)

        Region 0: I/O ports at 8000 [size=256]
        Region 1: Memory at ff0ffc00 (32-bit, non-prefetchable) [size=256]
        Region 2: Memory at ff0fe000 (32-bit, non-prefetchable) [size=4K]

        Kernel modules: sym53c8xx

Each iomem region's start address should be a multiple of 0x1000 (in order to qualify as page-aligned). Region 1 of the first function has remainder 0x800, while Region 1 of the second function has remainder 0xc00.

Note that upstream xend plainly refuses to pass through such devices with "non-page-aligned MMIO BAR found". The linux-2.6.18-xen kernel was extended to fix up the alignment:

http://wiki.xen.org/wiki/Xen_PCI_Passthrough
--> drivers/pci/guestdev.c (we don't have it)

It looks like a WONTFIX to me. I don't know how it could have worked earlier, but "xm pci-list-assignable-devices" not listing it is certainly disturbing. Referring back to the wiki link above, it says:

4. Run "xm pci-list-assignable-devices" and verify the PCI device is available for passthru

5. *Note important: Make sure the device shows up in the "xm pci-list-assignable-devices" list! Don't continue before you've gotten that properly working.*

Backporting "drivers/pci/guestdev.c" seems risky, it is not isolated (search the upstream hg repo for "pci_is_reassigndev").

Paolo, what do you think?
Comment 29 Laszlo Ersek 2012-03-09 13:20:31 EST
(In reply to comment #28)

> Note that upstream xend plainly refuses to pass through such devices with
> "non-page-aligned MMIO BAR found". The linux-2.6.18-xen kernel was extended to
> fix up the alignment:
> 
> http://wiki.xen.org/wiki/Xen_PCI_Passthrough
> --> drivers/pci/guestdev.c (we don't have it)

(Search the wiki page for "reassign_resources".)
Comment 30 Andreas Thienemann 2012-03-09 13:27:19 EST
I'm replacing the guest with a RHEL guest and will try older kernels to see when it worked. I do remember 5.4ish being okay.
Comment 32 Andreas Thienemann 2012-03-09 15:01:58 EST
I reinstalled the guest as RHEL5.8 and with the kernel 2.6.18-308.el5xen the machine boots lspci sees the passed through cards but will crash the hypervisor at poweroff. Same if the virtual machine is killed with xm destroy.

With 2.6.18-308.1.1.el5xen running on the guest the machine crashes right at poweron with the message from comment #28.

Starting 2.6.18-308.1.1.el5xen _without_ the swiotlb=force command line makes the guest correctly boot but exhibits the previous behaviour where the hypervisor would freeze at shutdown of the guest.
Comment 33 Paolo Bonzini 2012-03-09 15:29:54 EST
Need to look more closely at the upstream patches, but freezing the HV seems like a no-no to me.  *Something* needs to be done, even if the fix is just to forbid passthrough at Xend time.
Comment 34 Laszlo Ersek 2012-03-09 18:42:29 EST
$ git log --oneline --reverse 2.6.18-308.el5..2.6.18-308.1.1.el5

0b4d341 [fs] prevent lock contention in shrink_dcache_sb via private list
2b468ab [net] igb: reset PHY after recovering from PHY power down
c606d4b [kernel] sysctl: restrict write access to dmesg_restrict
266669d [usb] cdc-acm: make lock use interrupt safe
dc5c7a3 [net] bnx2x: add fan failure event handling
06422a2 [net] bnx2x: make bnx2x_close static again
e7c3ca9 [net] tg3: Fix 4k tx and recovery code
2492abd Revert: [scsi] qla2xxx: fix IO failure during chip reset
02db1d8 Revert: [scsi] qla2xxx: avoid SCSI host_lock dep in queuecommand
bc84712 tag: kernel-2.6.18-308.1.1.el5

Nothing seems to be related.

Andreas, if you remove the pci = [ ... ] stanza from the domU config, does that "fix" the domU? (Just a sanity check.) I'd try it myself but you seem to be logged in.

....

Regarding catching it in Xend:

http://xenbits.xensource.com/hg/xen-unstable.hg/rev/18046
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/18414
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/18965
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/20324
Comment 35 Laszlo Ersek 2012-03-09 19:10:14 EST
The two functions share a single page:

        Region 1: Memory at ff0ff800 (32-bit, non-prefetchable) [size=256]
        Region 1: Memory at ff0ffc00 (32-bit, non-prefetchable) [size=256]
                            ^^^^^
page(frame) 0xff0ff doesn't seem to be shared with anything else. I wonder if this is a "double free" ("double revoke") kind of problem... XEN_DOMCTL_iomem_permission
Comment 36 Andreas Thienemann 2012-03-10 06:33:26 EST
(In reply to comment #34)

> Andreas, if you remove the pci = [ ... ] stanza from the domU config, does that
> "fix" the domU? (Just a sanity check.) I'd try it myself but you seem to be
> logged in.

Ahh, right. I was trying different kernels and left the connection open.

As for the different kernels, here's a little table:

Guest kernel            pci   swiotlb   boots   freezes   comment
2.6.18-308.1.1.el5xen   no    no        yes     no        looks fine
2.6.18-308.1.1.el5xen   no    force     no      no        crash at boot, hv ok
2.6.18-308.1.1.el5xen   yes   no        yes     yes       hv freezes at shutdown
2.6.18-308.1.1.el5xen   yes   force     no      yes       hv freezes at crash
2.6.18-308.el5xen       no    no        yes     no        looks fine
2.6.18-308.el5xen       no    force     no      no        crash at boot, hv ok
2.6.18-308.el5xen       yes   no        yes     yes       hv freezes at shutdown
2.6.18-308.el5xen       yes   force     no      yes       hv freezes at crash

I tried a few different dom0 kernels but changing these seems to have no effect.
Comment 37 Laszlo Ersek 2012-03-10 09:43:56 EST
I did some additional testing. Passthrough always causes the hv to freeze (either at guest crash or guest shutdown time). swiotlb=force always causes the guest to crash.

Here's the same bug for Debian:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=446148

Something else I found:

http://old-list-archives.xen.org/archives/html/xen-users/2010-03/msg00213.html

but the devices in question don't seem to share interrupts.
Comment 38 Laszlo Ersek 2012-03-10 09:55:21 EST
(In reply to comment #35)
> The two functions share a single page:
> 
>         Region 1: Memory at ff0ff800 (32-bit, non-prefetchable) [size=256]
>         Region 1: Memory at ff0ffc00 (32-bit, non-prefetchable) [size=256]
>                             ^^^^^
> page(frame) 0xff0ff doesn't seem to be shared with anything else. I wonder if
> this is a "double free" ("double revoke") kind of problem...
> XEN_DOMCTL_iomem_permission

I wrote a trivial hv patch (for -308.1.1) to log this domctl; this was the output:

(XEN) iomem_permit_access domid=1 mfn=[0xff0ff, 0xff0ff] -> fn0 region1
(XEN) iomem_permit_access domid=1 mfn=[0xff0fd, 0xff0fd] -> fn0 region2
(XEN) iomem_permit_access domid=1 mfn=[0xff0d0, 0xff0df] -> fn0 expansion ROM
(XEN) iomem_permit_access domid=1 mfn=[0xff0ff, 0xff0ff] -> fn1 region1
(XEN) iomem_permit_access domid=1 mfn=[0xff0fe, 0xff0fe] -> fn1 region2
(XEN) iomem_permit_access domid=1 mfn=[0xff0e0, 0xff0ef] -> fn1 expansion ROM

mfn=[0xff0ff, 0xff0ff] is logged twice.

I saw no "iomem_deny_access" (ie. revoke) messages before the freeze (... although the domctl may not be exercised at domain destroy time at all).
Comment 39 Andreas Thienemann 2012-03-10 10:07:29 EST
(In reply to comment #37)

> I did some additional testing. Passthrough always causes the hv to freeze
> (either at guest crash or guest shutdown time). swiotlb=force always causes the
> guest to crash.

A small addition here. Passthrough always causes the hv to freeze _when_ using the Symbios SCSI card.

When using another card in the same hardware, everything works fine:

[root@guest ~]# cat /proc/cmdline 
 ro root=/dev/VolGroup00/LogVol00 console=tty0 console=xvc0
[root@guest ~]# lspci
00:00.0 Communication controller: Cyclades Corporation Cyclades-Z above first megabyte (rev 01)
[root@guest ~]# modprobe cyclades
[root@guest ~]# dmesg | tail -n 4
Cyclades driver 2.3.2.20 2004/02/25 18:14:16
        built Jan  3 2012 16:52:16
PCI: Enabling device 0000:00:00.0 (0000 -> 0003)
Cyclades-8Zo/PCI #1: 0xff500000-0xff57ffff, 8 channels starting from port 0.
[root@guest ~]# 

When shutting down this machine, everything keeps working and the hypervisor does not freeze.

Notes to reproduce on the testmachine:
# modprobe pciback
# echo 0000:00:05.0 > /sys/bus/pci/drivers/pciback/new_slot
# xm create -c test-cyclades
Comment 40 Laszlo Ersek 2012-03-10 19:34:07 EST
... which is exactly why finding a solution is not easy. As you say this card works OK. However it's also not listed by "xm pci-list-assignable-devices", for the same reason:

00:05.0 Communication controller: Cyclades Corporation Cyclades-Z 
        above first megabyte (rev 01)
        Region 0: Memory at ff6eac00 (32-bit, non-prefetchable) [disabled]
                                                                [size=128]
        Region 1: I/O ports at ec00 [disabled] [size=128]
        Region 2: Memory at ff500000 (32-bit, non-prefetchable) [disabled]
                                                                [size=1M]
        Expansion ROM at ff690000 [disabled] [size=4K]

Region 0 is not page aligned. So now we have three functions (across two cards) with one not-page-aligned region each. One function works, two others can hang the hypervisor. If we default-disable passthrough for these, we break the working case. If we additionally introduce an override (to keep the working case working, against the default disable), it has to be card specific.

OTOH I do notice Region 0 is disabled for the Cyclades card...
Comment 41 Laszlo Ersek 2012-03-13 05:56:45 EDT
Andreas, can you please check
- if there's a newer BIOS for the machine,
- if there's an option in the BIOS to set memory ranges of PCI devices?

Thanks,
Laszlo
Comment 43 Andreas Thienemann 2012-03-13 06:22:22 EDT
I believe the BIOS is current. I'll verify though. Same for the memory range.
Comment 44 Laszlo Ersek 2012-03-13 06:36:28 EDT
The hypervisor log contains messages like:

(XEN) PCI add device 00:18.2
(XEN) traps.c:1910:d0 Domain attempted WRMSR 0000000000000410 from
      00000000:00000000 to 00000000:00003bff.
(XEN) traps.c:1910:d0 Domain attempted WRMSR 0000000000000410 from
      00000000:00000000 to 00000000:00003bff.
(XEN) PCI add device 00:19.2
(XEN) traps.c:1910:d0 Domain attempted WRMSR 0000000000000410 from
      00000000:00000000 to 00000000:00003bff.
(XEN) traps.c:1910:d0 Domain attempted WRMSR 0000000000000410 from
      00000000:00000000 to 00000000:00003bff.

#define MSR_K8_MC4_CTL                  0x410

(The box has two dual-core Opteron 275 CPUs (Italy), so "K8" makes sense.)
The kernel tries to write to this MSR in "drivers/edac/k8_edac.c" -- "MC
support for AMD K8 memory controllers".

00:18.2 Host bridge: Advanced Micro Devices [AMD]
        K8 [Athlon64/Opteron] DRAM Controller
00:19.2 Host bridge: Advanced Micro Devices [AMD]
        K8 [Athlon64/Opteron] DRAM Controller

#define K8_MSR_MC4CTL   0x0410  /* North Bridge Check report ctl (64b) */
#define K8_MSR_MC4STAT  0x0411  /* North Bridge status (64b) */
#define K8_MSR_MC4ADDR  0x0412  /* North Bridge Address (64b) */


k8_init_one == k8_driver.k8_init_one
  k8_probe1
    k8_enable_error_reporting
Comment 45 Laszlo Ersek 2012-03-13 06:45:19 EDT
(In reply to comment #43)
> I believe the BIOS is current. I'll verify though. Same for the memory range.

Thank you!

(In reply to comment #30)
> I'm replacing the guest with a RHEL guest and will try older kernels to see
> when it worked. I do remember 5.4ish being okay.

Do you mean a RHEL-5.4 guest worked with a brand new (RHEL-5.8) dom0?
Comment 46 Andreas Thienemann 2012-03-13 07:04:32 EDT
(In reply to comment #41)
> Andreas, can you please check
> - if there's a newer BIOS for the machine,

Nope. 1.06 is the current one.

> - if there's an option in the BIOS to set memory ranges of PCI devices?

Didn't see anything. But feel free to look around. Bios is available via the serial console when pressing F4 during POST.
Comment 47 Laszlo Ersek 2012-03-13 08:24:55 EDT
(In reply to comment #22)

> I am not sure I can actually find an IOMMU enabled machine which will be
> able to house the card. It's the only symbios controller card I have
> around and it's PCI. All my IOMMU machines are PCIe though. Will make
> certain however.

(
I found such an option:
- Chipset | NorthBridge Configuration | IOMMU Option Menu | IOMMU Mode

But it seems to be GART related, not AMD-Vi.
)
Comment 49 Laszlo Ersek 2012-04-03 10:08:00 EDT
Hello Andreas,

can you please confirm this is a regression? Ie. can you pinpoint a hv/dom0/domU combination where the card worked? I've done some rough bisection back to RHEL-5.0 / RHEL-5.1 on the host side (comment 37), and no combination I tried worked.

If we can prove this is a regression, we have a chance bisecting it (which I tried and failed at). If not, then we may have to close the BZ as INSU.

I agree with Paolo's comment 33, but at this point we don't even seem to have enough information to exclude the device from passthrough. We could check the PCI vendor and the device ID, but if we change the default for it, we'll break passthrough for people who use this card without problems -- they'll have to override manually.
Comment 50 Andrew Jones 2012-06-20 09:04:44 EDT
This bug appears to have gone cold. Comment 49 asked for information from the reporter almost three months ago. We can leave it open another couple weeks in case this comment wakes it back up, otherwise it should be closed as INSU.
Comment 51 Orion Poplawski 2012-06-20 09:47:08 EDT
I'm not using this configuration anymore, so I can't help with this now.
Comment 52 Laszlo Ersek 2012-06-22 15:02:03 EDT
Closing as INSU. Thank you for your cooperation!
Comment 53 Andreas Thienemann 2012-06-22 16:49:43 EDT
Reopening. I still have the hardware and am now again in a position to test things.
Comment 54 Laszlo Ersek 2012-07-02 17:37:20 EDT
Hi Andreas,

could you please revisit comment 49?

Thank you,
Laszlo
Comment 56 Laszlo Ersek 2012-08-15 08:49:38 EDT
Please reopen only with a linked Customer Portal case. Thank you.

Note You need to log in before you can comment on or make changes to this bug.