Description of problem: Trying to get PCI passthrough working for a Symbios Logic 53c875 SCSI controller. Running EL 5.2 with xen 3.0.3-73 but with test kernel 2.6.18-132.el5virttest10xen on the Dom0. First I needed to add 1000:000f to xend-pci-permissive.sxp because of: pciback 0000:04:09.0: Driver tried to write to a read-only configuration space field at offset 0x44, size 2. The controller fails on the 5.2 guest with: SCSI subsystem initialized PCI: Enabling device 0000:00:00.0 (0000 -> 0003) sym0: <875> rev 0x26 at pci 0000:00:00.0 irq 20 sym0: Tekram NVRAM, ID 7, Fast-20, SE, parity checking Failed to obtain physical IRQ 20 sym0: SCSI BUS has been reset. scsi0 : sym-2.2.3 scsi 0:0:0:0: ABORT operation started. scsi 0:0:0:0: ABORT operation timed-out. scsi 0:0:0:0: DEVICE RESET operation started. scsi 0:0:0:0: DEVICE RESET operation timed-out. scsi 0:0:0:0: BUS RESET operation started. scsi 0:0:0:0: BUS RESET operation timed-out. scsi 0:0:0:0: HOST RESET operation started. sym0: SCSI BUS has been reset. scsi 0:0:0:0: HOST RESET operation timed-out. scsi 0:0:0:0: scsi: Device offlined - not ready after error recovery .... This may be fixed in c/s 791 in the linux-2.6.18-xen.hg tree Version-Release number of selected component (if applicable): 2.6.18-132.el5virttest10xen How reproducible: Every time
Orion, would you mind seeing if this still reproduces with RHEL 5.5? We didn't pick up the patch you pointed at, but before backporting it I'd like to see if this problem still exists. Thanks, Andrew
I'm currently running the 5.4 kernel (2.6.18-164.15.1.el5xen) on my 5.5 dom0 machine due to bug 607806, but the pci passthrough seems to be better, though I cannot read tapes. I had to add 04:05.0 to my pciback hide options due to: Error: pci: improper device assignment specified: pci: 0000:04:05.0 must be co-assigned to the same guest with 0000:04:09.0, but it is not owned by pciback. SCSI module loads in the 5.5 domU: PCI: Enabling device 0000:00:00.0 (0000 -> 0003) sym0: <875> rev 0x26 at pci 0000:00:00.0 irq 19 sym0: Tekram NVRAM, ID 7, Fast-20, SE, parity checking sym0: SCSI BUS has been reset. scsi0 : sym-2.2.3 Vendor: EXABYTE Model: EXB-89008E030203 Rev: V37f Type: Sequential-Access ANSI SCSI revision: 02 target0:0:6: Beginning Domain Validation target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:6: Domain Validation skipping write tests target0:0:6: Ending Domain Validation target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) Vendor: EXABYTE Model: EXB-89008E030203 Rev: V37f Type: Sequential-Access ANSI SCSI revision: 02 target0:0:8: Beginning Domain Validation target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: Domain Validation skipping write tests target0:0:8: Ending Domain Validation target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) scsi 0:0:6:0: Attached scsi generic sg0 type 1 scsi 0:0:8:0: Attached scsi generic sg1 type 1 st: Version 20070203, fixed bufsize 32768, s/g segs 256 st 0:0:6:0: Attached scsi tape st0 st0: try direct i/o: yes (alignment 512 B) st 0:0:8:0: Attached scsi tape st1 st1: try direct i/o: yes (alignment 512 B) target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:8: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) st0: Block limits 1 - 245760 bytes. target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) target0:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) doing a tar tvf /dev/st0 triggered some kind of network hiccup. Running it again produced the following on the domU console: Fatal DMA error! Please use 'swiotlb=force' ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:159 invalid opcode: 0000 [1] SMP last sysfs file: /devices/xen/pci-0/pci0000:00/0000:00:00.0/class CPU 0 Modules linked in: nfs fscache nfs_acl autofs4 i2c_dev i2c_core lockd sunrpc ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath scsi_dh parport_pc lp parport st sg sym53c8xx scsi_transport_spi scsi_mod pcspkr xennet dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 2693, comm: tar Not tainted 2.6.18-194.3.1.el5xen #1 RIP: e030:[<ffffffff802726f1>] [<ffffffff802726f1>] dma_map_sg+0x143/0x1ae RSP: e02b:ffff880003de1b58 EFLAGS: 00010082 RAX: 000000000000002f RBX: ffff88000d03d080 RCX: ffffffff804f9c28 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffff804f9c28 R09: 0000000000000000 R10: 000000000000002d R11: 0000000000000000 R12: ffff88000d03d080 R13: ffff88000f40d870 R14: 0000000000000000 R15: ffff88000418c748 FS: 00002b1e1fcedc30(0000) GS:ffffffff805d2000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process tar (pid: 2693, threadinfo ffff880003de0000, task ffff88000409f860) Stack: ffff88000c709c00 ffff88000c7d4080 ffff88000c7d4080 ffff88000c709000 0000000000000002 ffffffff88144882 ffff88000c709000 ffff88000c7d4080 ffff88000f9e2800 ffff88000f1ee048 Call Trace: [<ffffffff88144882>] :sym53c8xx:sym_setup_data_and_start+0x13b/0x2c3 [<ffffffff88143bb5>] :sym53c8xx:sym53c8xx_queue_command+0xf8/0x106 [<ffffffff88100c83>] :scsi_mod:scsi_dispatch_cmd+0x290/0x322 [<ffffffff88106103>] :scsi_mod:scsi_request_fn+0x2c5/0x39c [<ffffffff80336156>] blk_execute_rq_nowait+0x89/0xa0 [<ffffffff88105b83>] :scsi_mod:scsi_execute_async+0x356/0x399 [<ffffffff88171df5>] :st:st_do_scsi+0x1c6/0x221 [<ffffffff88171656>] :st:st_sleep_done+0x0/0x60 [<ffffffff88173181>] :st:check_tape+0x2bd/0x564 [<ffffffff88173d1d>] :st:st_open+0x1ce/0x20c [<ffffffff8024b2fe>] chrdev_open+0x14d/0x183 [<ffffffff8024b1b1>] chrdev_open+0x0/0x183 [<ffffffff8021edc8>] __dentry_open+0xd9/0x1dc [<ffffffff80227bca>] do_filp_open+0x2a/0x38 [<ffffffff8029595f>] recalc_sigpending_and_wake+0x9/0x1a [<ffffffff8021a270>] do_sys_open+0x44/0xbe [<ffffffff802602f9>] tracesys+0xab/0xb6 Code: 0f 0b 68 f6 7e 49 80 c2 9f 00 48 8b 3b 48 2b 3d eb c1 46 00 RIP [<ffffffff802726f1>] dma_map_sg+0x143/0x1ae RSP <ffff880003de1b58> <0>Kernel panic - not syncing: Fatal exception
A second try was more successful. So perhaps just flaky?
Hmm, if it's working sometimes, but not other times, then maybe the swiotlb is too small? It's size can be increased from the default (64M) with swiotlb=<size> Do you get any other logs in the guest's dmesg or host's dmesg or 'xm dmesg' when attempting to use it?
As a sanity check, can you see if I'm using this correctly? I have /etc/modprobe.d/pciback with: options pciback hide=(0000:04:05.0)(0000:04:09.0) And in my /etc/xen/domain file: pci = [ '04:09.0' ] However, when I boot my dom0, pciback isn't loaded, and my dom0 has loaded the sym53c8xx driver. Presumably I need to force the loading of pciback early and perhaps disable loading sym53c8xx?
Okay, figured out how to preload the pciback module in the initrd, so up and running there. Will do some more testing.
Getting lots on this in xm dmesg: (XEN) mm.c:630:d6 Non-privileged (6) attempt to map I/O space 000000f0 (XEN) mm.c:630:d3 Non-privileged (3) attempt to map I/O space 000000f0
Booted domU with swiotlb=force, so far so good.
Thanks for the information! Please reopen if there are problems.
Same situation here, 53c875 controller being passed through to a pv vm. xen-3.0.3-132.el5_7.2 kernel-xen-2.6.18-274.18.1.el5 pci devices: 02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) 02:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) modprobe.conf: options pciback hide=(0000:00:05.0)(0000:02:05.0) Booting the virtual machine without swiotlb=force results in the following trace: Starting udev: Fatal DMA error! Please use 'swiotlb=force' ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at arch/x86_64/kernel/../../i386/kernel/pci-dma-xen.c:161 invalid opcode: 0000 [1] SMP last sysfs file: /block/xvda/xvda1/dev CPU 0 Modules linked in: sym53c8xx scsi_transport_spi scsi_mod pcspkr xennet dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 18, comm: kblockd/0 Not tainted 2.6.18-274.18.1.el5xen #1 RIP: e030:[<ffffffff802724f4>] [<ffffffff802724f4>] dma_map_sg+0x140/0x1ad RSP: e02b:ffff88003fa49d50 EFLAGS: 00010086 RAX: 000000000000002f RBX: 0000000000000002 RCX: ffff88003fa5c070 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 0000000000000001 R08: 0000000000000010 R09: 000000003f6f2200 R10: 0000000000000001 R11: 00000000ffffffff R12: ffff88003da96c80 R13: ffff88003fa5c070 R14: 0000000000000000 R15: ffffffff80338a9c FS: 00002b7052c22710(0000) GS:ffffffff80631000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process kblockd/0 (pid: 18, threadinfo ffff88003fa48000, task ffff88003f9eb040) Stack: ffff88003d4f1c00 ffff88003deabc80 ffff88003deabc80 ffff88003d4f1000 0000000000000002 ffffffff8816c888 ffff88003d4f1000 ffff88003deabc80 ffff88003fed8000 0000000000000000 Call Trace: [<ffffffff8816c888>] :sym53c8xx:sym_setup_data_and_start+0x13b/0x2c3 [<ffffffff8816bbbb>] :sym53c8xx:sym53c8xx_queue_command+0xf8/0x106 [<ffffffff88128db2>] :scsi_mod:scsi_dispatch_cmd+0x2ac/0x366 [<ffffffff8812e421>] :scsi_mod:scsi_request_fn+0x2c7/0x39e [<ffffffff8025c91d>] generic_unplug_device+0x22/0x37 [<ffffffff8024f1b0>] run_workqueue+0x9e/0xfb [<ffffffff8024ba3d>] worker_thread+0x0/0x122 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4 [<ffffffff8024bb2d>] worker_thread+0xf0/0x122 [<ffffffff80289619>] default_wake_function+0x0/0xe [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4 [<ffffffff802339b7>] kthread+0xfe/0x132 [<ffffffff8025fb2c>] child_rip+0xa/0x12 [<ffffffff8029d53e>] keventd_create_kthread+0x0/0xc4 [<ffffffff802338b9>] kthread+0x0/0x132 [<ffffffff8025fb22>] child_rip+0x0/0x12 Code: 0f 0b 68 65 10 4a 80 c2 a1 00 eb fe 48 2b 15 a9 a4 5e 00 48 RIP [<ffffffff802724f4>] dma_map_sg+0x140/0x1ad RSP <ffff88003fa49d50> <0>Kernel panic - not syncing: Fatal exception What is interesting however, the hypervisor will freeze at this point and not even reply to icmp echo requests anymore. Enabling swiotlb=force works, the virtual machine boots and everything is fine. When the virtual machine is shutdown the hypervisor will lock up the same way as above right after the "System halted." message is printed by the vm. No icmp responses, no nothing fromt he hypervisor. This was working fine with older RHEL versions but is not working with RHEL5.7 anymore. Is there any good advice on getting debugging data for this from the hypervisor?
Can you please: 1) attach a sosreport of the host with no running guest; 2) attach a serial console to the machine and gather output from the hypervisor just before it freezes. Also, when it freezes, type ^A^A^A (Ctrl-A three times) followed by "q", "z" and "i", and gather output from those as well. Thanks!
(In reply to comment #11) > Enabling swiotlb=force works, the virtual machine boots and everything is > fine. When the virtual machine is shutdown the hypervisor will lock up the > same way as above right after the "System halted." message is printed by > the vm. No icmp responses, no nothing fromt he hypervisor. Can you please check that on the Xen serial console? In the grub.conf entry, add com1=115200,8n1 loglvl=all guest_loglvl=all sync_console to the xen.gz command line, and console=ttyS0,115200 ignore_loglevel to the vmlinuz command line. After reboot, if you press ^A six times in minicom (= 3x ^A for the hypervisor), it will switch to debug-keys mode. You should see a short message informing about that: (XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0). This would confirm the hypervisor is responsive. You can also find in KnowledgeBase how to set up kdump in dom0. Then (after switching the serial console to Xen debug-keys like above) pressing 'C' in minicom during the hang would dump a Xen+dom0 vmcore. Opening a case with Global Support Services could give the vmcore a first round analysis. > This was working fine with older RHEL versions but is not working with > RHEL5.7 anymore. Since we don't seem to have such a controller handy, can you please test it with some earlier 5.7.z kernel versions? (The stackdump in your comment is from 2.6.18-274.18.1.el5xen.) Or did you mean 5.6 as last working version? Thank you.
Setting needinfo for comment 12 & 13.
Thanks for the comments. I'm in the process of having the test machine wired up to a Cyclades. ETA currently is Thursday.
Created attachment 568641 [details] SOS Report for the hypervisor. I used the opportunity to reinstall the machine with RHEL 5.8 + latest upgrades from RHN. This is basically a fresh minimum installation on the box to rule out any side effects... [root@sysiphus2 ~]# sosreport sosreport (version 1.7) This utility will collect some detailed information about the hardware and setup of your Red Hat Enterprise Linux system. The information is collected and an archive is packaged under /tmp, which you can send to a support representative. Red Hat will use this information for diagnostic purposes ONLY and it will be considered confidential information. This process may take a while to complete. No changes will be made to your system. Press ENTER to continue, or CTRL-C to quit. Please enter your first initial and last name [sysiphus2]: Please enter the case number that you are generating this report for: could not setup plugin auditd <type 'instance'>: ("local variable 'flog' referenced before assignment",) could not setup plugin hardware <type 'instance'>: ('list index out of range',) plugin yum finished ... Completed. Creating compressed archive... Your sosreport has been generated and saved in: /tmp/sosreport-sysiphus2-550849-1c0d9a.tar.bz2 The md5sum is: 45002e2abbcd92ff4aa0c630c91c0d9a Please send this file to your support representative. [root@sysiphus2 ~]#
Thank you for the sosreport. (I assume the problem persists after the upgrade.) Please provide the serial console output and the vmcore as well if possible. Articles that could prove useful: - https://access.redhat.com/knowledge/solutions/6038 - https://access.redhat.com/knowledge/solutions/2112 - https://access.redhat.com/knowledge/articles/69964 Thank you.
I know have the serial console connected and configured to your instructions. If booting without the swiotlb=force command, the machine will still lock up with the same traceback printed. All I saw on the console at this point is the following: pciback: vpci: 0000:02:05.0: assign to virtual slot 0 pciback: vpci: 0000:02:05.1: assign to virtual slot 0 func 1 device vif1.0 entered promiscuous mode type=1700 audit(1331240822.377:13): dev=vif1.0 prom=256 old_prom=0 auid=4294967295 ses=4294967295 kernel direct mapping tables up to 100800000 @ 12b4000-22c8000 (XEN) mm.c:630:d1 Non-privileged (1) attempt to map I/O space 00000000 (XEN) mm.c:630:d1 Non-privileged (1) attempt to map I/O space 00000000 blkback: ring-ref 9, event-channel 9, protocol 1 (x86_64-abi) PCI: Enabling device 0000:02:05.0 (0000 -> 0003) ACPI: PCI Interrupt 0000:02:05.0[A] -> GSI 20 (level, low) -> IRQ 21 xenbr0: port 3(vif1.0) entering forwarding state PCI: Enabling device 0000:02:05.1 (0000 -> 0003) ACPI: PCI Interrupt 0000:02:05.1[B] -> GSI 21 (level, low) -> IRQ 20 Interestingly, CTRL-A A A doesn't do anything at this point anymore. During normal operation however, typing that key combination gets me the expected (XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0). message. Running with swiotlb=force makes the bootup succeed but powerdown of the virtual machine leads to a lock up of the hypervisor. xenbr0: port 3(vif1.0) entering disabled state device vif1.0 left promiscuous mode type=1700 audit(1331241023.994:14): dev=vif1.0 prom=0 old_prom=256 auid=4294967295 ses=4294967295 xenbr0: port 3(vif1.0) entering disabled state Is all I've gotten at that point before the whole machine (hypervisor) included locks up. Last messages on the virtual machine are the following: Turning off swap: Unmounting file systems: Halting system... md: stopping all md devices. System halted.
We should try rmmod-ing the module in the domU before shutting down the domU.
- Interesting module parameters: parm: verb:0 for minimal verbosity, 1 for normal, 2 for excessive (byte) parm: debug:Set bits to enable debugging (uint) parm: safe:Set other settings to a "safe mode" (charp) "safe=y" implies "verb=2". - Debug bits: #define DEBUG_ALLOC (0x0001) #define DEBUG_PHASE (0x0002) #define DEBUG_POLL (0x0004) #define DEBUG_QUEUE (0x0008) #define DEBUG_RESULT (0x0010) #define DEBUG_SCATTER (0x0020) #define DEBUG_SCRIPT (0x0040) #define DEBUG_TINY (0x0080) #define DEBUG_TIMING (0x0100) #define DEBUG_NEGO (0x0200) #define DEBUG_TAGS (0x0400) #define DEBUG_POINTER (0x0800) - What if the driver doesn't unmap all bounce buffers? - We should try passthrough to a fullvirt guest, on an IOMMU-enabled machine.
(In reply to comment #19) > We should try rmmod-ing the module in the domU before shutting down the domU. Tried it: [root@amanda ~]# rmmod sym53c8xx sym1: detaching ... sym1: resetting chip sym0: detaching ... sym0: resetting chip [root@amanda ~]# But no change so far. Hypervisor still dies without any messages printed. I'll try some of the module parameters. I am not sure I can actually find an IOMMU enabled machine which will be able to house the card. It's the only symbios controller card I have around and it's PCI. All my IOMMU machines are PCIe though. Will make certain however.
(In reply to comment #22) > [root@amanda ~]# rmmod sym53c8xx > sym1: detaching ... > sym1: resetting chip > sym0: detaching ... > sym0: resetting chip > [root@amanda ~]# > > But no change so far. Hypervisor still dies without any messages printed. When does it freeze? A little bit after you remove the module in the domU, or when you shut down the domu (after the rmmod)? [root@amanda ~]# rmmod sym53c8xx /* ... */ <---- does it lock up here? [root@amanda ~]# shutdown -h now /* ... */ <---- or here? > I'll try some of the module parameters. Thanks! > I am not sure I can actually find an IOMMU enabled machine which will be able > to house the card. Okay, let's postpone that for now; we have other ideas to try. Here's a further option: if the box survives the rmmod (indefinitely, not just for a few seconds), we could grab a dom0/hv vmcore and check the swiotlb (bounce buffer) status. It should say "everything free", since the module is absent.
The box freezes after the shutdown when it actually says "System halted.". The rmmod succeeds and the box keeps working fine. So we can grab the vmcore and see what we'll fine.
The host is RHEL-5.8. The guest was originally RHEL-5.7.z (see comment 11). Now I could only test with a CentOS (...) -308 (and -308.1.1) build. Those didn't even boot (with swiotlb=force). They bug out in: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at lib/../arch/i386/kernel/swiotlb.c:160 invalid opcode: 0000 [1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.18-308.1.1.el5xen #1 RIP: e030:[<ffffffff8034e39e>] [<ffffffff8034e39e>] swiotlb_init_with_default_size+0xa0/0x1a0 RSP: e02b:ffffffff80755f40 EFLAGS: 00010282 RAX: 00000000fffffff4 RBX: 0000000000000100 RCX: 00000000004ef406 RDX: ffffffffff578000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000004000000 R08: 0000000000001000 R09: ffffffff807c9ac0 R10: 0000000000000000 R11: 0000000000000048 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff80633000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process swapper (pid: 0, threadinfo ffffffff80754000, task ffffffff80501b80) Stack: 0000000000000314 0000000002020800 0000000000000000 ffffffff80276f37 0000000000000000 ffffffff80769b2b 0000000000000000 0000000002020800 0000000000000000 0000000000000000 Call Trace: [<ffffffff80276f37>] pci_swiotlb_init+0x9/0x2d [<ffffffff80769b2b>] mem_init+0x13/0x1e8 [<ffffffff8075ea7b>] start_kernel+0x189/0x224 [<ffffffff8075e1e5>] _sinittext+0x1e5/0x1eb Code: 0f 0b 68 a1 43 4b 80 c2 a0 00 eb fe 48 83 eb 80 48 8b 05 33 RIP [<ffffffff8034e39e>] swiotlb_init_with_default_size+0xa0/0x1a0 RSP <ffffffff80755f40> <0>Kernel panic - not syncing: Fatal exception ----------- [cut here ] --------- [please bite here ] --------- independently from whether or not I added dom0_mem=-128M to the Xen command line (see below why I tried that). The bug hits here (... at least looking at the RHEL-5.8.z source -- we'll need a RHEL guest here, not a CentOS one): swiotlb_init_with_default_size() [arch/i386/kernel/swiotlb.c] for (i = 0; i < iotlb_nslabs; i += IO_TLB_SEGSIZE) { int rc = xen_create_contiguous_region( (unsigned long)iotlb_virt_start + (i << IO_TLB_SHIFT), get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT), IO_TLB_DMA_BITS); BUG_ON(rc); } The default size is 64 MB [lib/swiotlb.c], IO_TLB_SHIFT is 11. So (because we didn't pass "swiotlb=NNNN,force") iotlb_nslabs = (64 * (1 << 20) >> 11) = 64 * 512 = 32768 It is then rounded up to be a multiple of IO_TLB_SEGSIZE (128) -- no change. We try to allocate 32768 / 128 = 256 contiguous regions, each 256 KB in size. One of those xen_create_contiguous_region() calls fails into the loop, the guest kernel panics, and the hypervisor freezes (^A^A^A over serial ceases working). If I try hard, I can squeeze these symptoms in the original problem -- or my idea thereof: (a) The hypervisor freezes when a PV domain with *unreleased* contiguous regions (bounce buffers) is destroyed. (b) Originally (= 5.7.z guest), this situation was produced by the sym53c8xx driver: when it was removed, some bounce buffers (sg scatter-gather areas) remained unreleased. We didn't see any problems until the domain was shut down. Then the hypervisor froze. (c) With the 5.8 guest, we trigger the problem during boot: we allocate *some* bounce buffers, then kill the domain (for whatever reason) while still owning those bounce buffers.
I'm wrong about the bounce buffer allocation/leak causing the freeze. I specified "swiotlb=8192,force" on the domU command line, which caused the domU to panic in alloc_bootmem_low_pages(), right before the loop quoted above -- before the first call to xen_create_contiguous_region(). The hypervisor still froze.
"xm pci-list-assignable-devices" doesn't list either of the devices (0000:02:05.0 0000:02:05.1). I think this is due to the fact that both devices / functions have non-page-aligned BARs -- and the pci-list-assignable-devices command seems to exclude those. Search "tools/python/xen/util/pci.py" for "has_non_page_aligned_bar" -- methods check_mmio_bar and PciDevice::__init__. xm_pci_list_assignable_devices lists only what passes through check_mmio_bar [tools/python/xen/xm/main.py]. 02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) Region 0: I/O ports at 7000 [size=256] Region 1: Memory at ff0ff800 (32-bit, non-prefetchable) [size=256] Region 2: Memory at ff0fd000 (32-bit, non-prefetchable) [size=4K] Kernel modules: sym53c8xx 02:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) Region 0: I/O ports at 8000 [size=256] Region 1: Memory at ff0ffc00 (32-bit, non-prefetchable) [size=256] Region 2: Memory at ff0fe000 (32-bit, non-prefetchable) [size=4K] Kernel modules: sym53c8xx Each iomem region's start address should be a multiple of 0x1000 (in order to qualify as page-aligned). Region 1 of the first function has remainder 0x800, while Region 1 of the second function has remainder 0xc00. Note that upstream xend plainly refuses to pass through such devices with "non-page-aligned MMIO BAR found". The linux-2.6.18-xen kernel was extended to fix up the alignment: http://wiki.xen.org/wiki/Xen_PCI_Passthrough --> drivers/pci/guestdev.c (we don't have it) It looks like a WONTFIX to me. I don't know how it could have worked earlier, but "xm pci-list-assignable-devices" not listing it is certainly disturbing. Referring back to the wiki link above, it says: 4. Run "xm pci-list-assignable-devices" and verify the PCI device is available for passthru 5. *Note important: Make sure the device shows up in the "xm pci-list-assignable-devices" list! Don't continue before you've gotten that properly working.* Backporting "drivers/pci/guestdev.c" seems risky, it is not isolated (search the upstream hg repo for "pci_is_reassigndev"). Paolo, what do you think?
(In reply to comment #28) > Note that upstream xend plainly refuses to pass through such devices with > "non-page-aligned MMIO BAR found". The linux-2.6.18-xen kernel was extended to > fix up the alignment: > > http://wiki.xen.org/wiki/Xen_PCI_Passthrough > --> drivers/pci/guestdev.c (we don't have it) (Search the wiki page for "reassign_resources".)
I'm replacing the guest with a RHEL guest and will try older kernels to see when it worked. I do remember 5.4ish being okay.
I reinstalled the guest as RHEL5.8 and with the kernel 2.6.18-308.el5xen the machine boots lspci sees the passed through cards but will crash the hypervisor at poweroff. Same if the virtual machine is killed with xm destroy. With 2.6.18-308.1.1.el5xen running on the guest the machine crashes right at poweron with the message from comment #28. Starting 2.6.18-308.1.1.el5xen _without_ the swiotlb=force command line makes the guest correctly boot but exhibits the previous behaviour where the hypervisor would freeze at shutdown of the guest.
Need to look more closely at the upstream patches, but freezing the HV seems like a no-no to me. *Something* needs to be done, even if the fix is just to forbid passthrough at Xend time.
$ git log --oneline --reverse 2.6.18-308.el5..2.6.18-308.1.1.el5 0b4d341 [fs] prevent lock contention in shrink_dcache_sb via private list 2b468ab [net] igb: reset PHY after recovering from PHY power down c606d4b [kernel] sysctl: restrict write access to dmesg_restrict 266669d [usb] cdc-acm: make lock use interrupt safe dc5c7a3 [net] bnx2x: add fan failure event handling 06422a2 [net] bnx2x: make bnx2x_close static again e7c3ca9 [net] tg3: Fix 4k tx and recovery code 2492abd Revert: [scsi] qla2xxx: fix IO failure during chip reset 02db1d8 Revert: [scsi] qla2xxx: avoid SCSI host_lock dep in queuecommand bc84712 tag: kernel-2.6.18-308.1.1.el5 Nothing seems to be related. Andreas, if you remove the pci = [ ... ] stanza from the domU config, does that "fix" the domU? (Just a sanity check.) I'd try it myself but you seem to be logged in. .... Regarding catching it in Xend: http://xenbits.xensource.com/hg/xen-unstable.hg/rev/18046 http://xenbits.xensource.com/hg/xen-unstable.hg/rev/18414 http://xenbits.xensource.com/hg/xen-unstable.hg/rev/18965 http://xenbits.xensource.com/hg/xen-unstable.hg/rev/20324
The two functions share a single page: Region 1: Memory at ff0ff800 (32-bit, non-prefetchable) [size=256] Region 1: Memory at ff0ffc00 (32-bit, non-prefetchable) [size=256] ^^^^^ page(frame) 0xff0ff doesn't seem to be shared with anything else. I wonder if this is a "double free" ("double revoke") kind of problem... XEN_DOMCTL_iomem_permission
(In reply to comment #34) > Andreas, if you remove the pci = [ ... ] stanza from the domU config, does that > "fix" the domU? (Just a sanity check.) I'd try it myself but you seem to be > logged in. Ahh, right. I was trying different kernels and left the connection open. As for the different kernels, here's a little table: Guest kernel pci swiotlb boots freezes comment 2.6.18-308.1.1.el5xen no no yes no looks fine 2.6.18-308.1.1.el5xen no force no no crash at boot, hv ok 2.6.18-308.1.1.el5xen yes no yes yes hv freezes at shutdown 2.6.18-308.1.1.el5xen yes force no yes hv freezes at crash 2.6.18-308.el5xen no no yes no looks fine 2.6.18-308.el5xen no force no no crash at boot, hv ok 2.6.18-308.el5xen yes no yes yes hv freezes at shutdown 2.6.18-308.el5xen yes force no yes hv freezes at crash I tried a few different dom0 kernels but changing these seems to have no effect.
I did some additional testing. Passthrough always causes the hv to freeze (either at guest crash or guest shutdown time). swiotlb=force always causes the guest to crash. Here's the same bug for Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=446148 Something else I found: http://old-list-archives.xen.org/archives/html/xen-users/2010-03/msg00213.html but the devices in question don't seem to share interrupts.
(In reply to comment #35) > The two functions share a single page: > > Region 1: Memory at ff0ff800 (32-bit, non-prefetchable) [size=256] > Region 1: Memory at ff0ffc00 (32-bit, non-prefetchable) [size=256] > ^^^^^ > page(frame) 0xff0ff doesn't seem to be shared with anything else. I wonder if > this is a "double free" ("double revoke") kind of problem... > XEN_DOMCTL_iomem_permission I wrote a trivial hv patch (for -308.1.1) to log this domctl; this was the output: (XEN) iomem_permit_access domid=1 mfn=[0xff0ff, 0xff0ff] -> fn0 region1 (XEN) iomem_permit_access domid=1 mfn=[0xff0fd, 0xff0fd] -> fn0 region2 (XEN) iomem_permit_access domid=1 mfn=[0xff0d0, 0xff0df] -> fn0 expansion ROM (XEN) iomem_permit_access domid=1 mfn=[0xff0ff, 0xff0ff] -> fn1 region1 (XEN) iomem_permit_access domid=1 mfn=[0xff0fe, 0xff0fe] -> fn1 region2 (XEN) iomem_permit_access domid=1 mfn=[0xff0e0, 0xff0ef] -> fn1 expansion ROM mfn=[0xff0ff, 0xff0ff] is logged twice. I saw no "iomem_deny_access" (ie. revoke) messages before the freeze (... although the domctl may not be exercised at domain destroy time at all).
(In reply to comment #37) > I did some additional testing. Passthrough always causes the hv to freeze > (either at guest crash or guest shutdown time). swiotlb=force always causes the > guest to crash. A small addition here. Passthrough always causes the hv to freeze _when_ using the Symbios SCSI card. When using another card in the same hardware, everything works fine: [root@guest ~]# cat /proc/cmdline ro root=/dev/VolGroup00/LogVol00 console=tty0 console=xvc0 [root@guest ~]# lspci 00:00.0 Communication controller: Cyclades Corporation Cyclades-Z above first megabyte (rev 01) [root@guest ~]# modprobe cyclades [root@guest ~]# dmesg | tail -n 4 Cyclades driver 2.3.2.20 2004/02/25 18:14:16 built Jan 3 2012 16:52:16 PCI: Enabling device 0000:00:00.0 (0000 -> 0003) Cyclades-8Zo/PCI #1: 0xff500000-0xff57ffff, 8 channels starting from port 0. [root@guest ~]# When shutting down this machine, everything keeps working and the hypervisor does not freeze. Notes to reproduce on the testmachine: # modprobe pciback # echo 0000:00:05.0 > /sys/bus/pci/drivers/pciback/new_slot # xm create -c test-cyclades
... which is exactly why finding a solution is not easy. As you say this card works OK. However it's also not listed by "xm pci-list-assignable-devices", for the same reason: 00:05.0 Communication controller: Cyclades Corporation Cyclades-Z above first megabyte (rev 01) Region 0: Memory at ff6eac00 (32-bit, non-prefetchable) [disabled] [size=128] Region 1: I/O ports at ec00 [disabled] [size=128] Region 2: Memory at ff500000 (32-bit, non-prefetchable) [disabled] [size=1M] Expansion ROM at ff690000 [disabled] [size=4K] Region 0 is not page aligned. So now we have three functions (across two cards) with one not-page-aligned region each. One function works, two others can hang the hypervisor. If we default-disable passthrough for these, we break the working case. If we additionally introduce an override (to keep the working case working, against the default disable), it has to be card specific. OTOH I do notice Region 0 is disabled for the Cyclades card...
Andreas, can you please check - if there's a newer BIOS for the machine, - if there's an option in the BIOS to set memory ranges of PCI devices? Thanks, Laszlo
I believe the BIOS is current. I'll verify though. Same for the memory range.
The hypervisor log contains messages like: (XEN) PCI add device 00:18.2 (XEN) traps.c:1910:d0 Domain attempted WRMSR 0000000000000410 from 00000000:00000000 to 00000000:00003bff. (XEN) traps.c:1910:d0 Domain attempted WRMSR 0000000000000410 from 00000000:00000000 to 00000000:00003bff. (XEN) PCI add device 00:19.2 (XEN) traps.c:1910:d0 Domain attempted WRMSR 0000000000000410 from 00000000:00000000 to 00000000:00003bff. (XEN) traps.c:1910:d0 Domain attempted WRMSR 0000000000000410 from 00000000:00000000 to 00000000:00003bff. #define MSR_K8_MC4_CTL 0x410 (The box has two dual-core Opteron 275 CPUs (Italy), so "K8" makes sense.) The kernel tries to write to this MSR in "drivers/edac/k8_edac.c" -- "MC support for AMD K8 memory controllers". 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller #define K8_MSR_MC4CTL 0x0410 /* North Bridge Check report ctl (64b) */ #define K8_MSR_MC4STAT 0x0411 /* North Bridge status (64b) */ #define K8_MSR_MC4ADDR 0x0412 /* North Bridge Address (64b) */ k8_init_one == k8_driver.k8_init_one k8_probe1 k8_enable_error_reporting
(In reply to comment #43) > I believe the BIOS is current. I'll verify though. Same for the memory range. Thank you! (In reply to comment #30) > I'm replacing the guest with a RHEL guest and will try older kernels to see > when it worked. I do remember 5.4ish being okay. Do you mean a RHEL-5.4 guest worked with a brand new (RHEL-5.8) dom0?
(In reply to comment #41) > Andreas, can you please check > - if there's a newer BIOS for the machine, Nope. 1.06 is the current one. > - if there's an option in the BIOS to set memory ranges of PCI devices? Didn't see anything. But feel free to look around. Bios is available via the serial console when pressing F4 during POST.
(In reply to comment #22) > I am not sure I can actually find an IOMMU enabled machine which will be > able to house the card. It's the only symbios controller card I have > around and it's PCI. All my IOMMU machines are PCIe though. Will make > certain however. ( I found such an option: - Chipset | NorthBridge Configuration | IOMMU Option Menu | IOMMU Mode But it seems to be GART related, not AMD-Vi. )
... and the following were apparently not accepted: http://old-list-archives.xen.org/archives/html/xen-devel/2006-01/msg00435.html http://old-list-archives.xen.org/archives/html/xen-devel/2006-01/msg00434.html
Hello Andreas, can you please confirm this is a regression? Ie. can you pinpoint a hv/dom0/domU combination where the card worked? I've done some rough bisection back to RHEL-5.0 / RHEL-5.1 on the host side (comment 37), and no combination I tried worked. If we can prove this is a regression, we have a chance bisecting it (which I tried and failed at). If not, then we may have to close the BZ as INSU. I agree with Paolo's comment 33, but at this point we don't even seem to have enough information to exclude the device from passthrough. We could check the PCI vendor and the device ID, but if we change the default for it, we'll break passthrough for people who use this card without problems -- they'll have to override manually.
This bug appears to have gone cold. Comment 49 asked for information from the reporter almost three months ago. We can leave it open another couple weeks in case this comment wakes it back up, otherwise it should be closed as INSU.
I'm not using this configuration anymore, so I can't help with this now.
Closing as INSU. Thank you for your cooperation!
Reopening. I still have the hardware and am now again in a position to test things.
Hi Andreas, could you please revisit comment 49? Thank you, Laszlo
Please reopen only with a linked Customer Portal case. Thank you.