Description of problem: kernel-xen-2.6.19-1.2911.6.5.fc6.i686 and newer kernels OOPS immediately upon plugging in any sbp2/firewire attached hard drives. Version-Release number of selected component (if applicable): kernel-xen-2.6.19-1.2911.6.5.fc6.i686 and all newer kernels released officially by the fedora project as of this posting. How reproducible: Happens every single time reliably. Steps to Reproduce: 1. If my external firewire chassis is plugged in during boot of xen kernel, crash will occur during boot. (during sbp2 init) 2. Kernel boots fine, dom0 is also booted fine if my external firewire chassis is NOT plugged in during boot. All applications run normally. However, the second I plug in my external firewire chassis, xen panics and hangs. 3. Actual results: kernel BUG at lib/../arch/i386/kernel/swiotlb.c:394! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: sbp2 bridge netloop netbk blktap blkbk ipv6 dm_mirror dm_mod raid1 raid0 video sb s i2c_ec button battery asus_acpi ac parport_pc lp parport sg iTCO_wdt ahci ide_cd ohci1394 i2c_i801 ieee1394 i2c_core cdrom serial_core pl2303 pcspkr sky2 usbserial floppy ata_piix libata sd_mod scsi Expected results: Smooth sailin' Additional info: Please know that NON-xen kernels work very happily on this system, and firewire is no trouble whatsoever. Gigabyte GA-965P-DS3 motherboard (rev 1.0, bios F10) E6400 Core-2-Duo CPU 4GB DDR2-800 ICH8 Northbridge PCI-Express 1394a FireWire card. Please see attached file for complete OOPS (sbp2 initilization). This is what a normal init looks like on a non-xen kernel: ieee1394: Initialized config rom entry `ip1394' ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17 ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[17] MMIO=[f5004000-f50047ff] Max Packet=[2048] IR/IT contexts=[4/8] ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ieee1394: Node added: ID:BUS[0-00:1023] GUID[0012100200000523] ieee1394: Node added: ID:BUS[0-01:1023] GUID[0012100200000522] ieee1394: Node added: ID:BUS[0-02:1023] GUID[0012100200000521] ieee1394: Node added: ID:BUS[0-03:1023] GUID[0012100200000520] ieee1394: Node added: ID:BUS[0-04:1023] GUID[001210020000051f] ieee1394: Node added: ID:BUS[0-05:1023] GUID[001210020000051e] ieee1394: Node changed: 0-00:1023 -> 0-06:1023 ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) ieee1394: sbp2: Try serialize_io=0 for better performance scsi4 : SBP-2 IEEE-1394 And then it goes on to detect all of my drives happily.
Created attachment 151899 [details] OOPS during plugging in a firewire drive
This was perhaps fixed in kernel.org's 2.6.21-rcX by patch "[IA64] make swiotlb use bus_to_virt/virt_to_bus" http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=93fbff63e62b87fe450814db41f859d60b048fb8 and in kernel.org's 2.6.19.6, 2.6.20.2, and 2.6.16.44 by patch "Missing critical phys_to_virt in lib/swiotlb.c" http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commit;h=e16b67f9a0ac6d9f89f680b7f3b439abfb1dac5e http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.19.y.git;a=commit;h=bcaaa45c3feb2fcc36a247011970d5026c286154 http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.16.y.git;a=commit;h=d4705d6dc74016619a1a6565dd54c7c5269c25d0
kernel-xen-2.6.20-1.2943.fc6 from testing repo still crashes in the same fashion, and I believe it to include the patches noted in comment #2 for 2.6.20.2.
Could you check the kernel source whether it really has the patch? If it does, could you build a kernel with the following patch applied? "ieee1394: sbp2: enforce 32bit DMA mapping" http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commit;h=f8ab7cc6e5457670145e31af6571eb3a584dfddb If you need help with that, give me the URL of the RPM with kernel-xen-... sources. (But not the SRPM please.)
I applied the following patch to the 2943 kernel as suggested by comment #4, however the problem still exists and the crash still happens in the same way: *** a/drivers/ieee1394/sbp2.c 2007-02-04 13:44:54.000000000 -0500 --- b/drivers/ieee1394/sbp2.c 2007-04-08 21:10:01.000000000 -0400 *************** *** 765,770 **** --- 765,775 ---- SBP2_ERR("failed to register lower 4GB address range"); goto failed_alloc; } + #else + if (dma_set_mask(hi->host->device.parent, DMA_32BIT_MASK)) { + SBP2_ERR("failed to set 4GB DMA mask"); + goto failed_alloc; + } #endif }
A naive newbie question: Is it the host OS or a guest OS that explodes?
(In reply to comment #6) > A naive newbie question: Is it the host OS or a guest OS that explodes? Both crash. As best as I can tell, dom0 crashes first, followed shortly by xen itself.
Created attachment 152155 [details] Xen Crash 2943 kernel sbp2
Forgive my ignorance, but in whose context is the sbp2scsi_queuecommand / sync_single run? The host's or the guest's?
I believe it'(In reply to comment #9) > Forgive my ignorance, but in whose context is the sbp2scsi_queuecommand / > sync_single run? The host's or the guest's? I am a newbie to xen.. but I'm guessing the context would be dom0.. the guest.
Created attachment 152624 [details] fedora-xen-2944 crash during boot I just tested the newly released Fedora-xen-2944 kernel, and unfortunately this bug still exists. FEDORA + XEN + SBP2 drives = OOPS. :(
Created attachment 153998 [details] fedora-xen-2948 crash during boot I just tired the newly release 2.6.20-1.2948 xen kernel.. unfortunately, it still crashes during sbp2 init. help :(
Created attachment 154003 [details] ieee1394: sbp2: move some memory allocations into non-atomic context and use GFP_DMA32 Re comment #3: Could you attach lib/swiotlb.c here? Make sure it is the one used in the kernel you are running. Also, you could try the attached patch. I took it from the last upstream patch submission round (post 2.6.21), so you might get conflicts when applying it to 2.6.20-something... The original patch as it went into mainline only did a GFP_ATOMIC -> GFP_KERNEL switch; in the attached version I also added GFP_DMA32 to the affected allocation to steer clear of swiotlb bounce buffers.
This is a problem with Xen's swiotlb which doesn't handle sync_single with DMA_BIDECTIONAL. There's three solutions. Preferable is to use TODEVICE or FROMDEVICE, but if that buffer gets written by both sides that's not an option. Next is simply disable this module in the Xen build. Last is fix Xen's swiotlb to handle DMA_BIDRECTIONAL in this case. The last option is the best, however unclear on what is needed to make this fix (so 2nd option is mostly likely one to use).
All bidirectional DMA mappings in drivers/ieee1394/sbp2.c were unnecessary. They were recently converted to the more specific DMA directions. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2446a79f4f0a5e88e5d8316dac407d66ac10f70d I will look up the 2.6.20-1.2948 sources and post a refreshed version of that commit circa tomorrow.
Created attachment 154609 [details] [PATCH 2.6.20] ieee1394: sbp2: optimize DMA direction of s/g tables I can't find a suitable archive of 2.6.20-1.2948 sources. So here you have the patch against vanilla 2.6.20.
I am happy to report that the patch provided in comment #16 has fixed my firewire issues. :) XEN is running happily now. Thanks!
I h
I have also seen this against the new drivers/firewire in rawhide, however it's from different path. The problem is in drivers/firewire/fw-ohci.c with the Async Receive Contexts setup in ar_context_add_page. While the buffer may be bidirectional, I don't see the point for the bidirectional sync for device (which is what's causing problem for Xen in rawhide) in that spot. The buffer is written by CPU and needs to by sync'd to device there, AFAICT. Shouldn't that just be DMA_TO_DEVICE?
Removing dependency, that was added automatically by bugzilla cloning. The FC6 bug is not dependent on the FC7 bug.
Patch from comment #16 included on CVS.