Created attachment 124741 [details] Picture of the oops
Description of problem: I mirror (RAID 1) my notebook's internal HD to an external Firewire/USB enclosure. If I boot with Xen, it won't bring up the RAID members from the sbp2 disk early enough. Shortly after I add them back to the RAID set, I get this sort of errors in /var/log/messages: ieee1394: sbp2: aborting sbp2 command sd 0:0:1:0: command: Write (10): 2a 00 05 1f eb a5 00 00 80 00 ieee1394: sbp2: aborting sbp2 command sd 0:0:1:0: command: Test Unit Ready: 00 00 00 00 00 00 ieee1394: sbp2: reset requested ieee1394: sbp2: Generating sbp2 fetch agent reset ieee1394: sbp2: aborting sbp2 command sd 0:0:1:0: command: Test Unit Ready: 00 00 00 00 00 00 sd 0:0:1:0: scsi: Device offlined - not ready after error recovery sd 0:0:1:0: SCSI error: return code = 0x50000 Needless to say, the RAID resyncing didn't go very far. On another session, *right* after I readded the external-disk partition to the raid set, I got an oops, as in the attached picture. Version-Release number of selected component (if applicable): kernel-xen-hypervisor-2.6.15-1.1948_FC5
This oops doesn't look Xen-specific, and I've seen sbp2 errors like this (though not with an oops) on older non-xen kernels. Are you sure it's only happening with Xen, or does the non-xen 1948 kernel show the same problem? Also, why does the RAID set not get constructed early enough? What errors do you get when it attempts to do so?
The non-xen kernels work perfectly fine. No such errors are produced (although I have seen `command time out´ errors with earlier kernels, especially when doing RAID over two Firewire disks, which is no longer the case. I don't know why the sbp2 raid members were not started when booting with Xen; the sbp2 module was loaded, and so was the usb-storage module, that introduces an 8-second delay in the boot, enough for usb and firewire devices to be recognized (even when I don't have any USB disks plugged in :-) That works with non-Xen kernels. I didn't see any errors fly by during Xen boot up, but I didn't notice whether it recognized the device early enough. I was actually very surprised it did boot into the Xen kernel by default. Maybe it didn't, and that would explain why the raid members in it didn't come up. I'll try the Xen kernel again momentarily.
I have a pretty strong theory on why raid didn't come up the first time: my kickstart file would rebuild initrd.img with sbp2 for `uname -r`, not for the Xen hypervisor kernel. mkinitrd should still have set it up by default, but I don't think I've ever tested its recent magic within the installer. Anyhow, I tried with 1.1955_FC5hypervisor and a `du -ks /' was enough to trigger the sbp2 errors after a minute or so. With the non-Xen kernel, it works just fine.
Do you get the same errors with the normal SMP kernel? We've seen sbp2 errors like this before on SMP (upstream SMP seems to have problems with sbp2), and dom0 is implicitly an SMP kernel.
No, tried that earlier today, and sbp2 appears to be rock solid on my other box, with an Athlon64X2 processor booted with both cpus enabled (I've used maxcpus=1 to work around other random problems, but sbp2 does not appear to suffer from the same sort of problems that usb-storage, for example, does). Not sure whether you meant SMP as in kernel built for SMP or as in running on an actual SMP box. On x86_64, the default kernel is SMP, so that's what I've been claiming to work fine on the notebook on which I tried the xen kernel all along.
change QA contact
This report targets FC5, which is now end-of-life. Please re-test against Fedora 7 or later, and if the issue persists, open a new bug. Thanks