181747 – sbp2 oopses on kernel-xen-hypervisor

Bug 181747 - sbp2 oopses on kernel-xen-hypervisor

Summary: sbp2 oopses on kernel-xen-hypervisor

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel-xen
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Juan Quintela
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	179269
TreeView+	depends on / blocked

Reported:	2006-02-16 06:22 UTC by Alexandre Oliva
Modified:	2009-12-14 20:42 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-02-26 22:58:08 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Picture of the oops (54.55 KB, image/jpeg) 2006-02-16 06:22 UTC, Alexandre Oliva	no flags	Details
View All

Description Alexandre Oliva 2006-02-16 06:22:52 UTC

Created attachment 124741 [details]
Picture of the oops

Comment 1 Alexandre Oliva 2006-02-16 06:22:52 UTC

Description of problem:
I mirror (RAID 1) my notebook's internal HD to an external Firewire/USB
enclosure.  If I boot with Xen, it won't bring up the RAID members from the sbp2
disk early enough.  Shortly after I add them back to the RAID set, I get this
sort of errors in /var/log/messages:

ieee1394: sbp2: aborting sbp2 command
sd 0:0:1:0:
        command: Write (10): 2a 00 05 1f eb a5 00
00 80 00
ieee1394: sbp2: aborting sbp2 command
sd 0:0:1:0:
        command: Test Unit Ready: 00 00 00 00 00 00
ieee1394: sbp2: reset requested
ieee1394: sbp2: Generating sbp2 fetch agent reset
ieee1394: sbp2: aborting sbp2 command
sd 0:0:1:0:
        command: Test Unit Ready: 00 00 00 00 00 00
sd 0:0:1:0: scsi: Device offlined - not ready after error recovery
sd 0:0:1:0: SCSI error: return code = 0x50000

Needless to say, the RAID resyncing didn't go very far.  On another session,
*right* after I readded the external-disk partition to the raid set, I got an
oops, as in the attached picture.

Version-Release number of selected component (if applicable):
kernel-xen-hypervisor-2.6.15-1.1948_FC5

Comment 2 Stephen Tweedie 2006-02-17 18:01:01 UTC

This oops doesn't look Xen-specific, and I've seen sbp2 errors like this (though
not with an oops) on older non-xen kernels.  Are you sure it's only happening
with Xen, or does the non-xen 1948 kernel show the same problem?

Also, why does the RAID set not get constructed early enough?  What errors do
you get when it attempts to do so?

Comment 3 Alexandre Oliva 2006-02-17 18:39:47 UTC

The non-xen kernels work perfectly fine.  No such errors are produced (although
I have seen `command time outÂ´ errors with earlier kernels, especially when
doing RAID over two Firewire disks, which is no longer the case.

I don't know why the sbp2 raid members were not started when booting with Xen;
the sbp2 module was loaded, and so was the usb-storage module, that introduces
an 8-second delay in the boot, enough for usb and firewire devices to be
recognized (even when I don't have any USB disks plugged in :-)  That works with
non-Xen kernels.  I didn't see any errors fly by during Xen boot up, but I
didn't notice whether it recognized the device early enough.  I was actually
very surprised it did boot into the Xen kernel by default.  Maybe it didn't, and
that would explain why the raid members in it didn't come up.  I'll try the Xen
kernel again momentarily.

Comment 4 Alexandre Oliva 2006-02-17 19:19:52 UTC

I have a pretty strong theory on why raid didn't come up the first time: my
kickstart file would rebuild initrd.img with sbp2 for `uname -r`, not for the
Xen hypervisor kernel.  mkinitrd should still have set it up by default, but I
don't think I've ever tested its recent magic within the installer.

Anyhow, I tried with 1.1955_FC5hypervisor and a `du -ks /' was enough to trigger
the sbp2 errors after a minute or so.  With the non-Xen kernel, it works just fine.

Comment 5 Stephen Tweedie 2006-02-17 20:35:01 UTC

Do you get the same errors with the normal SMP kernel?  We've seen sbp2 errors
like this before on SMP (upstream SMP seems to have problems with sbp2), and
dom0 is implicitly an SMP kernel.

Comment 6 Alexandre Oliva 2006-02-17 23:55:11 UTC

No, tried that earlier today, and sbp2 appears to be rock solid on my other box,
with an Athlon64X2 processor booted with both cpus enabled (I've used maxcpus=1
to work around other random problems, but sbp2 does not appear to suffer from
the same sort of problems that usb-storage, for example, does).

Not sure whether you meant SMP as in kernel built for SMP or as in running on an
actual SMP box.  On x86_64, the default kernel is SMP, so that's what I've been
claiming to work fine on the notebook on which I tried the xen kernel all along.

Comment 8 Red Hat Bugzilla 2007-07-25 01:30:43 UTC

change QA contact

Comment 9 Chris Lalancette 2008-02-26 22:58:08 UTC

This report targets FC5, which is now end-of-life.

Please re-test against Fedora 7 or later, and if the issue persists, open a new bug.

Thanks

Note You need to log in before you can comment on or make changes to this bug.