Bug 436493 - F-9 pv_ops_xen: kernel doesn't see the disks some of the time (intermittent boot failure)
Summary: F-9 pv_ops_xen: kernel doesn't see the disks some of the time (intermittent b...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel-xen-2.6
Version: rawhide
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Xen Maintainance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PvOpsTracker
TreeView+ depends on / blocked
 
Reported: 2008-03-07 15:54 UTC by Michael Young
Modified: 2008-03-20 14:11 UTC (History)
2 users (show)

Fixed In Version: kernel-xen-2.6-2.6.25-0.4.rc4.fc9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-03-20 14:11:49 UTC
Type: ---


Attachments (Terms of Use)

Description Michael Young 2008-03-07 15:54:31 UTC
I have a fedora 9 alpha xen machine (updated by updating the packages from
Fedora 8 xen) which I am trying to boot from the 2.6.25-0.0.rc4.fc9xen xen
kernel. However it frequently doesn't boot, with the boot messages finishing up
Loading xenblk module
blkfront: xvda: barriers enabled
 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 xvda6 >
Trying to resume from /sys/block/hda//hda3
Unable to access resume device (/sys/block/hda//hda3)
Creating root device.
Mounting root filesystem.
mount: could not find filesystem '/dev/root'
Setting up other filesystems.
Setting up new root fs
setuproot: moving /dev failed: No such file or directory
no fstab.sys, mounting internal defaults
setuproot: error mounting /proc: No such file or directory
setuproot: error mounting /sys: No such file or directory
Mount failed for selinuxfs on /selinux:  No such file or directory
Switching to new root and running init.
unmounting old /dev
unmounting old /proc
unmounting old /sys
switchroot: mount failed: No such file or directory
Booting has failed.

This is from a fedora 7 xen host machine running xen-3.1.2-2.fc7 and
kernel-xen-2.6.21-7.fc7. When trying to debug this I built a new initrd
replacing /sys/block/hda//hda3 with /sys/block/xvda//xvda3 and on unsuccessful
boots get something like
Loading xenblk module
blkfront: xvda: barriers enabled
 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5Trying to resume from /sys/block/xvda//xvda3
Unable to access resume device (/sys/block/xvda//xvda3)
Creating root device.
...
Booting has failed.
 xvda6 >

and for a successful boot I get
Loading xenblk module
blkfront: xvda: barriers enabled
 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 xvda6 >
Trying to resume from /sys/block/xvda//xvda3
Creating root device.
Mounting root filesystem.
kjournald starting.  Commit interval 5 seconds
...

which suggests to me that the disks aren't accessible, or maybe haven't finished
being mounted for some boots, since when it fails it can't access
/sys/block/xvda//xvda3 but it can when it succeeds.

Comment 1 Daniel Berrangé 2008-03-07 16:12:04 UTC
So it looks like the initrd is continuing before the kernel has finished
scanning the disk for partitions & thus failing to find the partitions it wants.
This was fixed in the original Xen kernels a little while back by making it
block until partition scanning is done. Can't remember the changeset offhand
right now though...

Comment 2 Mark McLoughlin 2008-03-07 16:15:57 UTC
Presumably you weren't seeing this issue before updating to the pv_ops kernel?

Anything interesting in the /var/log/xen logs?

Comment 3 Mark McLoughlin 2008-03-07 16:38:51 UTC
Thanks Dan, looks like this is it:

  http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017

See bug #241793 and bug #247265

Comment 4 Michael Young 2008-03-07 16:41:43 UTC
Re Comment #2
Yes, I didn't see this before the pv_ops kernel. the logs aren't interesting,
with no significant difference between a successful and an unsuccessful boot.
But the timing issue does seem likely. I have hacked the initrd again to put a
sleep 5 line after the modprobe -q xenblk one, and it boots much more reliably.

Comment 5 Mark McLoughlin 2008-03-20 14:11:49 UTC
Should be fixed in rawhide now

* Thu Mar 20 2008 Mark McLoughlin <markmc@redhat.com>
- Make xen-blkfront module load wait until backend is
  connected; fixes intermittent boot failure (bug #436493)



Note You need to log in before you can comment on or make changes to this bug.