I have a fedora 9 alpha xen machine (updated by updating the packages from Fedora 8 xen) which I am trying to boot from the 2.6.25-0.0.rc4.fc9xen xen kernel. However it frequently doesn't boot, with the boot messages finishing up Loading xenblk module blkfront: xvda: barriers enabled xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 xvda6 > Trying to resume from /sys/block/hda//hda3 Unable to access resume device (/sys/block/hda//hda3) Creating root device. Mounting root filesystem. mount: could not find filesystem '/dev/root' Setting up other filesystems. Setting up new root fs setuproot: moving /dev failed: No such file or directory no fstab.sys, mounting internal defaults setuproot: error mounting /proc: No such file or directory setuproot: error mounting /sys: No such file or directory Mount failed for selinuxfs on /selinux: No such file or directory Switching to new root and running init. unmounting old /dev unmounting old /proc unmounting old /sys switchroot: mount failed: No such file or directory Booting has failed. This is from a fedora 7 xen host machine running xen-3.1.2-2.fc7 and kernel-xen-2.6.21-7.fc7. When trying to debug this I built a new initrd replacing /sys/block/hda//hda3 with /sys/block/xvda//xvda3 and on unsuccessful boots get something like Loading xenblk module blkfront: xvda: barriers enabled xvda: xvda1 xvda2 xvda3 xvda4 < xvda5Trying to resume from /sys/block/xvda//xvda3 Unable to access resume device (/sys/block/xvda//xvda3) Creating root device. ... Booting has failed. xvda6 > and for a successful boot I get Loading xenblk module blkfront: xvda: barriers enabled xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 xvda6 > Trying to resume from /sys/block/xvda//xvda3 Creating root device. Mounting root filesystem. kjournald starting. Commit interval 5 seconds ... which suggests to me that the disks aren't accessible, or maybe haven't finished being mounted for some boots, since when it fails it can't access /sys/block/xvda//xvda3 but it can when it succeeds.
So it looks like the initrd is continuing before the kernel has finished scanning the disk for partitions & thus failing to find the partitions it wants. This was fixed in the original Xen kernels a little while back by making it block until partition scanning is done. Can't remember the changeset offhand right now though...
Presumably you weren't seeing this issue before updating to the pv_ops kernel? Anything interesting in the /var/log/xen logs?
Thanks Dan, looks like this is it: http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017 See bug #241793 and bug #247265
Re Comment #2 Yes, I didn't see this before the pv_ops kernel. the logs aren't interesting, with no significant difference between a successful and an unsuccessful boot. But the timing issue does seem likely. I have hacked the initrd again to put a sleep 5 line after the modprobe -q xenblk one, and it boots much more reliably.
Should be fixed in rawhide now * Thu Mar 20 2008 Mark McLoughlin <markmc> - Make xen-blkfront module load wait until backend is connected; fixes intermittent boot failure (bug #436493)