Bug 436493

Summary: F-9 pv_ops_xen: kernel doesn't see the disks some of the time (intermittent boot failure)
Product: [Fedora] Fedora Reporter: Michael Young <m.a.young>
Component: kernel-xen-2.6Assignee: Xen Maintainance List <xen-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: rawhideCC: berrange, ehabkost
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-xen-2.6-2.6.25-0.4.rc4.fc9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-20 14:11:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 434756    

Description Michael Young 2008-03-07 15:54:31 UTC
I have a fedora 9 alpha xen machine (updated by updating the packages from
Fedora 8 xen) which I am trying to boot from the 2.6.25-0.0.rc4.fc9xen xen
kernel. However it frequently doesn't boot, with the boot messages finishing up
Loading xenblk module
blkfront: xvda: barriers enabled
 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 xvda6 >
Trying to resume from /sys/block/hda//hda3
Unable to access resume device (/sys/block/hda//hda3)
Creating root device.
Mounting root filesystem.
mount: could not find filesystem '/dev/root'
Setting up other filesystems.
Setting up new root fs
setuproot: moving /dev failed: No such file or directory
no fstab.sys, mounting internal defaults
setuproot: error mounting /proc: No such file or directory
setuproot: error mounting /sys: No such file or directory
Mount failed for selinuxfs on /selinux:  No such file or directory
Switching to new root and running init.
unmounting old /dev
unmounting old /proc
unmounting old /sys
switchroot: mount failed: No such file or directory
Booting has failed.

This is from a fedora 7 xen host machine running xen-3.1.2-2.fc7 and
kernel-xen-2.6.21-7.fc7. When trying to debug this I built a new initrd
replacing /sys/block/hda//hda3 with /sys/block/xvda//xvda3 and on unsuccessful
boots get something like
Loading xenblk module
blkfront: xvda: barriers enabled
 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5Trying to resume from /sys/block/xvda//xvda3
Unable to access resume device (/sys/block/xvda//xvda3)
Creating root device.
...
Booting has failed.
 xvda6 >

and for a successful boot I get
Loading xenblk module
blkfront: xvda: barriers enabled
 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 xvda6 >
Trying to resume from /sys/block/xvda//xvda3
Creating root device.
Mounting root filesystem.
kjournald starting.  Commit interval 5 seconds
...

which suggests to me that the disks aren't accessible, or maybe haven't finished
being mounted for some boots, since when it fails it can't access
/sys/block/xvda//xvda3 but it can when it succeeds.

Comment 1 Daniel Berrangé 2008-03-07 16:12:04 UTC
So it looks like the initrd is continuing before the kernel has finished
scanning the disk for partitions & thus failing to find the partitions it wants.
This was fixed in the original Xen kernels a little while back by making it
block until partition scanning is done. Can't remember the changeset offhand
right now though...

Comment 2 Mark McLoughlin 2008-03-07 16:15:57 UTC
Presumably you weren't seeing this issue before updating to the pv_ops kernel?

Anything interesting in the /var/log/xen logs?

Comment 3 Mark McLoughlin 2008-03-07 16:38:51 UTC
Thanks Dan, looks like this is it:

  http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017

See bug #241793 and bug #247265

Comment 4 Michael Young 2008-03-07 16:41:43 UTC
Re Comment #2
Yes, I didn't see this before the pv_ops kernel. the logs aren't interesting,
with no significant difference between a successful and an unsuccessful boot.
But the timing issue does seem likely. I have hacked the initrd again to put a
sleep 5 line after the modprobe -q xenblk one, and it boots much more reliably.

Comment 5 Mark McLoughlin 2008-03-20 14:11:49 UTC
Should be fixed in rawhide now

* Thu Mar 20 2008 Mark McLoughlin <markmc>
- Make xen-blkfront module load wait until backend is
  connected; fixes intermittent boot failure (bug #436493)