Description of problem: When the xen block frontend driver is built as a module the module load is only synchronous up to the point where the frontend and the backend become connected rather than when the disk is added. This means that there can be a race on boot between loading the module and loading the dm-* modules and doing the scan for LVM physical volumes (all in the initrd). In the failure case the disk is not present until after the scan for physical volumes is complete. Version-Release number of selected component (if applicable): 2.6.18-8.1.6.EL Also applies to RHEL4u5 2.6.9-55.0.2.EL Upstream fix is: http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017 http://xenbits.xensource.com/kernels/rhel4x.hg?rev/156e3eaca552 How reproducible: Not precisely determined. Seems to be related to high load in domain 0 or in multiple other domains implying delays in scheduling the backend driver. Actual results: Loading xenblk.ko module Registering block device major 202 xvda:Loading dm-mod.ko module <6>device-mapper: ioctl: 4.11.0-ioctl (2006-09-14) initialised: dm-devel Loading dm-mirror.ko module Loading dm-zero.ko module Loading dm-snapshot.ko module Making device-mapper control node Scanning logical volumes Reading all physical volumes. This may take a while... xvda1 xvda2 No volume groups found Activating logical volumes Volume group "VolGroup00" not found Creating root device. Mounting root filesystem. mount: could not find filesystem '/dev/root' Setting up other filesystems. Setting up new root fs setuproot: moving /dev failed: No such file or directory no fstab.sys, mounting internal defaults setuproot: error mounting /proc: No such file or directory setuproot: error mounting /sys: No such file or directory Switching to new root and running init. unmounting old /dev unmounting old /proc unmounting old /sys switchroot: mount failed: No such file or directory Kernel panic - not syncing: Attempted to kill init! Expected results: Registering block device major 202 xvda: xvda1 xvda2 Loading dm-mod.ko module device-mapper: ioctl: 4.11.0-ioctl (2006-09-14) initialised: dm-devel Loading dm-mirror.ko module Loading dm-zero.ko module Loading dm-snapshot.ko module Making device-mapper control node Scanning logical volumes Reading all physical volumes. This may take a while... Found volume group "VolGroup00" using metadata type lvm2 Activating logical volumes 2 logical volume(s) in volume group "VolGroup00" now active Additional info:
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
AFAICS, this only solves half the problem. Waiting for the disk to be added gets you so far. You also need to wait for the partitions on the disk to be scanned. See 241793 comment 15 (and onwards).
Bug 241793 comment 15 (Bugzilla _really_ needs a preview feature).
The proposed patch causes the module load to wait until add_disk() has returned. In 2.6.18 at least this calls down to rescan_partitions in a synchronous manner. (add_disk->register_disk->blkdev_get->do_open->rescan_partitions).
trapped into this issue too while trying to make rhel5 boot with pv-on-hvm drivers. Fixed it this way: --- /sbin/mkinitrd.kraxel 2007-07-11 13:25:06.000000000 +0200 +++ /sbin/mkinitrd 2007-07-12 12:58:16.000000000 +0200 @@ -1239,9 +1239,11 @@ unset usb_mounted if [ -n "$scsi" ]; then - emit "echo Waiting for driver initialization." + emit "echo Waiting for driver initialization (scsi)." emit "stabilized --hash --interval 250 /proc/scsi/scsi" fi +emit "echo Waiting for driver initialization (partitions)." +emit "stabilized --hash --interval 250 /proc/partitions" if [ -n "$vg_list" ]; then
I have now tested the Xen upstream fix and it works.
Created attachment 159140 [details] Upstream patch against 2.6.20-2925.10
Created attachment 159143 [details] Patch against RHEL 5.1 2.6.18-32.el5 kernel This patch applies cleanly against the RHEL 5.1 2.6.18-32.el5 kernel (you will need %patch... -p2).
change QA contact
I'd ping dzickus about this one, last comment from him about the patch was a request for a -p1 repost, which Rik did on 7/20, but it doesn't yet appear to have been pulled into the kernel build.
in 2.6.18-37.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
*** Bug 241793 has been marked as a duplicate of this bug. ***
*** Bug 230561 has been marked as a duplicate of this bug. ***
The feedback from Fujitsu: ------------------------------------ We tested this issue on PV domain with RHEL5.1 snapshot 2. The result is fine. But we also tested it on HVM domain, the result is no good. We understand currently system-volume is qemu-dm device and so we not get this problem. We try to system-volume be VBD device, and this case we get same this problem. How about system-volume be VBD device in Red Hat? On HVM domain which system-volume is VBD device, it is difficult to fix this problem at driver side. We think some waiting logic is needed for the initial sequence of Linux OS. Thank you. Nishi This event sent from IssueTracker by mmatsuya issue 125723
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html