Bug 370961 - race between loading xenblk.ko and scanning for LVM partitions etc.
race between loading xenblk.ko and scanning for LVM partitions etc.
Status: CLOSED DUPLICATE of bug 248024
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen (Show other bugs)
4.5
All Linux
low Severity low
: ---
: ---
Assigned To: Xen Maintainance List
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-08 04:37 EST by Ian Campbell
Modified: 2007-11-16 20:14 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-08 09:56:40 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ian Campbell 2007-11-08 04:37:28 EST
This problem also effects RHEL4u5, cloning as a new bug rather than reopening
the original

+++ This bug was initially created as a clone of Bug #247265 +++

Description of problem:
When the xen block frontend driver is built as a module the module load is only
synchronous up to the point where the frontend and the backend become connected
rather than when the disk is added.

This means that there can be a race on boot between loading the module and
loading the dm-* modules and doing the scan for LVM physical volumes (all in the
initrd). In the failure case the disk is not present until after the scan for
physical volumes is complete.

Version-Release number of selected component (if applicable): 2.6.18-8.1.6.EL
Also applies to RHEL4u5 2.6.9-55.0.2.EL

Upstream fix is:
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017
http://xenbits.xensource.com/kernels/rhel4x.hg?rev/156e3eaca552

How reproducible:
Not precisely determined. Seems to be related to high load in domain 0 or in
multiple other domains implying delays in scheduling the backend driver. 

Actual results:
  Loading xenblk.ko module
  Registering block device major 202
  xvda:Loading dm-mod.ko module
  <6>device-mapper: ioctl: 4.11.0-ioctl (2006-09-14) initialised:
dm-devel@redhat.com
  Loading dm-mirror.ko module
  Loading dm-zero.ko module
  Loading dm-snapshot.ko module
  Making device-mapper control node
  Scanning logical volumes
  Reading all physical volumes. This may take a while...
  xvda1 xvda2
  No volume groups found
  Activating logical volumes
  Volume group "VolGroup00" not found
  Creating root device.
  Mounting root filesystem.
  mount: could not find filesystem '/dev/root'
  Setting up other filesystems.
  Setting up new root fs
  setuproot: moving /dev failed: No such file or directory
  no fstab.sys, mounting internal defaults
  setuproot: error mounting /proc: No such file or directory
  setuproot: error mounting /sys: No such file or directory
  Switching to new root and running init.
  unmounting old /dev
  unmounting old /proc
  unmounting old /sys
  switchroot: mount failed: No such file or directory
  Kernel panic - not syncing: Attempted to kill init!

Expected results:
  Registering block device major 202
  xvda: xvda1 xvda2
  Loading dm-mod.ko module
  device-mapper: ioctl: 4.11.0-ioctl (2006-09-14) initialised: dm-devel@redhat.com
  Loading dm-mirror.ko module
  Loading dm-zero.ko module
  Loading dm-snapshot.ko module
  Making device-mapper control node
  Scanning logical volumes
  Reading all physical volumes. This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2
  Activating logical volumes
  2 logical volume(s) in volume group "VolGroup00" now active

Additional info:

-- Additional comment from pm-rhel@redhat.com on 2007-07-06 12:23 EST --
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

-- Additional comment from rjones@redhat.com on 2007-07-07 03:49 EST --
AFAICS, this only solves half the problem.  Waiting for the disk to be
added gets you so far.  You also need to wait for the partitions on
the disk to be scanned.

See 241793 comment 15 (and onwards).

-- Additional comment from rjones@redhat.com on 2007-07-07 03:50 EST --
Bug 241793 comment 15
(Bugzilla _really_ needs a preview feature).

-- Additional comment from ijc@hellion.org.uk on 2007-07-09 04:39 EST --
The proposed patch causes the module load to wait until add_disk() has returned.
In 2.6.18 at least this calls down to rescan_partitions in a synchronous manner.

(add_disk->register_disk->blkdev_get->do_open->rescan_partitions).

-- Additional comment from kraxel@redhat.com on 2007-07-12 09:46 EST --
trapped into this issue too while trying to make rhel5 boot with pv-on-hvm
drivers.  Fixed it this way:

--- /sbin/mkinitrd.kraxel       2007-07-11 13:25:06.000000000 +0200
+++ /sbin/mkinitrd      2007-07-12 12:58:16.000000000 +0200
@@ -1239,9 +1239,11 @@
 unset usb_mounted
 
 if [ -n "$scsi" ]; then
-    emit "echo Waiting for driver initialization."
+    emit "echo Waiting for driver initialization (scsi)."
     emit "stabilized --hash --interval 250 /proc/scsi/scsi"
 fi
+emit "echo Waiting for driver initialization (partitions)."
+emit "stabilized --hash --interval 250 /proc/partitions"
 
 
 if [ -n "$vg_list" ]; then


-- Additional comment from rjones@redhat.com on 2007-07-12 10:30 EST --
I have now tested the Xen upstream fix and it works.

-- Additional comment from rjones@redhat.com on 2007-07-13 05:05 EST --
Created an attachment (id=159140)
Upstream patch against 2.6.20-2925.10


-- Additional comment from rjones@redhat.com on 2007-07-13 05:44 EST --
Created an attachment (id=159143)
Patch against RHEL 5.1 2.6.18-32.el5 kernel

This patch applies cleanly against the RHEL 5.1 2.6.18-32.el5 kernel
(you will need %patch... -p2).

-- Additional comment from bugzilla@redhat.com on 2007-07-24 20:53 EST --
change QA contact

-- Additional comment from jwilson@redhat.com on 2007-07-27 09:49 EST --
I'd ping dzickus about this one, last comment from him about the patch was a
request for a -p1 repost, 
which Rik did on 7/20, but it doesn't yet appear to have been pulled into the
kernel build.

-- Additional comment from dzickus@redhat.com on 2007-07-31 17:14 EST --
in 2.6.18-37.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

-- Additional comment from clalance@redhat.com on 2007-08-14 09:36 EST --
*** Bug 241793 has been marked as a duplicate of this bug. ***

-- Additional comment from clalance@redhat.com on 2007-08-14 11:33 EST --
*** Bug 230561 has been marked as a duplicate of this bug. ***

-- Additional comment from tao@redhat.com on 2007-08-29 07:03 EST --
The feedback from Fujitsu:
------------------------------------
We tested this issue on PV domain with RHEL5.1 snapshot 2.
The result is fine.
But we also tested it on HVM domain, the result is no good.
We understand currently system-volume is qemu-dm device and so we not get
this problem. We try to system-volume be VBD device, and this case we get
same this problem.
How about system-volume be VBD device in Red Hat?

On HVM domain which system-volume is VBD device, it is difficult to fix
this problem at driver side.
We think some waiting logic is needed for the initial sequence of Linux
OS.

Thank you.                         Nishi


This event sent from IssueTracker by mmatsuya 
 issue 125723

-- Additional comment from errata-xmlrpc@redhat.com on 2007-11-07 14:55 EST --

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html
Comment 1 Chris Lalancette 2007-11-08 09:56:40 EST
Ian,
     We already fixed this for 4.6 in BZ 248024; closing this one as a dup.  As
always, thanks for the heads up.

Chris Lalancette

*** This bug has been marked as a duplicate of 248024 ***

Note You need to log in before you can comment on or make changes to this bug.