Bug 151172

Summary: Updating kernel on node with lvm root vols renders machine unbootable if you have clustered lvm.
Product: Red Hat Enterprise Linux 4 Reporter: Dean Jansa <djansa>
Component: mkinitrdAssignee: Peter Jones <pjones>
Status: CLOSED ERRATA QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: agk, ccaulfie, djansa, kanderso, k.georgiou, mwesley, shillman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2005-328 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-09 12:47:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 137160    

Description Dean Jansa 2005-03-15 16:53:06 UTC
Updating your kernel rpm without first editing the lvm.conf file to
turn cluster locking off will create a initrd which will fail to boot
as lvm is looking for the cluster locking lib.

Also, if you are using non-default filters you may need to change the
filter line as well.  I had local disks filtered out and even after
fixing the lvm.conf file to use local locking I still could not boot.
 I needed to set the filter back to the default (accept everything).

The current workaround of editing the lvm.conf, installing the kern
(or rebuilding your initrd) and re-editing the conf file is  a bit of
a mess and needs at the very least to be well documented so users
avoid shooting a foot off. 

A solution that would not require  the user to mess with their
lvm.conf file would be ideal.

Comment 1 Alasdair Kergon 2005-03-15 17:11:12 UTC
Needs more intelligence adding to mkinitrd so it doesn't blindly copy
inappropriate config settings across?

Comment 2 Alasdair Kergon 2005-03-15 17:21:53 UTC
Changing filter lines should not be an issue in real life - people
shouldn't filter out local disks while they're using them.

So I think it's correct for mkinitrd to copy them.

What error do you get exactly?

There's a flag: --ignorelockingfailure  which was meant for use at
boot time.  Is this being used?  It seems to work for me:

# vgscan  --ignorelockingfailure
  Unable to open external locking library lvm2_locking.so
  Reading all physical volumes.  This may take a while...
  Found volume group "vgc" using metadata type lvm1
  Found volume group "vgb" using metadata type lvm2


Comment 3 Dean Jansa 2005-03-15 18:01:41 UTC
Changing fliter lines may not be a issue, we have them filtered out so
our automated tests don't "see" those vols and try to clean them up. 
So I agree, in use the filter issue is low risk, if a risk at all.


It seems like the --ignorelockingfailure is not being used at boot
time, I see:

.
.
.
 File descriptor 3 left open
   Unable to open external locking library /usr/lib/liblvm2clusterlock.so
   Locking type 2 initialisation failed.
 ERROR: /bin/lvm exited abnormally!
 Activating logical volumes
 File descriptor 3 left open
   Kernel panic - not syncing: Attempted to kill init!
 




Comment 4 Alasdair Kergon 2005-03-15 18:03:08 UTC
mkinitrd in CVS has:

    echo "echo Making device-mapper control node" >> $RCFILE
    echo "mkdmnod" >> $RCFILE
    echo "echo Scanning logical volumes" >> $RCFILE
    echo "lvm vgscan" >> $RCFILE
    echo "echo Activating logical volumes" >> $RCFILE
    echo "lvm vgchange -ay" >> $RCFILE
    echo "echo Making device nodes" >> $RCFILE
    echo "lvm vgmknodes" >> $RCFILE


Comment 5 Alasdair Kergon 2005-03-15 18:21:05 UTC
Ideally this should be:

  mkdmnod    # Optional since device-mapper 1.00.21
  lvm vgscan --mknodes --ignorelockingfailure
  lvm lvchange -ay --ignorelockingfailure <lvs>

where <lvs> is the list of logical volumes required at this stage of
booting.  vgchange -ay --ignorelockingfailure will attempt to activate
every visible logical volume, which might well includes some logical
volumes (incl. clustered ones) that should not be activated yet.

A compromise would be 'lvm vgchange -ay --ignorelockingfailure
VolGrp00'  grabbing the root volume group from $rootdev.


Comment 6 Alasdair Kergon 2005-03-15 18:25:07 UTC
At some point I'll fix XXchange to understand /dev/mapper/vg-lv args
so that 'lvchange $rootdev' would work directly, to save the shell
having to extract VG from $rootdev [easy until someone puts a hyphen
in the name].


Comment 7 Alasdair Kergon 2005-03-15 18:27:57 UTC
BTW --ignorelocking failure has been there since 2003.  It was
originally added to avoid the need for /var/lock/lvm to be writable
during the early stages of booting.

Comment 8 Alasdair Kergon 2005-03-15 18:29:38 UTC
Transferring from clvm to mkinitrd.

Comment 9 Peter Jones 2005-03-15 21:15:08 UTC
Packages are built in dist-4E-U1-HEAD.  If this needs to be blocking
other bugs, somebody should update this one to say so.

Comment 13 Alasdair Kergon 2005-03-16 20:49:15 UTC
Clarifying subject line: this problem exists if you have shared storage and
therefore installed the cluster-aware version of lvm2.

Without this change, mkinitrd builds the installed clustered features into the
initrd - but those clustered features can't possibly work at such an early stage
of the booting process, so booting fails.


Comment 14 Alasdair Kergon 2005-03-16 21:02:27 UTC
The change does two things:

  Adds a flag to the lvm command in the initrd to tell lvm2 to override the fact
that cluster locking will fail.

  Only activates the root logical volume (which is never clustered).  Clustered
logical volumes get activated in the initscripts, later in the boot process.

Comment 15 Jeremy Katz 2005-04-25 22:04:24 UTC
mkinitrd-4.2.1.3 should have the fix for this.

Comment 16 Dean Jansa 2005-05-19 19:26:25 UTC
I just ran into this again.  Unless I missed an obvious step?

I have mkinitrd-4.2.1.3-1 installed.

I upgraded to kernel-smp-2.6.9-10.EL.i686 without touching my lvm.conf file
(which has cluster locking turned on).

I installed cluster-i686-2005-05-19-1032 rpms (with
lvm2-2.01.08-1.0.RHEL4.i386.rpm  lvm2-cluster-2.01.09-3.0.RHEL4.i386.rpm
packages incleded)

I reboot my nodes and hit:
Scanning logical volumes
File descriptor 3 left open
  Unable to open external locking library /usr/lib/liblvm2clustecdrom: open failed.
rlock.so
  Reading all physical volumes.  This may take a while...
  Found volume group "gfs" using metadata type lvm2
Activating logical volumes
File descriptor 3 left open
  Unable to open external locking library /usr/lib/liblvm2clusterlock.so
cdrom: open failed.
  Unable to find vKernel panic - not syncing: Attempted to kill init!



Comment 17 Dean Jansa 2005-05-19 20:23:55 UTC
Sorry -- I forgot that we filter out /dev/hda* on our test nodes.  This causes
the root vols to not be found.  (See comment #3).

Changing the filter back to lvm.conf defaults is the workaround.




Comment 19 Tim Powers 2005-06-09 12:47:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-328.html