Bug 151172
Summary: | Updating kernel on node with lvm root vols renders machine unbootable if you have clustered lvm. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Dean Jansa <djansa> |
Component: | mkinitrd | Assignee: | Peter Jones <pjones> |
Status: | CLOSED ERRATA | QA Contact: | David Lawrence <dkl> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | agk, ccaulfie, djansa, kanderso, k.georgiou, mwesley, shillman |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2005-328 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-06-09 12:47:35 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 137160 |
Description
Dean Jansa
2005-03-15 16:53:06 UTC
Needs more intelligence adding to mkinitrd so it doesn't blindly copy inappropriate config settings across? Changing filter lines should not be an issue in real life - people shouldn't filter out local disks while they're using them. So I think it's correct for mkinitrd to copy them. What error do you get exactly? There's a flag: --ignorelockingfailure which was meant for use at boot time. Is this being used? It seems to work for me: # vgscan --ignorelockingfailure Unable to open external locking library lvm2_locking.so Reading all physical volumes. This may take a while... Found volume group "vgc" using metadata type lvm1 Found volume group "vgb" using metadata type lvm2 Changing fliter lines may not be a issue, we have them filtered out so our automated tests don't "see" those vols and try to clean them up. So I agree, in use the filter issue is low risk, if a risk at all. It seems like the --ignorelockingfailure is not being used at boot time, I see: . . . File descriptor 3 left open Unable to open external locking library /usr/lib/liblvm2clusterlock.so Locking type 2 initialisation failed. ERROR: /bin/lvm exited abnormally! Activating logical volumes File descriptor 3 left open Kernel panic - not syncing: Attempted to kill init! mkinitrd in CVS has: echo "echo Making device-mapper control node" >> $RCFILE echo "mkdmnod" >> $RCFILE echo "echo Scanning logical volumes" >> $RCFILE echo "lvm vgscan" >> $RCFILE echo "echo Activating logical volumes" >> $RCFILE echo "lvm vgchange -ay" >> $RCFILE echo "echo Making device nodes" >> $RCFILE echo "lvm vgmknodes" >> $RCFILE Ideally this should be: mkdmnod # Optional since device-mapper 1.00.21 lvm vgscan --mknodes --ignorelockingfailure lvm lvchange -ay --ignorelockingfailure <lvs> where <lvs> is the list of logical volumes required at this stage of booting. vgchange -ay --ignorelockingfailure will attempt to activate every visible logical volume, which might well includes some logical volumes (incl. clustered ones) that should not be activated yet. A compromise would be 'lvm vgchange -ay --ignorelockingfailure VolGrp00' grabbing the root volume group from $rootdev. At some point I'll fix XXchange to understand /dev/mapper/vg-lv args so that 'lvchange $rootdev' would work directly, to save the shell having to extract VG from $rootdev [easy until someone puts a hyphen in the name]. BTW --ignorelocking failure has been there since 2003. It was originally added to avoid the need for /var/lock/lvm to be writable during the early stages of booting. Transferring from clvm to mkinitrd. Packages are built in dist-4E-U1-HEAD. If this needs to be blocking other bugs, somebody should update this one to say so. Clarifying subject line: this problem exists if you have shared storage and therefore installed the cluster-aware version of lvm2. Without this change, mkinitrd builds the installed clustered features into the initrd - but those clustered features can't possibly work at such an early stage of the booting process, so booting fails. The change does two things: Adds a flag to the lvm command in the initrd to tell lvm2 to override the fact that cluster locking will fail. Only activates the root logical volume (which is never clustered). Clustered logical volumes get activated in the initscripts, later in the boot process. mkinitrd-4.2.1.3 should have the fix for this. I just ran into this again. Unless I missed an obvious step? I have mkinitrd-4.2.1.3-1 installed. I upgraded to kernel-smp-2.6.9-10.EL.i686 without touching my lvm.conf file (which has cluster locking turned on). I installed cluster-i686-2005-05-19-1032 rpms (with lvm2-2.01.08-1.0.RHEL4.i386.rpm lvm2-cluster-2.01.09-3.0.RHEL4.i386.rpm packages incleded) I reboot my nodes and hit: Scanning logical volumes File descriptor 3 left open Unable to open external locking library /usr/lib/liblvm2clustecdrom: open failed. rlock.so Reading all physical volumes. This may take a while... Found volume group "gfs" using metadata type lvm2 Activating logical volumes File descriptor 3 left open Unable to open external locking library /usr/lib/liblvm2clusterlock.so cdrom: open failed. Unable to find vKernel panic - not syncing: Attempted to kill init! Sorry -- I forgot that we filter out /dev/hda* on our test nodes. This causes the root vols to not be found. (See comment #3). Changing the filter back to lvm.conf defaults is the workaround. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-328.html |