When upgrading from 5.4 kernel 2.6.18-164.15.1 to 5.5 kernel 2.6.18-194 I've hit a regression of system not booting. My setup: /dev/md2 -> boot /dev/md3 -> root /dev/md100 -> PV -> VG -> logical volumes snippets from fstab: """ /dev/md3 / ext3 noatime,nodiratime 1 1 /dev/md2 /boot ext3 noatime,nodiratime 1 2 LABEL=XENSAVE /var/lib/xen/save/ ext3 noatime,nodiratime,noexec 1 2 """ # lvs |grep -i xensave xensave vg_store02 -wi-ao 8.00G In /etc/lvm/lvm.conf, if I uncomment the log file on 151 the system doesn't boot with the 5.5 kernel. System logs that / is read only and lvm2.log cannot be opened. In the rescue shell I realized that vgchange, vgs, lvs fails because of the log file open error. Since the logical volumes are not available, /var/lib/xen/save is not mounted and the system stops booting. 139 log { 140 141 # Controls the messages sent to stdout or stderr. 142 # There are three levels of verbosity, 3 being the most verbose. 143 verbose = 0 144 145 # Should we send log messages through syslog? 146 # 1 is yes; 0 is no. 147 syslog = 1 148 149 # Should we log error and debug messages to a file? 150 # By default there is no log file. 151 file = "/var/log/lvm2.log" 152 153 # Should we overwrite the log file each time the program is run? 154 # By default we append. 155 overwrite = 0 ... Facts: * works OK with all 5.4 kernels * works OK if root partition is on LVM. LVM then gets activated (though the error of ro system is still printed) * doesn't boot with 5.5 kernel with root NOT on LVM * commenting out line 151 makes the system bootable in all configurations I tried kernel-2.6.18-194.3.1.el5 lvm2-2.02.56-8.el5_5.1
Review create_toolcontext(). Are the liblvm requirements *really* different from the normal command line tool ones? What is the 'if (stored_errno)' test actually meant for, given that the function already returns NULL on failure? Should the field be cleared after operations we don't care about failing?
I don't understand why we added this code in init_lvm(): if (stored_errno()) { destroy_toolcontext(cmd); return_NULL; } liblvm returns cmd in this case - it does not tear down the context. So the tools seem to have become more restrictive than liblvm, which is the bug IMO. We should revert the above code.
I take comment #3 about reverting the code back. I agree with comment #2 and the IRC discussion between agk and kabi - we should call reset_lvm_errno(1) at various points in that create_toolcontext() for init functions that do not return an error or the error message is ignored. Perhaps the reset should go inside the specific init function.
Two patches checked in upstream, one resolves this issue, and a second fixes a related init issue (if init_rand fails).
Fixed in lvm2-2.02.56-12.el5.
Testing mentioned in comment #10 passed in the latest rpm (lvm2-2.02.74-1.el5). Marking verified. [root@grant-01 tmp]# pvscan /tmp/log/bar/foo/coreys_fake_file.log: fopen failed: No such file or directory Logging initialised at Mon Nov 8 17:27:05 2010 Set umask to 0077 read_urandom: /dev/urandom: open failed: No such file or directory Wiping cache of LVM-capable devices Wiping internal VG cache Walking through all physical volumes PV /dev/sdc1 VG centipede lvm2 [54.49 GB / 54.49 GB free] PV /dev/sdc2 VG centipede lvm2 [54.49 GB / 54.49 GB free] PV /dev/sdc3 VG centipede lvm2 [54.48 GB / 54.48 GB free] PV /dev/sdc5 VG centipede lvm2 [54.49 GB / 54.49 GB free] PV /dev/sdc6 VG centipede lvm2 [54.48 GB / 54.48 GB free] PV /dev/sdb1 VG centipede lvm2 [40.87 GB / 40.87 GB free] PV /dev/sdb2 VG centipede lvm2 [40.87 GB / 40.87 GB free] PV /dev/sdb3 VG centipede lvm2 [40.87 GB / 40.87 GB free] PV /dev/sdb5 VG centipede lvm2 [40.88 GB / 40.88 GB free] PV /dev/sda2 VG VolGroup00 lvm2 [74.38 GB / 0 free] Total: 10 [510.30 GB] / in use: 10 [510.30 GB] / in no VG: 0 [0 ] Wiping internal VG cache
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0052.html