Bug 1337977
| Summary: | When /var is a separate filesystem, File-based locking initialization fails due to inability to create /var/lock/lvm | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | John Pittman <jpittman> | ||||||
| Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||||
| lvm2 sub component: | libdevmapper (RHEL6) | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||
| Severity: | high | ||||||||
| Priority: | unspecified | CC: | agk, baumanmo, bubrown, chakumar, franz.brauneder, gjose, heinzm, jbrassow, jpittman, loberman, mgandhi, msnitzer, prajnoha, prockai, rbednar, rmadhuso, sreber, zkabelac | ||||||
| Version: | 6.8 | Keywords: | Reopened | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | lvm2-2.02.143-9.el6 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-03-21 12:02:59 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1269194 | ||||||||
| Attachments: |
|
||||||||
You can configure locking dir anywhere you want in /etc/lvm/lvm.conf
global { locking_dir = "/run/lock" }
It's worth to note - recent releases of lvm2 are already using this location and if you install new 'lvm.conf' dir - such location should be there 'by-default'.
What command failed and in what context and what is the locking protecting? /var/lock/lvm is installed by rpm so how is it missing? (/var/lock is installed by 'filesystem' rpm.) Is the error from running lvm before /var is mounted, or was the rpm installed without /var mounted? Thanks for the workaround Zdenek; One of the guys made a public article detailing the steps to switch the locking_dir in case customers see the message.
To add to my original 'How reproducible', the issue can be reproduced on a fresh install of 6.8, the only criteria is that /var is a separate filesystem. Upgrading from 6.7 to 6.8 is not a requirement.
Searching showed that the 'Failed to create directory' comes from the function dm_create_dir in libdm/libdm-file.c. It calls _create_dir_recursive which runs 3 mkdir commands in this case. Showing the return and errno of each mkdir below with log_verbose messages marked as RH_DEBUG.
Creating directory "/var/lock/lvm"
RH_DEBUG: First rc is -1 and errno is 17
RH_DEBUG: First rc is -1 and errno is 30
RH_DEBUG: Second rc is -1 and errno is 2
All return codes from mkdir indicate failure. For /var we get 17 which is EEXIST, for /var/lock we get 30 which is EROFS, and for /var/lock/lvm we get 2 which is ENOENT.
Also, for completeness, the return code of _create_dir_recursive was indeed 0 which satisfied the condition to give us the failure message.
RH_DEBUG: Return of _create_dir_recursive is 0.
So as far as I can tell, it does look as if the trouble is due to /var not being mounted rw at the time we try to create the directories.
That's as far as I've gotten for now, will post more if I'm able.
John
(In reply to John Pittman from comment #3) > Thanks for the workaround Zdenek; One of the guys made a public article > detailing the steps to switch the locking_dir in case customers see the > message. > Note - it's not a 'workaround' - locking dir is configurable setting - so when user does something 'unusual' like using mounted /var he likely needs to further 'reconfigure' other parts of his system. /var/lock dir used to regular content of /root filesystem - the only 'mounteable' part used to be /usr dir. It further changes with RHEL7 and usrmove. Recent version of lvm2 should automatically pick /run/lock. Closing as not a bug - it's just configuration issue. (In reply to Alasdair Kergon from comment #2) > Is the error from running lvm before /var is mounted So the answer was 'yes'. But this key question remains unanswered: > What command failed and in what context and what is the locking protecting? In other words, is the lock *necessary* at that exact point during booting or is it harmless if it gets skipped silently? Also, exactly which change between 6.7 and 6.8 caused this? Hi Alasdair, sorry for the wait. It looks as if the addition of logging an error based on the return of _create_dir_recursive is the only reason we're seeing this now. A snip from the relevant diff between 143 and 118: diff LVM2.2.02.143/libdm/libdm-file.c ../temp4/LVM2.2.02.118/libdm/libdm-file.c --- > if (stat(dir, &info) < 0) > return _create_dir_recursive(dir); 100,103c72,73 < if (!_create_dir_recursive(dir)) { < log_error("Failed to create directory %s.", dir); < return 0; < } --- From LVM2.2.02.143/WHATS_NEW_DM Version 1.02.109 - 22nd September 2016 ====================================== .... snip Check dir path components are valid if using dm_create_dir, error out if not. .... snip https://www.redhat.com/archives/lvm-devel/2015-September/msg00120.html The locking protection selected is read-only locking, so we do seem to have protection at the time. dracut: Setting global/locking_type to 4 ........ dracut: Logging initialised at Wed Jun 15 14:00:21 2016 dracut: Set umask from 0022 th 0077 dracut: Read-only locking selected. Only read operations permitted. Stated in lvm.conf: # 4 # LVM uses read-only locking which forbids any operations that # might change metadata If I understand you correctly, the actual command that is failing is the mkdir within _create_dir_recursive. The final mkdir fails due to the second to final failing because of the /var filesystem not being read-write. Code path for reference: init_locking --> init_file_locking --> dm_create_dir --> _create_dir_recursive --> mkdir(dir, 0777) So it doesn't seem that the file based lock is necessary at that point, would think it's ok to skip silently. There were points in my research where I was unsure, like trying to find when we actually do end up setting locking type to 1 for the first time and creating the lock dir. I hope that helps. John locking_type=4 should not be creating any locking dir. Could you please attach full -vvvv trace of the problematic failing command ? Note: when locking_type is 1 and creation of locking dir fails - it should 'fallback' read-only locking type mode when --ignorelockingfailure is specified. Moreover to fail on lock-dir creation means /var would have to be sitting on 'read-only' filesystems?? Created attachment 1169982 [details]
vgchange -vvvv command showing failure
Command showing the failure was '/sbin/lvm vgchange -a ay --sysinit --ignoreskippedcluster' from /etc/rc.d/rc.sysinit. File vgchange.out attached with -vvvv output. (In reply to John Pittman from comment #12) > Command showing the failure was '/sbin/lvm vgchange -a ay --sysinit > --ignoreskippedcluster' from /etc/rc.d/rc.sysinit. File vgchange.out > attached with -vvvv output. Yep it's being used with 'locking_type=1' So if the users wants to activate devices on not yet fully initialized system he may user locking_type=4 for this initialization - or --ignorelockingfailure - whatever fits better. RHEL7.X is using /run path. RHEL6.X is reconfigurable when user needs via lvm.conf It's unclear how that could have ever worked in RHEL6.7 - this behavior is consistent and without any change AFAIK. So is user suggesting the system worked without any changes in 6.7? We would need to see 'lvmdump -a' from both systems. Also note 'vgchange -ay' != 'vgchange -aay'. Thanks Zdenek; In 6.7, from what I understand of the issue, it worked the same, we just didn't know about the lock dir creation failure it because we didn't log it. Attached: lvmdump.tgz Created attachment 1170314 [details]
'lvmdump -a' from all systems.
Just checking in here. Any change after I provided the latest info? Are we going to supress the message? I'm too busy to look at this at the moment. I'm not sure my questions have been answered yet.
It sounds like you identified this commit?
> commit 6c0b4a2769067048fa144814e298a3272564c475
> Author: Peter Rajnoha <prajnoha>
> Date: Thu Sep 17 14:29:51 2015 +0200
If the filesystem is mounted readonly and the code has fallen back to a readonly locking mode there should certainly be no error messages appearing as this is a fully-supported configuration.
I've removed the "Failed to create directory" message from dm_create_dir function - there are detailed messages printed inside _create_dir_recursive which dm_create_dir calls (and which handles EROFS case like anywhere else in the code): https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=eac0706761e628532ffcd27f3e4d7fa559a5f818 Marking verified. The error is no longer shown during boot while having /var as a separate file system and locking dir set to "/var/lock/lvm". Changing locking_dir to "/run/lock" also removes the error message on affected systems as described in this artice: https://access.redhat.com/solutions/2333821 # df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_virt137-lv_root 7.0G 2.1G 4.6G 32% / tmpfs 499M 0 499M 0% /dev/shm /dev/vda1 477M 37M 416M 8% /boot /dev/mapper/vg_virt137-lv_var 847M 461M 343M 58% /var # egrep "locking_type|locking_dir" /etc/lvm/lvm.conf | egrep -v "^\s*#" locking_type = 3 locking_dir = "/var/lock/lvm" ------------------------------------------------------------------------------------------------ RHEL6.7, no errors in boot log while having /var as separate file system: lvm2-2.02.118-3.el6_7.4 # grep -i "logical volume management" /var/log/boot.log Setting up Logical Volume Management: 2 logical volume(s) in volume group "vg_virt137" now active ------------------------------------------------------------------------------------------------ After update to RHEL6.8 or equivalent lvm2 package the error message is present in boot.log: lvm2-2.02.143-7.el6 # grep -i "logical volume management" /var/log/boot.log Setting up Logical Volume Management: Failed to create directory /var/lock/lvm. ------------------------------------------------------------------------------------------------ After fix: # grep -i "logical volume management" /var/log/boot.log Setting up Logical Volume Management: 2 logical volume(s) in volume group "vg_virt137" now active Tested with: 2.6.32-573.35.2.el6.x86_64 lvm2-2.02.143-9.el6 BUILT: Thu Nov 10 10:21:10 CET 2016 lvm2-libs-2.02.143-9.el6 BUILT: Thu Nov 10 10:21:10 CET 2016 lvm2-cluster-2.02.143-9.el6 BUILT: Thu Nov 10 10:21:10 CET 2016 udev-147-2.63.el6_7.1 BUILT: Thu Nov 12 17:11:28 CET 2015 device-mapper-1.02.117-9.el6 BUILT: Thu Nov 10 10:21:10 CET 2016 device-mapper-libs-1.02.117-9.el6 BUILT: Thu Nov 10 10:21:10 CET 2016 device-mapper-event-1.02.117-9.el6 BUILT: Thu Nov 10 10:21:10 CET 2016 device-mapper-event-libs-1.02.117-9.el6 BUILT: Thu Nov 10 10:21:10 CET 2016 device-mapper-persistent-data-0.6.2-0.1.rc7.el6 BUILT: Tue Mar 22 14:58:09 CET 2016 cmirror-2.02.143-9.el6 BUILT: Thu Nov 10 10:21:10 CET 2016 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0798.html |
Description of problem: When /var is a separate filesystem, File-based locking initialization fails due to inability to create /var/lock/lvm. Issue can be worked around by keeping /var in the same filesystem as /, or downgrading lvm and dependents from levels mentioned below. Version-Release number of selected component (if applicable): lvm2-2.02.143-7.el6.x86_64 lvm2-libs-2.02.143-7.el6.x86_64 #Additional packages provided just in case device-mapper-multipath-libs-0.4.9-93.el6.x86_64 device-mapper-libs-1.02.117-7.el6.x86_64 device-mapper-persistent-data-0.6.2-0.1.rc7.el6.x86_64 device-mapper-1.02.117-7.el6.x86_64 device-mapper-event-1.02.117-7.el6.x86_64 device-mapper-event-libs-1.02.117-7.el6.x86_64 kernel-2.6.32-642.el6.x86_64 How reproducible: 1. Create RHEL 6.7 system with /var as separate filesystem 2. Upgrade to RHEL 6.8 3. Reboot 4. Issue will be shown in /var/log/boot.log Actual results: From /var/log/boot.log with lvm verbosity at 1: Setting up Logical Volume Management: Logging initialised at Fri May 20 10:54:58 2016 Set umask from 0022 to 0077 Creating directory "/var/lock/lvm" Failed to create directory /var/lock/lvm. File-based locking initialisation failed. Locking disabled - only read operations permitted. Expected results: Initialization should succeed Additional info: I have a test system recreation and will be glad to provide any information needed. Please let me know. [root@localhost ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup-LogVol00 3.8G 2.0G 1.7G 55% / tmpfs 495M 0 495M 0% /dev/shm /dev/sda1 477M 67M 385M 15% /boot /dev/mapper/VolGroup-LogVol01 969M 93M 826M 11% /home /dev/mapper/VolGroup-LogVol02 673M 716K 638M 1% /tmp /dev/mapper/VolGroup-LogVol03 969M 586M 333M 64% /var /dev/mapper/VolGroup-LogVol04 283M 2.2M 266M 1% /var/log/audit