Bug 843013

Summary: obscure error "Write locks are prohibited with read-only locking." doesn't tell you how to diagnose or fix the problem.
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: agk, bmarzins, bmr, dwysocha, heinzm, jonathan, lvm-team, msnitzer, prajnoha, prockai, zkabelac
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.97-1.fc18 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-14 06:47:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard W.M. Jones 2012-07-25 10:09:07 UTC
Description of problem:

$ sudo lvcreate -L 32G -n F17x64 /dev/vg_data
  Write locks are prohibited with read-only locking.
  Can't get lock for vg_data

$ sudo lvremove /dev/vg_data/F16x64
  Write locks are prohibited with read-only locking.
  Can't get lock for vg_data
  Skipping volume group vg_data

The more important point here is that the error message gives
no indication of how to diagnose or recover from the problem.

Version-Release number of selected component (if applicable):

lvm2-2.02.96-4.fc18.x86_64
device-mapper-1.02.75-4.fc18.x86_64
running kernel 3.5.0-0.rc5.git3.1.fc18.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Do any lv* operation.
2.
3.
  
Actual results:

Obscure error message about something being locked.

Expected results:

Should perform the operation.

Comment 1 Peter Rajnoha 2012-07-25 10:24:49 UTC
Have you changed your lvm configuration? Please, check /etc/lvm/lvm.conf and the global/locking_type setting. By default and after installation it's set to 1 (the "file-based locking"), one can use 4 (the "read-only locking") in specific situations, but it's definitely not the default one used as this locking does not allow metadata change...

Comment 2 Richard W.M. Jones 2012-07-25 10:34:54 UTC
Problem persists after a reboot.

Comment 3 Peter Rajnoha 2012-07-25 10:37:32 UTC
...and do you have "locking_type=1" set in /etc/lvm/lvm.conf?

Comment 4 Richard W.M. Jones 2012-07-25 10:39:21 UTC
(In reply to comment #1)
> Have you changed your lvm configuration? Please, check /etc/lvm/lvm.conf and
> the global/locking_type setting. By default and after installation it's set
> to 1 (the "file-based locking"), one can use 4 (the "read-only locking") in
> specific situations, but it's definitely not the default one used as this
> locking does not allow metadata change...

Not knowingly, but there is an rpmnew file.  Below are
the differences.  Perhaps some rpm %post or the /run move
tries to edit the file by hand?

--- /etc/lvm/lvm.conf	2012-07-08 18:38:00.131335147 +0100
+++ /etc/lvm/lvm.conf.rpmnew	2012-07-04 10:22:02.000000000 +0100
@@ -366,7 +366,7 @@
     # Type 3 uses built-in clustered locking.
     # Type 4 uses read-only locking which forbids any operations that might 
     # change metadata.
-    locking_type = 4
+    locking_type = 1
 
     # Set to 0 to fail when a lock request cannot be satisfied immediately.
     wait_for_locks = 1
@@ -386,7 +386,7 @@
 
     # Local non-LV directory that holds file-based locks while commands are
     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
-    locking_dir = "/var/lock/lvm"
+    locking_dir = "/run/lock/lvm"
 
     # Whenever there are competing read-only and read-write access requests for
     # a volume group's metadata, instead of always granting the read-only
@@ -477,7 +477,7 @@
     # The thin tools are available as part of the device-mapper-persistent-data
     # package from https://github.com/jthornber/thin-provisioning-tools.
     #
-    thin_check_executable = "/sbin/thin_check"
+    thin_check_executable = "/usr/sbin/thin_check"
 
     # String with options passed with thin_check command. By default,
     # option '-q' is for quiet output.
@@ -550,6 +550,15 @@
     #
     # volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]
 
+    # If auto_activation_volume_list is defined, each LV that is to be
+    # activated is checked against the list while using the autoactivation
+    # option (--activate ay/-a ay), and if it matches, it is activated.
+    #   "vgname" and "vgname/lvname" are matched exactly.
+    #   "@tag" matches any tag set in the LV or VG.
+    #   "@*" matches if any tag defined on the host is also set in the LV or VG
+    #
+    # auto_activation_volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]
+
     # If read_only_volume_list is defined, each LV that is to be activated 
     # is checked against the list, and if it matches, it as activated
     # in read-only mode.  (This overrides '--permission rw' stored in the
@@ -778,5 +787,5 @@
 
     # Full path of the dmeventd binary.
     #
-    # executable = "/sbin/dmeventd"
+    # executable = "/usr/sbin/dmeventd"
 }

Comment 5 Richard W.M. Jones 2012-07-25 10:40:57 UTC
Setting locking_type = 1 does fix the problem.

Changing the bug summary.

Comment 6 Peter Rajnoha 2012-07-25 10:44:58 UTC
(In reply to comment #4)
> -    locking_type = 4
> +    locking_type = 1
>  

Well, I've looked at the rpms directly (http://koji.fedoraproject.org/koji/buildinfo?buildID=336465) and it's "locking_type = 1" there. So the conf must have been edited by someone (it would be a huge mistake to put that in an rpm otherwise!).

Comment 7 Peter Rajnoha 2012-07-25 10:46:12 UTC
I'll double check though...

Comment 8 Richard W.M. Jones 2012-07-25 10:49:37 UTC
The timestamps on the files are interesting:

--- /etc/lvm/lvm.conf	2012-07-08 18:38:00.131335147 +0100

^ This date corresponds to a large yum install from Rawhide.
Specifically I installed oprofile and the kernel debuginfo,
but it also updated many other packages including systemd,
libvirt, libmount, util-linux, pulseaudio, xorg, selinux-policy
and more.

+++ /etc/lvm/lvm.conf.rpmnew	2012-07-04 10:22:02.000000000 +0100

^ This date presumably comes from the RPM file itself, so doesn't
really mean anything in the context of this error.

Comment 9 Peter Rajnoha 2012-07-25 11:01:25 UTC
Yes, there's been mass rebuild in rawhide recently. This must have been changed from "1" to "4" by manually editing the lvm.conf.

Now, regarding the error message, I think it's quite adequate - it says what's wrong exactly: wrong locking used. Read-only locking is supposed to behave this way for any command trying to modify metada (thus trying to take the read-write lock).

!!!BUT!!! This might be the consequence of the recent bugs in dracut which installed everything to the root fs instead of initramfs image. Yeah, that's certainly the case here as well, I bet!

Comment 11 Milan Broz 2012-07-25 11:03:23 UTC
There was a bug in dracut installing everything in root instead on to ramdisk.
I guess this is just side effect of it. I think you even replied to that thread :)

Comment 12 Peter Rajnoha 2012-07-25 11:06:48 UTC
Yes. That's exactly the case. Dracut uses locking_type=4 for LVM :) So I'm closing this bug then.

Comment 13 Richard W.M. Jones 2012-07-25 11:18:00 UTC
Yes I agree this was the dracut problem.

However the error message is obscure.  I'm an expert in most
things Linux, but I could not work out even where to begin
fixing this bug, even with extensive searching on Google
*and* examining the upstream source code.  How would anyone
else be expected to diagnose it?

All you need to do to improve LVM is to edit the bug message
to point out that the 'global_locking' value in lvm.conf may
need to be adjusted.

Comment 14 Peter Rajnoha 2012-07-25 11:37:55 UTC
(In reply to comment #13)
> Yes I agree this was the dracut problem.
> 
> However the error message is obscure.  I'm an expert in most
> things Linux, but I could not work out even where to begin
> fixing this bug, even with extensive searching on Google
> *and* examining the upstream source code.  How would anyone
> else be expected to diagnose it?
> 

Looking at lvm.conf?

> All you need to do to improve LVM is to edit the bug message
> to point out that the 'global_locking' value in lvm.conf may
> need to be adjusted.

Yes, but then we'd need to add such messages for all the other settings. The lvm.conf is the first place to look at. If that conf file stayed untouched, everything would work well. It's just a bad luck that dracut changed that. If any user changes the lvm.conf by hand, he can see all the accompanying comments in the lvm.conf directly which, in case of this setting says:

  # Type 4 uses read-only locking which forbids any operations that might 
  # change metadata.

Anyway, how about rewording the error message to:

  "Read-only locking configured/set. Write locks are prohibited."

The "set" or "configured" would inform more about something that is configured (and so it will direct the user more to look at lvm.conf).

(I'd like to avoid using "lvm.conf" keyword as configuration could be overloaded by the "--config" argument that could be used for each lvm command separately)

Comment 15 Peter Rajnoha 2012-07-25 11:44:49 UTC
The verbose log says a little bit more:

[0] rawhide/~ # lvcreate -l1 -vvv vg
      Setting activation/monitoring to 1
        Processing: lvcreate -l1 -vvv vg
        O_DIRECT will be used
      Setting global/locking_type to 4
      Setting global/wait_for_locks to 1
    Read-only locking selected. Only read operations permitted.
    ...

Comment 16 Richard W.M. Jones 2012-07-25 11:45:06 UTC
Better.  Can we get 'global_locking' into the error message too?
It will give them something to grep -r /etc with.

Comment 17 Peter Rajnoha 2012-07-25 11:50:56 UTC
How about this:

  "Read-only locking configured via global/locking_type setting. Write locks are prohibited."

Comment 18 Richard W.M. Jones 2012-07-25 11:53:46 UTC
Yes, that is much clearer.

If you look at the Google results you'll see this would
have been much easier to diagnose:

https://encrypted.google.com/search?q=global%2Flocking_type

Comment 20 Peter Rajnoha 2012-08-14 06:47:16 UTC
Well, finally, I had to change it to what was proposed in comment #14:

  "Read-only locking type set. Write locks are prohibited."

I couldn't write "configured via <concrete_setting>" - there is also the automatic fallback from cluster and file locking to read-only locking as the last thing we do to make the command pass even if the other types of locking fail (not allowing metadata change in that case of course).