Bug 684896 - clvmd sets LVs to 'active' when starting, does not set them to inactive on stop
clvmd sets LVs to 'active' when starting, does not set them to inactive on stop
Status: ASSIGNED
Product: Fedora
Classification: Fedora
Component: lvm2 (Show other bugs)
23
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Peter Rajnoha
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-14 14:22 EDT by digimer
Modified: 2015-07-15 11:16 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-03-22 10:08:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
clvmd init.d script from lvm2-cluster-2.02.56-7.el5_5.4 (2.76 KB, text/plain)
2011-03-14 16:13 EDT, digimer
no flags Details

  None (edit)
Description digimer 2011-03-14 14:22:43 EDT
Description of problem:

When you start /etc/init.d/clvmd, the LVs get set to active. When stopping clvmd though, they remain flagged as active rather than being set to inactive (equiv. of 'lvchange -an /path/to/dev'). This means that the underlying storage is "locked" until and unless clvmd is restarted and the LVs are manually set to inactive. This can be seen easily be creating a simple Primary/Primary DRBD resource and using it as a PV. When clvmd is stopped, you can not stop DRBD.

Version-Release number of selected component (if applicable):

cman-2.0.115-34.el5_5.4
lvm2-cluster-2.02.56-7.el5_5.4

How reproducible:

100%

Steps to Reproduce:
1. Create a simple cluster and start cman.
2. Run 'lvmconf --enable-cluster'.
3. Create a DRBD resource (ie: /dev/drbd0).
4. Add the DRBD resource as the CLVM's PV. Create a VG and a simple LV.
5. Start clvmd. Check that the LV is 'active'.
6. Stop clvmd.
7. Try to change the DRBD resource to 'Secondary'; it will fail.
8. Restart clvmd, manually deactivate the LV via 'lvchange -an ...'
9. Stop clvmd.
10. Again try to change the DRBD resource to 'Secondary'; it will work this time.
  
Actual results:

Underlying storage is locked after clvmd starts/stops.

Expected results:

Underlying storage is now locked.

Additional info:

I do not believe this is limited to DRBD, but I do not have access to SAN/iSCSI to test.
Comment 1 Milan Broz 2011-03-14 14:34:49 EDT
"service clvmd stop" should deactivate all *clustered* LVs (IOW LVs in clustered VGs) before it stops the clvmd operation. Non-clustered LVs remains active. (There is fallback to local locking so it still works even without clvmd).

What exactly here is going wrong?

Please test it with last release:
lvm2-2.02.74-5.el5_6.1	lvm2-cluster-2.02.74-3.el5_6.1
Comment 2 digimer 2011-03-14 15:26:25 EDT
The version I've got is the most recent available for EL5.5. I can try to find the updated version outside of the repos though. I'll do so and report back.

Before I start clvmd, I see this:

==============
# lvdisplay 
  connect() failed on local socket: No such file or directory
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  --- Logical volume ---
  LV Name                /dev/drbd0_vg0/xen_shared
  VG Name                drbd0_vg0
  LV UUID                EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                10.00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
==============

Then I start clvmd and it becomes:

==============
# /etc/init.d/clvmd start
Starting clvmd:                                            [  OK  ]
Activating VGs:   2 logical volume(s) in volume group "drbd0_vg0" now active
                                                           [  OK  ]
# lvdisplay 
  --- Logical volume ---
  LV Name                /dev/drbd0_vg0/xen_shared
  VG Name                drbd0_vg0
  LV UUID                EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                10.00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     768
  Block device           253:0
==============

Then I stop 'clvmd' and check the 'lvdisplay' again, and the LV is still 'available'.

==============
# /etc/init.d/clvmd stop
Stopping clvm:                                             [  OK  ]
# lvdisplay 
  connect() failed on local socket: No such file or directory
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  --- Logical volume ---
  LV Name                /dev/drbd0_vg0/xen_shared
  VG Name                drbd0_vg0
  LV UUID                EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                10.00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     768
  Block device           253:0
==============

Now if I try to 'lvchange' the LV it fails, obviously. As does an attempt to change the underlying DRBD resource.

==============
# lvchange -an /dev/drbd0_vg0/xen_shared 
  connect() failed on local socket: No such file or directory
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
# drbdadm secondary r0
0: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 0 secondary' terminated with exit code 11
==============

Now if I restart 'clvmd', manually 'lvchange' the LV and stop 'clvmd' again, things are as I would expect them to be (ignore the second LV, it's the same so I've skipped it for brevity's sake).

==============
# /etc/init.d/clvmd start
Starting clvmd:                                            [  OK  ]
Activating VGs:   2 logical volume(s) in volume group "drbd0_vg0" now active
                                                           [  OK  ]
# lvchange -an /dev/drbd0_vg0/xen_shared
# lvchange -an /dev/drbd0_vg0/an_vm1 
# /etc/init.d/clvmd stop
Stopping clvm:                                             [  OK  ]
[root@xenmaster013 ~]# lvdisplay 
  connect() failed on local socket: No such file or directory
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  --- Logical volume ---
  LV Name                /dev/drbd0_vg0/xen_shared
  VG Name                drbd0_vg0
  LV UUID                EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                10.00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
==============

And of course, I can now alter the underlying storage:

==============
# drbdadm secondary r0
[root@xenmaster013 ~]# cat /proc/drbd 
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:27
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:36 nr:36 dw:72 dr:12300 al:3 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
==============

Thanks for looking into this.
Comment 3 Milan Broz 2011-03-14 15:55:24 EDT
So local LVs are wrongly activated when clvmd starts, I think there is missing one line in clvmd initcripts.

stop is doing this:

[ -z "$LVM_VGS" ] && LVM_VGS="$(clustered_vgs)"
if [ -n "$LVM_VGS" ]; then
  action "Deactivating clustered VG(s):" ${lvm_vgchange} -anl $LVM_VGS || return $?

but start is missing LVM_VGS="$(clustered_vgs)", so it activates everything, but later it deactivates only clustered one....
Comment 4 digimer 2011-03-14 16:05:58 EDT
I got the 2.02.74-5.el5_6.1 RPMs, but to install them I would need to also
update device-mapper, which I can't do without a --force. I'm not too
comfortable going to such lengths. Was the information above enough to give a
clearer picture of the problem? If so, is there no fix available for v5.5
proper?

Thanks.
Comment 5 digimer 2011-03-14 16:06:53 EDT
In my case, the only two LVs are both clustered. The base system does not use LVM at all. I will try those changes and report back shortly.
Comment 6 digimer 2011-03-14 16:13:52 EDT
Created attachment 484297 [details]
clvmd init.d script from lvm2-cluster-2.02.56-7.el5_5.4

I may be lacking coffee, but my /etc/init.d/clvmd seems to not mesh. I'm attaching it if you wouldn't mind taking a peak.
Comment 7 digimer 2011-03-14 16:45:47 EDT
Ok, I think this is not a bug. I looked closer at what the init.d script, and realized that nothing was coming out of:

/usr/sbin/vgdisplay 2> /dev/null | awk 'BEGIN {RS=VG Name} {if (/Clustered/) print ;}'

I deleted the LVs and the VG, then re-created the VG explicitly using the 'vgcreate -c y ...' switch. This seems to have fixed it. So now the question is;

I created the VG and LVs *after* the cluster was created, but without the 'clvmd' daemon running. This seems to mean that the man page is somewhat inaccurate:

       -c, --clustered {y|n}
              If clustered locking is enabled, this defaults to  y  indicating
              that  this  Volume Group is shared with other nodes in the clus‐
              ter.

              If the new Volume Group contains only local disks that  are  not
              visible  on the other nodes, you must specify --clustered n.  If
              the cluster infrastructure is unavailable on a  particular  node
              at  a  particular time, you may still be able to use such Volume
              Groups.

I would interpret "If clustered locking is enabled" to mean that 'lvmconf --enable-cluster' had been run. Perhaps this should be changed to something like "If clusterd locking is enabled and the clvmd daemon is running, this defaults to 'y'". Perhaps though just mentioning 'clvmd' would suffice, if it depends on 'lvmconf --enable-cluster' to have be run.
Comment 8 Milan Broz 2011-03-22 09:42:58 EDT
"If clustered locking is enabled" means if locking_type in lvm.conf is set to cluster locking. lvmconf --enable-cluster is just wrapper to edit this paramater.

If you have set fallback to local locking and run vgcreate, it prints error that cluster locking is not available and uses local locking (and creates local VG). So clustered locking is not enabled in this situation in fact, I think the description is sufficient here.

(Currently is clvmd the only provider of cluster locking but in fact it can be 3rd party program, so mentioning clvmd explicitly is not good idea.)
Comment 9 Milan Broz 2011-03-22 10:07:45 EDT
Despite I think that clvmd script should not activate local volumes (while it doesn't deactivate them when stopping service) this change is quite problematic, there can be other services depending on this behaviour.

You can workaround that by defining LVM_VGS environment variable, then initscipt will activate only these VGs.
(Note I am describing clvmd initsctipt in 5.6 release, is slightly changed, please update.)

Closing this wontfix for now, if you need this please use open bug against Fedora (or request this through RHEL support as new feature), thanks.
Comment 10 digimer 2011-03-22 14:02:53 EDT
Milan,

  Thanks kindly for the feedback. I'll adapt to ensure manual enabling and disabling of VGs. Might this be something that could be discussed in the [c]lvm man page, to provide clarity?
Comment 11 Milan Broz 2011-03-22 14:22:56 EDT
ok, let's move this bug against Fedora, I think we should change clvmd initscript to not activate everything but such chang is not possible in RHEL5 (At least not now).

If you have some idea what should be changed in manual page, please add a comment here, thanks.
Comment 12 digimer 2011-03-22 14:36:23 EDT
As is, I'd probably add a comment to the DESCRIPTION; Something like:

clvmd is the daemon that distributes LVM metadata updates around a cluster. It must be running on all nodes in the cluster and will give an error if a node in the cluster does not have this daemon running. Note that clvmd is not responsible for making volumes active on start or deactivating volumes on stop.

As a second point, might it be worth adding an option (in future versions), that would tell clvmd to *try* to de/activate LVs? Perhaps something like:

-m

     Have clvmd manage clustered logical volumes by attempting to activate LVs on start and deactivate LVs on stop.

If this is adopted, a decision would need to be make whether a failure to de/activate LVs should cause the init script to fail. If not, a clause should be added to the above alerting the user that they are responsible for ensuring successful de/activation usage. Personally, I'd argue that the use of a switch like this should cause a failure in the init script with an appropriate error message.

  Thanks.
Comment 13 Fedora End Of Life 2013-04-03 15:31:56 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Comment 14 Jan Kurik 2015-07-15 11:16:56 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Note You need to log in before you can comment on or make changes to this bug.