Bug 684896 - clvmd sets LVs to 'active' when starting, does not set them to inactive on stop
Summary: clvmd sets LVs to 'active' when starting, does not set them to inactive on stop
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 23
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Peter Rajnoha
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-14 18:22 UTC by Madison Kelly
Modified: 2016-12-20 12:06 UTC (History)
14 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-12-20 12:06:25 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
clvmd init.d script from lvm2-cluster-2.02.56-7.el5_5.4 (2.76 KB, text/plain)
2011-03-14 20:13 UTC, Madison Kelly
no flags Details

Description Madison Kelly 2011-03-14 18:22:43 UTC
Description of problem:

When you start /etc/init.d/clvmd, the LVs get set to active. When stopping clvmd though, they remain flagged as active rather than being set to inactive (equiv. of 'lvchange -an /path/to/dev'). This means that the underlying storage is "locked" until and unless clvmd is restarted and the LVs are manually set to inactive. This can be seen easily be creating a simple Primary/Primary DRBD resource and using it as a PV. When clvmd is stopped, you can not stop DRBD.

Version-Release number of selected component (if applicable):

cman-2.0.115-34.el5_5.4
lvm2-cluster-2.02.56-7.el5_5.4

How reproducible:

100%

Steps to Reproduce:
1. Create a simple cluster and start cman.
2. Run 'lvmconf --enable-cluster'.
3. Create a DRBD resource (ie: /dev/drbd0).
4. Add the DRBD resource as the CLVM's PV. Create a VG and a simple LV.
5. Start clvmd. Check that the LV is 'active'.
6. Stop clvmd.
7. Try to change the DRBD resource to 'Secondary'; it will fail.
8. Restart clvmd, manually deactivate the LV via 'lvchange -an ...'
9. Stop clvmd.
10. Again try to change the DRBD resource to 'Secondary'; it will work this time.
  
Actual results:

Underlying storage is locked after clvmd starts/stops.

Expected results:

Underlying storage is now locked.

Additional info:

I do not believe this is limited to DRBD, but I do not have access to SAN/iSCSI to test.

Comment 1 Milan Broz 2011-03-14 18:34:49 UTC
"service clvmd stop" should deactivate all *clustered* LVs (IOW LVs in clustered VGs) before it stops the clvmd operation. Non-clustered LVs remains active. (There is fallback to local locking so it still works even without clvmd).

What exactly here is going wrong?

Please test it with last release:
lvm2-2.02.74-5.el5_6.1	lvm2-cluster-2.02.74-3.el5_6.1

Comment 2 Madison Kelly 2011-03-14 19:26:25 UTC
The version I've got is the most recent available for EL5.5. I can try to find the updated version outside of the repos though. I'll do so and report back.

Before I start clvmd, I see this:

==============
# lvdisplay 
  connect() failed on local socket: No such file or directory
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  --- Logical volume ---
  LV Name                /dev/drbd0_vg0/xen_shared
  VG Name                drbd0_vg0
  LV UUID                EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                10.00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
==============

Then I start clvmd and it becomes:

==============
# /etc/init.d/clvmd start
Starting clvmd:                                            [  OK  ]
Activating VGs:   2 logical volume(s) in volume group "drbd0_vg0" now active
                                                           [  OK  ]
# lvdisplay 
  --- Logical volume ---
  LV Name                /dev/drbd0_vg0/xen_shared
  VG Name                drbd0_vg0
  LV UUID                EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                10.00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     768
  Block device           253:0
==============

Then I stop 'clvmd' and check the 'lvdisplay' again, and the LV is still 'available'.

==============
# /etc/init.d/clvmd stop
Stopping clvm:                                             [  OK  ]
# lvdisplay 
  connect() failed on local socket: No such file or directory
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  --- Logical volume ---
  LV Name                /dev/drbd0_vg0/xen_shared
  VG Name                drbd0_vg0
  LV UUID                EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                10.00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     768
  Block device           253:0
==============

Now if I try to 'lvchange' the LV it fails, obviously. As does an attempt to change the underlying DRBD resource.

==============
# lvchange -an /dev/drbd0_vg0/xen_shared 
  connect() failed on local socket: No such file or directory
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
# drbdadm secondary r0
0: State change failed: (-12) Device is held open by someone
Command 'drbdsetup 0 secondary' terminated with exit code 11
==============

Now if I restart 'clvmd', manually 'lvchange' the LV and stop 'clvmd' again, things are as I would expect them to be (ignore the second LV, it's the same so I've skipped it for brevity's sake).

==============
# /etc/init.d/clvmd start
Starting clvmd:                                            [  OK  ]
Activating VGs:   2 logical volume(s) in volume group "drbd0_vg0" now active
                                                           [  OK  ]
# lvchange -an /dev/drbd0_vg0/xen_shared
# lvchange -an /dev/drbd0_vg0/an_vm1 
# /etc/init.d/clvmd stop
Stopping clvm:                                             [  OK  ]
[root@xenmaster013 ~]# lvdisplay 
  connect() failed on local socket: No such file or directory
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  --- Logical volume ---
  LV Name                /dev/drbd0_vg0/xen_shared
  VG Name                drbd0_vg0
  LV UUID                EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                10.00 GB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
==============

And of course, I can now alter the underlying storage:

==============
# drbdadm secondary r0
[root@xenmaster013 ~]# cat /proc/drbd 
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild.org, 2010-06-04 08:04:27
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:36 nr:36 dw:72 dr:12300 al:3 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
==============

Thanks for looking into this.

Comment 3 Milan Broz 2011-03-14 19:55:24 UTC
So local LVs are wrongly activated when clvmd starts, I think there is missing one line in clvmd initcripts.

stop is doing this:

[ -z "$LVM_VGS" ] && LVM_VGS="$(clustered_vgs)"
if [ -n "$LVM_VGS" ]; then
  action "Deactivating clustered VG(s):" ${lvm_vgchange} -anl $LVM_VGS || return $?

but start is missing LVM_VGS="$(clustered_vgs)", so it activates everything, but later it deactivates only clustered one....

Comment 4 Madison Kelly 2011-03-14 20:05:58 UTC
I got the 2.02.74-5.el5_6.1 RPMs, but to install them I would need to also
update device-mapper, which I can't do without a --force. I'm not too
comfortable going to such lengths. Was the information above enough to give a
clearer picture of the problem? If so, is there no fix available for v5.5
proper?

Thanks.

Comment 5 Madison Kelly 2011-03-14 20:06:53 UTC
In my case, the only two LVs are both clustered. The base system does not use LVM at all. I will try those changes and report back shortly.

Comment 6 Madison Kelly 2011-03-14 20:13:52 UTC
Created attachment 484297 [details]
clvmd init.d script from lvm2-cluster-2.02.56-7.el5_5.4

I may be lacking coffee, but my /etc/init.d/clvmd seems to not mesh. I'm attaching it if you wouldn't mind taking a peak.

Comment 7 Madison Kelly 2011-03-14 20:45:47 UTC
Ok, I think this is not a bug. I looked closer at what the init.d script, and realized that nothing was coming out of:

/usr/sbin/vgdisplay 2> /dev/null | awk 'BEGIN {RS=VG Name} {if (/Clustered/) print ;}'

I deleted the LVs and the VG, then re-created the VG explicitly using the 'vgcreate -c y ...' switch. This seems to have fixed it. So now the question is;

I created the VG and LVs *after* the cluster was created, but without the 'clvmd' daemon running. This seems to mean that the man page is somewhat inaccurate:

       -c, --clustered {y|n}
              If clustered locking is enabled, this defaults to  y  indicating
              that  this  Volume Group is shared with other nodes in the clus‐
              ter.

              If the new Volume Group contains only local disks that  are  not
              visible  on the other nodes, you must specify --clustered n.  If
              the cluster infrastructure is unavailable on a  particular  node
              at  a  particular time, you may still be able to use such Volume
              Groups.

I would interpret "If clustered locking is enabled" to mean that 'lvmconf --enable-cluster' had been run. Perhaps this should be changed to something like "If clusterd locking is enabled and the clvmd daemon is running, this defaults to 'y'". Perhaps though just mentioning 'clvmd' would suffice, if it depends on 'lvmconf --enable-cluster' to have be run.

Comment 8 Milan Broz 2011-03-22 13:42:58 UTC
"If clustered locking is enabled" means if locking_type in lvm.conf is set to cluster locking. lvmconf --enable-cluster is just wrapper to edit this paramater.

If you have set fallback to local locking and run vgcreate, it prints error that cluster locking is not available and uses local locking (and creates local VG). So clustered locking is not enabled in this situation in fact, I think the description is sufficient here.

(Currently is clvmd the only provider of cluster locking but in fact it can be 3rd party program, so mentioning clvmd explicitly is not good idea.)

Comment 9 Milan Broz 2011-03-22 14:07:45 UTC
Despite I think that clvmd script should not activate local volumes (while it doesn't deactivate them when stopping service) this change is quite problematic, there can be other services depending on this behaviour.

You can workaround that by defining LVM_VGS environment variable, then initscipt will activate only these VGs.
(Note I am describing clvmd initsctipt in 5.6 release, is slightly changed, please update.)

Closing this wontfix for now, if you need this please use open bug against Fedora (or request this through RHEL support as new feature), thanks.

Comment 10 Madison Kelly 2011-03-22 18:02:53 UTC
Milan,

  Thanks kindly for the feedback. I'll adapt to ensure manual enabling and disabling of VGs. Might this be something that could be discussed in the [c]lvm man page, to provide clarity?

Comment 11 Milan Broz 2011-03-22 18:22:56 UTC
ok, let's move this bug against Fedora, I think we should change clvmd initscript to not activate everything but such chang is not possible in RHEL5 (At least not now).

If you have some idea what should be changed in manual page, please add a comment here, thanks.

Comment 12 Madison Kelly 2011-03-22 18:36:23 UTC
As is, I'd probably add a comment to the DESCRIPTION; Something like:

clvmd is the daemon that distributes LVM metadata updates around a cluster. It must be running on all nodes in the cluster and will give an error if a node in the cluster does not have this daemon running. Note that clvmd is not responsible for making volumes active on start or deactivating volumes on stop.

As a second point, might it be worth adding an option (in future versions), that would tell clvmd to *try* to de/activate LVs? Perhaps something like:

-m

     Have clvmd manage clustered logical volumes by attempting to activate LVs on start and deactivate LVs on stop.

If this is adopted, a decision would need to be make whether a failure to de/activate LVs should cause the init script to fail. If not, a clause should be added to the above alerting the user that they are responsible for ensuring successful de/activation usage. Personally, I'd argue that the use of a switch like this should cause a failure in the init script with an appropriate error message.

  Thanks.

Comment 13 Fedora End Of Life 2013-04-03 19:31:56 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 14 Jan Kurik 2015-07-15 15:16:56 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 15 Fedora End Of Life 2016-11-24 10:30:00 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Fedora End Of Life 2016-12-20 12:06:25 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.