Description of problem: When you start /etc/init.d/clvmd, the LVs get set to active. When stopping clvmd though, they remain flagged as active rather than being set to inactive (equiv. of 'lvchange -an /path/to/dev'). This means that the underlying storage is "locked" until and unless clvmd is restarted and the LVs are manually set to inactive. This can be seen easily be creating a simple Primary/Primary DRBD resource and using it as a PV. When clvmd is stopped, you can not stop DRBD. Version-Release number of selected component (if applicable): cman-2.0.115-34.el5_5.4 lvm2-cluster-2.02.56-7.el5_5.4 How reproducible: 100% Steps to Reproduce: 1. Create a simple cluster and start cman. 2. Run 'lvmconf --enable-cluster'. 3. Create a DRBD resource (ie: /dev/drbd0). 4. Add the DRBD resource as the CLVM's PV. Create a VG and a simple LV. 5. Start clvmd. Check that the LV is 'active'. 6. Stop clvmd. 7. Try to change the DRBD resource to 'Secondary'; it will fail. 8. Restart clvmd, manually deactivate the LV via 'lvchange -an ...' 9. Stop clvmd. 10. Again try to change the DRBD resource to 'Secondary'; it will work this time. Actual results: Underlying storage is locked after clvmd starts/stops. Expected results: Underlying storage is now locked. Additional info: I do not believe this is limited to DRBD, but I do not have access to SAN/iSCSI to test.
"service clvmd stop" should deactivate all *clustered* LVs (IOW LVs in clustered VGs) before it stops the clvmd operation. Non-clustered LVs remains active. (There is fallback to local locking so it still works even without clvmd). What exactly here is going wrong? Please test it with last release: lvm2-2.02.74-5.el5_6.1 lvm2-cluster-2.02.74-3.el5_6.1
The version I've got is the most recent available for EL5.5. I can try to find the updated version outside of the repos though. I'll do so and report back. Before I start clvmd, I see this: ============== # lvdisplay connect() failed on local socket: No such file or directory WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. --- Logical volume --- LV Name /dev/drbd0_vg0/xen_shared VG Name drbd0_vg0 LV UUID EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc LV Write Access read/write LV Status NOT available LV Size 10.00 GB Current LE 2560 Segments 1 Allocation inherit Read ahead sectors auto ============== Then I start clvmd and it becomes: ============== # /etc/init.d/clvmd start Starting clvmd: [ OK ] Activating VGs: 2 logical volume(s) in volume group "drbd0_vg0" now active [ OK ] # lvdisplay --- Logical volume --- LV Name /dev/drbd0_vg0/xen_shared VG Name drbd0_vg0 LV UUID EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc LV Write Access read/write LV Status available # open 0 LV Size 10.00 GB Current LE 2560 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 768 Block device 253:0 ============== Then I stop 'clvmd' and check the 'lvdisplay' again, and the LV is still 'available'. ============== # /etc/init.d/clvmd stop Stopping clvm: [ OK ] # lvdisplay connect() failed on local socket: No such file or directory WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. --- Logical volume --- LV Name /dev/drbd0_vg0/xen_shared VG Name drbd0_vg0 LV UUID EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc LV Write Access read/write LV Status available # open 0 LV Size 10.00 GB Current LE 2560 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 768 Block device 253:0 ============== Now if I try to 'lvchange' the LV it fails, obviously. As does an attempt to change the underlying DRBD resource. ============== # lvchange -an /dev/drbd0_vg0/xen_shared connect() failed on local socket: No such file or directory WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. # drbdadm secondary r0 0: State change failed: (-12) Device is held open by someone Command 'drbdsetup 0 secondary' terminated with exit code 11 ============== Now if I restart 'clvmd', manually 'lvchange' the LV and stop 'clvmd' again, things are as I would expect them to be (ignore the second LV, it's the same so I've skipped it for brevity's sake). ============== # /etc/init.d/clvmd start Starting clvmd: [ OK ] Activating VGs: 2 logical volume(s) in volume group "drbd0_vg0" now active [ OK ] # lvchange -an /dev/drbd0_vg0/xen_shared # lvchange -an /dev/drbd0_vg0/an_vm1 # /etc/init.d/clvmd stop Stopping clvm: [ OK ] [root@xenmaster013 ~]# lvdisplay connect() failed on local socket: No such file or directory WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. --- Logical volume --- LV Name /dev/drbd0_vg0/xen_shared VG Name drbd0_vg0 LV UUID EftI2k-MrDW-k10a-6dDB-Ib2B-J2iK-A0cdrc LV Write Access read/write LV Status NOT available LV Size 10.00 GB Current LE 2560 Segments 1 Allocation inherit Read ahead sectors auto ============== And of course, I can now alter the underlying storage: ============== # drbdadm secondary r0 [root@xenmaster013 ~]# cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild.org, 2010-06-04 08:04:27 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r---- ns:36 nr:36 dw:72 dr:12300 al:3 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 ============== Thanks for looking into this.
So local LVs are wrongly activated when clvmd starts, I think there is missing one line in clvmd initcripts. stop is doing this: [ -z "$LVM_VGS" ] && LVM_VGS="$(clustered_vgs)" if [ -n "$LVM_VGS" ]; then action "Deactivating clustered VG(s):" ${lvm_vgchange} -anl $LVM_VGS || return $? but start is missing LVM_VGS="$(clustered_vgs)", so it activates everything, but later it deactivates only clustered one....
I got the 2.02.74-5.el5_6.1 RPMs, but to install them I would need to also update device-mapper, which I can't do without a --force. I'm not too comfortable going to such lengths. Was the information above enough to give a clearer picture of the problem? If so, is there no fix available for v5.5 proper? Thanks.
In my case, the only two LVs are both clustered. The base system does not use LVM at all. I will try those changes and report back shortly.
Created attachment 484297 [details] clvmd init.d script from lvm2-cluster-2.02.56-7.el5_5.4 I may be lacking coffee, but my /etc/init.d/clvmd seems to not mesh. I'm attaching it if you wouldn't mind taking a peak.
Ok, I think this is not a bug. I looked closer at what the init.d script, and realized that nothing was coming out of: /usr/sbin/vgdisplay 2> /dev/null | awk 'BEGIN {RS=VG Name} {if (/Clustered/) print ;}' I deleted the LVs and the VG, then re-created the VG explicitly using the 'vgcreate -c y ...' switch. This seems to have fixed it. So now the question is; I created the VG and LVs *after* the cluster was created, but without the 'clvmd' daemon running. This seems to mean that the man page is somewhat inaccurate: -c, --clustered {y|n} If clustered locking is enabled, this defaults to y indicating that this Volume Group is shared with other nodes in the clus‐ ter. If the new Volume Group contains only local disks that are not visible on the other nodes, you must specify --clustered n. If the cluster infrastructure is unavailable on a particular node at a particular time, you may still be able to use such Volume Groups. I would interpret "If clustered locking is enabled" to mean that 'lvmconf --enable-cluster' had been run. Perhaps this should be changed to something like "If clusterd locking is enabled and the clvmd daemon is running, this defaults to 'y'". Perhaps though just mentioning 'clvmd' would suffice, if it depends on 'lvmconf --enable-cluster' to have be run.
"If clustered locking is enabled" means if locking_type in lvm.conf is set to cluster locking. lvmconf --enable-cluster is just wrapper to edit this paramater. If you have set fallback to local locking and run vgcreate, it prints error that cluster locking is not available and uses local locking (and creates local VG). So clustered locking is not enabled in this situation in fact, I think the description is sufficient here. (Currently is clvmd the only provider of cluster locking but in fact it can be 3rd party program, so mentioning clvmd explicitly is not good idea.)
Despite I think that clvmd script should not activate local volumes (while it doesn't deactivate them when stopping service) this change is quite problematic, there can be other services depending on this behaviour. You can workaround that by defining LVM_VGS environment variable, then initscipt will activate only these VGs. (Note I am describing clvmd initsctipt in 5.6 release, is slightly changed, please update.) Closing this wontfix for now, if you need this please use open bug against Fedora (or request this through RHEL support as new feature), thanks.
Milan, Thanks kindly for the feedback. I'll adapt to ensure manual enabling and disabling of VGs. Might this be something that could be discussed in the [c]lvm man page, to provide clarity?
ok, let's move this bug against Fedora, I think we should change clvmd initscript to not activate everything but such chang is not possible in RHEL5 (At least not now). If you have some idea what should be changed in manual page, please add a comment here, thanks.
As is, I'd probably add a comment to the DESCRIPTION; Something like: clvmd is the daemon that distributes LVM metadata updates around a cluster. It must be running on all nodes in the cluster and will give an error if a node in the cluster does not have this daemon running. Note that clvmd is not responsible for making volumes active on start or deactivating volumes on stop. As a second point, might it be worth adding an option (in future versions), that would tell clvmd to *try* to de/activate LVs? Perhaps something like: -m Have clvmd manage clustered logical volumes by attempting to activate LVs on start and deactivate LVs on stop. If this is adopted, a decision would need to be make whether a failure to de/activate LVs should cause the init script to fail. If not, a clause should be added to the above alerting the user that they are responsible for ensuring successful de/activation usage. Personally, I'd argue that the use of a switch like this should cause a failure in the init script with an appropriate error message. Thanks.
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle. Changing version to '23'. (As we did not run this process for some time, it could affect also pre-Fedora 23 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23
This message is a reminder that Fedora 23 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 23. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '23'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 23 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.