Bug 1320520
| Summary: | LVM status checks for clustered LVM in Red Hat Cluster Suite gives false results. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Vedran Zivicnjak <zvedran> | ||||
| Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.7 | CC: | agk, cluster-maint, fdinitto, jruemker, mnovacek, prajnoha | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | resource-agents-3.9.5-37.el6 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-03-21 09:27:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Also consider variants like lvs -o active (instead of using the attr column) or lvs -o+active -S active!="" (doing the matching internally) For example, this should do it as well: test -n "$(lvs --noheadings -o active -S active!='' vgname/lvname)" (device is active) -S active!='' will select only LVs for which the "active" field is no empty and hence the LV is active. Similarly the other way round. See also lvs -S help for more info on various fields you can use in selection and then selection operators. The -S|--select option is available since lvm2 v2.02.107. The "lv_active" field displays string values like "active", "local exclusive", "remote exclusive", "remotely", "locally". But we have also separate binary fields: - lv_active_locally - lv_active_remotely - lv_active_exclusively These can display "" (blank string) or string representation of the value which means "yes". If you use new --binary option then you get the result directly as 0 or 1. So for example: $ lvs vg -o+active_locally LV VG Attr LSize ActLocal lvol0 vg -wi------- 4.00m lvol1 vg -wi-a----- 4.00m active locally $ lvs vg -o+active_locally --binary LV VG Attr LSize ActLocal lvol0 vg -wi------- 4.00m 0 lvol1 vg -wi-a----- 4.00m 1 $ lvs vg -o+active_locally -S 'active_locally=1' LV VG Attr LSize ActLocal lvol1 vg -wi-a----- 4.00m active locally $lvs vg -o+active_locally -S 'active_locally=1' --binary LV VG Attr LSize ActLocal lvol1 vg -wi-a----- 4.00m 1 $ lvs vg -o+active_locally -S 'active_locally="active locally"' LV VG Attr LSize ActLocal lvol1 vg -wi-a----- 4.00m active locally $ lvs vg -o+active_locally -S 'active_locally=0' LV VG Attr LSize ActLocal lvol0 vg -wi------- 4.00m So it's worth a try as it can simplify scripts a bit and make it more compatible with future versions of lvm2 (e.g. the number of lv_attr bits may change). Also, if needed, any combinations are possible, for example: $lvs -S 'active_locally=1 && vg_name=vg && (name=lvol0 || name=lvol1)' $lvs -S 'active_locally=1 || active_remotely=1' So any field as listed in lvs -S help. Maybe give it a few tries first and you'll have more complete picture of it and you'll choose yourself what suits best your script.
I have verified that lvm.sh is able to correctly report status for multiple vgs
with resource-agents-3.9.5-43.
----
common setup:
* cluster running on all nodes
* clvmd running an all nodes
* four cluster vgs created but only two active
[root@virt-009 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
vg5 1 1 0 wz--nc 500,00m 0
vg6 1 1 0 wz--nc 500,00m 0
vg7 1 1 0 wz--nc 500,00m 0
vg8 1 1 0 wz--nc 500,00m 0
vg_virt009 1 2 0 wz--n- 7,83g 0
[root@virt-009 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv5 vg5 -wi------- 500,00m
lv6 vg6 -wi------- 500,00m
lv7 vg7 -wi-a----- 500,00m
lv8 vg8 -wi-a----- 500,00m
lv_root vg_virt009 -wi-ao---- 7,00g
lv_swap vg_virt009 -wi-ao---- 852,00m
before the fix (resource-agents-3.9.5-24.el6)
=============================================
[root@virt-009 ~]# export OCF_RESKEY_lv_name=lv5;
[root@virt-009 ~]# export OCF_RESKEY_vg_name=vg5
[root@virt-009 ~]# bash -x /usr/share/cluster/lvm.sh status
...
[lvm.sh] Getting status
+ '[' -z lv5 ']'
+ lv_status
++ vgs -o attr --noheadings vg5
+ [[ wz--nc =~ .....c ]]
+ lv_status_clustered
++ lvs -o attr --noheadings
+ [[ ! -wi-------
-wi-------
-wi-a-----
-wi-a-----
-wi-ao----
-wi-ao---- =~ ....a. ]]
>> + return 0
>> + exit 0
after the fix (resource-agents-3.9.5-40.el6)
============================================
[root@virt-009 ~]# export OCF_RESKEY_lv_name=lv5;
[root@virt-009 ~]# export OCF_RESKEY_vg_name=vg5
[root@virt-009 ~]# bash -x /usr/share/cluster/lvm.sh status
...
[lvm.sh] Getting status
+ '[' -z lv5 ']'
+ lv_status
++ vgs -o attr --noheadings vg5
+ [[ wz--nc =~ .....c ]]
+ lv_status_clustered
+ declare lv_path=vg5/lv5
++ lvs -o attr --noheadings vg5/lv5
+ [[ ! -wi------- =~ ....a. ]]
>> + return 7
>> + exit 7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0602.html |
Created attachment 1139529 [details] Output of bash -x lvm.sh status command Description of problem: In Red Hat Cluster Suite on RHEL 6.7 with clvm HA-LVM enabled setup, volume groups are created with vgcreate -cy <vgname> <pvpath> command, and with locking_type 3 in /etc/lvm/lvm.conf. In such setup, each LV that is part of cluster resource is given to rgmanager in cluster.conf to be checked periodically. Rgmanager invokes /usr/share/cluster/lvm.sh status command with environment variables set to check status of each cluster managed logical volumes. If the given status check is called with $OCF_RESKEY_lv_name variable set, /usr/share/cluster/lvm.sh status command calls lv_status function found in /usr/share/cluster/lvm_by_lv.sh script. status|monitor) ocf_log notice "Getting status" if [ -z "$OCF_RESKEY_lv_name" ]; then vg_status exit $? else lv_status exit $? fi ;; lv_status function then checks if volume group under $OCF_RESKEY_vg_name environment variable name has "c" clustered flag set, and calls lv_status_clustered function within the same script. function lv_status { # We pass in the VG name to see of the logical volume is clustered if [[ $(vgs -o attr --noheadings $OCF_RESKEY_vg_name) =~ .....c ]]; then lv_status_clustered else lv_status_single fi } lv_status_clustered function should check if given logical volume in volume group is active as defined in function: lv_status_clustered() { # # Check if device is active # if [[ ! "$(lvs -o attr --noheadings $lv_path)" =~ ....a. ]]; then return $OCF_NOT_RUNNING fi return $OCF_SUCCESS } The problem is that variable $lv_path is not defined through the /usr/share/cluster/lvm.sh status sequence for the given environment, which results in executing: if [[ ! "$(lvs -o attr --noheadings)" =~ ....a. ]]; conditional statement which outputs ALL available logical volumes on the system, and checks if ANY of them has active flag set. This behaviour invalidates the intended LV status check of rgmanager. Proposed fix: lv_status_clustered() { + declare lv_path="$OCF_RESKEY_vg_name/$OCF_RESKEY_lv_name" # # Check if device is active # if [[ ! "$(lvs -o attr --noheadings $lv_path)" =~ ....a. ]]; then return $OCF_NOT_RUNNING fi return $OCF_SUCCESS } Version-Release number of selected component (if applicable): resource-agents-3.9.5-24.el6_7.1.x86_64 How reproducible: Always Steps to Reproduce: The simplest way is to setup some number of LVs: # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_testnim1 vg_testnim1 -wi-ao---- 96.00m lv_testnim2 vg_testnim2 -wi-ao---- 96.00m lv_testnim3 vg_testnim3 -wi------- 96.00m lv_testnim4 vg_testnim4 -wi------- 96.00m lv_testnim5 vg_testnim5 -wi------- 96.00m # vgs VG #PV #LV #SN Attr VSize VFree vg_testnim1 1 1 0 wz--nc 96.00m 0 vg_testnim2 1 1 0 wz--nc 96.00m 0 vg_testnim3 1 1 0 wz--nc 96.00m 0 vg_testnim4 1 1 0 wz--nc 96.00m 0 vg_testnim5 1 1 0 wz--nc 96.00m 0 and observe the output of the following commands: export OCF_RESKEY_lv_name=lv_testnim4; export OCF_RESKEY_vg_name=vg_testnim4 ; bash -x /usr/share/cluster/lvm.sh status Check the attachment for the given example. Actual results: Attributes of every logical volume is printed in the conditional statement, status returns 0 for not active logical volume Expected results: Attributes of only the specified logical volume is printed in the conditional statement, status returns non zero for not active logical volume Additional info: This bug possibly puts even more burden on DLM component as cluster locks are communicated n*n times for each lvm status check instead of only n times, depending on the DLM internals.