Bug 1198681
Summary: | clvmd gets killed (does not start) when there is a broken LV in a VG | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Nenad Peric <nperic> |
Component: | resource-agents | Assignee: | Fabio Massimo Di Nitto <fdinitto> |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.2 | CC: | agk, cluster-maint, fdinitto, heinzm, jbrassow, mnovacek, msnitzer, prajnoha, prockai, sbradley, thornber, zkabelac |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | resource-agents-3.9.5-43.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-19 04:46:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nenad Peric
2015-03-04 15:49:15 UTC
It does not appear that the pacemaker resource-agent has any control over whether or not the clvmd starts in the event of a broken LV. I'm reassigning this to the clvmd related component. -- David (In reply to David Vossel from comment #2) > It does not appear that the pacemaker resource-agent has any control over > whether or not the clvmd starts in the event of a broken LV. I'm reassigning > this to the clvmd related component. The clvmd starts all the time, no matter if there are broken/incomplete LVs or not. It's the controlling script that stops the clvmd directly during its processing of "start" action if it finds that some LVs were not activated? I mean, clvmd is started first, then "vgchange -aay" is called to activate volumes - so two steps. If the second step fails to activate some volumes, we should probably keep going and we shouldn't kill the clvmd we started. (In reply to Peter Rajnoha from comment #4) > I mean, clvmd is started first, then "vgchange -aay" is called to activate > volumes - so two steps. If the second step fails to activate some volumes, > we should probably keep going and we shouldn't kill the clvmd we started. I see. This is a bit tricky. The clvm agent is expected to enable all cluster volume groups during the start action. The only way I could allow the clvmd to start and the vgs to remain inactive is by adding a new option to the clvm resource-agent that would explicitly prevent activating vgs after start. Something like, 'activate_vgs=false' People who use this option will have to either only enable it when there is an issue with activating clvmd at startup, or use this option coupled with the ocf:LVM agent to manage activating the vgs outside of the clvm agent. Adding this option is the right thing to do. I've seen a couple of issues with trying to couple the clvmd initialization with vg activation in the ocf:clvm agent. I like the idea of being able to separate the two. -- David I have verified that clvm has new parameter "activate_vg" and that this parameter works as expected for clvm in resource-agents-3.9.5-50.el7.x86_64 ----- Cluster is configured this way (1). [root@virt-151 ~]# rpm -q resource-agents resource-agents-3.9.5-50.el7.x86_64 [root@virt-151 ~]# vgdisplay shared --- Volume group --- VG Name shared System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2 VG Access read/write VG Status resizable Clustered yes <----- Shared no MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 4,99 GiB PE Size 4,00 MiB Total PE 1278 Alloc PE / Size 1278 / 4,99 GiB Free PE / Size 0 / 0 VG UUID r4wHR1-Y1mb-kIVx-tH97-kUVr-wNgs-2gitP9 BEFORE THE FIX (resource-agents-3.9.5-40.el7.x86_64) ---------------------------------------------------- [root@virt-151 ~]# pcs resource describe clvm ocf:heartbeat:clvm - clvmd This agent manages the clvmd daemon. Resource options: with_cmirrord: Start with cmirrord (cluster mirror log daemon). daemon_options: Options to clvmd. Refer to clvmd.8 for detailed descriptions. [root@virt-151 ~]# pcs resource update clvmd activate_vgs=true Error: resource option(s): 'activate_vgs', are not recognized for resource type: 'ocf:heartbeat:clvm' (use --force to override) AFTER THE FIX (resource-agents-3.9.5-50.el7.x86_64) --------------------------------------------------- [root@virt-151 ~]# pcs resource describe clvm ocf:heartbeat:clvm - clvmd This agent manages the clvmd daemon. Resource options: with_cmirrord: Start with cmirrord (cluster mirror log daemon). daemon_options: Options to clvmd. Refer to clvmd.8 for detailed descriptions. activate_vgs: Whether or not to activate all cluster volume groups after starting the clvmd or not. Note that clustered volume groups will always be deactivated before the clvmd stops regardless of what this option is set to. [root@virt-151 ~]# lvdisplay shared/shared0 --- Logical volume --- LV Path /dev/shared/shared0 LV Name shared0 VG Name shared LV UUID BLLRea-63Tn-3c1T-Jbr6-Tu14-UeJj-OyzCnh LV Write Access read/write LV Creation host, time virt-151.cluster-qe.lab.eng.brq.redhat.com, 2015-08-11 00:08:23 +0200 LV Status available <---------------------------------- # open 0 LV Size 4,99 GiB Current LE 1278 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:2 [root@virt-151 ~]# pcs resource disable clvmd [root@virt-151 ~]# pcs resource update clvmd activate_vgs=false [root@virt-151 ~]# pcs resource enable clvmd [root@virt-151 ~]# pcs resource show clvmd Resource: clvmd (class=ocf provider=heartbeat type=clvm) Attributes: with_cmirrord=1 activate_vgs=false Operations: start interval=0s timeout=90 (clvmd-start-timeout-90) stop interval=0s timeout=90 (clvmd-stop-timeout-90) monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s) [root@virt-151 ~]# lvdisplay shared/shared0 --- Logical volume --- LV Path /dev/shared/shared0 LV Name shared0 VG Name shared LV UUID BLLRea-63Tn-3c1T-Jbr6-Tu14-UeJj-OyzCnh LV Write Access read/write LV Creation host, time virt-151.cluster-qe.lab.eng.brq.redhat.com, 2015-08-11 00:08:23 +0200 LV Status NOT available <------------------------ LV Size 4,99 GiB Current LE 1278 Segments 1 Allocation inherit Read ahead sectors auto ----- (1) [root@virt-151 ~]# pcs config Cluster Name: STSRHTS14613 Corosync Nodes: virt-151 virt-152 virt-157 Pacemaker Nodes: virt-151 virt-152 virt-157 Resources: Clone: dlm-clone Meta Attrs: interleave=true ordered=true Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: start interval=0s timeout=90 (dlm-start-timeout-90) stop interval=0s timeout=100 (dlm-stop-timeout-100) monitor interval=30s on-fail=fence (dlm-monitor-interval-30s) Clone: clvmd-clone Meta Attrs: interleave=true ordered=true Resource: clvmd (class=ocf provider=heartbeat type=clvm) Attributes: with_cmirrord=1 Operations: start interval=0s timeout=90 (clvmd-start-timeout-90) stop interval=0s timeout=90 (clvmd-stop-timeout-90) monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s) Stonith Devices: Resource: fence-virt-151 (class=stonith type=fence_xvm) Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-151 pcmk_host_map=virt-151:virt-151.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (fence-virt-151-monitor-interval-60s) Resource: fence-virt-152 (class=stonith type=fence_xvm) Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-152 pcmk_host_map=virt-152:virt-152.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (fence-virt-152-monitor-interval-60s) Resource: fence-virt-157 (class=stonith type=fence_xvm) Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-157 pcmk_host_map=virt-157:virt-157.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (fence-virt-157-monitor-interval-60s) Fencing Levels: Location Constraints: Ordering Constraints: start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory) Colocation Constraints: clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY) Cluster Properties: cluster-infrastructure: corosync cluster-name: STSRHTS14613 dc-version: 1.1.13-44eb2dd have-watchdog: false last-lrm-refresh: 1439368691 no-quorum-policy: freeze [root@virt-151 ~]# pcs status Cluster name: STSRHTS14613 Last updated: Wed Aug 12 10:46:24 2015 Last change: Wed Aug 12 10:39:27 2015 by root via cibadmin on virt-151 Stack: corosync Current DC: virt-157 (version 1.1.13-44eb2dd) - partition with quorum 3 nodes and 9 resources configured Online: [ virt-151 virt-152 virt-157 ] Full list of resources: fence-virt-151 (stonith:fence_xvm): Started virt-157 fence-virt-152 (stonith:fence_xvm): Started virt-151 fence-virt-157 (stonith:fence_xvm): Started virt-152 Clone Set: dlm-clone [dlm] Started: [ virt-151 virt-152 virt-157 ] Clone Set: clvmd-clone [clvmd] Started: [ virt-151 virt-152 virt-157 ] Failed Actions: * fence-virt-151_start_0 on virt-151 'unknown error' (1): call=48, status=Error, exitreason='none', last-rc-change='Wed Aug 12 10:36:55 2015', queued=0ms, exec=5303ms PCSD Status: virt-151: Online virt-152: Online virt-157: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2190.html |