Bug 1198681
| Summary: | clvmd gets killed (does not start) when there is a broken LV in a VG | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Nenad Peric <nperic> |
| Component: | resource-agents | Assignee: | Fabio Massimo Di Nitto <fdinitto> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.2 | CC: | agk, cluster-maint, fdinitto, heinzm, jbrassow, mnovacek, msnitzer, prajnoha, prockai, sbradley, thornber, zkabelac |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | resource-agents-3.9.5-43.el7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-11-19 04:46:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Nenad Peric
2015-03-04 15:49:15 UTC
It does not appear that the pacemaker resource-agent has any control over whether or not the clvmd starts in the event of a broken LV. I'm reassigning this to the clvmd related component. -- David (In reply to David Vossel from comment #2) > It does not appear that the pacemaker resource-agent has any control over > whether or not the clvmd starts in the event of a broken LV. I'm reassigning > this to the clvmd related component. The clvmd starts all the time, no matter if there are broken/incomplete LVs or not. It's the controlling script that stops the clvmd directly during its processing of "start" action if it finds that some LVs were not activated? I mean, clvmd is started first, then "vgchange -aay" is called to activate volumes - so two steps. If the second step fails to activate some volumes, we should probably keep going and we shouldn't kill the clvmd we started. (In reply to Peter Rajnoha from comment #4) > I mean, clvmd is started first, then "vgchange -aay" is called to activate > volumes - so two steps. If the second step fails to activate some volumes, > we should probably keep going and we shouldn't kill the clvmd we started. I see. This is a bit tricky. The clvm agent is expected to enable all cluster volume groups during the start action. The only way I could allow the clvmd to start and the vgs to remain inactive is by adding a new option to the clvm resource-agent that would explicitly prevent activating vgs after start. Something like, 'activate_vgs=false' People who use this option will have to either only enable it when there is an issue with activating clvmd at startup, or use this option coupled with the ocf:LVM agent to manage activating the vgs outside of the clvm agent. Adding this option is the right thing to do. I've seen a couple of issues with trying to couple the clvmd initialization with vg activation in the ocf:clvm agent. I like the idea of being able to separate the two. -- David
I have verified that clvm has new parameter "activate_vg" and that this
parameter works as expected for clvm in resource-agents-3.9.5-50.el7.x86_64
-----
Cluster is configured this way (1).
[root@virt-151 ~]# rpm -q resource-agents
resource-agents-3.9.5-50.el7.x86_64
[root@virt-151 ~]# vgdisplay shared
--- Volume group ---
VG Name shared
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
Clustered yes <-----
Shared no
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 4,99 GiB
PE Size 4,00 MiB
Total PE 1278
Alloc PE / Size 1278 / 4,99 GiB
Free PE / Size 0 / 0
VG UUID r4wHR1-Y1mb-kIVx-tH97-kUVr-wNgs-2gitP9
BEFORE THE FIX (resource-agents-3.9.5-40.el7.x86_64)
----------------------------------------------------
[root@virt-151 ~]# pcs resource describe clvm
ocf:heartbeat:clvm - clvmd
This agent manages the clvmd daemon.
Resource options:
with_cmirrord: Start with cmirrord (cluster mirror log daemon).
daemon_options: Options to clvmd. Refer to clvmd.8 for detailed descriptions.
[root@virt-151 ~]# pcs resource update clvmd activate_vgs=true
Error: resource option(s): 'activate_vgs', are not recognized for resource type: 'ocf:heartbeat:clvm' (use --force to override)
AFTER THE FIX (resource-agents-3.9.5-50.el7.x86_64)
---------------------------------------------------
[root@virt-151 ~]# pcs resource describe clvm
ocf:heartbeat:clvm - clvmd
This agent manages the clvmd daemon.
Resource options:
with_cmirrord: Start with cmirrord (cluster mirror log daemon).
daemon_options: Options to clvmd. Refer to clvmd.8 for detailed descriptions.
activate_vgs: Whether or not to activate all cluster volume groups after
starting the clvmd or not. Note that clustered volume groups will always be
deactivated before the clvmd stops regardless of what this option is set
to.
[root@virt-151 ~]# lvdisplay shared/shared0
--- Logical volume ---
LV Path /dev/shared/shared0
LV Name shared0
VG Name shared
LV UUID BLLRea-63Tn-3c1T-Jbr6-Tu14-UeJj-OyzCnh
LV Write Access read/write
LV Creation host, time virt-151.cluster-qe.lab.eng.brq.redhat.com, 2015-08-11 00:08:23 +0200
LV Status available <----------------------------------
# open 0
LV Size 4,99 GiB
Current LE 1278
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:2
[root@virt-151 ~]# pcs resource disable clvmd
[root@virt-151 ~]# pcs resource update clvmd activate_vgs=false
[root@virt-151 ~]# pcs resource enable clvmd
[root@virt-151 ~]# pcs resource show clvmd
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Attributes: with_cmirrord=1 activate_vgs=false
Operations: start interval=0s timeout=90 (clvmd-start-timeout-90)
stop interval=0s timeout=90 (clvmd-stop-timeout-90)
monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
[root@virt-151 ~]# lvdisplay shared/shared0
--- Logical volume ---
LV Path /dev/shared/shared0
LV Name shared0
VG Name shared
LV UUID BLLRea-63Tn-3c1T-Jbr6-Tu14-UeJj-OyzCnh
LV Write Access read/write
LV Creation host, time virt-151.cluster-qe.lab.eng.brq.redhat.com, 2015-08-11 00:08:23 +0200
LV Status NOT available <------------------------
LV Size 4,99 GiB
Current LE 1278
Segments 1
Allocation inherit
Read ahead sectors auto
-----
(1)
[root@virt-151 ~]# pcs config
Cluster Name: STSRHTS14613
Corosync Nodes:
virt-151 virt-152 virt-157
Pacemaker Nodes:
virt-151 virt-152 virt-157
Resources:
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: start interval=0s timeout=90 (dlm-start-timeout-90)
stop interval=0s timeout=100 (dlm-stop-timeout-100)
monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Attributes: with_cmirrord=1
Operations: start interval=0s timeout=90 (clvmd-start-timeout-90)
stop interval=0s timeout=90 (clvmd-stop-timeout-90)
monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
Stonith Devices:
Resource: fence-virt-151 (class=stonith type=fence_xvm)
Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-151 pcmk_host_map=virt-151:virt-151.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-virt-151-monitor-interval-60s)
Resource: fence-virt-152 (class=stonith type=fence_xvm)
Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-152 pcmk_host_map=virt-152:virt-152.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-virt-152-monitor-interval-60s)
Resource: fence-virt-157 (class=stonith type=fence_xvm)
Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-157 pcmk_host_map=virt-157:virt-157.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-virt-157-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Ordering Constraints:
start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory)
Colocation Constraints:
clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY)
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: STSRHTS14613
dc-version: 1.1.13-44eb2dd
have-watchdog: false
last-lrm-refresh: 1439368691
no-quorum-policy: freeze
[root@virt-151 ~]# pcs status
Cluster name: STSRHTS14613
Last updated: Wed Aug 12 10:46:24 2015 Last change: Wed Aug 12 10:39:27 2015 by root via cibadmin on virt-151
Stack: corosync
Current DC: virt-157 (version 1.1.13-44eb2dd) - partition with quorum
3 nodes and 9 resources configured
Online: [ virt-151 virt-152 virt-157 ]
Full list of resources:
fence-virt-151 (stonith:fence_xvm): Started virt-157
fence-virt-152 (stonith:fence_xvm): Started virt-151
fence-virt-157 (stonith:fence_xvm): Started virt-152
Clone Set: dlm-clone [dlm]
Started: [ virt-151 virt-152 virt-157 ]
Clone Set: clvmd-clone [clvmd]
Started: [ virt-151 virt-152 virt-157 ]
Failed Actions:
* fence-virt-151_start_0 on virt-151 'unknown error' (1): call=48, status=Error, exitreason='none',
last-rc-change='Wed Aug 12 10:36:55 2015', queued=0ms, exec=5303ms
PCSD Status:
virt-151: Online
virt-152: Online
virt-157: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2190.html |