Bug 1378817
| Summary: | ClusterMon will not kill crm_mon process correctly | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Oyvind Albrigtsen <oalbrigt> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.2 | CC: | abeekhof, agk, cfeist, cluster-maint, cluster-qe, fdinitto, kgaillot, mnovacek, pzimek |
| Target Milestone: | rc | ||
| Target Release: | 7.4 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-1.1.16-1.el7 | Doc Type: | No Doc Update |
| Doc Text: |
This rare and minor issue was not reported by a customer, and does need to be in the 7.4 release notes.
|
Story Points: | --- |
| Clone Of: | 1360234 | Environment: | |
| Last Closed: | 2017-08-01 17:54:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1360234, 1385753 | ||
| Bug Blocks: | |||
|
Description
Oyvind Albrigtsen
2016-09-23 10:22:29 UTC
*** Bug 1385753 has been marked as a duplicate of this bug. *** Fixed upstream by commit 7b303943
I have verified that ClusterMon resource agent is correctly recognized as
failed in paceamker-1.1.16-9
---
Common setup:
* configure cluster with fencing and ClusterMon resource [1]
before the fix (pacemaker pacemaker-1.1.15-11.el7.x86_64)
=========================================================
[root@virt-136 ~]# pcs resource
...
cmon (ocf::pacemaker:ClusterMon): Started virt-136
[root@virt-136 ~]# ps axf | grep crm_mon
15327 pts/0 S+ 0:00 \_ grep --color=auto crm_mon
15284 ? S 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_cmon.pid -d -i 15 -h /tmp/ClusterMon_cmon.html
[root@virt-136 ~]# cat /tmp/ClusterMon_cmon.pid
15284
G
[root@virt-136 ~]# pcs node maintenance virt-136
[root@virt-136 ~]# pcs resource
...
cmon (ocf::pacemaker:ClusterMon): Started virt-136 (unmanaged)
[root@virt-136 ~]# kill -9 15284
[root@virt-136 ~]# echo 1 > /tmp/ClusterMon_cmon.pid
[root@virt-136 ~]# pcs node unmaintenance virt-136
[root@virt-136 ~]# pcs resource debug-monitor cmon
Operation monitor for cmon (ocf:pacemaker:ClusterMon) returned 0
[root@virt-136 ~]# pcs resource
...
cmon (ocf::pacemaker:ClusterMon): Started virt-136
[root@virt-136 ~]# ps axf | grep crm_mon
15546 pts/0 S+ 0:00 \_ grep --color=auto crm_mon
after the fix (pacemaker-1.1.16-9.el7.x86_64)
=============================================
[root@virt-136 ~]# pcs resource
cmon (ocf::pacemaker:ClusterMon): Started virt-136
[root@virt-136 ~]# ps axf | grep crm_mon
10637 pts/0 S+ 0:00 \_ grep --color=auto crm_mon
10570 ? S 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_cmon.pid -d -i 15 -h /tmp/ClusterMon_cmon.html
[root@virt-136 ~]# cat /tmp/ClusterMon_cmon.pid
10570
[root@virt-136 ~]# pcs node maintenance virt-136
[root@virt-136 ~]# kill -9 10570
[root@virt-136 ~]# echo 1 > /tmp/ClusterMon_cmon.pid
[root@virt-136 ~]# pcs node unmaintenance virt-136
[root@virt-136 ~]# pcs resource debug-monitor cmon
Operation monitor for cmon (ocf:pacemaker:ClusterMon) returned 0
[root@virt-136 ~]# pcs resource
cmon (ocf::pacemaker:ClusterMon): Started virt-136
[root@virt-136 ~]# ps axf | grep crm_mon
10783 pts/0 S+ 0:00 \_ grep --color=auto crm_mon
10743 ? S 0:00 /usr/sbin/crm_mon -p /tmp/ClusterMon_cmon.pid -d -i 15 -h /tmp/ClusterMon_cmon.html
-----
(1) pcs config
[root@virt-136 ~]# pcs config
Cluster Name: STSRHTS2420
Corosync Nodes:
virt-134 virt-135 virt-136
Pacemaker Nodes:
virt-134 virt-135 virt-136
Resources:
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Attributes: with_cmirrord=1
Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
start interval=0s timeout=90 (clvmd-start-interval-0s)
stop interval=0s timeout=90 (clvmd-stop-interval-0s)
Resource: cmon (class=ocf provider=pacemaker type=ClusterMon)
Operations: monitor interval=10 timeout=20 (cmon-monitor-interval-10)
start interval=0s timeout=20 (cmon-start-interval-0s)
stop interval=0s timeout=20 (cmon-stop-interval-0s)
Stonith Devices:
Resource: fence-virt-134 (class=stonith type=fence_xvm)
Attributes: pcmk_host_check=static-list pcmk_host_list=virt-134 pcmk_host_map=virt-134:virt-134.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-virt-134-monitor-interval-60s)
Resource: fence-virt-135 (class=stonith type=fence_xvm)
Attributes: pcmk_host_check=static-list pcmk_host_list=virt-135 pcmk_host_map=virt-135:virt-135.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-virt-135-monitor-interval-60s)
Resource: fence-virt-136 (class=stonith type=fence_xvm)
Attributes: pcmk_host_check=static-list pcmk_host_list=virt-136 pcmk_host_map=virt-136:virt-136.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-virt-136-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Resource: cmon
Enabled on: virt-136 (score:INFINITY) (id:location-cmon-virt-136-INFINITY)
Ordering Constraints:
start dlm-clone then start clvmd-clone (kind:Mandatory)
Colocation Constraints:
clvmd-clone with dlm-clone (score:INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: STSRHTS2420
dc-version: 1.1.15-11.el7-e174ec8
have-watchdog: false
no-quorum-policy: freeze
Quorum:
Options:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1862 |