Bug 1436696

Summary: pacemaker can catch systemd unit reload as not running
Product: Red Hat Enterprise Linux 7 Reporter: Markus Frosch <markus.frosch>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: abeekhof, cluster-maint, mnovacek
Target Milestone: rc   
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.16-3.el7 Doc Type: Bug Fix
Doc Text:
Cause: If a systemd-based cluster resource were in the "reloading" state when monitored by Pacemaker, Pacemaker would consider that to be a monitor failure. Consequence: A systemd-based resource could be unnecessarily recovered by the cluster. Fix: Pacemaker now considers "reloading" to be a successful monitor state for a systemd-based resource. Result: Reloads of a systemd-based resource do not trigger recovery.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 17:54:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Markus Frosch 2017-03-28 13:00:47 UTC
Description of problem:

When a reload via systemd and the monitor action of pacemaker run at the same time, pacemaker will detect the resource "not running".

Resulting in an increased fail-count and pacemaker restarting the service.

This has been addressed at upstream, but not yet released: https://github.com/ClusterLabs/pacemaker/pull/1191

Is there a chance to include the patch in an upcoming pacemaker update?

Comment 2 Ken Gaillot 2017-03-28 14:15:04 UTC
Yes, actually the fix is already in the build planned for 7.4.

QA: To test, create a dummy systemd service on all nodes with:

yum install systemd-python
cat > /usr/lib/systemd/system/bz1436696.service <<EOF
[Unit]
Description=BZ#1436696 Test Unit

[Service]
Type=notify
ExecStart=/usr/bin/python -c 'import time, systemd.daemon; systemd.daemon.notify("READY=1"); time.sleep(86400)'
ExecStop=/bin/sh -c '[ -n "$MAINPID" ] && kill -s KILL $MAINPID'
ExecReload=/bin/sh -c 'sleep 10'
EOF
systemctl daemon-reload

Then configure a resource:

# pcs resource create bz1436696test systemd:bz1436696 op monitor interval=9s

Then on the node running the service:

# systemctl reload bz1436696

Before the fix, it should result in a monitor error; after the fix, there should be no error.

Comment 5 michal novacek 2017-05-24 15:30:39 UTC
I have verified that reloading systemd unit at the time of monitor will not
cause monitoring action to fail in pacemaker-1.1.16-9.

-----

1/ Create new dummy systemctl resource which has reload action (sleep) taking
    longer than the monitor interval of the resource [1]

2/ Create cluster with fencing and newly created systemd resource with shorter
    monitor interval than the created dummy resource have [2]

3/ Start the resource 
[root@virt-135 ~]# pcs resource
...
 bz1436696test  (systemd:bz1436696):    Started virt-135

[root@virt-135 ~]# systemctl is-active bz1436696
active


before the patch (pacemaker-1.1.15-11.el7)
==========================================


[root@virt-135 ~]# systemctl reload bz1436696
Job for bz1436696.service canceled.

[root@virt-135 ~]# date
Wed May 24 17:19:15 CEST 2017

[root@virt-135 ~]# grep bz1436696test_monitor_9000 /var/log/cluster/corosync.log
May 24 17:18:35 [8480] virt-135  cib:     info: cib_perform_op: +  /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='bz1436696test']/lrm_rsc_op[@id='bz1436696test_monitor_9000']:  @transition-key=5:18:0:415724a7-08e1-4dea-b809-56372ffe1866, @transition-magic=0:0;5:18:0:415724a7-08e1-4dea-b809-56372ffe1866, @call-id=93, @last-rc-change=1495639115, @exec-time=3, @queue-time=1
> May 24 17:19:20 [8485] virt-135 crmd:     info: process_lrm_event:      Result of monitor operation for bz1436696test on virt-135: 7 (not running) | call=93 key=bz1436696test_monitor_9000 confirmed=false cib-update=93
May 24 17:19:20 [8485] virt-135 crmd:     info: process_lrm_event:      Result of monitor operation for bz1436696test on virt-135: Cancelled | call=93 key=bz1436696test_monitor_9000 confirmed=true


after the patch (pacemaker-1.1.16-9.el7)
========================================

[root@virt-135 ~]# date
Wed May 24 17:26:47 CEST 2017
[root@virt-135 ~]# systemctl reload bz1436696
[root@virt-135 ~]# echo $?
0

[root@virt-135 ~]# grep bz1436696test_monitor_9000 /var/log/cluster/corosync.log
May 24 17:25:30 [31737] virt-135 crmd:   notice: te_rsc_command:        Initiating monitor operation bz1436696test_monitor_0 on virt-134 | action 15
May 24 17:25:30 [31732] virt-135  cib:     info: cib_perform_op:        ++                                                                <lrm_rsc_op id="bz1436696test_last_0" operation_key="bz1436696test_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.12" transition-key="15:1:7:ffa92c87-7fe1-4557-bdcb-3b3610c343e7" transition-magic="0:7;15:1:7:ffa92c87-7fe1-4557-bdcb-3b3610c343e7" on_node="virt-134" call-id="30" rc-code="7" op-stat
May 24 17:25:30 [31737] virt-135 crmd:     info: match_graph_event:     Action bz1436696test_monitor_0 (15) confirmed on virt-134 (rc=7)

(no monitor failures after systemctl reload have been run)

---

(1) systemctl resource
[root@virt-135 ~]# yum -y install systemd-python

[root@virt-135 ~]# systemctl cat bz1436696
# /usr/lib/systemd/system/bz1436696.service
[Unit]
Description=BZ#1436696 Test Unit

[Service]
Type=notify
ExecStart=/usr/bin/python -c 'import time, systemd.daemon; systemd.daemon.notify("READY=1"); time.sleep(86400)'
ExecStop=/bin/sh -c '[ -n "" ] && kill -s KILL '
ExecReload=/bin/sh -c 'sleep 10'

# /run/systemd/system/bz1436696.service.d/50-pacemaker.conf
[Unit]
Description=Cluster Controlled bz1436696
Before=pacemaker.service

[Service]
Restart=no


(2) resource setup
[root@virt-135 ~]# pcs resource
 Clone Set: dlm-clone [dlm]
     Started: [ virt-134 virt-135 virt-136 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-134 virt-135 virt-136 ]
 bz1436696test  (systemd:bz1436696):    Started virt-135

[root@virt-135 ~]# pcs resource --full
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
               start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
               start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
 Resource: bz1436696test (class=systemd type=bz1436696)
  Operations: monitor interval=9s (bz1436696test-monitor-interval-9s)
              start interval=0s timeout=100 (bz1436696test-start-interval-0s)
              stop interval=0s timeout=100 (bz1436696test-stop-interval-0s)

Comment 6 errata-xmlrpc 2017-08-01 17:54:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1862