Bug 1704870
| Summary: | A pacemaker_remoted node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Shane Bradley <sbradley> | |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | |
| Status: | CLOSED ERRATA | QA Contact: | pkomarov | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 7.6 | CC: | abeekhof, aherr, cfeist, cluster-maint, ctowsley, jruemker, kgaillot, lmiccini, phagara, pkomarov, rcadova, sbradley, tkajinam, ykulkarn | |
| Target Milestone: | pre-dev-freeze | Keywords: | ZStream | |
| Target Release: | 7.8 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | pacemaker-1.1.21-1.el7 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Pacemaker implicitly ordered all stops needed on a Pacemaker Remote node before the stop of the node's Pacemaker Remote connection, including stops that were implied by fencing of the node. Also, Pacemaker scheduled actions on Pacemaker Remote nodes with a failed connection so that the actions could be done once the connection is recovered, even if the connection wasn't being recovered (for example, if the node was shutting down when the failure occurred).
Consequence: If a Pacemaker Remote node needed to be fenced while it was in the process of shutting down, once the fencing completed pacemaker scheduled probes on the node. The probes fail because the connection is not actually active. Due to the failed probe, a stop is scheduled which also fails, leading to fencing of the node again, and the situation repeats itself indefinitely.
Fix: Pacemaker Remote connection stops are no longer ordered after implied stops, and actions are not scheduled on Pacemaker Remote nodes when the connection is failed and not being started again.
Result: A Pacemaker Remote node that needs to be fenced while it is in the process of shutting down is fenced once, without repeating indefinitely.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1721198 1732335 (view as bug list) | Environment: | ||
| Last Closed: | 2020-03-31 19:41:51 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1703946, 1721198, 1729984, 1732335 | |||
|
Comment 8
Ken Gaillot
2019-06-03 22:30:25 UTC
This has been fixed in the upstream 1.1 branch by these commits: c20f8920 - don't order implied stops relative to a remote connection 938e99f2 - remote state is failed if node is shutting down with connection failure QA: Reproducer is: 1. Configure a cluster with a bare metal Pacemaker Remote node whose connection resource has a reconnect_interval set (the OpenStack setup does this for compute nodes), and a resource that will always run on the remote node (OpenStack has the compute-unfence-trigger clone). 2. pacemaker-remoted's shutdown must take long enough to be interrupted by the power cut. The OpenStack setup seems to do this reliably, but one way to ensure this would be to create a dummy resource with a 5-second delay constrained to run on the remote node (pacemaker-remoted will wait for the dummy resource to stop before completing shutdown). 3. Start a graceful shutdown of the Pacemaker Remote daemon, and before it can complete, hard power down the node. The easiest way to do this is to leave acpid enabled in the remote node OS, and leave IPMI configured to send the power cut signal to the OS then cut power a few seconds later; pressing the power button (or cutting power from the IPMI interface if it's configured to behave similarly) will then cause acpid to notify systemd which will tell pacemaker-remoted to stop. Before: Problems like fencing loops of the remote node, and failures with return code 189 being reported on the remote node. After: Cluster recovers from power cut. QA: Updating Comment 13, here is the reproducer with more detail: 1. Configure a cluster with a bare metal Pacemaker Remote node whose connection resource has a reconnect_interval set, and a resource that will always run on the remote node. The cluster-recheck-interval should be the same or lower than the reconnect_interval. (The OpenStack setup does this for compute nodes, with reconnect_interval and cluster-recheck-interval of 60s, and the compute-unfence-trigger clone running on all compute nodes.) 2. pacemaker-remoted's shutdown must take long enough to be interrupted by the power cut. The OpenStack setup seems to do this reliably, but one way to ensure this would be to create a ocf:pacemaker:Dummy resource with an op_delay (long enough to manually see the resource is stopping and then kill the node) constrained to run on the remote node (pacemaker-remoted will wait for the dummy resource to stop before completing shutdown). 3. Start a graceful shutdown of the Pacemaker Remote daemon, and before it can complete, hard power down the node. For a physical node, you can leave acpid enabled in the remote node OS, so when pressing and holding the power button, acpid will notify systemd which will tell pacemaker-remoted to stop, then the power will drop. For a virtual node, you can run "systemctl stop pacemaker_remote" on the virtual node, then virsh destroy it from the host. In either case, you can ensure in pcs status that the delayed resource is "Stopping" when the node is lost. 4. After some time, the remote connection will be detected as lost. Before the fix, there will be recovery problems such as fencing loops of the remote node. The logs will show actions scheduled for the remote node repeatedly failing with return code 189 (these failures don't show up in pcs status). After the fix, the remote node will be fenced once, and the cluster will recover properly. There may be 189 codes associated with the initial connection loss, but none afterward. Verified ,
#check hotfix is in place:
[stack@undercloud-0 ~]$ ansible overcloud -mshell -b -a'rpm -qa|grep pacemaker'
[WARNING]: Found both group and host with same name: undercloud
[WARNING]: Consider using the yum, dnf or zypper module rather than running rpm. If you need to use command because yum, dnf or zypper is insufficient you
can add warn=False to this command task or set command_warnings=False in ansible.cfg to get rid of this message.
compute-0 | SUCCESS | rc=0 >>
ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch
pacemaker-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-libs-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-remote-1.1.19-8.el7_6.5.0.1.x86_64
puppet-pacemaker-0.7.2-0.20180423212257.el7ost.noarch
pacemaker-cli-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-cluster-libs-1.1.19-8.el7_6.5.0.1.x86_64
compute-1 | SUCCESS | rc=0 >>
ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch
pacemaker-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-libs-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-remote-1.1.19-8.el7_6.5.0.1.x86_64
puppet-pacemaker-0.7.2-0.20180423212257.el7ost.noarch
pacemaker-cli-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-cluster-libs-1.1.19-8.el7_6.5.0.1.x86_64
controller-0 | SUCCESS | rc=0 >>
ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch
pacemaker-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-libs-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-remote-1.1.19-8.el7_6.5.0.1.x86_64
puppet-pacemaker-0.7.2-0.20180423212257.el7ost.noarch
pacemaker-cli-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-cluster-libs-1.1.19-8.el7_6.5.0.1.x86_64
controller-2 | SUCCESS | rc=0 >>
ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch
pacemaker-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-libs-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-remote-1.1.19-8.el7_6.5.0.1.x86_64
puppet-pacemaker-0.7.2-0.20180423212257.el7ost.noarch
pacemaker-cli-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-cluster-libs-1.1.19-8.el7_6.5.0.1.x86_64
controller-1 | SUCCESS | rc=0 >>
ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch
pacemaker-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-libs-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-remote-1.1.19-8.el7_6.5.0.1.x86_64
puppet-pacemaker-0.7.2-0.20180423212257.el7ost.noarch
pacemaker-cli-1.1.19-8.el7_6.5.0.1.x86_64
pacemaker-cluster-libs-1.1.19-8.el7_6.5.0.1.x86_64
#create resource with operations delay:
[root@compute-1 ~]# pcs config |grep -A 1 test_resource
Resource: test_resource (class=ocf provider=pacemaker type=Dummy)
Attributes: op_sleep=10
Operations: migrate_from interval=0s timeout=20 (test_resource-migrate_from-interval-0s)
migrate_to interval=0s timeout=20 (test_resource-migrate_to-interval-0s)
monitor interval=10 timeout=20 (test_resource-monitor-interval-10)
reload interval=0s timeout=20 (test_resource-reload-interval-0s)
start interval=0s timeout=20 (test_resource-start-interval-0s)
stop interval=0s timeout=20 (test_resource-stop-interval-0s)
--
Resource: test_resource
Enabled on: compute-1 (score:INFINITY) (id:location-test_resource-compute-1-INFINITY)
#test_resource is started on compute-1:
pcs status:
[...]
test_resource (ocf::pacemaker:Dummy): Started compute-1
#verification steps :
#current date :
Thu Jul 18 11:05:15 UTC 2019
#we do [root@compute-1 ~]# date && systemctl stop pacemaker_remote
#then before the cluster can respond we do :
[root@hac-01 ~]# virsh destroy compute-1
Domain compute-1 destroyed
#now in the logs we see the cluster responding with rc=189 , but only initially ...
Jul 18 11:10:26 [904700] controller-2 cib: info: cib_perform_op: + /cib: @num_updates=9
Jul 18 11:10:26 [904700] controller-2 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='compute-1']: <transient_attributes id="compute-1"/>
Jul 18 11:10:26 [904700] controller-2 cib: info: cib_perform_op: ++ <instance_attributes id="status-compute-1">
Jul 18 11:10:26 [904700] controller-2 cib: info: cib_perform_op: ++ <nvpair id="status-compute-1-shutdown" name="shutdown" value="1563448226"/>
Jul 18 11:11:18 [904700] controller-2 cib: info: cib_perform_op: + /cib/status/node_state[@id='compute-1']/lrm[@id='compute-1']/lrm_resources/lrm_resource[@id='test_resource']/lrm_rsc_op[@id='test_resource_last_0']: @transition-magic=4:189;197:38:0:5d309b52-17a3-44eb-b037-124924b092ac, @call-id=208, @rc-code=189, @op-status=4
Jul 18 11:11:18 [904703] controller-2 attrd: info: write_attribute: Sent update 24 with 1 changes for fail-count-compute-1#monitor_60000, id=<n/a>, set=(null)
Jul 18 11:11:18 [904700] controller-2 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='compute-1']/lrm[@id='compute-1']/lrm_resources/lrm_resource[@id='test_resource']: <lrm_rsc_op id="test_resource_last_failure_0" operation_key="test_resource_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="197:38:0:5d309b52-17a3-44eb-b037-124924b092ac" transition-magic="4:189;197:38:0:5d309b52-17a3-44eb-b037-124924b092ac" exit-reason="" on_node="
Jul 18 11:11:18 [904703] controller-2 attrd: info: attrd_peer_update: Setting last-failure-compute-1#monitor_60000[controller-0]: (null) -> 1563448278 from controller-2
Jul 18 11:11:18 [904703] controller-2 attrd: info: write_attribute: Sent update 25 with 1 changes for last-failure-compute-1#monitor_60000, id=<n/a>, set=(null)
Jul 18 11:11:18 [904700] controller-2 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=controller-0/crmd/231, version=0.98.12)
Jul 18 11:11:18 [904700] controller-2 cib: info: cib_process_request: Forwarding cib_modify operation for section status to all (origin=local/attrd/24)
Jul 18 11:11:18 [904700] controller-2 cib: info: cib_process_request: Forwarding cib_modify operation for section status to all (origin=local/attrd/25)
Jul 18 11:11:18 [904705] controller-2 crmd: warning: status_from_rc: Action 197 (test_resource_stop_0) on compute-1 failed (target: 0 vs. rc: 189): Error
Jul 18 11:11:18 [904705] controller-2 crmd: info: abort_transition_graph: Transition aborted by operation test_resource_stop_0 'modify' on controller-0: Event failed | magic=4:189;197:38:0:5d309b52-17a3-44eb-b037-124924b092ac cib=0.98.12 source=match_graph_event:308 complete=false
Jul 18 11:11:18 [904705] controller-2 crmd: info: match_graph_event: Action test_resource_stop_0 (197) confirmed on compute-1 (rc=189)
Jul 18 11:11:18 [904705] controller-2 crmd: info: update_failcount: Updating failcount for test_resource on compute-1 after failed stop: rc=189 (update=INFINITY, time=1563448278)
Jul 18 11:11:18 [904705] controller-2 crmd: info: process_graph_event: Detected action (38.197) test_resource_stop_0.208=unknown: failed
#after acouple of min there is no mention of rc=189 logged :
[root@controller-2 cluster]# date
Thu Jul 18 11:16:12 UTC 2019
[root@controller-2 cluster]# grep 'rc=189' corosync.log
Jul 18 11:11:18 [904705] controller-2 crmd: info: match_graph_event: Action test_resource_stop_0 (197) confirmed on compute-1 (rc=189)
Jul 18 11:11:18 [904705] controller-2 crmd: info: update_failcount: Updating failcount for test_resource on compute-1 after failed stop: rc=189 (update=INFINITY, time=1563448278)
Jul 18 11:11:18 [904704] controller-2 pengine: warning: unpack_rsc_op_failure: Processing failed stop of test_resource on compute-1: unknown | rc=189
Jul 18 11:11:18 [904704] controller-2 pengine: warning: unpack_rsc_op_failure: Processing failed stop of test_resource on compute-1: unknown | rc=189
Jul 18 11:11:18 [904704] controller-2 pengine: warning: unpack_rsc_op_failure: Processing failed stop of test_resource on compute-1: unknown | rc=189
Jul 18 11:11:18 [904704] controller-2 pengine: warning: unpack_rsc_op_failure: Processing failed stop of test_resource on compute-1: unknown | rc=189
Verified , details in https://bugzilla.redhat.com/show_bug.cgi?id=1704870#c27 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1032 |