Bug 1570130
Summary: | Pending notify actions are recorded as complete | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ken Gaillot <kgaillot> | ||||
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | ||||
Status: | CLOSED ERRATA | QA Contact: | Patrik Hagara <phagara> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 7.5 | CC: | abeekhof, aherr, amit.banerjee, chjones, cluster-maint, dkinkead, mnovacek, phagara | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | 7.6 | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | pacemaker-1.1.19-2.el7 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: In order to ignore the result of clone notify actions, Pacemaker always recorded them as successful.
Consequence: When record-pending is true (which is the default as of 7.5), pending clone notifications would appear to be successfully completed, and dependent actions could proceed before the notify action had completed.
Fix: Pacemaker only records completed, not pending, clone notify actions as successful.
Result: The correct ordering of actions is preserved even if record-pending is true.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1570618 (view as bug list) | Environment: | |||||
Last Closed: | 2018-10-30 07:57:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1570618 | ||||||
Attachments: |
|
Description
Ken Gaillot
2018-04-20 16:50:43 UTC
Created attachment 1424619 [details]
Newer version of ocf:pacemaker:Stateful that supports a notify delay
fixed upstream by commit 8e9a9d1a The original fix for this turned out to be incomplete, so the latest build incorporates a complete fix. The original fix, recording pending clone notifications as pending in the CIB, solved the immediate issue, but left the pending entries semi-permanently in the CIB, because the cluster does not record clone notification completions in the CIB. The latest build avoids recording pending clone notifications in the CIB at all. QA: The reproducer does not change. If desired, you can check that the live CIB has no lrm_rsc_op entries with operation="notify", neither while the notification is in progress nor when it has completed. before: ======= > [root@virt-264 ~]# rpm -q pacemaker > pacemaker-1.1.18-11.el7.x86_64 > [root@virt-264 ~]# wget -qO /usr/lib/ocf/resource.d/pacemaker/notify-test https://bugzilla.redhat.com/attachment.cgi?id=1424619 > [root@virt-264 ~]# chmod 755 /usr/lib/ocf/resource.d/pacemaker/notify-test > [root@virt-264 ~]# scp /usr/lib/ocf/resource.d/pacemaker/notify-test virt-265:/usr/lib/ocf/resource.d/pacemaker/ > [root@virt-264 ~]# scp /usr/lib/ocf/resource.d/pacemaker/notify-test virt-266:/usr/lib/ocf/resource.d/pacemaker/ > [root@virt-264 ~]# pcs resource create bz1570130 ocf:pacemaker:notify-test notify_delay=4 > Warning: changing a monitor operation interval from 10 to 11 to make the operation unique > [root@virt-264 ~]# pcs resource clone bz1570130 notify=true > [root@virt-264 ~]# pcs status > Cluster name: bzzt > Stack: corosync > Current DC: virt-265.cluster-qe.lab.eng.brq.redhat.com (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum > Last updated: Tue Jul 31 14:58:04 2018 > Last change: Tue Jul 31 14:58:00 2018 by root via cibadmin on virt-264.cluster-qe.lab.eng.brq.redhat.com > > 3 nodes configured > 3 resources configured > > Online: [ virt-264.cluster-qe.lab.eng.brq.redhat.com virt-265.cluster-qe.lab.eng.brq.redhat.com virt-266.cluster-qe.lab.eng.brq.redhat.com ] > > Full list of resources: > > Clone Set: bz1570130-clone [bz1570130] > Started: [ virt-264.cluster-qe.lab.eng.brq.redhat.com virt-265.cluster-qe.lab.eng.brq.redhat.com virt-266.cluster-qe.lab.eng.brq.redhat.com ] > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > [root@virt-264 ~]# grep "Old event" /var/log/cluster/corosync.log > [root@virt-264 ~]# echo $? > 1 > [root@virt-264 ~]# ssh virt-265 'grep "Old event" /var/log/cluster/corosync.log' > Jul 31 14:58:04 [15956] virt-265.cluster-qe.lab.eng.brq.redhat.com crmd: info: abort_transition_graph: Transition aborted by lrm_rsc_op.bz1570130_pre_notify_start_0: Old event | cib=0.0.0 source=process_graph_event:498 path=/create_request_adv/crm_xml/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='bz1570130']/lrm_rsc_op[@id='bz1570130_pre_notify_start_0'] complete=true > Jul 31 14:58:04 [15956] virt-265.cluster-qe.lab.eng.brq.redhat.com crmd: info: abort_transition_graph: Transition aborted by lrm_rsc_op.bz1570130_post_notify_start_0: Old event | cib=0.0.0 source=process_graph_event:498 path=/create_request_adv/crm_xml/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='bz1570130']/lrm_rsc_op[@id='bz1570130_post_notify_start_0'] complete=true > Jul 31 14:58:04 [15956] virt-265.cluster-qe.lab.eng.brq.redhat.com crmd: info: abort_transition_graph: Transition aborted by lrm_rsc_op.bz1570130_post_notify_start_0: Old event | cib=0.0.0 source=process_graph_event:498 path=/create_request_adv/crm_xml/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='bz1570130']/lrm_rsc_op[@id='bz1570130_post_notify_start_0'] complete=true > Jul 31 14:58:08 [15956] virt-265.cluster-qe.lab.eng.brq.redhat.com crmd: info: abort_transition_graph: Transition aborted by lrm_rsc_op.bz1570130_post_notify_start_0: Old event | cib=0.0.0 source=process_graph_event:498 path=/create_request_adv/crm_xml/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='bz1570130']/lrm_rsc_op[@id='bz1570130_post_notify_start_0'] complete=true > [root@virt-264 ~]# echo $? > 0 > [root@virt-264 ~]# ssh virt-266 'grep "Old event" /var/log/cluster/corosync.log' > [root@virt-264 ~]# echo $? > 1 > [root@virt-264 ~]# pcs cluster cib | gxpp -x '/cib/status/node_state/lrm/lrm_resources/lrm_resource[@id="bz1570130"]/lrm_rsc_op[@operation="notify"]' | wc -l > 4 Before the fix, the post-start notify transition is aborted on the DC with an "Old event" message in its logs. Also, live CIB contains notify operations for the resource. after: ====== > [root@virt-265 ~]# rpm -q pacemaker > pacemaker-1.1.19-4.el7.x86_64 > [root@virt-265 ~]# wget -qO /usr/lib/ocf/resource.d/pacemaker/notify-test https://bugzilla.redhat.com/attachment.cgi?id=1424619 > [root@virt-265 ~]# chmod 755 /usr/lib/ocf/resource.d/pacemaker/notify-test > [root@virt-265 ~]# scp /usr/lib/ocf/resource.d/pacemaker/notify-test virt-266:/usr/lib/ocf/resource.d/pacemaker/ > [root@virt-265 ~]# scp /usr/lib/ocf/resource.d/pacemaker/notify-test virt-267:/usr/lib/ocf/resource.d/pacemaker/ > [root@virt-265 ~]# pcs resource create bz1570130 ocf:pacemaker:notify-test notify_delay=4 > Warning: changing a monitor operation interval from 10 to 11 to make the operation unique > [root@virt-265 ~]# pcs resource clone bz1570130 notify=true > [root@virt-265 ~]# pcs status > Cluster name: bzzt > Stack: corosync > Current DC: virt-266.cluster-qe.lab.eng.brq.redhat.com (version 1.1.19-4.el7-c3c624ea3d) - partition with quorum > Last updated: Wed Aug 1 11:30:23 2018 > Last change: Wed Aug 1 11:30:17 2018 by root via cibadmin on virt-265.cluster-qe.lab.eng.brq.redhat.com > > 3 nodes configured > 3 resources configured > > Online: [ virt-265.cluster-qe.lab.eng.brq.redhat.com virt-266.cluster-qe.lab.eng.brq.redhat.com virt-267.cluster-qe.lab.eng.brq.redhat.com ] > > Full list of resources: > > Clone Set: bz1570130-clone [bz1570130] > Started: [ virt-265.cluster-qe.lab.eng.brq.redhat.com virt-266.cluster-qe.lab.eng.brq.redhat.com virt-267.cluster-qe.lab.eng.brq.redhat.com ] > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > [root@virt-265 ~]# grep "Old event" /var/log/cluster/corosync.log > [root@virt-265 ~]# echo $? > 1 > [root@virt-265 ~]# ssh virt-266 'grep "Old event" /var/log/cluster/corosync.log' > [root@virt-265 ~]# echo $? > 1 > [root@virt-265 ~]# ssh virt-267 'grep "Old event" /var/log/cluster/corosync.log' > [root@virt-265 ~]# echo $? > 1 > [root@virt-265 ~]# pcs cluster cib | gxpp -x '/cib/status/node_state/lrm/lrm_resources/lrm_resource[@id="bz1570130"]/lrm_rsc_op[@operation="notify"]' | wc -l > 0 After the fix, there is no "Old event" log message on any of the cluster nodes and no notify operation in the CIB. Marking verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3055 |