Bug 2125344
Summary: | During a rolling upgrade, monitor operations are not being communicated between nodes as expected. | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Ken Gaillot <kgaillot> | |
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | urgent | Docs Contact: | Steven J. Levine <slevine> | |
Priority: | urgent | |||
Version: | 9.1 | CC: | cfeist, cluster-maint, cluster-qe, cnewsom, jobaker, jrehova, mjuricek, nwahl, sbradley, slevine | |
Target Milestone: | rc | Keywords: | Regression, Triaged, ZStream | |
Target Release: | 9.2 | Flags: | pm-rhel:
mirror+
|
|
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | pacemaker-2.1.5-1.el9 | Doc Type: | Bug Fix | |
Doc Text: |
.OCF resource agent metadata actions can now call `crm_node` without causing unexpected fencing
As of RHEL 8.5, OCF resource agent metadata actions blocked the controller and `crm_node` queries performed controller requests. As a result, if an agent's metadata action called `crm_node`, it blocked the controller for 30 seconds until the action timed out. This could cause other actions to fail and the node to be fenced.
With this fix, the controller now performs metadata actions asynchronously. An OCF resource agent metadata action can now call `crm_node` without issue.
|
Story Points: | --- | |
Clone Of: | 2121852 | |||
: | 2128036 2128037 (view as bug list) | Environment: | ||
Last Closed: | 2023-05-09 07:18:17 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | 2.1.5 | |
Embargoed: | ||||
Bug Depends On: | 2121852 | |||
Bug Blocks: | 2128036, 2128037 |
Description
Ken Gaillot
2022-09-08 16:53:15 UTC
QA: Reproducer: * Configure a cluster. * Edit or copy /usr/lib/ocf/resource.d/pacemaker/Dummy and add "crm_node -l" to the metadata action. * Create a resource using Dummy (or the copy). Actual results: The cluster is hung for 30 seconds and then the controller logs a metadata action timeout. Expected results: The cluster proceeds normally. Fixed in upstream main branch as of commit bc852fe3 * 2 nodes cluster Version of pacemaker: > [root@virt-261 ~]# rpm -q pacemaker > pacemaker-2.1.5-2.el9.x86_64 Edited /usr/lib/ocf/resource.d/pacemaker/Dummy --> add "crm_node -l >/dev/null 2>&1": > [root@virt-261 pacemaker]# vim Dummy > [root@virt-261 pacemaker]# grep -A1 '^meta_data()' /usr/lib/ocf/resource.d/pacemaker/Dummy > meta_data() { > crm_node -l >/dev/null 2>&1 Created pcs resource: > [root@virt-261 log]# date && pcs resource create rsc_dummy ocf:pacemaker:Dummy > Mon Dec 5 04:33:04 PM CET 2022 Result --> the cluster proceeds normally (/var/log/messages): > [root@virt-261 log]# grep -A2 'rsc_dummy' ./messages > Dec 5 16:33:06 virt-261 pacemaker-controld[56224]: notice: Requesting local execution of probe operation for rsc_dummy on virt-261 > Dec 5 16:33:06 virt-261 pacemaker-controld[56224]: notice: Result of probe operation for rsc_dummy on virt-261: not running > Dec 5 16:33:06 virt-261 pacemaker-controld[56224]: notice: Requesting local execution of start operation for rsc_dummy on virt-261 > Dec 5 16:33:06 virt-261 pacemaker-controld[56224]: notice: Result of start operation for rsc_dummy on virt-261: ok > Dec 5 16:33:06 virt-261 pacemaker-controld[56224]: notice: Requesting local execution of monitor operation for rsc_dummy on virt-261 > Dec 5 16:33:06 virt-261 pacemaker-controld[56224]: notice: Result of monitor operation for rsc_dummy on virt-261: ok > [root@virt-261 log]# pcs resource > * rsc_dummy (ocf:pacemaker:Dummy): Started virt-261 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2150 |