1543278 – `crm_resource --wait` waits indefinitely after upgrade of one of the nodes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1543278 - `crm_resource --wait` waits indefinitely after upgrade of one of the nodes

Summary: `crm_resource --wait` waits indefinitely after upgrade of one of the nodes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	7.4-Alt
Assignee:	Ken Gaillot
QA Contact:	Patrik Hagara
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-08 06:50 UTC by Ondrej Faměra
Modified:	2021-12-10 15:38 UTC (History)
CC List:	6 users (show)
Fixed In Version:	pacemaker-1.1.18-12.el7
Doc Type:	No Doc Update
Doc Text:	Trivial change
Clone Of:
Environment:
Last Closed:	2018-10-30 07:57:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
pe-input from Master node (8.99 KB, application/x-gzip) 2018-02-09 04:03 UTC, Ondrej Faměra	no flags	Details
outputs from cluster with timestamps reproducing the issue (20.00 KB, text/plain) 2018-02-09 04:04 UTC, Ondrej Faměra	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3347761	0	None	None	None	2018-02-08 06:51:10 UTC
Red Hat Product Errata	RHBA-2018:3055	0	None	None	None	2018-10-30 07:59:04 UTC

Description Ondrej Faměra 2018-02-08 06:50:14 UTC

=== Description of problem:
`crm_resource --wait` waits indefinitely after one of the nodes in cluster was
updated from version 1.1.16-12.el7 or 1.1.16-12.el7_4.2 to newer.
Cluster must also contain fence_scsi and resource that has monitor operation with
'on-fail=fence' option. Check 'Steps to Reproduce' for more details.

=== Version-Release number of selected component (if applicable):
- 1.1.16-12.el7
- 1.1.16-12.el7_4.2
Affected when updating from versions above to versions below:
- 1.1.16-12.el7_4.7
- 1.1.16-12.el7_4.7
- 1.1.18-10.el7

=== How reproducible:
Always

=== Steps to Reproduce:
1. Install cluster with one of following pacemaker packages
  # yum install pacemaker-1.1.16-12.el7 pacemaker-cli-1.1.16-12.el7 pacemaker-cluster-libs-1.1.16-12.el7 pacemaker-libs-1.1.16-12.el7 pcs resouce-agents fence-agents-scsi
  # yum install pacemaker-1.1.16-12.el7_4.2 pacemaker-cli-1.1.16-12.el7_4.2 pacemaker-cluster-libs-1.1.16-12.el7_4.2 pacemaker-libs-1.1.16-12.el7_4.2 pcs resouce-agents fence-agents-scsi
2. Create cluster and start cluster
  # pcs cluster setup --name='justtest' fastvm-rhel-7-4-113 fastvm-rhel-7-4-114
  # pcs cluster start --all
3. Configure fence_scsi and one resource with 'on-fail=fence' (controld resource is used in example below. /dev/sdb is shared drive between nodes)
  # pcs stonith create fence_scsi fence_scsi devices=/dev/sdb pcmk_host_list="fastvm-rhel-7-4-113 fastvm-rhel-7-4-114" pcmk_reboot_action=off pcmk_monitor_action=metadata meta provides="unfencing"
  # pcs resource create dlm controld op monitor interval=30s on-fail=fence meta interleave=true ordered=true --clone
4. Wait for cluster to get stable (`crm_resource --wait` should return within few seconds)
5. Stop one of the nodes
  # pcs cluster stop
6. Update pacemaker packages on this node to newer ones mentioned earlier (or any latest packages)
  # yum update pacemaker pacemaker-cli pacemaker-cluster-libs pacemaker-libs
7. Start the cluster on this node after upgrade and let it rejoin
  # pcs cluster start
8. Use the `crm_resource --wait` to wait for cluster to stabilize

=== Actual results:
`crm_resource --wait` waits indefinitely (or until the timeout specified (defaults to 1 hour)).
Output looks similar to following:

    info: determine_online_status_fencing:      Node fastvm-rhel-7-4-113 is active
    info: determine_online_status:      Node fastvm-rhel-7-4-113 is online
    info: determine_online_status_fencing:      Node fastvm-rhel-7-4-114 is active
    info: determine_online_status:      Node fastvm-rhel-7-4-114 is online
    info: unpack_node_loop:     Node 1 is already processed
    info: unpack_node_loop:     Node 2 is already processed
    info: unpack_node_loop:     Node 1 is already processed
    info: unpack_node_loop:     Node 2 is already processed
    info: common_print: fence_scsi      (stonith:fence_scsi):   Started fastvm-rhel-7-4-113
    info: clone_print:   Clone Set: dlm-clone [dlm]
    info: short_print:       Started: [ fastvm-rhel-7-4-113 fastvm-rhel-7-4-114 ]
    info: common_print: test    (ocf::pacemaker:Dummy): Started fastvm-rhel-7-4-114
   error: StopRsc:      Stopping dlm:0 until fastvm-rhel-7-4-113 can be unfenced
   error: StopRsc:      Stopping dlm:1 until fastvm-rhel-7-4-114 can be unfenced
   error: StopRsc:      Stopping test until fastvm-rhel-7-4-114 can be unfenced
  notice: LogNodeActions:        * Fence (on) fastvm-rhel-7-4-114 'Required by dlm:1'
  notice: LogNodeActions:        * Fence (on) fastvm-rhel-7-4-113 'Required by dlm:0'
    info: LogActions:   Leave   fence_scsi      (Started fastvm-rhel-7-4-113)
  notice: LogAction:     * Restart    dlm:0   ( fastvm-rhel-7-4-113 )   due to required stonith
  notice: LogAction:     * Restart    dlm:1   ( fastvm-rhel-7-4-114 )   due to required stonith
  notice: LogAction:     * Restart    test    ( fastvm-rhel-7-4-114 )   due to required stonith
    info: wait_till_stable:     Waiting up to 3596 seconds for cluster actions to complete
  ...

=== Expected results:
`crm_resource --wait` would finish within few seconds which would indicate that cluster is in stable state.
  # crm_resource --wait -VVVV
    info: wait_till_stable:     Waiting up to 3600 seconds for cluster actions to complete
    info: determine_online_status_fencing:      Node fastvm-rhel-7-4-113 is active
    info: determine_online_status:      Node fastvm-rhel-7-4-113 is online
    info: determine_online_status_fencing:      Node fastvm-rhel-7-4-114 is active
    info: determine_online_status:      Node fastvm-rhel-7-4-114 is online
    info: unpack_node_loop:     Node 1 is already processed
    info: unpack_node_loop:     Node 2 is already processed
    info: unpack_node_loop:     Node 1 is already processed
    info: unpack_node_loop:     Node 2 is already processed
    info: common_print: fence_scsi      (stonith:fence_scsi):   Started fastvm-rhel-7-4-113
    info: clone_print:   Clone Set: dlm-clone [dlm]
    info: short_print:       Started: [ fastvm-rhel-7-4-113 fastvm-rhel-7-4-114 ]
  notice: LogNodeActions:        * Fence fastvm-rhel-7-4-114
  notice: LogNodeActions:        * Fence fastvm-rhel-7-4-113
    info: LogActions:   Leave   fence_scsi      (Started fastvm-rhel-7-4-113)
    info: LogActions:   Leave   dlm:0   (Started fastvm-rhel-7-4-113)
    info: LogActions:   Leave   dlm:1   (Started fastvm-rhel-7-4-114)
    info: crm_xml_cleanup:      Cleaning up memory from libxml2

=== Additional info:
Issue doesn't appear without resource with 'on-fail=fence'.
Issue doesn't appear if fence_scsi is not used (even if there is 'on-fail=fence').
Both above approaches can be used to workaround the problem.
This doesn't appear when upgrading from 1.1.16-12.el7_4.4 to newer or when
upgrading from 1.1.15-11.el7_3.5 to 1.1.16-12.el7.
Issue also doesn't appear when testing upgrading between 1.1.18-8.el7 and 1.1.18-10.el7 in RHEL 7.5 beta.

Comment 2 Ken Gaillot 2018-02-08 15:30:12 UTC

Can you attach a policy engine file from after the upgrade? One of the older nodes will have messages from around that time like "Calculated transition ..., saving inputs in ..." with the filename. Any such file during the time the wait was hanging should be sufficient.

Comment 3 Ondrej Faměra 2018-02-09 04:03:34 UTC

Created attachment 1393523 [details]
pe-input from Master node

'pengine-node113.tar.gz' archive with pe-input* files from node 113

Comment 4 Ondrej Faměra 2018-02-09 04:04:40 UTC

Created attachment 1393524 [details]
outputs from cluster with timestamps reproducing the issue

Comment 5 Reid Wahl 2018-02-12 00:46:18 UTC

As noted earlier in the case: in pacemaker-1.1.16-12.el7_4.4.src.rpm, file 089-unfencing.patch, "[PATCH 02/12] Support unfencing of remote nodes", I see some changes made to the StopRsc and StartRsc functions in native.c that may be relevant. Let me know if I'm barking up the wrong tree. For example:

diff --git a/pengine/native.c b/pengine/native.c
index dd5ff18..bc59405 100644
--- a/pengine/native.c
+++ b/pengine/native.c
@@ -1355,7 +1355,7 @@ native_internal_constraints(resource_t * rsc, pe_working_set_t * data_set)

         g_hash_table_iter_init(&iter, rsc->allowed_nodes);
         while (g_hash_table_iter_next(&iter, NULL, (void **)&node)) {
-            action_t *unfence = pe_fence_op(node, "on", TRUE, data_set);
+            action_t *unfence = pe_fence_op(node, "on", TRUE, __FUNCTION__, data_set);

             crm_debug("Ordering any stops of %s before %s, and any starts after",
                       rsc->id, unfence->uuid);
@@ -2455,6 +2455,16 @@ StopRsc(resource_t * rsc, node_t * next, gboolean optional, pe_working_set_t * d
         if (is_set(data_set->flags, pe_flag_remove_after_stop)) {
             DeleteRsc(rsc, current, optional, data_set);
         }
+
+        if(is_set(rsc->flags, pe_rsc_needs_unfencing)) {
+            action_t *unfence = pe_fence_op(current, "on", TRUE, __FUNCTION__, data_set);
+            const char *unfenced = g_hash_table_lookup(current->details->attrs, XML_NODE_IS_UNFENCED);
+
+            order_actions(stop, unfence, pe_order_implies_first);
+            if (unfenced == NULL || safe_str_eq("0", unfenced)) {
+                pe_proc_err("Stopping %s until %s can be unfenced", rsc->id, current->details->uname);
+            }
+        }
     }

     return TRUE;
@@ -2468,9 +2478,25 @@ StartRsc(resource_t * rsc, node_t * next, gboolean optional, pe_working_set_t *
     CRM_ASSERT(rsc);
     pe_rsc_trace(rsc, "%s on %s %d", rsc->id, next ? next->details->uname : "N/A", optional);
     start = start_action(rsc, next, TRUE);
+
+    if(is_set(rsc->flags, pe_rsc_needs_unfencing)) {
+        action_t *unfence = pe_fence_op(next, "on", TRUE, __FUNCTION__, data_set);
+        const char *unfenced = g_hash_table_lookup(next->details->attrs, XML_NODE_IS_UNFENCED);
+
+        order_actions(unfence, start, pe_order_implies_then);
+
+        if (unfenced == NULL || safe_str_eq("0", unfenced)) {
+            char *reason = crm_strdup_printf("Required by %s", rsc->id);
+            trigger_unfencing(NULL, next, reason, NULL, data_set);
+            free(reason);
+        }
+    }
+
     if (is_set(start->flags, pe_action_runnable) && optional == FALSE) {
         update_action_flags(start, pe_action_optional | pe_action_clear, __FUNCTION__, __LINE__);
     }
+
+
     return TRUE;
 }

@@ -2820,14 +2846,9 @@ native_create_probe(resource_t * rsc, node_t * node, action_t * complete,
      * probed, we know the state of all resources that require
      * unfencing and that unfencing occurred.
      */
-    if(is_set(rsc->flags, pe_rsc_fence_device) && is_set(data_set->flags, pe_flag_enable_unfencing)) {
-        trigger_unfencing(NULL, node, "node discovery", probe, data_set);
-        probe->priority = INFINITY; /* Ensure this runs if unfencing succeeds */
-
-    } else if(is_set(rsc->flags, pe_rsc_needs_unfencing)) {
-        action_t *unfence = pe_fence_op(node, "on", TRUE, data_set);
-
-        order_actions(probe, unfence, pe_order_optional);
+    if(is_set(rsc->flags, pe_rsc_needs_unfencing)) {
+        action_t *unfence = pe_fence_op(node, "on", TRUE, __FUNCTION__, data_set);
+        order_actions(unfence, probe, pe_order_optional);
     }

     /*

Comment 6 Ken Gaillot 2018-02-13 00:19:53 UTC

(In reply to Reid Wahl from comment #5)
> As noted earlier in the case: in pacemaker-1.1.16-12.el7_4.4.src.rpm, file
> 089-unfencing.patch, "[PATCH 02/12] Support unfencing of remote nodes", I
> see some changes made to the StopRsc and StartRsc functions in native.c that
> may be relevant. Let me know if I'm barking up the wrong tree.

It's indirectly relevant.

Pacemaker's "policy engine" is the code that decides what (if anything) needs to be done at the moment. The node with the longest uptime (always the older node in the attached policy engine files) will be elected the "DC", which runs the policy engine as needed.

"crm_resource --wait" also runs the policy engine on the current configuration, and returns once no more actions are seen to be necessary.

Based on the output in the Description, "crm_resource --wait" is being run on the upgraded node.

The older policy engine (run by the DC to determine actual actions) is generating a different list of required actions than the newer policy engine run by crm_resource. Thus, crm_resource is waiting for additional actions that the older DC will never do.

This raises an interesting (and so far unaddressed) question of whether "crm_resource --wait" can be supported on nodes with a different pacemaker version than the DC. It's likely impossible to guarantee any reliability of that.

I think the best we can do is make "crm_resource --wait" print a warning about the possibility if it is run in that situation. We can also update any documentation around upgrades to mention the issue.

Separately, I'm not sure the actions chosen by the newer version are ideal. I'm going to take a closer look at that as well.

Comment 7 Ken Gaillot 2018-02-13 00:43:05 UTC

A possible full solution would be to reimplement the crm_resource --wait code as a new crmd operation, which the local crmd would pass to the DC. That's obviously a bigger project though, and could have undesirable side effects (additional load on the DC, special handling needed if the DC election changes during the wait, etc.), so I'm not sure how feasible it is.

Comment 8 Ken Gaillot 2018-02-13 01:17:22 UTC

(In reply to Ken Gaillot from comment #6)
> Separately, I'm not sure the actions chosen by the newer version are ideal.
> I'm going to take a closer look at that as well.

The new actions are fine. The newer code utilizes a new node attribute to record the time of last unfencing. Since the configuration doesn't have that attribute, it will decide to unfence both nodes. That would be an expected side effect of an upgrade across those versions.

Comment 10 Ken Gaillot 2018-03-02 23:16:14 UTC

Warning implemented by upstream commit 8978473

Comment 11 Ken Gaillot 2018-04-24 00:00:24 UTC

QA: Test procedure:

* Configure a cluster with at least two nodes using 7.4 pacemaker packages.
* Upgrade pacemaker on one of the nodes (to 7.5 before fix, 7.6 after fix).
* Run "crm_resource --wait" on the upgraded node.

The 7.5 pacemaker packages will not give any indication that anything is wrong. The 7.6 packages will print a warning about mixed-version clusters.

Comment 14 Patrik Hagara 2018-08-20 13:04:47 UTC

before:
=======

Both cluster nodes running version 1.1.16-12.el7, one node upgraded to 1.1.18-10.el7 (7.5 packages).

> [root@virt-253 ~]# pcs status
> Cluster name: bzzt
> Stack: corosync
> Current DC: virt-254.cluster-qe.lab.eng.brq.redhat.com (version 1.1.16-12.el7-94ff4df) - partition with quorum
> Last updated: Mon Aug 20 14:40:07 2018
> Last change: Mon Aug 20 14:40:04 2018 by root via cibadmin on virt-253.cluster-qe.lab.eng.brq.redhat.com
> 
> 2 nodes configured
> 1 resource configured
> 
> Online: [ virt-253.cluster-qe.lab.eng.brq.redhat.com virt-254.cluster-qe.lab.eng.brq.redhat.com ]
> 
> Full list of resources:
> 
>  dummy	(ocf::pacemaker:Dummy):	Started virt-253.cluster-qe.lab.eng.brq.redhat.com
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> [root@virt-253 ~]# pcs node standby virt-254.cluster-qe.lab.eng.brq.redhat.com --wait
> [root@virt-254 ~]# yum install -y /mnt/redhat/brewroot/packages/pacemaker/1.1.18/10.el7/x86_64/pacemaker{,-cli,-cluster-libs,-libs}-1.1.18-10.el7*.rpm
> [root@virt-253 ~]# pcs node unstandby virt-254.cluster-qe.lab.eng.brq.redhat.com --wait
> [root@virt-254 ~]# crm_resource --wait
> [root@virt-254 ~]# echo $?
> 0

After upgrading one of the nodes to 7.5 pacemaker packages, "crm_resource --wait" does not print any warning when invoked from the upgraded node. Depending on the cluster configuration, the command may exit cleanly or wait for a cluster transition that will never happen.


after:
======

Both cluster nodes running version 1.1.16-12.el7, one node upgraded to 1.1.19-7.el7 (7.6 packages).

> [root@virt-253 ~]# pcs node standby virt-254.cluster-qe.lab.eng.brq.redhat.com --wait
> [root@virt-254 ~]# yum install -y /mnt/redhat/brewroot/packages/pacemaker/1.1.19/7.el7/x86_64/pacemaker{,-cli,-cluster-libs,-libs}-1.1.19-7.el7*.rpm
> [root@virt-253 ~]# pcs node unstandby virt-254.cluster-qe.lab.eng.brq.redhat.com --wait
> [root@virt-254 ~]# crm_resource --wait
> warning: --wait command may not work properly in mixed-version cluster
> [root@virt-254 ~]# echo $?
> 0

With 7.6 packages, the "crm_resource --wait" command works the same way as before (ie. might wait indefinitely) but a new warning is printed upon every invocation.

Marking verified in 1.1.19-7.el7.

Comment 16 errata-xmlrpc 2018-10-30 07:57:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3055

Note You need to log in before you can comment on or make changes to this bug.