2121852 – During a rolling upgrade, monitor operations are not being communicated between nodes as expected.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2121852 - During a rolling upgrade, monitor operations are not being communicated between nodes as expected.

Summary: During a rolling upgrade, monitor operations are not being communicated betwe...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	8.6
Hardware:	Unspecified
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	8.8
Assignee:	Ken Gaillot
QA Contact:	cluster-qe@redhat.com
Docs Contact:	Steven J. Levine
URL:
Whiteboard:
Depends On:
Blocks:	2125344 2125587 2125588
TreeView+	depends on / blocked

Reported:	2022-08-26 20:39 UTC by Chad Newsom
Modified:	2023-05-16 09:53 UTC (History)
CC List:	9 users (show)
Fixed In Version:	pacemaker-2.1.5-1.el8
Doc Type:	Bug Fix
Doc Text:	.OCF resource agent metadata actions can now call `crm_node` without causing unexpected fencing As of RHEL 8.5, OCF resource agent metadata actions blocked the controller and `crm_node` queries performed controller requests. As a result, if an agent's metadata action called `crm_node`, it blocked the controller for 30 seconds until the action timed out. This could cause other actions to fail and the node to be fenced. With this fix, the controller now performs metadata actions asynchronously. An OCF resource agent metadata action can now call `crm_node` without issue.
Clone Of:
Clones:	2125344 2125587 2125588 (view as bug list)
Environment:
Last Closed:	2023-05-16 08:35:22 UTC
Type:	Bug
Target Upstream Version:	2.1.5
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CLUSTERQE-6257	None	None	None	2022-12-01 17:11:42 UTC
Red Hat Issue Tracker	RHELPLAN-132499	None	None	None	2022-08-26 20:46:28 UTC
Red Hat Knowledge Base (Solution)	6973540	None	None	None	2022-08-30 18:39:23 UTC
Red Hat Product Errata	RHBA-2023:2818	None	None	None	2023-05-16 08:36:10 UTC

Description Chad Newsom 2022-08-26 20:39:40 UTC

Description of problem:

During a rolling upgrade, when the updated node rejoins, the non-updated node logs `Node nodeA did not send monitor result (via controller) within 120000ms (action timeout plus cluster-delay)` this results in monitor fail > stop fail > fence.


Aug 26 14:32:35 nodeB pacemaker-controld[50166]: error: Node nodeA did not send monitor result (via controller) within 120000ms (action timeout plus cluster-delay)
Aug 26 14:32:35 nodeB pacemaker-controld[50166]: error: [Action   55]: In-flight resource op lsnr_pruledbprd_monitor_0    on nodeA (priority: 0, waiting: (null))
Aug 26 14:32:35 nodeB pacemaker-controld[50166]: warning: rsc_op 55: lsnr_pruledbprd_monitor_0 on nodeA timed out



It seems that there is an issue in which nodeB is not receiving monitor results from nodeA within and expected timeframe, which is causing resource failures that are leading to fence events. This appears to be a bug. Based on the timing of the log messages. Perhaps there is a difference in how monitor operations are scheduled between pacemaker-2.1.0-8.el8 and pacemaker-2.1.2-4.el8_6.2.

For example, at 14:30:35, node B initiates probes of all the resources on node A. That's when the timer starts counting down. But node A runs them serially. For example, for this 30-second window, it's only running ONE of the probes:

Aug 26 14:30:35 nodeA pacemaker-controld[47747]: notice: Requesting local execution of probe operation for srdf_pruledb_PRD on nodeA
Aug 26 14:31:05 nodeA pacemaker-controld[47747]: error: Failed to retrieve meta-data for ocf:sky:emc_srdf

It fails to get metadata for several of these resources, and each of them takes 30 seconds to time out. So it never even tries to run the lsnr probe (the resource that ultimately caused the failure message on node B) within the 120 seconds allowed for node B to receive the result 'action timeout plus cluster delay' means the 60-second action timeout plus an additional 60-second (by default) grace period that we have to receive the result after the timeout expires == 120 seconds



Version-Release number of selected component (if applicable):

Node not receiving the monitor result is on pacemaker-2.1.0-8.el8

Updated node is on pacemaker-2.1.2-4.el8_6.2

Comment 1 Reid Wahl 2022-08-27 00:00:11 UTC

There are two issues being reported here:
1. The probe operations are initiated in parallel by the DC (running pacemaker-2.1.0-8.el8), but they appear to be executed in a roughly serial manner on node A (looks like two at a time).
2. The metadata actions are timing out, which should not happen.

More of the relevant logs are below. Each failed metadata call times out after the hard-coded 30-second timeout. I omitted the "timed out after 30000 ms" messages below, but they're in pacemaker.log
```
# # # Node B (DC, pacemaker-2.1.0) # # #
# The DC (node B) initiates probes for all resources on node A
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation nodeA-ilo_monitor_0 on nodeA | action 28
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation nodeB-ilo_monitor_0 on nodeA | action 29
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation srdf_pruledb_PRD_monitor_0 on nodeA | action 30
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation srdf_pruledbarc_PRD_monitor_0 on nodeA | action 31
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation srdf_pruledbredo_PRD_monitor_0 on nodeA | action 32
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation srdf_pruledbfra_PRD_monitor_0 on nodeA | action 33
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation srdf_pruledbgg_PRD_monitor_0 on nodeA | action 34
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation vg_pruledbredo_PRD_monitor_0 on nodeA | action 35
Aug 26 14:30:35 nodeB pacemaker-controld  [50166] (te_rsc_command)   notice: Initiating monitor operation vg_pruledbarc_PRD_monitor_0 on nodeA | action 36
... and so on ...

# We quickly start seeing lost actions. This means node B hasn't received any result (success or failure) from node A
# within (monitor timeout + 60-second cluster delay).
Aug 26 14:31:55 nodeB pacemaker-controld  [50166] (action_timer_callback)    error: Node nodeA did not send monitor result (via controller) within 80000ms (action timeout plus cluster-delay)
Aug 26 14:31:55 nodeB pacemaker-controld  [50166] (log_synapse_action)       error: [Action   30]: In-flight resource op srdf_pruledb_PRD_monitor_0   on nodeA (priority: 0, waiting: (null))
Aug 26 14:31:55 nodeB pacemaker-controld  [50166] (abort_transition_graph)   notice: Transition 6 aborted: Action lost | source=action_timer_callback:690 complete=false
Aug 26 14:31:55 nodeB pacemaker-controld  [50166] (controld_record_action_timeout)   warning: rsc_op 30: srdf_pruledb_PRD_monitor_0 on nodeA timed out

# The problem report refers to this one:
Aug 26 14:32:35 nodeB pacemaker-controld  [50166] (action_timer_callback)    error: Node nodeA did not send monitor result (via controller) within 120000ms (action timeout plus cluster-delay)
Aug 26 14:32:35 nodeB pacemaker-controld  [50166] (log_synapse_action)       error: [Action   55]: In-flight resource op lsnr_pruledbprd_monitor_0    on nodeA (priority: 0, waiting: (null))
Aug 26 14:32:35 nodeB pacemaker-controld  [50166] (abort_transition_graph)   info: Transition 6 aborted: Action lost | source=action_timer_callback:690 complete=false
Aug 26 14:32:35 nodeB pacemaker-controld  [50166] (controld_record_action_timeout)   warning: rsc_op 55: lsnr_pruledbprd_monitor_0 on nodeA timed out


# # # Node A (pacemaker-2.1.2) # # #
# probe running for srdf_pruledb_PRD
Aug 26 14:30:35 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for srdf_pruledb_PRD on nodeA | transition_key=30:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=srdf_pruledb_PRD_monitor_0

Aug 26 14:31:05 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledb_PRD (ocf:sky:emc_srdf): Input/output error

# probes running for srdf_pruledb_PRD and srdf_pruledbarc_PRD
Aug 26 14:31:05 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for srdf_pruledbarc_PRD on nodeA | transition_key=31:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=srdf_pruledbarc_PRD_monitor_0

Aug 26 14:31:35 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledbarc_PRD (ocf:sky:emc_srdf): Input/output error

Aug 26 14:32:05 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledb_PRD (ocf:sky:emc_srdf): Input/output error

Aug 26 14:32:35 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledbarc_PRD (ocf:sky:emc_srdf): Input/output error

# It seems we give up on srdf_pruledb_PRD and srdf_pruledbarc_PRD; we never see them again
# probe running for srdf_pruledbredo_PRD
Aug 26 14:32:35 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for srdf_pruledbredo_PRD on nodeA | transition_key=32:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=srdf_pruledbredo_PRD_monitor_0

Aug 26 14:33:05 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledbredo_PRD (ocf:sky:emc_srdf): Input/output error

# probes running for srdf_pruledbredo_PRD and srdf_pruledbfra_PRD
Aug 26 14:33:05 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for srdf_pruledbfra_PRD on nodeA | transition_key=33:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=srdf_pruledbfra_PRD_monitor_0
Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: Validate check of symclipath value /opt/emc/SYMCLI/bin ok
Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: Validate check of VG pruledbredo_PRD ok
Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: vg_sysid = nodeB
Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: Volume group pruledbredo_PRD not active on this host
Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: Not running on this host

Aug 26 14:33:35 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledbfra_PRD (ocf:sky:emc_srdf): Input/output error

Aug 26 14:34:05 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledbredo_PRD (ocf:sky:emc_srdf): Input/output error

Aug 26 14:34:35 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledbfra_PRD (ocf:sky:emc_srdf): Input/output error

# It seems we give up on srdf_pruledbredo_PRD and srdf_pruledbfra_PRD; we never see them again
# probe running for srdf_pruledbgg_PRD
Aug 26 14:34:35 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for srdf_pruledbgg_PRD on nodeA | transition_key=34:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=srdf_pruledbgg_PRD_monitor_0

Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (controld_get_rsc_metadata)        warning: Failed to get metadata for srdf_pruledbgg_PRD (ocf:sky:emc_srdf): Input/output error
Aug 26 14:35:05  emc_srdf(srdf_pruledbgg_PRD)[48951]:    INFO: Validate check of symclipath value /opt/emc/SYMCLI/bin ok
Aug 26 14:35:05  emc_srdf(srdf_pruledbgg_PRD)[48951]:    INFO: Validate check of VG pruledbgg_PRD ok
Aug 26 14:35:05  emc_srdf(srdf_pruledbgg_PRD)[48951]:    INFO: vg_sysid = nodeA
Aug 26 14:35:05  emc_srdf(srdf_pruledbgg_PRD)[48951]:    INFO: Volume group pruledbgg_PRD not active on this host
Aug 26 14:35:05  emc_srdf(srdf_pruledbgg_PRD)[48951]:    INFO: Not running on this host

# probes running for srdf_pruledbgg_PRD and vg_pruledbredo_PRD
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for vg_pruledbredo_PRD on nodeA | transition_key=35:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=vg_pruledbredo_PRD_monitor_0
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (wait_for_sync_result)     info: LVM-activate_meta-data_0[48952] exited with status 0
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for vg_pruledbarc_PRD on nodeA | transition_key=36:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=vg_pruledbarc_PRD_monitor_0
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for vg_pruledb_PRD on nodeA | transition_key=37:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=vg_pruledb_PRD_monitor_0
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for vg_pruledbgg_PRD on nodeA | transition_key=38:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=vg_pruledbgg_PRD_monitor_0
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for vg_pruledbfra_PRD on nodeA | transition_key=39:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=vg_pruledbfra_PRD_monitor_0
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (do_lrm_rsc_op)    notice: Requesting local execution of probe operation for fs_pruledbprd on nodeA | transition_key=40:6:7:57303c79-0b90-48d2-b386-87a9f518edd3 op_key=fs_pruledbprd_monitor_0
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (wait_for_sync_result)     info: Filesystem_meta-data_0[48983] exited with status 0

Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (log_executor_event)       notice: Result of probe operation for vg_pruledbredo_PRD on nodeA: not running | CIB update 27, graph action confirmed; call=33 key=vg_pruledbredo_PRD_monitor_0 rc=7
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (log_executor_event)       notice: Result of probe operation for vg_pruledbarc_PRD on nodeA: not running | CIB update 28, graph action confirmed; call=37 key=vg_pruledbarc_PRD_monitor_0 rc=7
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (log_executor_event)       notice: Result of probe operation for vg_pruledb_PRD on nodeA: not running | CIB update 29, graph action confirmed; call=41 key=vg_pruledb_PRD_monitor_0 rc=7
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (log_executor_event)       notice: Result of probe operation for vg_pruledbgg_PRD on nodeA: not running | CIB update 30, graph action confirmed; call=45 key=vg_pruledbgg_PRD_monitor_0 rc=7
Aug 26 14:35:05 nodeA pacemaker-controld  [47747] (log_executor_event)       notice: Result of probe operation for vg_pruledbfra_PRD on nodeA: not running | CIB update 31, graph action confirmed; call=49 key=vg_pruledbfra_PRD_monitor_0 rc=7

# LVM and Filesystem probes are finishing quickly and may be running in parallel of more than 2 at a time...
# ... maybe metadata actions are throttled but the probes themselves aren't? Not sure
# logs continue in a similar fashion
```

-----

My key takeaways:
1. Something does seem to be serialized by the executor on node A. I don't know whether the probes are inherently
   serialized, or whether they're forced to bottleneck if two of them have a pending metadata action. I'm skeptical
   that they're inherently serialized, since a bunch of LVM and Filesystem probes appeared to run simultaneously.
   Regardless, the DC (node B) initiated them all at once and started their timers simultaneously. This might be related
   to BZ 1858462, and it should be investigated further. We can keep this BZ open for now to see if it's the same thing
   or something different.
   
2. These are custom resource agents, and it now seems obvious that the metadata actions are doing things that they're
   not supposed to. Note messages like the following:
   
      Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: Validate check of symclipath value /opt/emc/SYMCLI/bin ok
      Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: Validate check of VG pruledbredo_PRD ok
      Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: vg_sysid = nodeB
      Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: Volume group pruledbredo_PRD not active on this host
      Aug 26 14:33:05  emc_srdf(srdf_pruledbredo_PRD)[48554]:    INFO: Not running on this host
   
   That suggests a bug in the resource agent. A resource agent's metadata action shouldn't perform validation. It
   especially shouldn't perform host-specific validation (like checking for a volume group being active); such host-
   specific validation checks should be reserved for the validate action with OCF_CHECK_LEVEL=10. I only showed logs
   from ocf:sky:emc_srdf, but the ocf:sky:CrossSiteIP resource agent shows similar behavior.
   
   Further reading:
   - https://github.com/ClusterLabs/OCF-spec/blob/main/ra/1.1/resource-agent-api.md#mandatory-actions (for what the
     metadata action should do)
   - https://github.com/ClusterLabs/OCF-spec/blob/main/ra/1.1/resource-agent-api.md#check-levels (for the validate
     action with OCF_CHECK_LEVEL=10)

Comment 2 Reid Wahl 2022-08-27 00:04:08 UTC

Follow-up: We need to try to reproduce this and establish whether this is a change between 2.1.0 and 2.1.2, or an issue that was already present and happened to appear during the rolling upgrade due to the metadata action taking so long.

Comment 3 Ken Gaillot 2022-08-29 20:15:51 UTC

The actions aren't being serialized; meta-data actions are always synchronous, which blocks anything else from being done. Also, meta-data actions consume the timeout of the probe they're being run for. (Bug 1850506 would fix both of these issues.)

Meta-data actions calling validation is suboptimal but relatively common. It generally indicates the agent calls validation first, before even looking at the action. A more important question is why are the meta-data actions timing out; they aren't logging for more than a few seconds, but it appears the agent isn't exiting.

I'm guessing these are just the system logs? The pacemaker.log might have more detail about why the actions aren't completing.

Comment 6 Reid Wahl 2022-08-29 21:16:57 UTC

(In reply to Ken Gaillot from comment #3)
> The actions aren't being serialized; meta-data actions are always
> synchronous, which blocks anything else from being done. Also, meta-data
> actions consume the timeout of the probe they're being run for. (Bug 1850506
> would fix both of these issues.)

Right, I saw that meta-data actions are always synchronous and figured they would consume the probe timeout.

I missed that these resources were in a group, so what was less clear to me was why at most two metadata actions (as part of probe actions) are running at a time. Two will run simultaneously, and another one won't start until we give up on one of those. That mostly makes sense now that I see they're grouped... although I would expect rsc2 to wait for rsc1 to time out, instead of going in batches of "probe rsc1 and rsc2, fail, probe rsc3 and rsc4, fail, ..."


> Meta-data actions calling validation is suboptimal but relatively common. It
> generally indicates the agent calls validation first, before even looking at
> the action.
>
> A more important question is why are the meta-data actions
> timing out; they aren't logging for more than a few seconds, but it appears
> the agent isn't exiting.

That's what I expected to find. However, the customer gave us the resource agent script and I don't see anything calling validate in the meta-data action. I attached the RA privately to this BZ.

They also said those validation and "vg_sysid =" outputs are from the monitor operation. It looks that way to me too. So it's as though the metadata action isn't completing until the whole probe is complete. (This RA doesn't distinguish between probes and recurring monitors.)


> I'm guessing these are just the system logs? The pacemaker.log might have
> more detail about why the actions aren't completing.

I'll upload the pacemaker.log files.

Comment 9 Reid Wahl 2022-08-30 00:26:13 UTC

Side note on the customer's situation -- since their RA's metadata depends on the output of `crm_node -l`, it adds two new required attributes to the metadata any time a node joins and runs the meta-data action...

Comment 10 Ken Gaillot 2022-09-06 22:58:17 UTC

Observations:

* The agent (via get_local_param() in sky-common.sh) assumes that uname -n is the local node name, but that's not necessarily true. It would be better to use crm_node -n.

* The agent metadata includes <version>0.1</version>. The version element is intended for the version of the OCF standard that the metadata complies with (1.0 would be appropriate for the existing agent code), rather than the version of the agent itself, which would be specified like <resource-agent name="emc_srdf" version="0.1">. Recent versions of pcs require a known OCF version, so it could cause problems down the road.

* The metadata action runs crm_node -l (via get_node_list()) to generate the node-name-dependent parameters. This makes sense for what the agent needs, but is problematic. Metadata should ideally be obtainable even if the cluster is not running, or on a machine that is not part of the cluster (for example, a host running a GUI front-end to the cluster). I don't have any suggestions, but ideally the node names would be parameter values somehow and not part of parameter names.

* The crm_node command might be what's hanging the metadata. That could be verified by replacing crm_node -l with just echoing the node names in a test cluster. If that is the problem, adding -VVVVVV to the crm_node command (in a test cluster) would help narrow down what's wrong.

* I suspect the monitor result timeouts are an artifact of the metadata calls being synchronous, and not a problem with the monitor itself. nodeA records the timeout in the CIB immediately after nodeB's timer pops. Increasing the monitor timeout by 5 seconds might avoid that particular log error, though it will still be recorded as a timeout, and it's possible stop would still fail and lead to fencing.

Chad, can you pass along the comments about the agent code and monitor timeout? Would it be possible to reproduce this in a test cluster so we can see if crm_node is the problem?

The fix for Bug 1850506 would give a much better recovery from metadata failure, and would probably avoid the monitor/stop failures and fencing here. We might close this as a duplicate of that. But figuring out why the metadata calls are timing out is the most important thing now.

Comment 12 Reid Wahl 2022-09-07 01:32:46 UTC

Now that I can look at this again, it's trivial to reproduce. Add `crm_node -l >/dev/null 2>&1` to the beginning of ocf:pacemaker:Dummy's meta_data() function. Then create a Dummy resource.

Environment: Pacemaker-2.1.4 (from an upstream main source build) on RHEL 9. (I have only one node online, for simplicity.)

# grep -A1 '^meta_data()' /usr/lib/ocf/resource.d/pacemaker/Dummy 
meta_data() {
    crm_node -l >/dev/null 2>&1

# date && pcs resource create dummy1 ocf:pacemaker:Dummy
Tue Sep  6 05:24:35 PM PDT 2022

Logs:

Sep 06 17:24:35.650 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (do_state_transition)         notice: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph
Sep 06 17:24:35.653 fastvm-rhel-9-0-42 pacemaker-schedulerd[1469] (determine_online_status)     info: Node fastvm-rhel-9-0-42 is online
Sep 06 17:24:35.653 fastvm-rhel-9-0-42 pacemaker-schedulerd[1469] (log_list_item)       notice: Actions: Start      xvm        ( fastvm-rhel-9-0-42 )  due to no quorum (blocked)
Sep 06 17:24:35.653 fastvm-rhel-9-0-42 pacemaker-schedulerd[1469] (log_list_item)       notice: Actions: Start      dummy1     ( fastvm-rhel-9-0-42 )  due to no quorum (blocked)
Sep 06 17:24:35.653 fastvm-rhel-9-0-42 pacemaker-schedulerd[1469] (pcmk__log_transition_summary)        notice: Calculated transition 10, saving inputs in /var/lib/pacemaker/pengine/pe-input-123.bz2

Sep 06 17:24:35.664 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (do_state_transition)         info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_reply
Sep 06 17:24:35.664 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (do_te_invoke)        info: Processing graph 10 (ref=pe_calc-pacemaker-controld-1662510275-10) derived from /var/lib/pacemaker/pengine/pe-input-123.bz2
Sep 06 17:24:35.664 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (execute_rsc_action)  notice: Initiating monitor operation dummy1_monitor_0 locally on fastvm-rhel-9-0-42 | action 1
Sep 06 17:24:35.664 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (do_lrm_rsc_op)       notice: Requesting local execution of probe operation for dummy1 on fastvm-rhel-9-0-42 | transition_key=1:10:7:4902cc63-ad7b-4837-a7c4-5f556ddfb178 op_key=dummy1_monitor_0
Sep 06 17:24:39.219 fastvm-rhel-9-0-42 pacemakerd          [1464] (pcmk__ipc_is_authentic_process_active)       info: Could not connect to crmd IPC: timeout
Sep 06 17:24:39.219 fastvm-rhel-9-0-42 pacemakerd          [1464] (check_next_subdaemon)        notice: pacemaker-controld[1859] is unresponsive to ipc after 1 tries
Sep 06 17:24:40.656 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_process_ping)    info: Reporting our current digest to fastvm-rhel-9-0-42: 386ca5445db08dabcb6e4ab8e1a85557 for 0.110.0 (0x558946d9bd50 0)
Sep 06 17:24:46.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (pcmk__ipc_is_authentic_process_active)       info: Could not connect to crmd IPC: timeout
Sep 06 17:24:46.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (check_next_subdaemon)        notice: pacemaker-controld[1859] is unresponsive to ipc after 2 tries
Sep 06 17:24:53.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (pcmk__ipc_is_authentic_process_active)       info: Could not connect to crmd IPC: timeout
Sep 06 17:24:53.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (check_next_subdaemon)        notice: pacemaker-controld[1859] is unresponsive to ipc after 3 tries
Sep 06 17:25:00.221 fastvm-rhel-9-0-42 pacemakerd          [1464] (pcmk__ipc_is_authentic_process_active)       info: Could not connect to crmd IPC: timeout
Sep 06 17:25:00.221 fastvm-rhel-9-0-42 pacemakerd          [1464] (check_next_subdaemon)        notice: pacemaker-controld[1859] is unresponsive to ipc after 4 tries
Sep 06 17:25:05.695 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (wait_for_sync_result)        info: Dummy_meta-data_0[1922] timed out after 30000ms
Sep 06 17:25:05.696 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (lrmd_api_get_metadata_params)        error: Failed to retrieve meta-data for ocf:pacemaker:Dummy
Sep 06 17:25:05.697 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (controld_get_rsc_metadata)   warning: Failed to get metadata for dummy1 (ocf:pacemaker:Dummy): Input/output error
Sep 06 17:25:05.698 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_process_request)         info: Forwarding cib_modify operation for section status to all (origin=local/crmd/31)
Sep 06 17:25:05.699 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_perform_op)      info: Diff: --- 0.110.0 2
Sep 06 17:25:05.699 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_perform_op)      info: Diff: +++ 0.110.1 (null)
Sep 06 17:25:05.699 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_perform_op)      info: +  /cib:  @num_updates=1
Sep 06 17:25:05.699 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_perform_op)      info: ++ /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources:  <lrm_resource id="dummy1" type="Dummy" class="ocf" provider="pacemaker"/>
Sep 06 17:25:05.699 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_perform_op)      info: ++                                                                <lrm_rsc_op id="dummy1_last_0" operation_key="dummy1_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.16.0" transition-key="1:10:7:4902cc63-ad7b-4837-a7c4-5f556ddfb178" transition-magic="-1:193;1:10:7:4902cc63-ad7b-4837-a7c4-5f556ddfb178" exit-reason="" on_node="fastvm-rhel-9-0-42" call-id="-1" rc-code="193" op-status="-1" interval="0" last-rc-change="1662510275" exec-time="0" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
Sep 06 17:25:05.699 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_perform_op)      info: ++                                                              </lrm_resource>
Sep 06 17:25:05.700 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_process_request)         info: Completed cib_modify operation for section status: OK (rc=0, origin=fastvm-rhel-9-0-42/crmd/31, version=0.110.1)
Sep 06 17:25:05.702 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (pcmk__procfs_pid_of)         info: Found pacemaker-based active as process 1465
Sep 06 17:25:05.702 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (throttle_send_command)       info: New throttle mode: negligible load (was undetermined)
Sep 06 17:25:10.704 fastvm-rhel-9-0-42 pacemaker-based     [1465] (cib_process_ping)    info: Reporting our current digest to fastvm-rhel-9-0-42: 7d0fddd9c0de27d6137b64a850415c60 for 0.110.1 (0x558946d9bd50 0)
Sep 06 17:25:13.221 fastvm-rhel-9-0-42 pacemakerd          [1464] (pcmk__ipc_is_authentic_process_active)       info: Could not connect to crmd IPC: timeout
Sep 06 17:25:13.221 fastvm-rhel-9-0-42 pacemakerd          [1464] (check_next_subdaemon)        notice: pacemaker-controld[1859] is unresponsive to ipc after 1 tries
Sep 06 17:25:20.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (pcmk__ipc_is_authentic_process_active)       info: Could not connect to crmd IPC: timeout
Sep 06 17:25:20.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (check_next_subdaemon)        notice: pacemaker-controld[1859] is unresponsive to ipc after 2 tries
Sep 06 17:25:27.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (pcmk__ipc_is_authentic_process_active)       info: Could not connect to crmd IPC: timeout
Sep 06 17:25:27.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (check_next_subdaemon)        notice: pacemaker-controld[1859] is unresponsive to ipc after 3 tries
Sep 06 17:25:34.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (pcmk__ipc_is_authentic_process_active)       info: Could not connect to crmd IPC: timeout
Sep 06 17:25:34.220 fastvm-rhel-9-0-42 pacemakerd          [1464] (check_next_subdaemon)        notice: pacemaker-controld[1859] is unresponsive to ipc after 4 tries
Sep 06 17:25:35.731 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (wait_for_sync_result)        info: Dummy_meta-data_0[1938] timed out after 30000ms
Sep 06 17:25:35.732 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (lrmd_api_get_metadata_params)        error: Failed to retrieve meta-data for ocf:pacemaker:Dummy
Sep 06 17:25:35.732 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (controld_get_rsc_metadata)   warning: Failed to get metadata for dummy1 (ocf:pacemaker:Dummy): Input/output error
Sep 06 17:25:35.732 fastvm-rhel-9-0-42 pacemaker-controld  [1859] (log_executor_event)  notice: Result of probe operation for dummy1 on fastvm-rhel-9-0-42: not running | CIB update 32, graph action confirmed; call=39 key=dummy1_monitor_0 rc=7

Comment 13 Ken Gaillot 2022-09-07 19:40:09 UTC

No further information is needed from the customer.

Since upstream Pacemaker release 2.1.0 (RHEL 8.5 and 9.0 beta), crm_node -l gets the node list via the controller, rather than querying Corosync directly.

Since metadata calls are executed synchronously in the controller, metadata calls block all other activity by the controller. Thus, if the metadata action runs crm_node, crm_node will be unable to connect to the controller, and the metadata action will hang until it times out.

The long-term solution in Bug 1850506 will be better, but for a quicker fix, I'm thinking we can make the metadata call asynchronous without the rest of that solution. The metadata call would still eat into the probe timeout, but since the controller could handle the crm_node connection, it can proceed and should complete quickly.

Comment 14 Ken Gaillot 2022-09-07 19:48:31 UTC

A workaround would be to avoid calling crm_node for the metadata action.

Comment 15 Ken Gaillot 2022-09-08 16:56:57 UTC

QA: Reproducer:

* Configure a cluster.
* Edit or copy /usr/lib/ocf/resource.d/pacemaker/Dummy and add "crm_node -l" to the metadata action.
* Create a resource using Dummy (or the copy).

Actual results: The cluster is hung for 30 seconds and then the controller logs a metadata action timeout.

Expected results: The cluster proceeds normally.

Comment 18 Ken Gaillot 2022-09-20 19:50:32 UTC

Fixed in upstream main branch as of commit bc852fe3

Comment 21 Markéta Smazová 2022-12-19 13:37:16 UTC

Tested using reproducer from Comment 12 and Comment 15.


after fix:
---------

>   [root@virt-507 ~]# rpm -q pacemaker
>   pacemaker-2.1.5-4.el8.x86_64

Configure a cluster:

>   [root@virt-507 ~]# pcs status
>   Cluster name: STSRHTS29018
>   Status of pacemakerd: 'Pacemaker is running' (last updated 2022-12-19 13:51:03 +01:00)
>   Cluster Summary:
>     * Stack: corosync
>     * Current DC: virt-507 (version 2.1.5-4.el8-a3f44794f94) - partition with quorum
>     * Last updated: Mon Dec 19 13:51:03 2022
>     * Last change:  Mon Dec 19 13:24:06 2022 by root via cibadmin on virt-507
>     * 2 nodes configured
>     * 2 resource instances configured

>   Node List:
>     * Online: [ virt-507 virt-508 ]

>   Full List of Resources:
>     * fence-virt-507	(stonith:fence_xvm):	 Started virt-507
>     * fence-virt-508	(stonith:fence_xvm):	 Started virt-508

>   Daemon Status:
>     corosync: active/enabled
>     pacemaker: active/enabled
>     pcsd: active/enabled

Edit /usr/lib/ocf/resource.d/pacemaker/Dummy and add "crm_node -l" to the metadata action:

>   [root@virt-507 ~]# grep -A1 '^meta_data()' /usr/lib/ocf/resource.d/pacemaker/Dummy
>   meta_data() {
>       crm_node -l >/dev/null 2>&1

Create a resource using Dummy:

>   [root@virt-507 ~]# date && pcs resource create dummy2 ocf:pacemaker:Dummy
>   Mon 19 Dec 13:57:07 CET 2022

>   [root@virt-507 ~]# pcs resource
>     * dummy2	(ocf::pacemaker:Dummy):	 Started virt-507

>   [root@virt-507 ~]# tail -f /var/log/messages 

>   Dec 19 13:57:08 virt-507 pacemaker-controld[101660]: notice: State transition S_IDLE -> S_POLICY_ENGINE
>   Dec 19 13:57:09 virt-507 pacemaker-schedulerd[101659]: notice: Actions: Start      dummy2             ( virt-507 )
>   Dec 19 13:57:09 virt-507 pacemaker-schedulerd[101659]: notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-4.bz2
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Initiating monitor operation dummy2_monitor_0 on virt-508
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Initiating monitor operation dummy2_monitor_0 locally on virt-507
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Requesting local execution of probe operation for dummy2 on virt-507
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Result of probe operation for dummy2 on virt-507: not running
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Initiating start operation dummy2_start_0 locally on virt-507
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Requesting local execution of start operation for dummy2 on virt-507
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Result of start operation for dummy2 on virt-507: ok
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Initiating monitor operation dummy2_monitor_10000 locally on virt-507
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Requesting local execution of monitor operation for dummy2 on virt-507
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Result of monitor operation for dummy2 on virt-507: ok
>   Dec 19 13:57:09 virt-507 pacemaker-controld[101660]: notice: Transition 5 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-4.bz2): Complete


Result: The cluster proceeds normally.

marking VERIFIED in pacemaker-2.1.5-4.el8

Comment 28 errata-xmlrpc 2023-05-16 08:35:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2818

Note You need to log in before you can comment on or make changes to this bug.