Bug 2181019 - azure-events-az fails with pacemaker => 2.1 with missing transition summary (RHEL8)
Summary: azure-events-az fails with pacemaker => 2.1 with missing transition summary (...
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: resource-agents
Version: 8.6
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: 8.6
Assignee: Oyvind Albrigtsen
QA Contact: Brandon Perkins
URL:
Whiteboard:
Depends On:
Blocks: 2182415 2182763 2182761 2182762
TreeView+ depends on / blocked
 
Reported: 2023-03-22 21:11 UTC by robbiro
Modified: 2023-07-18 16:26 UTC (History)
9 users (show)

Fixed In Version: resource-agents-4.9.0-42.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2182415 2182761 2182762 2182763 (view as bug list)
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CLUSTERQE-6665 0 None None None 2023-04-28 17:09:48 UTC
Red Hat Issue Tracker RHELPLAN-152837 0 None None None 2023-03-22 21:14:05 UTC

Description robbiro 2023-03-22 21:11:58 UTC
Description of problem:
resource-agents package provides azure-events-az agent. This agent executes 'crm_simulate -Ls' and parses output. With pacemaker 2.1 and newer the output of crm_simulate changed and no longer contains text 'Transition Summary:', resulting with an error for azure-events-az.

Version-Release number of selected component (if applicable):
RHEL 8.6 (and newer)
resource-agents-4.9.0-16.el8_6.8.x86_64
pacemaker-cli-2.1.2-4.el8_6.5.x86_64

How reproducible:
Error in azure-events-az only shows after an Azure scheduled event is triggered. Example of such event is VM redeploy or reboot. https://learn.microsoft.com/en-us/azure/virtual-machines/linux/scheduled-events

Errors during execution of azure-events-az, as a monitor within pacemaker or manually with 'export OCF_ROOT=/usr/lib/ocf; export OCF_RESKEY_verbose=1; /usr/lib/ocf/resource.d/heartbeat/azure-events-az monitor' causes an error:

ocf-exit-reason:object of type 'bool' has no len()
ocf.py(None)[2963061]:  Mar 22 21:04:03 ERROR: object of type 'bool' has no len()

Steps to Reproduce:
See above section

Actual results:
Within /usr/lib/ocf/resource.d/heartbeat/azure-events-az, method transitionSummary() on line 296 calls crm_simulate -Ls (line 311). This method expects to find "Transition Summary:" in the output. Upto and including RHEL 8.4 this was true (pacemaker 2.0.5 in RHEL8.4). In pacemaker 2.1.x this line is no longer part of output.

Result is that within azure-events-az, method allResourcesStoppedOnNode(node) then fails in line 372 with above errors


Expected results:
# crm_simulate -Ls
Transition Summary:
	* Promote rsc_SAPHana_HN1_HDB03:0      (Slave -> Master hsr3-db1)
	* Stop    rsc_SAPHana_HN1_HDB03:1      (hsr3-db0)
	* Move    rsc_ip_HN1_HDB03     (Started hsr3-db0 -> hsr3-db1)
	* Start   rsc_nc_HN1_HDB03     (hsr3-db1)
# Excepted result when there are no pending actions:
Transition Summary:

Additional info:

Comment 3 Oyvind Albrigtsen 2023-03-23 14:38:36 UTC
@kgaillot is there a crm_feature_set version I should check against so we can use new format where needed, and fallback to the old way for older versions?

Comment 4 Ken Gaillot 2023-03-23 14:54:47 UTC
(In reply to Oyvind Albrigtsen from comment #3)
> @kgaillot is there a crm_feature_set version I should check
> against so we can use new format where needed, and fallback to the old way
> for older versions?

3.7.4, which is also when --output-as=xml is available for crm_simulate

Comment 5 Oyvind Albrigtsen 2023-03-28 15:04:08 UTC
Fix to treat no "Transition Summary" as no actions:https://github.com/ClusterLabs/resource-agents/pull/1854

--output-as=xml is quite new in crm_simulate, so not as backwards compatible.

Comment 8 Oyvind Albrigtsen 2023-05-01 09:12:25 UTC
Additional patch to improve logic: https://github.com/ClusterLabs/resource-agents/pull/1864

Comment 9 robbiro 2023-05-03 15:40:09 UTC
Thank you for the fix. Will await release of package in repository channels for full integration tests on our side. Quick tests on RH9.0 and 8.6 are positive with the checked in modifications.

To clarify on the proposed fix - is there any concern of using crm_simulate -LS and depending on the output containing the expected text string in future releases, potentially breaking again down the line again? Certainly, future changes cannot be predicted, am trying to understand if there are assumed changes. Since -Ls output containing the 'Transition Summary:" text block was likely unintended and thus removed in the later/current releases, assuming -LS keeping it going forward. Any guidance on this would be appreciated.

Comment 10 Ken Gaillot 2023-05-03 16:12:21 UTC
(In reply to robbiro from comment #9)
> Thank you for the fix. Will await release of package in repository channels
> for full integration tests on our side. Quick tests on RH9.0 and 8.6 are
> positive with the checked in modifications.
> 
> To clarify on the proposed fix - is there any concern of using crm_simulate
> -LS and depending on the output containing the expected text string in
> future releases, potentially breaking again down the line again? Certainly,
> future changes cannot be predicted, am trying to understand if there are
> assumed changes. Since -Ls output containing the 'Transition Summary:" text
> block was likely unintended and thus removed in the later/current releases,
> assuming -LS keeping it going forward. Any guidance on this would be
> appreciated.

To address such issues, Pacemaker has been gradually adding support for XML output for all command-line tools. The idea is that the text output may change from release to release, but the XML output will change as little as possible, and remain backward-compatible as much as possible, for parsing by scripts. All commands will take the same --output-as option, which may be set to "none", "text", or "xml".

The schema for the XML output is installed as /usr/share/pacemaker/api/api-result.rng (which includes RNGs for each individual command). You can use that to figure out what to parse.

crm_simulate supports --output-as as of the Pacemaker 2.1.0 release (RHEL 8.5 and later, and all of RHEL 9). Most agents haven't switched to parsing XML yet in order to remain compatible with older versions, but if that's not a concern, I'd recommend the XML.

FYI, we've also been gradually adding high-level C APIs corresponding to each command-line tool, and those generate the same XML output that --output-as=xml would.

Commands that currently support --output-as=xml:
* Since 2.0.2 (8.1+/9.0+): stonith_admin
* Since 2.0.3 (8.2+/9.0+): crm_mon
* Since 2.1.0 (8.5+/9.0+): crmadmin, crm_resource, crm_simulate, crm_verify
* Since 2.1.3 (8.7+/9.1+): attrd_updater, crm_attribute, crm_rule
* Since 2.1.5 (8.8+/9.2+): crm_error
* Since 2.1.6 (8.9+/9.3+): crm_shadow
* Not yet supported: cibadmin, crm_diff, crm_node, crm_ticket, iso8601


Note You need to log in before you can comment on or make changes to this bug.