Bug 2181019

Summary: azure-events-az fails with pacemaker => 2.1 with missing transition summary (RHEL8)
Product: Red Hat Enterprise Linux 8 Reporter: robbiro <robert.biro>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: Brandon Perkins <bperkins>
Severity: unspecified Docs Contact: Steven J. Levine <slevine>
Priority: unspecified    
Version: 8.6CC: agk, bperkins, cfeist, cluster-maint, fdinitto, kgaillot, ksatarin, nwahl, radeltch, slevine
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: 8.6Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: resource-agents-4.9.0-42.el8 Doc Type: Bug Fix
Doc Text:
.The `azure-events-az` resource agent no longer produces an error with Pacemaker 2.1 and later The `azure-events-az` resource agent executes the `crm_simulate -Ls` command and parses the output. With Pacemaker 2.1 and later, the output of the `crm_simulate` command no longer contains the text `Transition Summary:`, which resulted in an error. With this fix, the agent no longer yields an error when this text is missing.
Story Points: ---
Clone Of:
: 2182415 2182761 2182762 2182763 (view as bug list) Environment:
Last Closed: 2023-11-14 15:22:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2182415, 2182761, 2182762, 2182763    

Description robbiro 2023-03-22 21:11:58 UTC
Description of problem:
resource-agents package provides azure-events-az agent. This agent executes 'crm_simulate -Ls' and parses output. With pacemaker 2.1 and newer the output of crm_simulate changed and no longer contains text 'Transition Summary:', resulting with an error for azure-events-az.

Version-Release number of selected component (if applicable):
RHEL 8.6 (and newer)
resource-agents-4.9.0-16.el8_6.8.x86_64
pacemaker-cli-2.1.2-4.el8_6.5.x86_64

How reproducible:
Error in azure-events-az only shows after an Azure scheduled event is triggered. Example of such event is VM redeploy or reboot. https://learn.microsoft.com/en-us/azure/virtual-machines/linux/scheduled-events

Errors during execution of azure-events-az, as a monitor within pacemaker or manually with 'export OCF_ROOT=/usr/lib/ocf; export OCF_RESKEY_verbose=1; /usr/lib/ocf/resource.d/heartbeat/azure-events-az monitor' causes an error:

ocf-exit-reason:object of type 'bool' has no len()
ocf.py(None)[2963061]:  Mar 22 21:04:03 ERROR: object of type 'bool' has no len()

Steps to Reproduce:
See above section

Actual results:
Within /usr/lib/ocf/resource.d/heartbeat/azure-events-az, method transitionSummary() on line 296 calls crm_simulate -Ls (line 311). This method expects to find "Transition Summary:" in the output. Upto and including RHEL 8.4 this was true (pacemaker 2.0.5 in RHEL8.4). In pacemaker 2.1.x this line is no longer part of output.

Result is that within azure-events-az, method allResourcesStoppedOnNode(node) then fails in line 372 with above errors


Expected results:
# crm_simulate -Ls
Transition Summary:
	* Promote rsc_SAPHana_HN1_HDB03:0      (Slave -> Master hsr3-db1)
	* Stop    rsc_SAPHana_HN1_HDB03:1      (hsr3-db0)
	* Move    rsc_ip_HN1_HDB03     (Started hsr3-db0 -> hsr3-db1)
	* Start   rsc_nc_HN1_HDB03     (hsr3-db1)
# Excepted result when there are no pending actions:
Transition Summary:

Additional info:

Comment 3 Oyvind Albrigtsen 2023-03-23 14:38:36 UTC
@kgaillot is there a crm_feature_set version I should check against so we can use new format where needed, and fallback to the old way for older versions?

Comment 4 Ken Gaillot 2023-03-23 14:54:47 UTC
(In reply to Oyvind Albrigtsen from comment #3)
> @kgaillot is there a crm_feature_set version I should check
> against so we can use new format where needed, and fallback to the old way
> for older versions?

3.7.4, which is also when --output-as=xml is available for crm_simulate

Comment 5 Oyvind Albrigtsen 2023-03-28 15:04:08 UTC
Fix to treat no "Transition Summary" as no actions:https://github.com/ClusterLabs/resource-agents/pull/1854

--output-as=xml is quite new in crm_simulate, so not as backwards compatible.

Comment 8 Oyvind Albrigtsen 2023-05-01 09:12:25 UTC
Additional patch to improve logic: https://github.com/ClusterLabs/resource-agents/pull/1864

Comment 9 robbiro 2023-05-03 15:40:09 UTC
Thank you for the fix. Will await release of package in repository channels for full integration tests on our side. Quick tests on RH9.0 and 8.6 are positive with the checked in modifications.

To clarify on the proposed fix - is there any concern of using crm_simulate -LS and depending on the output containing the expected text string in future releases, potentially breaking again down the line again? Certainly, future changes cannot be predicted, am trying to understand if there are assumed changes. Since -Ls output containing the 'Transition Summary:" text block was likely unintended and thus removed in the later/current releases, assuming -LS keeping it going forward. Any guidance on this would be appreciated.

Comment 10 Ken Gaillot 2023-05-03 16:12:21 UTC
(In reply to robbiro from comment #9)
> Thank you for the fix. Will await release of package in repository channels
> for full integration tests on our side. Quick tests on RH9.0 and 8.6 are
> positive with the checked in modifications.
> 
> To clarify on the proposed fix - is there any concern of using crm_simulate
> -LS and depending on the output containing the expected text string in
> future releases, potentially breaking again down the line again? Certainly,
> future changes cannot be predicted, am trying to understand if there are
> assumed changes. Since -Ls output containing the 'Transition Summary:" text
> block was likely unintended and thus removed in the later/current releases,
> assuming -LS keeping it going forward. Any guidance on this would be
> appreciated.

To address such issues, Pacemaker has been gradually adding support for XML output for all command-line tools. The idea is that the text output may change from release to release, but the XML output will change as little as possible, and remain backward-compatible as much as possible, for parsing by scripts. All commands will take the same --output-as option, which may be set to "none", "text", or "xml".

The schema for the XML output is installed as /usr/share/pacemaker/api/api-result.rng (which includes RNGs for each individual command). You can use that to figure out what to parse.

crm_simulate supports --output-as as of the Pacemaker 2.1.0 release (RHEL 8.5 and later, and all of RHEL 9). Most agents haven't switched to parsing XML yet in order to remain compatible with older versions, but if that's not a concern, I'd recommend the XML.

FYI, we've also been gradually adding high-level C APIs corresponding to each command-line tool, and those generate the same XML output that --output-as=xml would.

Commands that currently support --output-as=xml:
* Since 2.0.2 (8.1+/9.0+): stonith_admin
* Since 2.0.3 (8.2+/9.0+): crm_mon
* Since 2.1.0 (8.5+/9.0+): crmadmin, crm_resource, crm_simulate, crm_verify
* Since 2.1.3 (8.7+/9.1+): attrd_updater, crm_attribute, crm_rule
* Since 2.1.5 (8.8+/9.2+): crm_error
* Since 2.1.6 (8.9+/9.3+): crm_shadow
* Not yet supported: cibadmin, crm_diff, crm_node, crm_ticket, iso8601

Comment 19 errata-xmlrpc 2023-11-14 15:22:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (resource-agents bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6899