Bug 2099331

Summary: crm_attribute default output changed to "(null") instead of empty, breaks redis resource agent
Product: Red Hat Enterprise Linux 9 Reporter: Damien Ciabrini <dciabrin>
Component: pacemakerAssignee: Chris Lumens <clumens>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 9.0CC: cluster-maint, kgaillot, lmiccini, msmazova, peljasz, tkajinam
Target Milestone: rcKeywords: Regression, Triaged
Target Release: 9.1Flags: pm-rhel: mirror+
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: pacemaker-2.1.4-2.el9 Doc Type: No Doc Update
Doc Text:
This issue was not in a released build
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-15 09:49:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 2.1.5
Embargoed:

Description Damien Ciabrini 2022-06-20 15:10:54 UTC
Description of problem:
In Openstack CI, we're consuming the latest pacemaker pacemaker-2.1.3-2.el9.x86_64, and our deployment can no longer promote the Redis resource 
managed by the redis resource agent.

After futther inspection, it looks like recent version of crm_attribute now returns "(null)" instead of an empty string when a attribute is no found in the CIB. E.g:

# crm_attribute --type crm_config --name REDIS_REPL_INFO -s redis_replication --query -q 2>/dev/null
(null)

or

# crm_attribute --promotion -n nonexisting-attribbute -N standalone -G --quiet
(null)
crm_attribute: Error performing operation: No such device or address


This may potentially confuse a lot of resource agent. So far we've confirmed that the redis resource agent is not able to cope with that behaviour as it expected empty string to cycle from start to promotable state.


Version-Release number of selected component (if applicable):
pacemaker-2.1.3-2.el9.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Deploy a standalone OpenStack cloud in a VM. (this creates a containerized M/S redis resource.)

Actual results:
The redis-bundle resource stays to Unpromoted state

Expected results:
The redis-bundle resource should go to Promoted state automatically

Additional info:
This old behaviour is known to work up to pacemaker.x86_64 2.1.2-4.el9

Comment 1 Takashi Kajinami 2022-06-20 15:20:01 UTC
I guess the output change was made by https://github.com/ClusterLabs/pacemaker/commit/3f1565b95d5e5314c9bbb6edc91aa949a6c05935 , which is present in pacmeaker 2.1.3 (and beyond)

Comment 2 Ken Gaillot 2022-06-27 21:09:36 UTC
Fixed in upstream main branch as of commit 9853f4d05

Comment 4 Ken Gaillot 2022-06-29 14:32:05 UTC
QA: Only the redis and rabbitmq agents are known to be potentially affected by this issue, but the issue itself is in the crm_attribute tool, so the only test needed is:

    crm_attribute --query --quiet --name $NAME --node $NODE 2>/dev/null

where $NAME is the name of an attribute that does not exist, and $NODE is any node in the cluster. Before the fix, with the 2.1.3 or 2.1.4-1 packages, it will output "(null)"; after the fix, it will not output anything.

Comment 5 lejeczek 2022-07-08 07:34:41 UTC
That goes for c8s too with pacemaker-2.1.3-2.el8.x86_64, there the cluster I must fool by making a constraint with 'move --master' if that does not exists cluster logs:
...
3442363:S 07 Jul 2022 20:11:24.184 # Unable to connect to MASTER: (null)
3442363:S 07 Jul 2022 20:11:25.187 * Connecting to MASTER no-such-master:6379
...

Would be great to have fixes send to centOS asap as well.
thanks, L.

Comment 9 Ken Gaillot 2022-07-11 17:09:13 UTC
(In reply to lejeczek from comment #5)
> That goes for c8s too with pacemaker-2.1.3-2.el8.x86_64, there the cluster I
> must fool by making a constraint with 'move --master' if that does not
> exists cluster logs:
> ...
> 3442363:S 07 Jul 2022 20:11:24.184 # Unable to connect to MASTER: (null)
> 3442363:S 07 Jul 2022 20:11:25.187 * Connecting to MASTER no-such-master:6379
> ...
> 
> Would be great to have fixes send to centOS asap as well.
> thanks, L.

The fix is also in the pacemaker-2.1.4-3.el8 build

Comment 10 Markéta Smazová 2022-07-13 15:03:46 UTC
before fix
-----------
[root@virt-245 ~]# rpm -q pacemaker
pacemaker-2.1.4-1.el9.x86_64

[root@virt-245 ~]# pcs cluster status
Cluster Status:
 Cluster Summary:
   * Stack: corosync
   * Current DC: virt-245 (version 2.1.4-1.el9-dc6eb4362e) - partition with quorum
   * Last updated: Wed Jul 13 16:31:15 2022
   * Last change:  Tue Jul 12 09:58:47 2022 by root via cibadmin on virt-245
   * 2 nodes configured
   * 2 resource instances configured
 Node List:
   * Online: [ virt-245 virt-246 ]

PCSD Status:
  virt-245: Online
  virt-246: Online

[root@virt-245 ~]# crm_attribute --query --quiet --name test --node virt-246 2>/dev/null
(null)

[root@virt-245 ~]# crm_attribute --query --quiet --name test --node virt-246
(null)
crm_attribute: Error performing operation: No such device or address


after fix
----------
[root@virt-259 ~]# rpm -q pacemaker
pacemaker-2.1.4-2.el9.x86_64

[root@virt-259 ~]# pcs cluster status
Cluster Status:
 Cluster Summary:
   * Stack: corosync
   * Current DC: virt-260 (version 2.1.4-2.el9-dc6eb4362e) - partition with quorum
   * Last updated: Wed Jul 13 16:31:58 2022
   * Last change:  Wed Jul 13 15:54:22 2022 by root via cibadmin on virt-259
   * 2 nodes configured
   * 2 resource instances configured
 Node List:
   * Online: [ virt-259 virt-260 ]

PCSD Status:
  virt-260: Online
  virt-259: Online

[root@virt-259 ~]# crm_attribute --query --quiet --name test --node virt-259 2>/dev/null

[root@virt-259 ~]# crm_attribute --query --quiet --name test --node virt-259
crm_attribute: Error performing operation: No such device or address

marking verified in pacemaker-2.1.4-2.el9

Comment 12 errata-xmlrpc 2022-11-15 09:49:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7937