Bug 1420565

Summary: pgsql agent misuses crm_failcount
Product: Red Hat Enterprise Linux 7 Reporter: Ken Gaillot <kgaillot>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: agk, cluster-maint, fdinitto, mnovacek
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-88.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 14:57:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Gaillot 2017-02-09 00:34:22 UTC
Description of problem: The ocf:heartbeat:pgsql resource agent calls crm_failcount in a certain error situation. This is not the appropriate means of indicating a node-fatal error, and could lead to unexpected behavior in a Pacemaker cluster.

Instead of using crm_failcount, the agent should return a hard error code such as OCF_ERR_ARGS or OCF_ERR_PERM.

The version of pacemaker that will be in RHEL 7.4 will have a change to crm_failcount which will break this usage, so it is important to have a fix in the same time frame.

This was discussed on the upstream mailing list, and a user offered to submit a fix, so a pull request may be forthcoming:

http://lists.clusterlabs.org/pipermail/users/2017-February/004958.html

Comment 4 michal novacek 2017-06-05 14:41:37 UTC
I have verified using our internal "pacemaker,resource,Postrgesql" test that it passes even with /usr/bin/crm_failcount having user rights 000 (resource-agents-3.9.5-104.el7)

---

[root@host-133 ~]# aq.sh 'ls -l /usr/sbin/crm_failcount'
----> using /usr/tests/resource-STSRHTS10447.xml
----------. 1 root root 2309 May  9 16:25 /usr/sbin/crm_failcount
----------. 1 root root 2309 May  9 16:25 /usr/sbin/crm_failcount
----------. 1 root root 2309 May  9 16:25 /usr/sbin/crm_failcount

/usr/tests/sts-rhel7.4/vedder/bin/vedder-ng -t pacemaker,resource,Postgres
...
------------------- Summary ---------------------
Testcase                                 Result    
--------                                 ------    
generic_setup                            PASS      
setup-clvmd                              PASS      
setup-pacemaker                          PASS      
setup_initscripts                        PASS      
pacemaker-resource-Postgres              PASS      
cleanup                                  PASS      
=================================================
Total Tests Run: 6
Total PASS:      6
Total FAIL:      0
Total TIMEOUT:   0
Total KILLED:    0
Total STOPPED:   0
Test output in /tmp/vedder.CHERRY.STSRHTS10447.201706050930
Killing XMLRPC server...
DEBUG:STSXMLRPC:Killing server with PID 12953 (SIGTERM)
INFO:STSXMLRPC:Server terminated.

Comment 5 errata-xmlrpc 2017-08-01 14:57:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1844