Bug 1292054

Summary: tmpfile leak in mysql resource agent
Product: Red Hat Enterprise Linux 6 Reporter: Oyvind Albrigtsen <oalbrigt>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.8CC: abeekhof, agk, cfeist, cluster-maint, cluster-qe, djansa, fdinitto, frank.enderle, mjuricek, oalbrigt, rmccabe
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-28.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1263348 Environment:
Last Closed: 2016-05-10 19:15:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1263348    
Bug Blocks:    

Description Oyvind Albrigtsen 2015-12-16 10:59:55 UTC
+++ This bug was initially created as a clone of Bug #1263348 +++

Description of problem:
The mysql resource agent leaks the check_slave... tmpfile and thus floods the /var/run/resource-agents directory with check_slave.* files which are not cleaned up, eventually leading to a filled partition or inode shortage.

Version-Release number of selected component (if applicable):
bash-4.2.46-12.el7.x86_64 (CentOS 7.1.1503)
resource-agents-3.9.5-40.el7_1.6.x86_64 (CentOS 7.1.1503)

How reproducible:
always

Steps to Reproduce:
1. Setup mysql Master/Slave Pacemaker cluster 
2. Look at the /var/run/resource-agents directory
3.

Actual results:
Hundreds and thousands of left over check_slave.* files in /var/run/resource-agents

Expected results:
None to a few check_slave.* files in /var/run/resource-agents

Additional info:
I did some debugging and the problem seems to boil down to a strings behaviour in /usr/lib/ocf/resource.d/heartbeat/mysql:

1. The function check_slave() creates the file through the get_slave_info() function in line 378
2. Later on the file should be deleted. Actually the execution goes through line 504 which should delete the file using rm -f $tmpfile
3. Debugging shows that the $tmpfile variable is empty when the rm -f $tmpfile happens. The $tmpfile variable seems to be scoped to the get_slave_info() function, though there is no 'local' declaration in the function.

--- Additional comment from Fabio Massimo Di Nitto on 2015-09-15 19:33:31 CEST ---

Looking at the code, there are tons of conditions inside the agent that can lead to file temporary file leaks beside the normal runtime monitoring issue.

I have never used mysql is master/slave mode (where this code path is leaking). Can you please share your configuration so i can set it up easily to verify any fix?

Thanks

--- Additional comment from Frank Enderle on 2015-10-02 18:40:35 CEST ---

I'm currently unable to produce a test case. I will try to find the time in the next week, so please do not autoclose this bug.

--- Additional comment from Oyvind Albrigtsen on 2015-12-16 11:18:21 CET ---

OCF_CHECK_LEVEL=10 is required to reproduce the issue:
# pcs resource op add MySQL monitor interval=15s OCF_CHECK_LEVEL=10

I tested and verified that the following patch solves the issue:
https://github.com/ClusterLabs/resource-agents/pull/722

Comment 2 Oyvind Albrigtsen 2015-12-16 11:01:38 UTC
Tested and verified that the patch is working as expected on RHEL6 as well.

Comment 3 Oyvind Albrigtsen 2015-12-21 12:23:34 UTC
Create MySQL service with pcs and run the following command to be able to reproduce the issue:
# pcs resource op add MySQL monitor interval=60s OCF_CHECK_LEVEL=10

Before:
New /var/run/resource-agents/check_slave.mysql.XXXXXX created on slave-node every minute (or other interval you set it to).

After:
/var/run/resource-agents/check_slave.mysql.XXXXXX files are deleted after being used by the resource agent.

Comment 4 Oyvind Albrigtsen 2015-12-24 12:52:05 UTC
Setup:
# pcs resource create MySQL mysql
# pcs resource master MySQL
# pcs resource op add MySQL monitor interval=15s OCF_CHECK_LEVEL=10


Run on the slave node.

Before:
# rpm -q resource-agents
resource-agents-3.9.5-24.el6_7.1.x86_64
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f

Wait at least 15 seconds.
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo


After:
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo

Wait at least 15 seconds.
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo

The same number of files (from before the patch still exist, but will be removed on reboot).

Comment 8 errata-xmlrpc 2016-05-10 19:15:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0735.html