Bug 1292054

Summary:	tmpfile leak in mysql resource agent
Product:	Red Hat Enterprise Linux 6	Reporter:	Oyvind Albrigtsen <oalbrigt>
Component:	resource-agents	Assignee:	Oyvind Albrigtsen <oalbrigt>
Status:	CLOSED ERRATA	QA Contact:	cluster-qe <cluster-qe>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.8	CC:	abeekhof, agk, cfeist, cluster-maint, cluster-qe, djansa, fdinitto, frank.enderle, mjuricek, oalbrigt, rmccabe
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	resource-agents-3.9.5-28.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1263348	Environment:
Last Closed:	2016-05-10 19:15:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1263348
Bug Blocks:

Description Oyvind Albrigtsen 2015-12-16 10:59:55 UTC

+++ This bug was initially created as a clone of Bug #1263348 +++

Description of problem:
The mysql resource agent leaks the check_slave... tmpfile and thus floods the /var/run/resource-agents directory with check_slave.* files which are not cleaned up, eventually leading to a filled partition or inode shortage.

Version-Release number of selected component (if applicable):
bash-4.2.46-12.el7.x86_64 (CentOS 7.1.1503)
resource-agents-3.9.5-40.el7_1.6.x86_64 (CentOS 7.1.1503)

How reproducible:
always

Steps to Reproduce:
1. Setup mysql Master/Slave Pacemaker cluster 
2. Look at the /var/run/resource-agents directory
3.

Actual results:
Hundreds and thousands of left over check_slave.* files in /var/run/resource-agents

Expected results:
None to a few check_slave.* files in /var/run/resource-agents

Additional info:
I did some debugging and the problem seems to boil down to a strings behaviour in /usr/lib/ocf/resource.d/heartbeat/mysql:

1. The function check_slave() creates the file through the get_slave_info() function in line 378
2. Later on the file should be deleted. Actually the execution goes through line 504 which should delete the file using rm -f $tmpfile
3. Debugging shows that the $tmpfile variable is empty when the rm -f $tmpfile happens. The $tmpfile variable seems to be scoped to the get_slave_info() function, though there is no 'local' declaration in the function.

--- Additional comment from Fabio Massimo Di Nitto on 2015-09-15 19:33:31 CEST ---

Looking at the code, there are tons of conditions inside the agent that can lead to file temporary file leaks beside the normal runtime monitoring issue.

I have never used mysql is master/slave mode (where this code path is leaking). Can you please share your configuration so i can set it up easily to verify any fix?

Thanks

--- Additional comment from Frank Enderle on 2015-10-02 18:40:35 CEST ---

I'm currently unable to produce a test case. I will try to find the time in the next week, so please do not autoclose this bug.

--- Additional comment from Oyvind Albrigtsen on 2015-12-16 11:18:21 CET ---

OCF_CHECK_LEVEL=10 is required to reproduce the issue:
# pcs resource op add MySQL monitor interval=15s OCF_CHECK_LEVEL=10

I tested and verified that the following patch solves the issue:
https://github.com/ClusterLabs/resource-agents/pull/722

Comment 2 Oyvind Albrigtsen 2015-12-16 11:01:38 UTC

Tested and verified that the patch is working as expected on RHEL6 as well.

Comment 3 Oyvind Albrigtsen 2015-12-21 12:23:34 UTC

Create MySQL service with pcs and run the following command to be able to reproduce the issue:
# pcs resource op add MySQL monitor interval=60s OCF_CHECK_LEVEL=10

Before:
New /var/run/resource-agents/check_slave.mysql.XXXXXX created on slave-node every minute (or other interval you set it to).

After:
/var/run/resource-agents/check_slave.mysql.XXXXXX files are deleted after being used by the resource agent.

Comment 4 Oyvind Albrigtsen 2015-12-24 12:52:05 UTC

Setup:
# pcs resource create MySQL mysql
# pcs resource master MySQL
# pcs resource op add MySQL monitor interval=15s OCF_CHECK_LEVEL=10


Run on the slave node.

Before:
# rpm -q resource-agents
resource-agents-3.9.5-24.el6_7.1.x86_64
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f

Wait at least 15 seconds.
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo


After:
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo

Wait at least 15 seconds.
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo

The same number of files (from before the patch still exist, but will be removed on reboot).

Comment 8 errata-xmlrpc 2016-05-10 19:15:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0735.html