Bug 1263348 - tmpfile leak in mysql resource agent
tmpfile leak in mysql resource agent
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents (Show other bugs)
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Oyvind Albrigtsen
Depends On:
Blocks: 1292054
  Show dependency treegraph
Reported: 2015-09-15 11:28 EDT by Frank Enderle
Modified: 2016-11-03 19:58 EDT (History)
8 users (show)

See Also:
Fixed In Version: resource-agents-3.9.5-61.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1292054 (view as bug list)
Last Closed: 2016-11-03 19:58:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2174 normal SHIPPED_LIVE resource-agents bug fix and enhancement update 2016-11-03 09:16:36 EDT

  None (edit)
Description Frank Enderle 2015-09-15 11:28:54 EDT
Description of problem:
The mysql resource agent leaks the check_slave... tmpfile and thus floods the /var/run/resource-agents directory with check_slave.* files which are not cleaned up, eventually leading to a filled partition or inode shortage.

Version-Release number of selected component (if applicable):
bash-4.2.46-12.el7.x86_64 (CentOS 7.1.1503)
resource-agents-3.9.5-40.el7_1.6.x86_64 (CentOS 7.1.1503)

How reproducible:

Steps to Reproduce:
1. Setup mysql Master/Slave Pacemaker cluster 
2. Look at the /var/run/resource-agents directory

Actual results:
Hundreds and thousands of left over check_slave.* files in /var/run/resource-agents

Expected results:
None to a few check_slave.* files in /var/run/resource-agents

Additional info:
I did some debugging and the problem seems to boil down to a strings behaviour in /usr/lib/ocf/resource.d/heartbeat/mysql:

1. The function check_slave() creates the file through the get_slave_info() function in line 378
2. Later on the file should be deleted. Actually the execution goes through line 504 which should delete the file using rm -f $tmpfile
3. Debugging shows that the $tmpfile variable is empty when the rm -f $tmpfile happens. The $tmpfile variable seems to be scoped to the get_slave_info() function, though there is no 'local' declaration in the function.
Comment 2 Fabio Massimo Di Nitto 2015-09-15 13:33:31 EDT
Looking at the code, there are tons of conditions inside the agent that can lead to file temporary file leaks beside the normal runtime monitoring issue.

I have never used mysql is master/slave mode (where this code path is leaking). Can you please share your configuration so i can set it up easily to verify any fix?

Comment 3 Frank Enderle 2015-10-02 12:40:35 EDT
I'm currently unable to produce a test case. I will try to find the time in the next week, so please do not autoclose this bug.
Comment 5 Oyvind Albrigtsen 2015-12-16 05:18:21 EST
OCF_CHECK_LEVEL=10 is required to reproduce the issue:
# pcs resource op add MDB monitor interval=15s OCF_CHECK_LEVEL=10

I tested and verified that the following patch solves the issue:
Comment 7 Oyvind Albrigtsen 2016-03-01 04:34:55 EST
# pcs resource create MySQL mysql
# pcs resource master MySQL
# pcs resource op add MySQL monitor interval=15s OCF_CHECK_LEVEL=10

Run on the slave node.

# rpm -q resource-agents
# ls /var/run/resource-agents/check_slave.*

Wait at least 15 seconds.
# ls /var/run/resource-agents/check_slave.*

# rpm -q resource-agents
# ls /var/run/resource-agents/check_slave.*

Wait at least 15 seconds.
# ls /var/run/resource-agents/check_slave.*

The same number of files (from before the patch still exist, but will be removed on reboot).
Comment 8 michal novacek 2016-09-02 10:15:21 EDT
I have verified that there are no more leftover files on slave after the patch
in resource-agents-3.9.5-81.el7.x86_64


common setup
 * setup cluster (1)
 * setup mysql resource as master/slave clone and verify that mysql is 
    running (2)

before the patch (resource-agents-3.9.5-54.el7.x86_64)
> on slave virt-022, there are a lot of leftover files

[root@virt-022 ~]# find /var/run/resource-agents

after the patch (resource-agents-3.9.5-81.el7.x86_64)

> when mysql resource is active on virt-005, virt-022 is slave:

[root@virt-022 ~]# find /var/run/resource-agents


>> (1)
[root@virt-005 ~]# pcs status
Cluster name: STSRHTS2803
Stack: corosync
Current DC: virt-021 (version 1.1.15-10.el7-e174ec8) - partition with quorum
Last updated: Fri Sep  2 15:40:09 2016          Last change: Fri Sep  2 15:34:26 2016 by root via cibadmin on virt-005

4 nodes and 14 resources configured

Online: [ virt-004 virt-005 virt-021 virt-022 ]

Full list of resources:

 fence-virt-004 (stonith:fence_xvm):    Started virt-004
 fence-virt-005 (stonith:fence_xvm):    Started virt-005
 fence-virt-021 (stonith:fence_xvm):    Started virt-021
 fence-virt-022 (stonith:fence_xvm):    Started virt-022
 Clone Set: dlm-clone [dlm]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Master/Slave Set: mysql-clone [mysql]
     Masters: [ virt-005 ]
     Slaves: [ virt-022 ]

Failed Actions:

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

>> (2)
[root@virt-005 ~]# pcs resource show --full
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
               monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
               monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
 Master: mysql-clone
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1
  Resource: mysql (class=ocf provider=heartbeat type=mysql)
   Operations: start interval=0s timeout=240 (mysql-start-interval-0s)
               stop interval=0s timeout=120 (mysql-stop-interval-0s)
               promote interval=0s timeout=120 (mysql-promote-interval-0s)
               demote interval=0s timeout=120 (mysql-demote-interval-0s)
               monitor interval=30s (mysql-monitor-interval-30s)
               monitor interval=15s OCF_CHECK_LEVEL=10 (mysql-monitor-interval-15s)
Comment 10 errata-xmlrpc 2016-11-03 19:58:32 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.