Bug 1263348 - tmpfile leak in mysql resource agent
tmpfile leak in mysql resource agent
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents (Show other bugs)
7.1
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Oyvind Albrigtsen
cluster-qe@redhat.com
:
Depends On:
Blocks: 1292054
  Show dependency treegraph
 
Reported: 2015-09-15 11:28 EDT by Frank Enderle
Modified: 2016-11-03 19:58 EDT (History)
8 users (show)

See Also:
Fixed In Version: resource-agents-3.9.5-61.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1292054 (view as bug list)
Environment:
Last Closed: 2016-11-03 19:58:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2174 normal SHIPPED_LIVE resource-agents bug fix and enhancement update 2016-11-03 09:16:36 EDT

  None (edit)
Description Frank Enderle 2015-09-15 11:28:54 EDT
Description of problem:
The mysql resource agent leaks the check_slave... tmpfile and thus floods the /var/run/resource-agents directory with check_slave.* files which are not cleaned up, eventually leading to a filled partition or inode shortage.

Version-Release number of selected component (if applicable):
bash-4.2.46-12.el7.x86_64 (CentOS 7.1.1503)
resource-agents-3.9.5-40.el7_1.6.x86_64 (CentOS 7.1.1503)

How reproducible:
always

Steps to Reproduce:
1. Setup mysql Master/Slave Pacemaker cluster 
2. Look at the /var/run/resource-agents directory
3.

Actual results:
Hundreds and thousands of left over check_slave.* files in /var/run/resource-agents

Expected results:
None to a few check_slave.* files in /var/run/resource-agents

Additional info:
I did some debugging and the problem seems to boil down to a strings behaviour in /usr/lib/ocf/resource.d/heartbeat/mysql:

1. The function check_slave() creates the file through the get_slave_info() function in line 378
2. Later on the file should be deleted. Actually the execution goes through line 504 which should delete the file using rm -f $tmpfile
3. Debugging shows that the $tmpfile variable is empty when the rm -f $tmpfile happens. The $tmpfile variable seems to be scoped to the get_slave_info() function, though there is no 'local' declaration in the function.
Comment 2 Fabio Massimo Di Nitto 2015-09-15 13:33:31 EDT
Looking at the code, there are tons of conditions inside the agent that can lead to file temporary file leaks beside the normal runtime monitoring issue.

I have never used mysql is master/slave mode (where this code path is leaking). Can you please share your configuration so i can set it up easily to verify any fix?

Thanks
Comment 3 Frank Enderle 2015-10-02 12:40:35 EDT
I'm currently unable to produce a test case. I will try to find the time in the next week, so please do not autoclose this bug.
Comment 5 Oyvind Albrigtsen 2015-12-16 05:18:21 EST
OCF_CHECK_LEVEL=10 is required to reproduce the issue:
# pcs resource op add MDB monitor interval=15s OCF_CHECK_LEVEL=10

I tested and verified that the following patch solves the issue:
https://github.com/ClusterLabs/resource-agents/pull/722
Comment 7 Oyvind Albrigtsen 2016-03-01 04:34:55 EST
Setup:
# pcs resource create MySQL mysql
# pcs resource master MySQL
# pcs resource op add MySQL monitor interval=15s OCF_CHECK_LEVEL=10


Run on the slave node.

Before:
# rpm -q resource-agents
resource-agents-3.9.5-54.el7_2.6.x86_64
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f

Wait at least 15 seconds.
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo


After:
# rpm -q resource-agents
resource-agents-3.9.5-61.el7.x86_64
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo

Wait at least 15 seconds.
# ls /var/run/resource-agents/check_slave.*
/var/run/resource-agents/check_slave.MySQL.uCeh9f
/var/run/resource-agents/check_slave.MySQL.uxRvQo

The same number of files (from before the patch still exist, but will be removed on reboot).
Comment 8 michal novacek 2016-09-02 10:15:21 EDT
I have verified that there are no more leftover files on slave after the patch
in resource-agents-3.9.5-81.el7.x86_64

---

common setup
 * setup cluster (1)
 * setup mysql resource as master/slave clone and verify that mysql is 
    running (2)

before the patch (resource-agents-3.9.5-54.el7.x86_64)
======================================================
> on slave virt-022, there are a lot of leftover files

[root@virt-022 ~]# find /var/run/resource-agents
/var/run/resource-agents
/var/run/resource-agents/check_slave.mysql.FtU5zA
/var/run/resource-agents/check_slave.mysql.vIqLEy
/var/run/resource-agents/check_slave.mysql.rGDFFZ
/var/run/resource-agents/check_slave.mysql.8X9GpU
/var/run/resource-agents/check_slave.mysql.Nns5S0
/var/run/resource-agents/check_slave.mysql.jRlGo9
/var/run/resource-agents/check_slave.mysql.pHXcvw
/var/run/resource-agents/check_slave.mysql.YM8ofW
/var/run/resource-agents/check_slave.mysql.ej6his
/var/run/resource-agents/check_slave.mysql.BFzDwJ
/var/run/resource-agents/check_slave.mysql.lvI3t5
/var/run/resource-agents/check_slave.mysql.iQr2Xt
/var/run/resource-agents/check_slave.mysql.4unyu5
/var/run/resource-agents/threads.mysql.XGarNE
/var/run/resource-agents/master_status.mysql
/var/run/resource-agents/cmirrord-clvmd.pid
/var/run/resource-agents/clvmd-clvmd.pid


after the patch (resource-agents-3.9.5-81.el7.x86_64)
=====================================================

> when mysql resource is active on virt-005, virt-022 is slave:

[root@virt-022 ~]# find /var/run/resource-agents
/var/run/resource-agents
/var/run/resource-agents/cmirrord-clvmd.pid
/var/run/resource-agents/clvmd-clvmd.pid


-----

>> (1)
[root@virt-005 ~]# pcs status
Cluster name: STSRHTS2803
Stack: corosync
Current DC: virt-021 (version 1.1.15-10.el7-e174ec8) - partition with quorum
Last updated: Fri Sep  2 15:40:09 2016          Last change: Fri Sep  2 15:34:26 2016 by root via cibadmin on virt-005

4 nodes and 14 resources configured

Online: [ virt-004 virt-005 virt-021 virt-022 ]

Full list of resources:

 fence-virt-004 (stonith:fence_xvm):    Started virt-004
 fence-virt-005 (stonith:fence_xvm):    Started virt-005
 fence-virt-021 (stonith:fence_xvm):    Started virt-021
 fence-virt-022 (stonith:fence_xvm):    Started virt-022
 Clone Set: dlm-clone [dlm]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Master/Slave Set: mysql-clone [mysql]
     Masters: [ virt-005 ]
     Slaves: [ virt-022 ]

Failed Actions:

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

>> (2)
[root@virt-005 ~]# pcs resource show --full
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
               monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
               monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
 Master: mysql-clone
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1
  Resource: mysql (class=ocf provider=heartbeat type=mysql)
   Operations: start interval=0s timeout=240 (mysql-start-interval-0s)
               stop interval=0s timeout=120 (mysql-stop-interval-0s)
               promote interval=0s timeout=120 (mysql-promote-interval-0s)
               demote interval=0s timeout=120 (mysql-demote-interval-0s)
               monitor interval=30s (mysql-monitor-interval-30s)
               monitor interval=15s OCF_CHECK_LEVEL=10 (mysql-monitor-interval-15s)
Comment 10 errata-xmlrpc 2016-11-03 19:58:32 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2174.html

Note You need to log in before you can comment on or make changes to this bug.