Red Hat Bugzilla – Bug 1263444
Memory leak in pacemaker_remote's proxy dispatch function
Last modified: 2015-12-10 16:00:02 EST
Description of problem: In normal cluster operation, remote nodes running pacemaker_remote will exhibit memory leaks.
Version-Release number of selected component (if applicable): 1.1.12-22.el7_1.4
How reproducible: Run a pacemaker cluster that includes a remote node
Steps to Reproduce:
1. Set up a cluster that includes a remote node.
2. Use valgrind to run pacemaker_remote:
2a. yum install valgrind
2b. Uncomment VALGRIND_OPTS in /etc/sysconfig/pacemaker_remote
2c. mkdir /etc/systemd/system/pacemaker_remote.service.d
2d. cat >/etc/systemd/system/pacemaker_remote.service.d/valgrind.conf <<EOF
2e. Disable the remote node resource if the cluster is running, then "systemctl restart pacemaker_remote", and reenable the remote node resource if needed
3. Start the cluster and perform routine cluster actions. I tried actions like disable and enable a resource running on the remote node, migrating a resource to and from the remote node, setting and unsetting node attributes for the remote node, disabling and enabling the remote node resource itself (from another node in the cluster), and also various CLI commands (crm_attribute, attrd_updater, stonith_admin, crm_mon, etc.) from the remote node itself. You don't need to do all of them, a few is enough.
4. Disable the remote node resource, then "systemctl stop pacemaker_remote"
5. Examine the valgrind output on the remote node (in /var/lib/pacemaker/valgrind-* by default).
Valgrind output will show nonzero "definitely lost" + "indirectly lost" + "possibly lost" byte counts, and the backtraces will contain "crm_ipcs_recv".
Expected results: No memory lost.
Additional info: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Remote/index.html
Fixed upstream as of commit 1019d3e
The leak seems serious. It occurs every time the remote node needs to proxy a connection to its hosting cluster node's pacemaker components. The number of bytes lost appears to increase with each occurrence, but I haven't investigated whether that's actually the case or an artifact of how valgrind reports it. If accurate, the loss quickly gets into the 10s of MBs when commands are being continuously run.
Fixed upstream as of commit 1019d3e.
The fix for this was included in the pacemaker packages released with 7.2.