Bug 1263444 - Memory leak in pacemaker_remote's proxy dispatch function
Memory leak in pacemaker_remote's proxy dispatch function
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker (Show other bugs)
All All
unspecified Severity medium
: rc
: 7.2
Assigned To: Andrew Beekhof
Depends On:
  Show dependency treegraph
Reported: 2015-09-15 15:24 EDT by Ken Gaillot
Modified: 2015-12-10 16:00 EST (History)
2 users (show)

See Also:
Fixed In Version: 1.1.13-8
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-12-03 18:48:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Ken Gaillot 2015-09-15 15:24:45 EDT
Description of problem: In normal cluster operation, remote nodes running pacemaker_remote will exhibit memory leaks.

Version-Release number of selected component (if applicable): 1.1.12-22.el7_1.4

How reproducible: Run a pacemaker cluster that includes a remote node

Steps to Reproduce:
1. Set up a cluster that includes a remote node.
2. Use valgrind to run pacemaker_remote:
2a. yum install valgrind
2b. Uncomment VALGRIND_OPTS in /etc/sysconfig/pacemaker_remote
2c. mkdir /etc/systemd/system/pacemaker_remote.service.d
2d. cat >/etc/systemd/system/pacemaker_remote.service.d/valgrind.conf <<EOF
ExecStart=/usr/bin/valgrind /usr/sbin/pacemaker_remoted
2e. Disable the remote node resource if the cluster is running, then "systemctl restart pacemaker_remote", and reenable the remote node resource if needed
3. Start the cluster and perform routine cluster actions. I tried actions like disable and enable a resource running on the remote node, migrating a resource to and from the remote node, setting and unsetting node attributes for the remote node, disabling and enabling the remote node resource itself (from another node in the cluster), and also various CLI commands (crm_attribute, attrd_updater, stonith_admin, crm_mon, etc.) from the remote node itself. You don't need to do all of them, a few is enough.
4. Disable the remote node resource, then "systemctl stop pacemaker_remote"
5. Examine the valgrind output on the remote node (in /var/lib/pacemaker/valgrind-* by default).

Actual results:
Valgrind output will show nonzero "definitely lost" + "indirectly lost" + "possibly lost" byte counts, and the backtraces will contain "crm_ipcs_recv".

Expected results: No memory lost.

Additional info: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Remote/index.html
Comment 1 Ken Gaillot 2015-09-15 15:27:20 EDT
Fixed upstream as of commit 1019d3e
Comment 2 Ken Gaillot 2015-09-15 15:37:30 EDT
The leak seems serious. It occurs every time the remote node needs to proxy a connection to its hosting cluster node's pacemaker components. The number of bytes lost appears to increase with each occurrence, but I haven't investigated whether that's actually the case or an artifact of how valgrind reports it. If accurate, the loss quickly gets into the 10s of MBs when commands are being continuously run.
Comment 4 Ken Gaillot 2015-09-22 09:53:47 EDT
Fixed upstream as of commit 1019d3e.
Comment 5 Ken Gaillot 2015-12-03 18:48:23 EST
The fix for this was included in the pacemaker packages released with 7.2.

Note You need to log in before you can comment on or make changes to this bug.