Bug 1331973

Summary: memory leak in erroneous derived-metrics
Product: [Fedora] Fedora Reporter: Frank Ch. Eigler <fche>
Component: pcpAssignee: Mark Goodwin <mgoodwin>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: brolley, fche, lberk, mbenitez, mgoodwin, nathans, pcp, scox
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcp-3.11.3-1 pcp-3.11.3-1.fc24 pcp-3.11.3-1.fc22 pcp-3.11.3-1.fc23 pcp-3.11.3-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-27 18:28:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
patch to fix memory leak in derived metric error handling none

Description Frank Ch. Eigler 2016-04-30 17:54:55 UTC
In pcp 3.11.2, derived metrics are automatically added as a part of context initialization.  However, if there is an error, the derive.c code does not clean up fully.

% mkdir foo
% echo 'foo.bar = 100 * delta(non.existent)' > foo/derived.conf
% PCP_DERIVED_CONFIG=`pwd`/foo valgrind --leak-check=full pminfo -f foo.bar

==20471== 144 (64 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 40
==20471==    at 0x4C28C50: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20471==    by 0x4E7F751: newnode (derive.c:532)
==20471==    by 0x4E80D13: bind_expr (derive.c:569)
==20471==    by 0x4E80D27: bind_expr (derive.c:571)
==20471==    by 0x4E8329A: __dmopencontext (derive.c:2130)
==20471==    by 0x4E4A612: pmNewContext (context.c:1151)
==20471==    by 0x4017AD: main (pminfo.c:646)

derive.c bind_expr() is clearly implicated.  If the left or right bind_expr()
subcalls fail, the new newnode() is never freed.  But one should audit all
the related code to guarantee a proper cleanup on an error.

Comment 1 Frank Ch. Eigler 2016-05-22 21:46:14 UTC
Please note that this memory leak affects every pm$CLIENT that, e.g., deals with archives that lack some of the input metrics to satisfy any derived metric, even:

% valgrind --leak-check=full pminfo -f -a $PCP/qa/archives/19970807.09.54

pmwebd (graphite charting) is particularly severely affected.

Comment 2 Frank Ch. Eigler 2016-06-14 15:43:24 UTC
If you have no plan to correct this regression, would you at least accept a patch to remove the iostat.conf file from the default distribution?

Comment 3 Nathan Scott 2016-06-14 23:26:49 UTC
I have no plans, but chatting to mgoodwin last week I understand he is planning to tackle this one at some point.

A workaround would not be helpful, no - its 100% reproducible, easily regression tested - it should just be fixed properly.

Comment 4 Mark Goodwin 2016-06-15 07:53:10 UTC
Created attachment 1168237 [details]
patch to fix memory leak in derived metric error handling

Error handling needs to recursively free the current node, since it may have been built recursively, so call free_expr() instead of free() in the appropriate places where a derived expression fails. The patch also fortifies free_expr() itself a bit.

Passes qa for group 'derive', and Frank's valgrind repro script passes now too. A new QA test should probably be added.

BTW, this is not a regression per-se - the leak has always been there - it's just more noticeable now that global derived metrics defs are loaded by default.

Comment 5 Mark Goodwin 2016-06-15 08:08:50 UTC
Posted upstream patch for review http://oss.sgi.com/pipermail/pcp/2016-June/010756.html

Comment 6 Frank Ch. Eigler 2016-06-16 19:00:21 UTC
Thanks, Mark, that looks good.  (I haven't had the chance yet to try it against a larger sustained pmwebd load.)

Comment 7 Fedora Update System 2016-06-17 09:16:10 UTC
pcp-3.11.3-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0b45289bc4

Comment 8 Fedora Update System 2016-06-17 09:17:48 UTC
pcp-3.11.3-1.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-331bd3e9bf

Comment 9 Fedora Update System 2016-06-17 09:18:21 UTC
pcp-3.11.3-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292

Comment 10 Fedora Update System 2016-06-17 10:56:26 UTC
pcp-3.11.3-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-aad44ac639

Comment 11 Fedora Update System 2016-06-18 05:23:49 UTC
pcp-3.11.3-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-331bd3e9bf

Comment 12 Fedora Update System 2016-06-18 05:24:18 UTC
pcp-3.11.3-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-aad44ac639

Comment 13 Fedora Update System 2016-06-18 06:17:21 UTC
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292

Comment 14 Fedora Update System 2016-06-18 16:26:04 UTC
pcp-3.11.3-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-0b45289bc4

Comment 15 Fedora Update System 2016-06-27 18:27:55 UTC
pcp-3.11.3-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 16 Fedora Update System 2016-06-27 22:52:55 UTC
pcp-3.11.3-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2016-06-27 22:56:41 UTC
pcp-3.11.3-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 18 Fedora Update System 2016-07-09 20:18:50 UTC
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.