Red Hat Bugzilla – Bug 1331973
memory leak in erroneous derived-metrics
Last modified: 2016-07-09 16:19:19 EDT
In pcp 3.11.2, derived metrics are automatically added as a part of context initialization. However, if there is an error, the derive.c code does not clean up fully. % mkdir foo % echo 'foo.bar = 100 * delta(non.existent)' > foo/derived.conf % PCP_DERIVED_CONFIG=`pwd`/foo valgrind --leak-check=full pminfo -f foo.bar ==20471== 144 (64 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 40 ==20471== at 0x4C28C50: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20471== by 0x4E7F751: newnode (derive.c:532) ==20471== by 0x4E80D13: bind_expr (derive.c:569) ==20471== by 0x4E80D27: bind_expr (derive.c:571) ==20471== by 0x4E8329A: __dmopencontext (derive.c:2130) ==20471== by 0x4E4A612: pmNewContext (context.c:1151) ==20471== by 0x4017AD: main (pminfo.c:646) derive.c bind_expr() is clearly implicated. If the left or right bind_expr() subcalls fail, the new newnode() is never freed. But one should audit all the related code to guarantee a proper cleanup on an error.
Please note that this memory leak affects every pm$CLIENT that, e.g., deals with archives that lack some of the input metrics to satisfy any derived metric, even: % valgrind --leak-check=full pminfo -f -a $PCP/qa/archives/19970807.09.54 pmwebd (graphite charting) is particularly severely affected.
If you have no plan to correct this regression, would you at least accept a patch to remove the iostat.conf file from the default distribution?
I have no plans, but chatting to mgoodwin last week I understand he is planning to tackle this one at some point. A workaround would not be helpful, no - its 100% reproducible, easily regression tested - it should just be fixed properly.
Created attachment 1168237 [details] patch to fix memory leak in derived metric error handling Error handling needs to recursively free the current node, since it may have been built recursively, so call free_expr() instead of free() in the appropriate places where a derived expression fails. The patch also fortifies free_expr() itself a bit. Passes qa for group 'derive', and Frank's valgrind repro script passes now too. A new QA test should probably be added. BTW, this is not a regression per-se - the leak has always been there - it's just more noticeable now that global derived metrics defs are loaded by default.
Posted upstream patch for review http://oss.sgi.com/pipermail/pcp/2016-June/010756.html
Thanks, Mark, that looks good. (I haven't had the chance yet to try it against a larger sustained pmwebd load.)
pcp-3.11.3-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0b45289bc4
pcp-3.11.3-1.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-331bd3e9bf
pcp-3.11.3-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292
pcp-3.11.3-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-aad44ac639
pcp-3.11.3-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-331bd3e9bf
pcp-3.11.3-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-aad44ac639
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292
pcp-3.11.3-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-0b45289bc4
pcp-3.11.3-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.11.3-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.11.3-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.