pcp-3.11.9-1.fc25.x86_64 and pcp-3.11.3-4.el7.x86_64: % valgrind pminfo -K add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc ==11540== Invalid free() / delete / delete[] / realloc() ==11540== at 0x4C2ED4A: free (vg_replace_malloc.c:530) ==11540== by 0x94B1D32: pmdaDynamicMetricTable (in /usr/lib64/libpcp_pmda.so.3) ==11540== by 0x94B1E47: pmdaDynamicLookupName (in /usr/lib64/libpcp_pmda.so.3) ==11540== by 0x9B03D14: ??? (in /var/lib/pcp/pmdas/proc/pmda_proc.so) ==11540== by 0x4E5E2C1: pmGetChildrenStatus (in /usr/lib64/libpcp.so.3) ==11540== by 0x4E5E409: ??? (in /usr/lib64/libpcp.so.3) ==11540== by 0x4E5E441: ??? (in /usr/lib64/libpcp.so.3) ==11540== by 0x4E5E72A: ??? (in /usr/lib64/libpcp.so.3) ==11540== by 0x109F7B: ??? (in /usr/bin/pminfo) ==11540== by 0x50EB400: (below main) (libc-start.c:289) ==11540== Address 0x9d1e1e0 is in the Data segment of /var/lib/pcp/pmdas/proc/pmda_proc.so
With debuginfo-install pcp it is slightly easier to see what is going on: ==16791== Invalid free() / delete / delete[] / realloc() ==16791== at 0x4C28CDD: free (vg_replace_malloc.c:530) ==16791== by 0x82BB26C: pmdaDynamicMetricTable (dynamic.c:234) ==16791== by 0x82BB31C: pmdaDynamicLookupName (dynamic.c:127) ==16791== by 0x88D5EF4: proc_children (pmda.c:3210) ==16791== by 0x4E571C2: pmGetChildrenStatus (pmns.c:2056) ==16791== by 0x4E572D8: TraversePMNS_local (pmns.c:2533) ==16791== by 0x4E57318: TraversePMNS_local (pmns.c:2551) ==16791== by 0x4E575DB: TraversePMNS (pmns.c:2585) ==16791== by 0x40188B: main (pminfo.c:685) ==16791== Address 0x8af1600 is 0 bytes inside data symbol "metrictab" So it looks like this code in libpcp_pmda/src/dynamic.c: if (pmda->e_metrics != fixed) free(pmda->e_metrics); Is supposed to prevent this from happening. But somehow e_metrics == metrictab, which != fixed. Running with vgdb we can see the following in gdb: $ valgrind --vgdb-error=0 pminfo -K add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc [...] $ gdb pminfo (gdb) target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17265 (gdb) c Program received signal SIGTRAP, Trace/breakpoint trap. 0x0000000004c28cdd in _vgr10050ZU_VgSoSynsomalloc_free ( p=0x8af1600 <metrictab>) at m_replacemalloc/vg_replace_malloc.c:530 530 m_replacemalloc/vg_replace_malloc.c: No such file or directory. Missing separate debuginfos, use: debuginfo-install avahi-libs-0.6.31-17.el7.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 (gdb) up #1 0x00000000082bb26d in pmdaDynamicMetricTable (pmda=pmda@entry=0x7ca64e0) at dynamic.c:234 234 free(pmda->e_metrics); (gdb) p pmda->e_metrics $3 = (pmdaMetric *) 0x8af1600 <metrictab> (gdb) p fixed $4 = (pmdaMetric *) 0x829d460 <metrictab> O, that is weird. The addresses are different, but they both point to different metrictab? (gdb) p *pmda->e_metrics $5 = {m_user = 0x0, m_desc = {pmid = 12591203, type = 1, indom = 4294967295, sem = 3, units = {pad = 0, scaleCount = 0, scaleTime = 0, scaleSpace = 0, dimCount = 0, dimTime = 0, dimSpace = 0}}} (gdb) p *fixed $6 = {m_user = 0x0, m_desc = {pmid = 251658240, type = 3, indom = 251658240, sem = 1, units = {pad = 0, scaleCount = 0, scaleTime = 2, scaleSpace = 0, dimCount = 0, dimTime = 1, dimSpace = 0}}} (gdb) info symbol metrictab metrictab in section .data of /var/lib/pcp/pmdas/linux/pmda_linux.so (gdb) info symbol pmda->e_metrics metrictab in section .data of /var/lib/pcp/pmdas/proc/pmda_proc.so (gdb) info symbol fixed metrictab in section .data of /var/lib/pcp/pmdas/linux/pmda_linux.so Unfortunately I know too little about this code to really understand what is going on. A bit earlier we have: fixed = dynamic[0].metrics; /* fixed metrics */ (gdb) print dynamic[0] $7 = {prefix = 0x8097cf7 "kernel.percpu.interrupts", prefixlen = 24, mtabcount = 2, extratrees = 18, pmnsupdate = 0x808ede0 <refresh_interrupts>, textupdate = 0x808e3c0 <interrupts_text>, mtabupdate = 0x808e8b0 <refresh_metrictable>, mtabcounts = 0x808e810 <size_metrictable>, pmns = 0x7d312d0, metrics = 0x829d460 <metrictab>, nmetrics = 837, clustermask = 0} Does that help anybody better debug this issue?
| Does that help anybody better debug this issue? It does, thanks Mark! This little hidden gem is the key ... | (gdb) print dynamic[0] | $7 = {prefix = 0x8097cf7 "kernel.percpu.interrupts", prefixlen = 24, This is quite unexpected. We're querying proc.* metric names, but that's a kernel.* dynamic name - these come from different PMDAs. I think the problem is something to do with interference between multiple DSO agents being loaded at once, and more than one of them accessing the dynamic names table. Looks like this scenario wasn't considered when this code was written, and it assumes only one PMDA will be using the dynamic names table at a a time. This is confirmed by adding the "-Kclear" option to the start of the pminfo invocation, and it all starts working (in that case, only the proc PMDA is loaded). This will take a fair bit of fixing unfortunately - either the dynamic names table needs to grow per-PMDA knowledge (e.g. a pmdaExt*) or the table itself could perhaps be made per-PMDA and moved into the e_ext_t structure.
That seems to work. But note that even that produces a (maybe unrelated) valgrind warning: $ valgrind --track-origins=yes -q pminfo -Kclear -K add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc.psinfo.processor ==31028== Conditional jump or move depends on uninitialised value(s) ==31028== at 0x4E80D82: __dmdesc (derive.c:2201) ==31028== by 0x4E49ACC: pmLookupDesc (desc.c:78) ==31028== by 0x4E7E668: bind_expr (derive.c:627) ==31028== by 0x4E7E517: bind_expr (derive.c:578) ==31028== by 0x4E7E517: bind_expr (derive.c:578) ==31028== by 0x4E7E517: bind_expr (derive.c:578) ==31028== by 0x4E809CD: __dmopencontext (derive.c:2144) ==31028== by 0x4E486C2: pmNewContext (context.c:1154) ==31028== by 0x4017A7: main (pminfo.c:646) ==31028== Uninitialised value was created by a heap allocation ==31028== at 0x4C27BE3: malloc (vg_replace_malloc.c:299) ==31028== by 0x4E808D6: __dmopencontext (derive.c:2110) ==31028== by 0x4E486C2: pmNewContext (context.c:1154) ==31028== by 0x4017A7: main (pminfo.c:646) ==31028== proc.psinfo.processor
The "conditional jump relies on uninitialized values" issue no longer seems to occur with latest pcp-3.12.2-1 sources : $ valgrind --track-origins=yes -q pminfo -Kclear -K add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc.psinfo.processor proc.psinfo.processor However the original recipe in Comment #0 still produces an invalid free. In any case, using the programmatic equivalent of "-Kclear" could be viable and a better alternative than Commit 8fde77dd5bf5fe44 in pcpfans fche/pmfuntools : commit 8fde77dd5bf5fe445ac5d2f1e4bece92a25f2e16 (HEAD -> fche/pmfuntools, origin/fche/pmfuntools) Author: Frank Ch. Eigler <fche> Date: Mon Sep 4 09:19:41 2017 -0400 paper over issues/339 - libpcp_pmns bad free() crash, leak memory instead diff --git a/src/libpcp_pmda/src/dynamic.c b/src/libpcp_pmda/src/dynamic.c index f340ae03e..2ffeb1144 100644 --- a/src/libpcp_pmda/src/dynamic.c +++ b/src/libpcp_pmda/src/dynamic.c @@ -218,7 +218,7 @@ pmdaDynamicMetricTable(pmdaExt *pmda) /* Fits into the default metric table - reset it to original values */ fallback: if (pmda->e_metrics != fixed) - free(pmda->e_metrics); + /*free(pmda->e_metrics)*/; pmdaRehash(pmda, fixed, total); } else { resize += total; @@ -229,7 +229,7 @@ fallback: for (i = 0; i < dynamic_count; i++) offset = dynamic_metric_table(i, offset, pmda); if (pmda->e_metrics != fixed) - free(pmda->e_metrics); + /*free(pmda->e_metrics)*/; pmdaRehash(pmda, mtab, resize); } }
(In reply to Mark Goodwin from comment #4) > The "conditional jump relies on uninitialized values" issue no longer seems > to occur with latest pcp-3.12.2-1 sources : > > $ valgrind --track-origins=yes -q pminfo -Kclear -K > add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc.psinfo.processor > proc.psinfo.processor Note that it's not pminfo that's interesting here, but /usr/libexec/pcp/bin/pmcd. It does not take -K options.
This message is a reminder that Fedora 25 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '25'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 25 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Resolved upstream now, fix will be in pcp-4.0.0. Author: Nathan Scott <nathans> Date: Wed Dec 13 12:27:01 2017 +1100 libpcp_pmda: fix multiple DSO PMDAs using the dynamic metric table When multiple DSO PMDAs are active, both using the global dynamic metrics table, there are situations where one can step on the other resulting in metric table corruption. By moving the dynamic table into per-PMDA structures, we resolve this in a straightforward fashion. QA test 1221 exercises the fix using valgrind. Resolves Fedora BZ #1451475. Resolves github issue #339.
pcp-4.0.0-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-5d50796a5d
pcp-4.0.0-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-1a78dc89ec
pcp-4.0.0-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-1a78dc89ec
pcp-4.0.0-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-5d50796a5d
pcp-4.0.0-2.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-84c14418ef
pcp-4.0.0-2.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-6e29ed8f6d
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle. Changing version to '28'.
pcp-4.0.0-2.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-6e29ed8f6d
pcp-4.0.0-2.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-84c14418ef
pcp-4.0.0-2.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
pcp-4.0.0-2.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.