Bug 1451475 - proc pmda crashes in -K/-L mode
Summary: proc pmda crashes in -K/-L mode
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: pcp
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nathan Scott
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-16 18:01 UTC by Frank Ch. Eigler
Modified: 2018-02-27 17:21 UTC (History)
8 users (show)

Fixed In Version: pcp-4.0.0-2.fc26 pcp-4.0.0-2.fc27
Clone Of:
Environment:
Last Closed: 2018-02-27 16:55:51 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Frank Ch. Eigler 2017-05-16 18:01:37 UTC
pcp-3.11.9-1.fc25.x86_64 and pcp-3.11.3-4.el7.x86_64:

% valgrind pminfo -K add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc
==11540== Invalid free() / delete / delete[] / realloc()
==11540==    at 0x4C2ED4A: free (vg_replace_malloc.c:530)
==11540==    by 0x94B1D32: pmdaDynamicMetricTable (in /usr/lib64/libpcp_pmda.so.3)
==11540==    by 0x94B1E47: pmdaDynamicLookupName (in /usr/lib64/libpcp_pmda.so.3)
==11540==    by 0x9B03D14: ??? (in /var/lib/pcp/pmdas/proc/pmda_proc.so)
==11540==    by 0x4E5E2C1: pmGetChildrenStatus (in /usr/lib64/libpcp.so.3)
==11540==    by 0x4E5E409: ??? (in /usr/lib64/libpcp.so.3)
==11540==    by 0x4E5E441: ??? (in /usr/lib64/libpcp.so.3)
==11540==    by 0x4E5E72A: ??? (in /usr/lib64/libpcp.so.3)
==11540==    by 0x109F7B: ??? (in /usr/bin/pminfo)
==11540==    by 0x50EB400: (below main) (libc-start.c:289)
==11540==  Address 0x9d1e1e0 is in the Data segment of /var/lib/pcp/pmdas/proc/pmda_proc.so

Comment 1 Mark Wielaard 2017-05-16 23:05:35 UTC
With debuginfo-install pcp it is slightly easier to see what is going on:

==16791== Invalid free() / delete / delete[] / realloc()
==16791==    at 0x4C28CDD: free (vg_replace_malloc.c:530)
==16791==    by 0x82BB26C: pmdaDynamicMetricTable (dynamic.c:234)
==16791==    by 0x82BB31C: pmdaDynamicLookupName (dynamic.c:127)
==16791==    by 0x88D5EF4: proc_children (pmda.c:3210)
==16791==    by 0x4E571C2: pmGetChildrenStatus (pmns.c:2056)
==16791==    by 0x4E572D8: TraversePMNS_local (pmns.c:2533)
==16791==    by 0x4E57318: TraversePMNS_local (pmns.c:2551)
==16791==    by 0x4E575DB: TraversePMNS (pmns.c:2585)
==16791==    by 0x40188B: main (pminfo.c:685)
==16791==  Address 0x8af1600 is 0 bytes inside data symbol "metrictab"

So it looks like this code in libpcp_pmda/src/dynamic.c:

        if (pmda->e_metrics != fixed)
            free(pmda->e_metrics);

Is supposed to prevent this from happening. But somehow e_metrics == metrictab, which != fixed.

Running with vgdb we can see the following in gdb:

$ valgrind --vgdb-error=0 pminfo -K add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc

[...]

$ gdb pminfo
(gdb) target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17265
(gdb) c
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000004c28cdd in _vgr10050ZU_VgSoSynsomalloc_free (
    p=0x8af1600 <metrictab>) at m_replacemalloc/vg_replace_malloc.c:530
530	m_replacemalloc/vg_replace_malloc.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install avahi-libs-0.6.31-17.el7.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64
(gdb) up
#1  0x00000000082bb26d in pmdaDynamicMetricTable (pmda=pmda@entry=0x7ca64e0)
    at dynamic.c:234
234		    free(pmda->e_metrics);
(gdb) p pmda->e_metrics 
$3 = (pmdaMetric *) 0x8af1600 <metrictab>
(gdb) p fixed
$4 = (pmdaMetric *) 0x829d460 <metrictab>

O, that is weird. The addresses are different, but they both point to different metrictab?

(gdb) p *pmda->e_metrics
$5 = {m_user = 0x0, m_desc = {pmid = 12591203, type = 1, indom = 4294967295, 
    sem = 3, units = {pad = 0, scaleCount = 0, scaleTime = 0, scaleSpace = 0, 
      dimCount = 0, dimTime = 0, dimSpace = 0}}}
(gdb) p *fixed
$6 = {m_user = 0x0, m_desc = {pmid = 251658240, type = 3, indom = 251658240, 
    sem = 1, units = {pad = 0, scaleCount = 0, scaleTime = 2, scaleSpace = 0, 
      dimCount = 0, dimTime = 1, dimSpace = 0}}}

(gdb) info symbol metrictab
metrictab in section .data of /var/lib/pcp/pmdas/linux/pmda_linux.so
(gdb) info symbol pmda->e_metrics
metrictab in section .data of /var/lib/pcp/pmdas/proc/pmda_proc.so
(gdb) info symbol fixed
metrictab in section .data of /var/lib/pcp/pmdas/linux/pmda_linux.so

Unfortunately I know too little about this code to really understand what is going on. A bit earlier we have:

     fixed = dynamic[0].metrics;         /* fixed metrics */

(gdb) print dynamic[0]
$7 = {prefix = 0x8097cf7 "kernel.percpu.interrupts", prefixlen = 24, 
  mtabcount = 2, extratrees = 18, pmnsupdate = 0x808ede0 <refresh_interrupts>, 
  textupdate = 0x808e3c0 <interrupts_text>, 
  mtabupdate = 0x808e8b0 <refresh_metrictable>, 
  mtabcounts = 0x808e810 <size_metrictable>, pmns = 0x7d312d0, 
  metrics = 0x829d460 <metrictab>, nmetrics = 837, clustermask = 0}

Does that help anybody better debug this issue?

Comment 2 Nathan Scott 2017-05-17 03:48:15 UTC
| Does that help anybody better debug this issue?

It does, thanks Mark!  This little hidden gem is the key ...

| (gdb) print dynamic[0]
| $7 = {prefix = 0x8097cf7 "kernel.percpu.interrupts", prefixlen = 24, 

This is quite unexpected.  We're querying proc.* metric names, but that's a kernel.* dynamic name - these come from different PMDAs.

I think the problem is something to do with interference between multiple DSO agents being loaded at once, and more than one of them accessing the dynamic names table.  Looks like this scenario wasn't considered when this code was written, and it assumes only one PMDA will be using the dynamic names table at a a time.

This is confirmed by adding the "-Kclear" option to the start of the pminfo invocation, and it all starts working (in that case, only the proc PMDA is loaded).

This will take a fair bit of fixing unfortunately - either the dynamic names table needs to grow per-PMDA knowledge (e.g. a pmdaExt*) or the table itself could perhaps be made per-PMDA and moved into the e_ext_t structure.

Comment 3 Mark Wielaard 2017-05-17 09:30:50 UTC
That seems to work. But note that even that produces a (maybe unrelated) valgrind warning:

$ valgrind --track-origins=yes -q pminfo -Kclear -K add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc.psinfo.processor
==31028== Conditional jump or move depends on uninitialised value(s)
==31028==    at 0x4E80D82: __dmdesc (derive.c:2201)
==31028==    by 0x4E49ACC: pmLookupDesc (desc.c:78)
==31028==    by 0x4E7E668: bind_expr (derive.c:627)
==31028==    by 0x4E7E517: bind_expr (derive.c:578)
==31028==    by 0x4E7E517: bind_expr (derive.c:578)
==31028==    by 0x4E7E517: bind_expr (derive.c:578)
==31028==    by 0x4E809CD: __dmopencontext (derive.c:2144)
==31028==    by 0x4E486C2: pmNewContext (context.c:1154)
==31028==    by 0x4017A7: main (pminfo.c:646)
==31028==  Uninitialised value was created by a heap allocation
==31028==    at 0x4C27BE3: malloc (vg_replace_malloc.c:299)
==31028==    by 0x4E808D6: __dmopencontext (derive.c:2110)
==31028==    by 0x4E486C2: pmNewContext (context.c:1154)
==31028==    by 0x4017A7: main (pminfo.c:646)
==31028== 
proc.psinfo.processor

Comment 4 Mark Goodwin 2017-09-26 07:33:38 UTC
The "conditional jump relies on uninitialized values" issue no longer seems to occur with latest pcp-3.12.2-1 sources :

$ valgrind --track-origins=yes -q pminfo -Kclear -K add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc.psinfo.processor
proc.psinfo.processor

However the original recipe in Comment #0 still produces an invalid free.

In any case, using the programmatic equivalent of "-Kclear" could be viable and a better alternative than Commit 8fde77dd5bf5fe44 in pcpfans fche/pmfuntools :

commit 8fde77dd5bf5fe445ac5d2f1e4bece92a25f2e16 (HEAD -> fche/pmfuntools, origin/fche/pmfuntools)
Author: Frank Ch. Eigler <fche>
Date:   Mon Sep 4 09:19:41 2017 -0400

    paper over issues/339 - libpcp_pmns bad free() crash, leak memory instead

diff --git a/src/libpcp_pmda/src/dynamic.c b/src/libpcp_pmda/src/dynamic.c
index f340ae03e..2ffeb1144 100644
--- a/src/libpcp_pmda/src/dynamic.c
+++ b/src/libpcp_pmda/src/dynamic.c
@@ -218,7 +218,7 @@ pmdaDynamicMetricTable(pmdaExt *pmda)
        /* Fits into the default metric table - reset it to original values */
 fallback:
        if (pmda->e_metrics != fixed)
-           free(pmda->e_metrics);
+           /*free(pmda->e_metrics)*/;
        pmdaRehash(pmda, fixed, total);
     } else {
        resize += total;
@@ -229,7 +229,7 @@ fallback:
        for (i = 0; i < dynamic_count; i++)
            offset = dynamic_metric_table(i, offset, pmda);
        if (pmda->e_metrics != fixed)
-           free(pmda->e_metrics);
+           /*free(pmda->e_metrics)*/;
        pmdaRehash(pmda, mtab, resize);
     }
 }

Comment 5 Frank Ch. Eigler 2017-10-04 13:16:39 UTC
(In reply to Mark Goodwin from comment #4)
> The "conditional jump relies on uninitialized values" issue no longer seems
> to occur with latest pcp-3.12.2-1 sources :
> 
> $ valgrind --track-origins=yes -q pminfo -Kclear -K
> add,3,/var/lib/pcp/pmdas/proc/pmda_proc.so,proc_init -L proc.psinfo.processor
> proc.psinfo.processor

Note that it's not pminfo that's interesting here, but /usr/libexec/pcp/bin/pmcd.  It does not take -K options.

Comment 6 Fedora End Of Life 2017-11-16 19:04:46 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 7 Nathan Scott 2017-12-13 01:32:43 UTC
Resolved upstream now, fix will be in pcp-4.0.0.

Author: Nathan Scott <nathans>
Date:   Wed Dec 13 12:27:01 2017 +1100

    libpcp_pmda: fix multiple DSO PMDAs using the dynamic metric table
    
    When multiple DSO PMDAs are active, both using the global
    dynamic metrics table, there are situations where one can
    step on the other resulting in metric table corruption.
    
    By moving the dynamic table into per-PMDA structures, we
    resolve this in a straightforward fashion.  QA test 1221
    exercises the fix using valgrind.
    
    Resolves Fedora BZ #1451475.
    Resolves github issue #339.

Comment 8 Fedora Update System 2018-02-16 04:47:23 UTC
pcp-4.0.0-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-5d50796a5d

Comment 9 Fedora Update System 2018-02-16 04:47:55 UTC
pcp-4.0.0-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-1a78dc89ec

Comment 10 Fedora Update System 2018-02-16 15:58:33 UTC
pcp-4.0.0-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-1a78dc89ec

Comment 11 Fedora Update System 2018-02-16 16:28:25 UTC
pcp-4.0.0-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-5d50796a5d

Comment 12 Fedora Update System 2018-02-19 21:40:16 UTC
pcp-4.0.0-2.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-84c14418ef

Comment 13 Fedora Update System 2018-02-19 21:41:06 UTC
pcp-4.0.0-2.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-6e29ed8f6d

Comment 14 Fedora End Of Life 2018-02-20 15:34:32 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle.
Changing version to '28'.

Comment 15 Fedora Update System 2018-02-20 17:49:46 UTC
pcp-4.0.0-2.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-6e29ed8f6d

Comment 16 Fedora Update System 2018-02-20 18:18:19 UTC
pcp-4.0.0-2.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-84c14418ef

Comment 17 Fedora Update System 2018-02-27 16:55:51 UTC
pcp-4.0.0-2.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 18 Fedora Update System 2018-02-27 17:21:00 UTC
pcp-4.0.0-2.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.