Bug 1262721 - pcp -a archive uptime core dumps
pcp -a archive uptime core dumps
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcp (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Nathan Scott
Miloš Prchlík
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-14 04:00 EDT by Marko Myllynen
Modified: 2016-11-04 00:22 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 00:22:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
pcp uptime core dump (1.01 MB, application/x-gzip)
2015-09-14 04:00 EDT, Marko Myllynen
no flags Details

  None (edit)
Description Marko Myllynen 2015-09-14 04:00:56 EDT
Created attachment 1073121 [details]
pcp uptime core dump

Description of problem:
$ pcp -a ~/20150823.10.41 uptime
zsh: segmentation fault (core dumped)  pcp -a ~/20150823.10.41 uptime

gzipped core dump attached. The archive can be processed just fine with other tools.

Version-Release number of selected component (if applicable):
pcp-3.10.6-2.el7.x86_64
pcp-system-tools-3.10.6-2.el7.x86_64
Comment 2 Mark Goodwin 2015-09-14 09:55:51 EDT
Looks like a bad pmValue has been passed into pmExtractValue() :

[mgoodwin@goblin ~]$ ulimit -c unlimited
[mgoodwin@goblin ~]$ pcp -a /var/log/pcp/pmlogger/goblin/20150914.19.31 uptime
Segmentation fault (core dumped)
[mgoodwin@goblin ~]$ file core.14442 
core.14442: ELF 64-bit LSB core file x86-64, version 1 (SYSV), too many program headers (222)
[mgoodwin@goblin ~]$ file -Pelf_phnum=10000 core.14442
core.14442: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'python3 /usr/libexec/pcp/bin/pcp-uptime'
[mgoodwin@goblin ~]$ gdb `which python3` core.14442 
GNU gdb (GDB) Fedora 7.9.1-17.fc22
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/python3...Reading symbols from /usr/bin/python3...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 14442]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `python3 /usr/libexec/pcp/bin/pcp-uptime'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f00d1544254 in pmExtractValue (valfmt=<optimized out>, 
    ival=0xee7570, itype=4, oval=0x7f00ce21cbc0, otype=4) at units.c:945
945			if (ival->value.pval->vlen != PM_VAL_HDR_SIZE + sizeof(float)
Missing separate debuginfos, use: dnf debuginfo-install python3-3.4.2-6.fc22.x86_64
(gdb) where
#0  0x00007f00d1544254 in pmExtractValue (valfmt=<optimized out>, 
    ival=0xee7570, itype=4, oval=0x7f00ce21cbc0, otype=4) at units.c:945
#1  0x00007f00cefb5db0 in ffi_call_unix64 () from /lib64/libffi.so.6
#2  0x00007f00cefb5818 in ffi_call () from /lib64/libffi.so.6
#3  0x00007f00cf1c877f in _ctypes_callproc ()
   from /usr/lib64/python3.4/lib-dynload/_ctypes.cpython-34m.so
#4  0x00007f00cf1c26c9 in PyCFuncPtr_call ()
   from /usr/lib64/python3.4/lib-dynload/_ctypes.cpython-34m.so
#5  0x00007f00d901df31 in PyObject_Call () from /lib64/libpython3.4m.so.1.0
#6  0x00007f00d90cde0a in PyEval_EvalFrameEx ()
   from /lib64/libpython3.4m.so.1.0
#7  0x00007f00d90d5162 in PyEval_EvalFrameEx ()
   from /lib64/libpython3.4m.so.1.0
#8  0x00007f00d90d5162 in PyEval_EvalFrameEx ()
   from /lib64/libpython3.4m.so.1.0
#9  0x00007f00d90d5de6 in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#10 0x00007f00d90d5e8b in PyEval_EvalCode () from /lib64/libpython3.4m.so.1.0
#11 0x00007f00d90f1ef4 in run_mod () from /lib64/libpython3.4m.so.1.0
#12 0x00007f00d90f4135 in PyRun_FileExFlags () from /lib64/libpython3.4m.so.1.0
#13 0x00007f00d90f51b3 in PyRun_SimpleFileExFlags ()
   from /lib64/libpython3.4m.so.1.0
#14 0x00007f00d910bd74 in Py_Main () from /lib64/libpython3.4m.so.1.0
#15 0x0000000000400ad7 in main ()
(gdb) p ival
$1 = (const pmValue *) 0xee7570
(gdb) p *ival
$2 = {inst = -649913344, value = {pval = 0x51, lval = 81}}
(gdb) p ival->value
$3 = {pval = 0x51, lval = 81}
(gdb) p ival->value.pval
$4 = (pmValueBlock *) 0x51
(gdb) p ival->value.pval->vlen
Cannot access memory at address 0x51
Comment 5 Frank Ch. Eigler 2015-09-14 16:19:10 EDT
% pcp -Dall -a 20150823.10.41.meta uptime 2>&1 | tail -10
__pmLogFetchInterp: log reads: forward 3 (+1 cached) backwards 1 (+1 cached)
pmFetch returns ...
pmResult dump from 0x12c4bf0 timestamp: 1440315703.207307 03:41:43.207 numpmid: 3
  60.26.0 (kernel.all.uptime): No values returned!
  60.25.0 (kernel.all.nusers): No values returned!
  60.2.0 (kernel.all.load): No values returned!
pmUseContext(0) -> 0
pmExtractValue:  145 [U32] -> 145 [U32]
pmExtractValue:  33 [U32] -> 33 [U32]
pmExtractValue: [1]    17781 segmentation fault (core dumped)

i.e., the pmFetch in src/pcp/uptime/pcp-uptime.py returns a pmResult structure with numpmid=3, but a pmValueSet.numval=0.  A get_vlist() against a nonexistent pmValue[] gives us the crashy situation.

Chances are src/python/pcp/pmapi.py get_vlist() should do a check on vset_idx, comparing it against get_numval(), and throw an exception if <=.
Comment 6 Nathan Scott 2015-09-15 02:35:04 EDT
(In reply to Marko Myllynen from comment #0)
> Created attachment 1073121 [details]
> pcp uptime core dump
> 
> Description of problem:
> $ pcp -a ~/20150823.10.41 uptime
> zsh: segmentation fault (core dumped)  pcp -a ~/20150823.10.41 uptime
> 

The problem is related to the offset of the initial fetch - if you use:

$ pcp -O@+1 -a ~/20150823.10.41 uptime

it works.  The reason is there is a pmResult record at the head of the log, recorded before any record containing the metrics pcp-uptime is looking for.  So when pcp-uptime fetches at offset zero it gets no data (even though the metrics exist in the log, and if it did a second fetch it would find 'em).

The existing qa/742 happens to test with an archive that does not exhibit this characteristic (first pmResult record contains all the data needed), so the initial fetch works just fine there and the test passes.

I have a trivial fix for pcp-uptime ready, and have updated that test to exercise both cases now.

Since there is an easy workaround (via -O option), I'll mark this one for 7.3.

thanks Marko!
Comment 7 Frank Ch. Eigler 2015-09-15 06:49:35 EDT
(In reply to Nathan Scott from comment #6)

> I have a trivial fix for pcp-uptime ready, and have updated that test to
> exercise both cases now.

That check is worthwhile, but the addition of python-pcp-binding range checking seems even more important (to protect against future bugs).
Comment 9 Miloš Prchlík 2016-08-20 07:56:47 EDT
Verified for build pcp-3.11.3-3.el7.
Comment 11 errata-xmlrpc 2016-11-04 00:22:36 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2344.html

Note You need to log in before you can comment on or make changes to this bug.