Bug 1262721
Summary: | pcp -a archive uptime core dumps | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Marko Myllynen <myllynen> | ||||
Component: | pcp | Assignee: | Nathan Scott <nathans> | ||||
Status: | CLOSED ERRATA | QA Contact: | Miloš Prchlík <mprchlik> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.2 | CC: | brolley, fche, lberk, mbenitez, mcermak, mgoodwin, mprchlik, myllynen, nathans | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-04 04:22:36 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Looks like a bad pmValue has been passed into pmExtractValue() : [mgoodwin@goblin ~]$ ulimit -c unlimited [mgoodwin@goblin ~]$ pcp -a /var/log/pcp/pmlogger/goblin/20150914.19.31 uptime Segmentation fault (core dumped) [mgoodwin@goblin ~]$ file core.14442 core.14442: ELF 64-bit LSB core file x86-64, version 1 (SYSV), too many program headers (222) [mgoodwin@goblin ~]$ file -Pelf_phnum=10000 core.14442 core.14442: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'python3 /usr/libexec/pcp/bin/pcp-uptime' [mgoodwin@goblin ~]$ gdb `which python3` core.14442 GNU gdb (GDB) Fedora 7.9.1-17.fc22 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/bin/python3...Reading symbols from /usr/bin/python3...(no debugging symbols found)...done. (no debugging symbols found)...done. warning: core file may not match specified executable file. [New LWP 14442] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `python3 /usr/libexec/pcp/bin/pcp-uptime'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f00d1544254 in pmExtractValue (valfmt=<optimized out>, ival=0xee7570, itype=4, oval=0x7f00ce21cbc0, otype=4) at units.c:945 945 if (ival->value.pval->vlen != PM_VAL_HDR_SIZE + sizeof(float) Missing separate debuginfos, use: dnf debuginfo-install python3-3.4.2-6.fc22.x86_64 (gdb) where #0 0x00007f00d1544254 in pmExtractValue (valfmt=<optimized out>, ival=0xee7570, itype=4, oval=0x7f00ce21cbc0, otype=4) at units.c:945 #1 0x00007f00cefb5db0 in ffi_call_unix64 () from /lib64/libffi.so.6 #2 0x00007f00cefb5818 in ffi_call () from /lib64/libffi.so.6 #3 0x00007f00cf1c877f in _ctypes_callproc () from /usr/lib64/python3.4/lib-dynload/_ctypes.cpython-34m.so #4 0x00007f00cf1c26c9 in PyCFuncPtr_call () from /usr/lib64/python3.4/lib-dynload/_ctypes.cpython-34m.so #5 0x00007f00d901df31 in PyObject_Call () from /lib64/libpython3.4m.so.1.0 #6 0x00007f00d90cde0a in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0 #7 0x00007f00d90d5162 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0 #8 0x00007f00d90d5162 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0 #9 0x00007f00d90d5de6 in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0 #10 0x00007f00d90d5e8b in PyEval_EvalCode () from /lib64/libpython3.4m.so.1.0 #11 0x00007f00d90f1ef4 in run_mod () from /lib64/libpython3.4m.so.1.0 #12 0x00007f00d90f4135 in PyRun_FileExFlags () from /lib64/libpython3.4m.so.1.0 #13 0x00007f00d90f51b3 in PyRun_SimpleFileExFlags () from /lib64/libpython3.4m.so.1.0 #14 0x00007f00d910bd74 in Py_Main () from /lib64/libpython3.4m.so.1.0 #15 0x0000000000400ad7 in main () (gdb) p ival $1 = (const pmValue *) 0xee7570 (gdb) p *ival $2 = {inst = -649913344, value = {pval = 0x51, lval = 81}} (gdb) p ival->value $3 = {pval = 0x51, lval = 81} (gdb) p ival->value.pval $4 = (pmValueBlock *) 0x51 (gdb) p ival->value.pval->vlen Cannot access memory at address 0x51 % pcp -Dall -a 20150823.10.41.meta uptime 2>&1 | tail -10 __pmLogFetchInterp: log reads: forward 3 (+1 cached) backwards 1 (+1 cached) pmFetch returns ... pmResult dump from 0x12c4bf0 timestamp: 1440315703.207307 03:41:43.207 numpmid: 3 60.26.0 (kernel.all.uptime): No values returned! 60.25.0 (kernel.all.nusers): No values returned! 60.2.0 (kernel.all.load): No values returned! pmUseContext(0) -> 0 pmExtractValue: 145 [U32] -> 145 [U32] pmExtractValue: 33 [U32] -> 33 [U32] pmExtractValue: [1] 17781 segmentation fault (core dumped) i.e., the pmFetch in src/pcp/uptime/pcp-uptime.py returns a pmResult structure with numpmid=3, but a pmValueSet.numval=0. A get_vlist() against a nonexistent pmValue[] gives us the crashy situation. Chances are src/python/pcp/pmapi.py get_vlist() should do a check on vset_idx, comparing it against get_numval(), and throw an exception if <=. (In reply to Marko Myllynen from comment #0) > Created attachment 1073121 [details] > pcp uptime core dump > > Description of problem: > $ pcp -a ~/20150823.10.41 uptime > zsh: segmentation fault (core dumped) pcp -a ~/20150823.10.41 uptime > The problem is related to the offset of the initial fetch - if you use: $ pcp -O@+1 -a ~/20150823.10.41 uptime it works. The reason is there is a pmResult record at the head of the log, recorded before any record containing the metrics pcp-uptime is looking for. So when pcp-uptime fetches at offset zero it gets no data (even though the metrics exist in the log, and if it did a second fetch it would find 'em). The existing qa/742 happens to test with an archive that does not exhibit this characteristic (first pmResult record contains all the data needed), so the initial fetch works just fine there and the test passes. I have a trivial fix for pcp-uptime ready, and have updated that test to exercise both cases now. Since there is an easy workaround (via -O option), I'll mark this one for 7.3. thanks Marko! (In reply to Nathan Scott from comment #6) > I have a trivial fix for pcp-uptime ready, and have updated that test to > exercise both cases now. That check is worthwhile, but the addition of python-pcp-binding range checking seems even more important (to protect against future bugs). Verified for build pcp-3.11.3-3.el7. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2344.html |
Created attachment 1073121 [details] pcp uptime core dump Description of problem: $ pcp -a ~/20150823.10.41 uptime zsh: segmentation fault (core dumped) pcp -a ~/20150823.10.41 uptime gzipped core dump attached. The archive can be processed just fine with other tools. Version-Release number of selected component (if applicable): pcp-3.10.6-2.el7.x86_64 pcp-system-tools-3.10.6-2.el7.x86_64