Bug 1319288

Summary: segv in libpcp during discovery error processing
Product: [Fedora] Fedora Reporter: Frank Ch. Eigler <fche>
Component: pcpAssignee: Nathan Scott <nathans>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: brolley, fche, lberk, mgoodwin, nathans, pcp, scox, zcerza
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcp-3.11.2-2.fc24 pcp-3.11.2-1.fc22 pcp-3.11.2-2.fc23 pcp-3.11.3-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-09 20:19:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
all-thread backtrace none

Description Frank Ch. Eigler 2016-03-18 17:50:40 UTC
with pcp 3.11.0

bash% ulimit -n 20
bash% pmfind -m probe=192.168.1.0/24,maxThreads=256
Segmentation fault.

A backtrace shows:

Core was generated by `pmfind -m probe=192.168.1.0/24,maxThreads=255'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0)
    at config.c:185
185	    PM_INIT_LOCKS();
(gdb) bt
#0  __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0)
    at config.c:185
#1  0x00002b67588de6b5 in pmgetconfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR", 
    fatal=fatal@entry=0) at config.c:251
#2  0x00002b67588de88f in pmGetOptionalConfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR")
    at config.c:288
#3  0x00002b67588c65e4 in vpmprintf (msg=0x2b6758902775 "[%.19s] %s(%d) %s: ", 
    arg=arg@entry=0x2b67586f33e0) at util.c:1352
#4  0x00002b67588c8b69 in pmprintf (msg=msg@entry=0x2b6758902775 "[%.19s] %s(%d) %s: ")
    at util.c:1421
#5  0x00002b67588c92ff in __pmNotifyErr (priority=priority@entry=4, 
    message=message@entry=0x2b67589095c0 "__pmProbeDiscoverServices: Unable to create socket for address %s") at util.c:150
#6  0x00002b67588f623d in attemptConnections (arg=0x7ffc67379d90) at probe.c:150
#7  0x00002b6759fde555 in start_thread (arg=0x2b67586f4700) at pthread_create.c:333
#8  0x00002b6758c23ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

There is apparently a race condition in the PM_INIT_LOCKS facility.  Continuing analysis.

Note that fixing bug #1229494 would have corrected this, by eschewing all that pmgetconfig etc. effort.

Comment 1 Frank Ch. Eigler 2016-03-18 18:29:43 UTC
Created attachment 1137843 [details]
all-thread backtrace

Comment 2 Frank Ch. Eigler 2016-03-18 20:27:10 UTC
A few things jump out in that backtrace collection.

- pmNotifyErr() does PM_LOCK* work for no obvious reason.  The syslog(3) facility doesn't need it.  The "stderr equivalent" block doesn't manipulate shared data, except perhaps the pmprintf* stuff.  Except pmprintf* uses locks internally, and at the pmNotifyErr level isn't properly protected anyway, since a PM_UNLOCK is placed too early.

- The actual crash appears to occur during a callq instruction, as it's writing the return-pc into the stack.  I don't have a theory as to why that should be bad; the stack pointer etc. look ok.  Continuing investigation.

Comment 3 Dave Brolley 2016-03-18 20:31:31 UTC
(In reply to Frank Ch. Eigler from comment #2)
> - The actual crash appears to occur during a callq instruction, as it's
> writing the return-pc into the stack.  I don't have a theory as to why that
> should be bad; the stack pointer etc. look ok.  Continuing investigation.

Because the program uses potentially 1024 threads, the size of the stack for each thread has been limited (PTHREAD_STACK_MIN). Unexpected and undeterministic behaviour can occur if the stack is too small.

Comment 4 Frank Ch. Eigler 2016-03-30 22:44:24 UTC
brolley's intuition's right; our MAXPATHLENs were larger than I expected and indeed thread stacks were being blown out.  Patch posted.

http://oss.sgi.com/pipermail/pcp/2016-March/010092.html

Comment 5 Fedora Update System 2016-04-29 02:53:38 UTC
pcp-3.11.2-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755

Comment 6 Fedora Update System 2016-04-29 17:22:01 UTC
pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-bad5995fe9

Comment 7 Fedora Update System 2016-04-30 01:50:11 UTC
pcp-3.11.2-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755

Comment 8 Fedora Update System 2016-04-30 02:23:09 UTC
pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f8f919a355

Comment 9 Fedora Update System 2016-04-30 02:23:47 UTC
pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-53282a0c5a

Comment 10 Fedora Update System 2016-05-09 00:04:50 UTC
pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 11 Fedora Update System 2016-05-10 17:53:12 UTC
pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2016-05-10 17:59:44 UTC
pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2016-06-18 06:17:49 UTC
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292

Comment 14 Fedora Update System 2016-07-09 20:19:15 UTC
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.