Bug 1319288 - segv in libpcp during discovery error processing
Summary: segv in libpcp during discovery error processing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: pcp
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nathan Scott
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-18 17:50 UTC by Frank Ch. Eigler
Modified: 2016-07-09 20:19 UTC (History)
8 users (show)

Fixed In Version: pcp-3.11.2-2.fc24 pcp-3.11.2-1.fc22 pcp-3.11.2-2.fc23 pcp-3.11.3-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-09 20:19:44 UTC


Attachments (Terms of Use)
all-thread backtrace (38.88 KB, text/plain)
2016-03-18 18:29 UTC, Frank Ch. Eigler
no flags Details

Description Frank Ch. Eigler 2016-03-18 17:50:40 UTC
with pcp 3.11.0

bash% ulimit -n 20
bash% pmfind -m probe=192.168.1.0/24,maxThreads=256
Segmentation fault.

A backtrace shows:

Core was generated by `pmfind -m probe=192.168.1.0/24,maxThreads=255'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0)
    at config.c:185
185	    PM_INIT_LOCKS();
(gdb) bt
#0  __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0)
    at config.c:185
#1  0x00002b67588de6b5 in pmgetconfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR", 
    fatal=fatal@entry=0) at config.c:251
#2  0x00002b67588de88f in pmGetOptionalConfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR")
    at config.c:288
#3  0x00002b67588c65e4 in vpmprintf (msg=0x2b6758902775 "[%.19s] %s(%d) %s: ", 
    arg=arg@entry=0x2b67586f33e0) at util.c:1352
#4  0x00002b67588c8b69 in pmprintf (msg=msg@entry=0x2b6758902775 "[%.19s] %s(%d) %s: ")
    at util.c:1421
#5  0x00002b67588c92ff in __pmNotifyErr (priority=priority@entry=4, 
    message=message@entry=0x2b67589095c0 "__pmProbeDiscoverServices: Unable to create socket for address %s") at util.c:150
#6  0x00002b67588f623d in attemptConnections (arg=0x7ffc67379d90) at probe.c:150
#7  0x00002b6759fde555 in start_thread (arg=0x2b67586f4700) at pthread_create.c:333
#8  0x00002b6758c23ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

There is apparently a race condition in the PM_INIT_LOCKS facility.  Continuing analysis.

Note that fixing bug #1229494 would have corrected this, by eschewing all that pmgetconfig etc. effort.

Comment 1 Frank Ch. Eigler 2016-03-18 18:29:43 UTC
Created attachment 1137843 [details]
all-thread backtrace

Comment 2 Frank Ch. Eigler 2016-03-18 20:27:10 UTC
A few things jump out in that backtrace collection.

- pmNotifyErr() does PM_LOCK* work for no obvious reason.  The syslog(3) facility doesn't need it.  The "stderr equivalent" block doesn't manipulate shared data, except perhaps the pmprintf* stuff.  Except pmprintf* uses locks internally, and at the pmNotifyErr level isn't properly protected anyway, since a PM_UNLOCK is placed too early.

- The actual crash appears to occur during a callq instruction, as it's writing the return-pc into the stack.  I don't have a theory as to why that should be bad; the stack pointer etc. look ok.  Continuing investigation.

Comment 3 Dave Brolley 2016-03-18 20:31:31 UTC
(In reply to Frank Ch. Eigler from comment #2)
> - The actual crash appears to occur during a callq instruction, as it's
> writing the return-pc into the stack.  I don't have a theory as to why that
> should be bad; the stack pointer etc. look ok.  Continuing investigation.

Because the program uses potentially 1024 threads, the size of the stack for each thread has been limited (PTHREAD_STACK_MIN). Unexpected and undeterministic behaviour can occur if the stack is too small.

Comment 4 Frank Ch. Eigler 2016-03-30 22:44:24 UTC
brolley's intuition's right; our MAXPATHLENs were larger than I expected and indeed thread stacks were being blown out.  Patch posted.

http://oss.sgi.com/pipermail/pcp/2016-March/010092.html

Comment 5 Fedora Update System 2016-04-29 02:53:38 UTC
pcp-3.11.2-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755

Comment 6 Fedora Update System 2016-04-29 17:22:01 UTC
pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-bad5995fe9

Comment 7 Fedora Update System 2016-04-30 01:50:11 UTC
pcp-3.11.2-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755

Comment 8 Fedora Update System 2016-04-30 02:23:09 UTC
pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f8f919a355

Comment 9 Fedora Update System 2016-04-30 02:23:47 UTC
pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-53282a0c5a

Comment 10 Fedora Update System 2016-05-09 00:04:50 UTC
pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 11 Fedora Update System 2016-05-10 17:53:12 UTC
pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2016-05-10 17:59:44 UTC
pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2016-06-18 06:17:49 UTC
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292

Comment 14 Fedora Update System 2016-07-09 20:19:15 UTC
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.