This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1319288 - segv in libpcp during discovery error processing
segv in libpcp during discovery error processing
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: pcp (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Nathan Scott
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-18 13:50 EDT by Frank Ch. Eigler
Modified: 2016-07-09 16:19 EDT (History)
8 users (show)

See Also:
Fixed In Version: pcp-3.11.2-2.fc24 pcp-3.11.2-1.fc22 pcp-3.11.2-2.fc23 pcp-3.11.3-1.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-09 16:19:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
all-thread backtrace (38.88 KB, text/plain)
2016-03-18 14:29 EDT, Frank Ch. Eigler
no flags Details

  None (edit)
Description Frank Ch. Eigler 2016-03-18 13:50:40 EDT
with pcp 3.11.0

bash% ulimit -n 20
bash% pmfind -m probe=192.168.1.0/24,maxThreads=256
Segmentation fault.

A backtrace shows:

Core was generated by `pmfind -m probe=192.168.1.0/24,maxThreads=255'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0)
    at config.c:185
185	    PM_INIT_LOCKS();
(gdb) bt
#0  __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0)
    at config.c:185
#1  0x00002b67588de6b5 in pmgetconfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR", 
    fatal=fatal@entry=0) at config.c:251
#2  0x00002b67588de88f in pmGetOptionalConfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR")
    at config.c:288
#3  0x00002b67588c65e4 in vpmprintf (msg=0x2b6758902775 "[%.19s] %s(%d) %s: ", 
    arg=arg@entry=0x2b67586f33e0) at util.c:1352
#4  0x00002b67588c8b69 in pmprintf (msg=msg@entry=0x2b6758902775 "[%.19s] %s(%d) %s: ")
    at util.c:1421
#5  0x00002b67588c92ff in __pmNotifyErr (priority=priority@entry=4, 
    message=message@entry=0x2b67589095c0 "__pmProbeDiscoverServices: Unable to create socket for address %s") at util.c:150
#6  0x00002b67588f623d in attemptConnections (arg=0x7ffc67379d90) at probe.c:150
#7  0x00002b6759fde555 in start_thread (arg=0x2b67586f4700) at pthread_create.c:333
#8  0x00002b6758c23ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

There is apparently a race condition in the PM_INIT_LOCKS facility.  Continuing analysis.

Note that fixing bug #1229494 would have corrected this, by eschewing all that pmgetconfig etc. effort.
Comment 1 Frank Ch. Eigler 2016-03-18 14:29 EDT
Created attachment 1137843 [details]
all-thread backtrace
Comment 2 Frank Ch. Eigler 2016-03-18 16:27:10 EDT
A few things jump out in that backtrace collection.

- pmNotifyErr() does PM_LOCK* work for no obvious reason.  The syslog(3) facility doesn't need it.  The "stderr equivalent" block doesn't manipulate shared data, except perhaps the pmprintf* stuff.  Except pmprintf* uses locks internally, and at the pmNotifyErr level isn't properly protected anyway, since a PM_UNLOCK is placed too early.

- The actual crash appears to occur during a callq instruction, as it's writing the return-pc into the stack.  I don't have a theory as to why that should be bad; the stack pointer etc. look ok.  Continuing investigation.
Comment 3 Dave Brolley 2016-03-18 16:31:31 EDT
(In reply to Frank Ch. Eigler from comment #2)
> - The actual crash appears to occur during a callq instruction, as it's
> writing the return-pc into the stack.  I don't have a theory as to why that
> should be bad; the stack pointer etc. look ok.  Continuing investigation.

Because the program uses potentially 1024 threads, the size of the stack for each thread has been limited (PTHREAD_STACK_MIN). Unexpected and undeterministic behaviour can occur if the stack is too small.
Comment 4 Frank Ch. Eigler 2016-03-30 18:44:24 EDT
brolley's intuition's right; our MAXPATHLENs were larger than I expected and indeed thread stacks were being blown out.  Patch posted.

http://oss.sgi.com/pipermail/pcp/2016-March/010092.html
Comment 5 Fedora Update System 2016-04-28 22:53:38 EDT
pcp-3.11.2-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755
Comment 6 Fedora Update System 2016-04-29 13:22:01 EDT
pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-bad5995fe9
Comment 7 Fedora Update System 2016-04-29 21:50:11 EDT
pcp-3.11.2-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755
Comment 8 Fedora Update System 2016-04-29 22:23:09 EDT
pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f8f919a355
Comment 9 Fedora Update System 2016-04-29 22:23:47 EDT
pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-53282a0c5a
Comment 10 Fedora Update System 2016-05-08 20:04:50 EDT
pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.
Comment 11 Fedora Update System 2016-05-10 13:53:12 EDT
pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
Comment 12 Fedora Update System 2016-05-10 13:59:44 EDT
pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
Comment 13 Fedora Update System 2016-06-18 02:17:49 EDT
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292
Comment 14 Fedora Update System 2016-07-09 16:19:15 EDT
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.