with pcp 3.11.0 bash% ulimit -n 20 bash% pmfind -m probe=192.168.1.0/24,maxThreads=256 Segmentation fault. A backtrace shows: Core was generated by `pmfind -m probe=192.168.1.0/24,maxThreads=255'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0) at config.c:185 185 PM_INIT_LOCKS(); (gdb) bt #0 __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0) at config.c:185 #1 0x00002b67588de6b5 in pmgetconfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR", fatal=fatal@entry=0) at config.c:251 #2 0x00002b67588de88f in pmGetOptionalConfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR") at config.c:288 #3 0x00002b67588c65e4 in vpmprintf (msg=0x2b6758902775 "[%.19s] %s(%d) %s: ", arg=arg@entry=0x2b67586f33e0) at util.c:1352 #4 0x00002b67588c8b69 in pmprintf (msg=msg@entry=0x2b6758902775 "[%.19s] %s(%d) %s: ") at util.c:1421 #5 0x00002b67588c92ff in __pmNotifyErr (priority=priority@entry=4, message=message@entry=0x2b67589095c0 "__pmProbeDiscoverServices: Unable to create socket for address %s") at util.c:150 #6 0x00002b67588f623d in attemptConnections (arg=0x7ffc67379d90) at probe.c:150 #7 0x00002b6759fde555 in start_thread (arg=0x2b67586f4700) at pthread_create.c:333 #8 0x00002b6758c23ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 There is apparently a race condition in the PM_INIT_LOCKS facility. Continuing analysis. Note that fixing bug #1229494 would have corrected this, by eschewing all that pmgetconfig etc. effort.
Created attachment 1137843 [details] all-thread backtrace
A few things jump out in that backtrace collection. - pmNotifyErr() does PM_LOCK* work for no obvious reason. The syslog(3) facility doesn't need it. The "stderr equivalent" block doesn't manipulate shared data, except perhaps the pmprintf* stuff. Except pmprintf* uses locks internally, and at the pmNotifyErr level isn't properly protected anyway, since a PM_UNLOCK is placed too early. - The actual crash appears to occur during a callq instruction, as it's writing the return-pc into the stack. I don't have a theory as to why that should be bad; the stack pointer etc. look ok. Continuing investigation.
(In reply to Frank Ch. Eigler from comment #2) > - The actual crash appears to occur during a callq instruction, as it's > writing the return-pc into the stack. I don't have a theory as to why that > should be bad; the stack pointer etc. look ok. Continuing investigation. Because the program uses potentially 1024 threads, the size of the stack for each thread has been limited (PTHREAD_STACK_MIN). Unexpected and undeterministic behaviour can occur if the stack is too small.
brolley's intuition's right; our MAXPATHLENs were larger than I expected and indeed thread stacks were being blown out. Patch posted. http://oss.sgi.com/pipermail/pcp/2016-March/010092.html
pcp-3.11.2-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755
pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-bad5995fe9
pcp-3.11.2-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755
pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f8f919a355
pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-53282a0c5a
pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292
pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.