| Summary: | segv in libpcp during discovery error processing | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Frank Ch. Eigler <fche> | ||||
| Component: | pcp | Assignee: | Nathan Scott <nathans> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | rawhide | CC: | brolley, fche, lberk, mgoodwin, nathans, pcp, scox, zcerza | ||||
| Target Milestone: | --- | Keywords: | Reopened | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pcp-3.11.2-2.fc24 pcp-3.11.2-1.fc22 pcp-3.11.2-2.fc23 pcp-3.11.3-1.el5 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-07-09 20:19:44 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 1137843 [details]
all-thread backtrace
A few things jump out in that backtrace collection. - pmNotifyErr() does PM_LOCK* work for no obvious reason. The syslog(3) facility doesn't need it. The "stderr equivalent" block doesn't manipulate shared data, except perhaps the pmprintf* stuff. Except pmprintf* uses locks internally, and at the pmNotifyErr level isn't properly protected anyway, since a PM_UNLOCK is placed too early. - The actual crash appears to occur during a callq instruction, as it's writing the return-pc into the stack. I don't have a theory as to why that should be bad; the stack pointer etc. look ok. Continuing investigation. (In reply to Frank Ch. Eigler from comment #2) > - The actual crash appears to occur during a callq instruction, as it's > writing the return-pc into the stack. I don't have a theory as to why that > should be bad; the stack pointer etc. look ok. Continuing investigation. Because the program uses potentially 1024 threads, the size of the stack for each thread has been limited (PTHREAD_STACK_MIN). Unexpected and undeterministic behaviour can occur if the stack is too small. brolley's intuition's right; our MAXPATHLENs were larger than I expected and indeed thread stacks were being blown out. Patch posted. http://oss.sgi.com/pipermail/pcp/2016-March/010092.html pcp-3.11.2-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755 pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-bad5995fe9 pcp-3.11.2-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755 pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f8f919a355 pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-53282a0c5a pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292 pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report. |
with pcp 3.11.0 bash% ulimit -n 20 bash% pmfind -m probe=192.168.1.0/24,maxThreads=256 Segmentation fault. A backtrace shows: Core was generated by `pmfind -m probe=192.168.1.0/24,maxThreads=255'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0) at config.c:185 185 PM_INIT_LOCKS(); (gdb) bt #0 __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0) at config.c:185 #1 0x00002b67588de6b5 in pmgetconfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR", fatal=fatal@entry=0) at config.c:251 #2 0x00002b67588de88f in pmGetOptionalConfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR") at config.c:288 #3 0x00002b67588c65e4 in vpmprintf (msg=0x2b6758902775 "[%.19s] %s(%d) %s: ", arg=arg@entry=0x2b67586f33e0) at util.c:1352 #4 0x00002b67588c8b69 in pmprintf (msg=msg@entry=0x2b6758902775 "[%.19s] %s(%d) %s: ") at util.c:1421 #5 0x00002b67588c92ff in __pmNotifyErr (priority=priority@entry=4, message=message@entry=0x2b67589095c0 "__pmProbeDiscoverServices: Unable to create socket for address %s") at util.c:150 #6 0x00002b67588f623d in attemptConnections (arg=0x7ffc67379d90) at probe.c:150 #7 0x00002b6759fde555 in start_thread (arg=0x2b67586f4700) at pthread_create.c:333 #8 0x00002b6758c23ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 There is apparently a race condition in the PM_INIT_LOCKS facility. Continuing analysis. Note that fixing bug #1229494 would have corrected this, by eschewing all that pmgetconfig etc. effort.