Bug 1319288

Summary:

segv in libpcp during discovery error processing

Product:

[Fedora] Fedora

Reporter:

Frank Ch. Eigler <fche>

Component:

pcp

Assignee:

Nathan Scott <nathans>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

rawhide

CC:

brolley, fche, lberk, mgoodwin, nathans, pcp, scox, zcerza

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

pcp-3.11.2-2.fc24 pcp-3.11.2-1.fc22 pcp-3.11.2-2.fc23 pcp-3.11.3-1.el5

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-07-09 20:19:44 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
all-thread backtrace	none

Description Frank Ch. Eigler 2016-03-18 17:50:40 UTC

with pcp 3.11.0

bash% ulimit -n 20
bash% pmfind -m probe=192.168.1.0/24,maxThreads=256
Segmentation fault.

A backtrace shows:

Core was generated by `pmfind -m probe=192.168.1.0/24,maxThreads=255'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0)
    at config.c:185
185	    PM_INIT_LOCKS();
(gdb) bt
#0  __pmconfig (formatter=formatter@entry=0x2b67588de180 <posix_formatter>, fatal=fatal@entry=0)
    at config.c:185
#1  0x00002b67588de6b5 in pmgetconfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR", 
    fatal=fatal@entry=0) at config.c:251
#2  0x00002b67588de88f in pmGetOptionalConfig (name=name@entry=0x2b67589023d0 "PCP_TMPFILE_DIR")
    at config.c:288
#3  0x00002b67588c65e4 in vpmprintf (msg=0x2b6758902775 "[%.19s] %s(%d) %s: ", 
    arg=arg@entry=0x2b67586f33e0) at util.c:1352
#4  0x00002b67588c8b69 in pmprintf (msg=msg@entry=0x2b6758902775 "[%.19s] %s(%d) %s: ")
    at util.c:1421
#5  0x00002b67588c92ff in __pmNotifyErr (priority=priority@entry=4, 
    message=message@entry=0x2b67589095c0 "__pmProbeDiscoverServices: Unable to create socket for address %s") at util.c:150
#6  0x00002b67588f623d in attemptConnections (arg=0x7ffc67379d90) at probe.c:150
#7  0x00002b6759fde555 in start_thread (arg=0x2b67586f4700) at pthread_create.c:333
#8  0x00002b6758c23ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

There is apparently a race condition in the PM_INIT_LOCKS facility.  Continuing analysis.

Note that fixing bug #1229494 would have corrected this, by eschewing all that pmgetconfig etc. effort.

Comment 1 Frank Ch. Eigler 2016-03-18 18:29:43 UTC

Created attachment 1137843 [details]
all-thread backtrace

Comment 2 Frank Ch. Eigler 2016-03-18 20:27:10 UTC

A few things jump out in that backtrace collection.

- pmNotifyErr() does PM_LOCK* work for no obvious reason.  The syslog(3) facility doesn't need it.  The "stderr equivalent" block doesn't manipulate shared data, except perhaps the pmprintf* stuff.  Except pmprintf* uses locks internally, and at the pmNotifyErr level isn't properly protected anyway, since a PM_UNLOCK is placed too early.

- The actual crash appears to occur during a callq instruction, as it's writing the return-pc into the stack.  I don't have a theory as to why that should be bad; the stack pointer etc. look ok.  Continuing investigation.

Comment 3 Dave Brolley 2016-03-18 20:31:31 UTC

(In reply to Frank Ch. Eigler from comment #2)
> - The actual crash appears to occur during a callq instruction, as it's
> writing the return-pc into the stack.  I don't have a theory as to why that
> should be bad; the stack pointer etc. look ok.  Continuing investigation.

Because the program uses potentially 1024 threads, the size of the stack for each thread has been limited (PTHREAD_STACK_MIN). Unexpected and undeterministic behaviour can occur if the stack is too small.

Comment 4 Frank Ch. Eigler 2016-03-30 22:44:24 UTC

brolley's intuition's right; our MAXPATHLENs were larger than I expected and indeed thread stacks were being blown out.  Patch posted.

http://oss.sgi.com/pipermail/pcp/2016-March/010092.html

Comment 5 Fedora Update System 2016-04-29 02:53:38 UTC

pcp-3.11.2-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755

Comment 6 Fedora Update System 2016-04-29 17:22:01 UTC

pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-bad5995fe9

Comment 7 Fedora Update System 2016-04-30 01:50:11 UTC

pcp-3.11.2-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-394320f755

Comment 8 Fedora Update System 2016-04-30 02:23:09 UTC

pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-f8f919a355

Comment 9 Fedora Update System 2016-04-30 02:23:47 UTC

pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-53282a0c5a

Comment 10 Fedora Update System 2016-05-09 00:04:50 UTC

pcp-3.11.2-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 11 Fedora Update System 2016-05-10 17:53:12 UTC

pcp-3.11.2-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2016-05-10 17:59:44 UTC

pcp-3.11.2-2.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2016-06-18 06:17:49 UTC

pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-4745f3e292

Comment 14 Fedora Update System 2016-07-09 20:19:15 UTC

pcp-3.11.3-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.