Bug 506298

Summary:

Crash in Poller::wait()

Product:

Red Hat Enterprise MRG

Reporter:

Gordon Sim <gsim>

Component:

qpid-cpp

Assignee:

Andrew Stitcher <astitcher>

Status:

CLOSED ERRATA

QA Contact:

Jan Sarenik <jsarenik>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

1.1.1

CC:

freznice, jsarenik

Target Milestone:

1.1.6

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-07-14 17:31:54 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Combination of fixes that solve this issue	none
reproducer lines	none

Description Gordon Sim 2009-06-16 16:14:48 UTC

Description of problem:

Core was generated by `/usr/sbin/qpidd --daemon --port=5672 --data-dir=/app/qgw_tt2/common/data/1 --pi'.
Program terminated with signal 11, Segmentation fault.
[New process 29927]
[New process 29928]
[New process 29926]
[New process 29924]
[New process 29923]
[New process 29922]
[New process 29921]
#0  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00000036ad6c3920 in qpid::sys::Mutex::lock () from /usr/lib64/libqpidbroker.so.0
#2  0x00000036ad1768e2 in qpid::sys::Poller::wait () from /usr/lib64/libqpidcommon.so.0
#3  0x00000036ad177667 in qpid::sys::Poller::run () from /usr/lib64/libqpidcommon.so.0
#4  0x00000036ad16e64a in ?? () from /usr/lib64/libqpidcommon.so.0
#5  0x0000003c1a006367 in start_thread () from /lib64/libpthread.so.0
#6  0x00000030364d30ad in clone () from /lib64/libc.so.6
(gdb) info threads
  7 process 29921  0x0000003c1a00b5b5 in pthread_sigmask () from /lib64/libpthread.so.0
  6 process 29922  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 process 29923  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 process 29924  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 process 29926  0x00000036ad18f4ab in qpid::SessionId::operator< () from /usr/lib64/libqpidcommon.so.0
  2 process 29928  0x00002aef4c7d2850 in qpid::cluster::EventHeader::EventHeader () from /usr/lib64/qpid/daemon/cluster.so
* 1 process 29927  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0



Version-Release number of selected component (if applicable):

1.1.2 (qpidd-0.5.752581-17.el5)

How reproducible:

Seems quite frequent.

Steps to Reproduce:

As yet unknown

Comment 1 Andrew Stitcher 2009-06-17 19:24:09 UTC

This appears to be the broker side analog of BZ505231

There is a broker race that can crash the broker if the client heartbeat timeout timer fires at the same time as the client disconnects.

Reproducer:

Run the server thusly on one terminal:

src/qpidd --auth no --port 21022 --no-data-dir --worker-threads 1 & while true; do sleep 1; kill -STOP %%; sleep 2; kill -CONT %%; done

[This runs a broker and stops and continues it, this will make clients disconnect due to heartbeat loss, and start the broker again as the disconnect is going on]

On another terminal run:

while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill -STOP %% ; sleep 4 ; kill -CONT %%; done

[This will simultaneously run perftest with heartbeats in a loop, stopping it long enough that the broker will disconnect it due to heartbeat failures]

This test failed within a few minutes in my testing

Comment 2 Andrew Stitcher 2009-06-17 19:25:23 UTC

*** Bug 506102 has been marked as a duplicate of this bug. ***

Comment 3 Andrew Stitcher 2009-06-17 19:25:56 UTC

*** Bug 506498 has been marked as a duplicate of this bug. ***

Comment 4 Gordon Sim 2009-06-22 15:17:41 UTC

Fixed in -19

Comment 5 Andrew Stitcher 2009-06-23 02:54:28 UTC

Created attachment 349030 [details]
Combination of fixes that solve this issue

Comment 6 Jan Sarenik 2009-07-03 13:01:33 UTC

Reproduced on RHEL5, qpid build -17. It took no more than
a minute using the 'while' loops to get a segfault.

No signs of segfault even after 20 minutes of the same
loops running on RHEL5, 0.5.752581-22, i386 and x86_64;
and RHEL4, 0.5.752581-21, i386 and x86_64.

Comment 7 Jan Sarenik 2009-07-03 13:18:26 UTC

Created attachment 350431 [details]
reproducer lines

nothing more than what is already written above

Comment 9 errata-xmlrpc 2009-07-14 17:31:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1153.html