Bug 506298 - Crash in Poller::wait()
Crash in Poller::wait()
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
All Linux
urgent Severity urgent
: 1.1.6
: ---
Assigned To: Andrew Stitcher
Jan Sarenik
: 506102 506498 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2009-06-16 12:14 EDT by Gordon Sim
Modified: 2009-07-14 13:31 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-07-14 13:31:54 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Combination of fixes that solve this issue (8.25 KB, patch)
2009-06-22 22:54 EDT, Andrew Stitcher
no flags Details | Diff
reproducer lines (266 bytes, application/x-sh)
2009-07-03 09:18 EDT, Jan Sarenik
no flags Details

  None (edit)
Description Gordon Sim 2009-06-16 12:14:48 EDT
Description of problem:

Core was generated by `/usr/sbin/qpidd --daemon --port=5672 --data-dir=/app/qgw_tt2/common/data/1 --pi'.
Program terminated with signal 11, Segmentation fault.
[New process 29927]
[New process 29928]
[New process 29926]
[New process 29924]
[New process 29923]
[New process 29922]
[New process 29921]
#0  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00000036ad6c3920 in qpid::sys::Mutex::lock () from /usr/lib64/libqpidbroker.so.0
#2  0x00000036ad1768e2 in qpid::sys::Poller::wait () from /usr/lib64/libqpidcommon.so.0
#3  0x00000036ad177667 in qpid::sys::Poller::run () from /usr/lib64/libqpidcommon.so.0
#4  0x00000036ad16e64a in ?? () from /usr/lib64/libqpidcommon.so.0
#5  0x0000003c1a006367 in start_thread () from /lib64/libpthread.so.0
#6  0x00000030364d30ad in clone () from /lib64/libc.so.6
(gdb) info threads
  7 process 29921  0x0000003c1a00b5b5 in pthread_sigmask () from /lib64/libpthread.so.0
  6 process 29922  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 process 29923  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 process 29924  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 process 29926  0x00000036ad18f4ab in qpid::SessionId::operator< () from /usr/lib64/libqpidcommon.so.0
  2 process 29928  0x00002aef4c7d2850 in qpid::cluster::EventHeader::EventHeader () from /usr/lib64/qpid/daemon/cluster.so
* 1 process 29927  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0

Version-Release number of selected component (if applicable):

1.1.2 (qpidd-0.5.752581-17.el5)

How reproducible:

Seems quite frequent.

Steps to Reproduce:

As yet unknown
Comment 1 Andrew Stitcher 2009-06-17 15:24:09 EDT
This appears to be the broker side analog of BZ505231

There is a broker race that can crash the broker if the client heartbeat timeout timer fires at the same time as the client disconnects.


Run the server thusly on one terminal:

src/qpidd --auth no --port 21022 --no-data-dir --worker-threads 1 & while true; do sleep 1; kill -STOP %%; sleep 2; kill -CONT %%; done

[This runs a broker and stops and continues it, this will make clients disconnect due to heartbeat loss, and start the broker again as the disconnect is going on]

On another terminal run:

while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill -STOP %% ; sleep 4 ; kill -CONT %%; done

[This will simultaneously run perftest with heartbeats in a loop, stopping it long enough that the broker will disconnect it due to heartbeat failures]

This test failed within a few minutes in my testing
Comment 2 Andrew Stitcher 2009-06-17 15:25:23 EDT
*** Bug 506102 has been marked as a duplicate of this bug. ***
Comment 3 Andrew Stitcher 2009-06-17 15:25:56 EDT
*** Bug 506498 has been marked as a duplicate of this bug. ***
Comment 4 Gordon Sim 2009-06-22 11:17:41 EDT
Fixed in -19
Comment 5 Andrew Stitcher 2009-06-22 22:54:28 EDT
Created attachment 349030 [details]
Combination of fixes that solve this issue
Comment 6 Jan Sarenik 2009-07-03 09:01:33 EDT
Reproduced on RHEL5, qpid build -17. It took no more than
a minute using the 'while' loops to get a segfault.

No signs of segfault even after 20 minutes of the same
loops running on RHEL5, 0.5.752581-22, i386 and x86_64;
and RHEL4, 0.5.752581-21, i386 and x86_64.
Comment 7 Jan Sarenik 2009-07-03 09:18:26 EDT
Created attachment 350431 [details]
reproducer lines

nothing more than what is already written above
Comment 9 errata-xmlrpc 2009-07-14 13:31:54 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.