Bug 506298 - Crash in Poller::wait()
Crash in Poller::wait()
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1.1
All Linux
urgent Severity urgent
: 1.1.6
: ---
Assigned To: Andrew Stitcher
Jan Sarenik
:
: 506102 506498 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-16 12:14 EDT by Gordon Sim
Modified: 2009-07-14 13:31 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-14 13:31:54 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Combination of fixes that solve this issue (8.25 KB, patch)
2009-06-22 22:54 EDT, Andrew Stitcher
no flags Details | Diff
reproducer lines (266 bytes, application/x-sh)
2009-07-03 09:18 EDT, Jan Sarenik
no flags Details

  None (edit)
Description Gordon Sim 2009-06-16 12:14:48 EDT
Description of problem:

Core was generated by `/usr/sbin/qpidd --daemon --port=5672 --data-dir=/app/qgw_tt2/common/data/1 --pi'.
Program terminated with signal 11, Segmentation fault.
[New process 29927]
[New process 29928]
[New process 29926]
[New process 29924]
[New process 29923]
[New process 29922]
[New process 29921]
#0  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00000036ad6c3920 in qpid::sys::Mutex::lock () from /usr/lib64/libqpidbroker.so.0
#2  0x00000036ad1768e2 in qpid::sys::Poller::wait () from /usr/lib64/libqpidcommon.so.0
#3  0x00000036ad177667 in qpid::sys::Poller::run () from /usr/lib64/libqpidcommon.so.0
#4  0x00000036ad16e64a in ?? () from /usr/lib64/libqpidcommon.so.0
#5  0x0000003c1a006367 in start_thread () from /lib64/libpthread.so.0
#6  0x00000030364d30ad in clone () from /lib64/libc.so.6
(gdb) info threads
  7 process 29921  0x0000003c1a00b5b5 in pthread_sigmask () from /lib64/libpthread.so.0
  6 process 29922  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 process 29923  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 process 29924  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 process 29926  0x00000036ad18f4ab in qpid::SessionId::operator< () from /usr/lib64/libqpidcommon.so.0
  2 process 29928  0x00002aef4c7d2850 in qpid::cluster::EventHeader::EventHeader () from /usr/lib64/qpid/daemon/cluster.so
* 1 process 29927  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0



Version-Release number of selected component (if applicable):

1.1.2 (qpidd-0.5.752581-17.el5)

How reproducible:

Seems quite frequent.

Steps to Reproduce:

As yet unknown
Comment 1 Andrew Stitcher 2009-06-17 15:24:09 EDT
This appears to be the broker side analog of BZ505231

There is a broker race that can crash the broker if the client heartbeat timeout timer fires at the same time as the client disconnects.

Reproducer:

Run the server thusly on one terminal:

src/qpidd --auth no --port 21022 --no-data-dir --worker-threads 1 & while true; do sleep 1; kill -STOP %%; sleep 2; kill -CONT %%; done

[This runs a broker and stops and continues it, this will make clients disconnect due to heartbeat loss, and start the broker again as the disconnect is going on]

On another terminal run:

while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill -STOP %% ; sleep 4 ; kill -CONT %%; done

[This will simultaneously run perftest with heartbeats in a loop, stopping it long enough that the broker will disconnect it due to heartbeat failures]

This test failed within a few minutes in my testing
Comment 2 Andrew Stitcher 2009-06-17 15:25:23 EDT
*** Bug 506102 has been marked as a duplicate of this bug. ***
Comment 3 Andrew Stitcher 2009-06-17 15:25:56 EDT
*** Bug 506498 has been marked as a duplicate of this bug. ***
Comment 4 Gordon Sim 2009-06-22 11:17:41 EDT
Fixed in -19
Comment 5 Andrew Stitcher 2009-06-22 22:54:28 EDT
Created attachment 349030 [details]
Combination of fixes that solve this issue
Comment 6 Jan Sarenik 2009-07-03 09:01:33 EDT
Reproduced on RHEL5, qpid build -17. It took no more than
a minute using the 'while' loops to get a segfault.

No signs of segfault even after 20 minutes of the same
loops running on RHEL5, 0.5.752581-22, i386 and x86_64;
and RHEL4, 0.5.752581-21, i386 and x86_64.
Comment 7 Jan Sarenik 2009-07-03 09:18:26 EDT
Created attachment 350431 [details]
reproducer lines

nothing more than what is already written above
Comment 9 errata-xmlrpc 2009-07-14 13:31:54 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1153.html

Note You need to log in before you can comment on or make changes to this bug.