Bug 506298 - Crash in Poller::wait()
Summary: Crash in Poller::wait()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.1.6
: ---
Assignee: Andrew Stitcher
QA Contact: Jan Sarenik
URL:
Whiteboard:
: 506102 506498 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-16 16:14 UTC by Gordon Sim
Modified: 2009-07-14 17:31 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-14 17:31:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Combination of fixes that solve this issue (8.25 KB, patch)
2009-06-23 02:54 UTC, Andrew Stitcher
no flags Details | Diff
reproducer lines (266 bytes, application/x-sh)
2009-07-03 13:18 UTC, Jan Sarenik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1153 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging bug fixing update 2009-07-14 17:31:48 UTC

Description Gordon Sim 2009-06-16 16:14:48 UTC
Description of problem:

Core was generated by `/usr/sbin/qpidd --daemon --port=5672 --data-dir=/app/qgw_tt2/common/data/1 --pi'.
Program terminated with signal 11, Segmentation fault.
[New process 29927]
[New process 29928]
[New process 29926]
[New process 29924]
[New process 29923]
[New process 29922]
[New process 29921]
#0  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00000036ad6c3920 in qpid::sys::Mutex::lock () from /usr/lib64/libqpidbroker.so.0
#2  0x00000036ad1768e2 in qpid::sys::Poller::wait () from /usr/lib64/libqpidcommon.so.0
#3  0x00000036ad177667 in qpid::sys::Poller::run () from /usr/lib64/libqpidcommon.so.0
#4  0x00000036ad16e64a in ?? () from /usr/lib64/libqpidcommon.so.0
#5  0x0000003c1a006367 in start_thread () from /lib64/libpthread.so.0
#6  0x00000030364d30ad in clone () from /lib64/libc.so.6
(gdb) info threads
  7 process 29921  0x0000003c1a00b5b5 in pthread_sigmask () from /lib64/libpthread.so.0
  6 process 29922  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 process 29923  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 process 29924  0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 process 29926  0x00000036ad18f4ab in qpid::SessionId::operator< () from /usr/lib64/libqpidcommon.so.0
  2 process 29928  0x00002aef4c7d2850 in qpid::cluster::EventHeader::EventHeader () from /usr/lib64/qpid/daemon/cluster.so
* 1 process 29927  0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0



Version-Release number of selected component (if applicable):

1.1.2 (qpidd-0.5.752581-17.el5)

How reproducible:

Seems quite frequent.

Steps to Reproduce:

As yet unknown

Comment 1 Andrew Stitcher 2009-06-17 19:24:09 UTC
This appears to be the broker side analog of BZ505231

There is a broker race that can crash the broker if the client heartbeat timeout timer fires at the same time as the client disconnects.

Reproducer:

Run the server thusly on one terminal:

src/qpidd --auth no --port 21022 --no-data-dir --worker-threads 1 & while true; do sleep 1; kill -STOP %%; sleep 2; kill -CONT %%; done

[This runs a broker and stops and continues it, this will make clients disconnect due to heartbeat loss, and start the broker again as the disconnect is going on]

On another terminal run:

while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill -STOP %% ; sleep 4 ; kill -CONT %%; done

[This will simultaneously run perftest with heartbeats in a loop, stopping it long enough that the broker will disconnect it due to heartbeat failures]

This test failed within a few minutes in my testing

Comment 2 Andrew Stitcher 2009-06-17 19:25:23 UTC
*** Bug 506102 has been marked as a duplicate of this bug. ***

Comment 3 Andrew Stitcher 2009-06-17 19:25:56 UTC
*** Bug 506498 has been marked as a duplicate of this bug. ***

Comment 4 Gordon Sim 2009-06-22 15:17:41 UTC
Fixed in -19

Comment 5 Andrew Stitcher 2009-06-23 02:54:28 UTC
Created attachment 349030 [details]
Combination of fixes that solve this issue

Comment 6 Jan Sarenik 2009-07-03 13:01:33 UTC
Reproduced on RHEL5, qpid build -17. It took no more than
a minute using the 'while' loops to get a segfault.

No signs of segfault even after 20 minutes of the same
loops running on RHEL5, 0.5.752581-22, i386 and x86_64;
and RHEL4, 0.5.752581-21, i386 and x86_64.

Comment 7 Jan Sarenik 2009-07-03 13:18:26 UTC
Created attachment 350431 [details]
reproducer lines

nothing more than what is already written above

Comment 9 errata-xmlrpc 2009-07-14 17:31:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1153.html


Note You need to log in before you can comment on or make changes to this bug.