Description of problem: Core was generated by `/usr/sbin/qpidd --daemon --port=5672 --data-dir=/app/qgw_tt2/common/data/1 --pi'. Program terminated with signal 11, Segmentation fault. [New process 29927] [New process 29928] [New process 29926] [New process 29924] [New process 29923] [New process 29922] [New process 29921] #0 0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) bt #0 0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x00000036ad6c3920 in qpid::sys::Mutex::lock () from /usr/lib64/libqpidbroker.so.0 #2 0x00000036ad1768e2 in qpid::sys::Poller::wait () from /usr/lib64/libqpidcommon.so.0 #3 0x00000036ad177667 in qpid::sys::Poller::run () from /usr/lib64/libqpidcommon.so.0 #4 0x00000036ad16e64a in ?? () from /usr/lib64/libqpidcommon.so.0 #5 0x0000003c1a006367 in start_thread () from /lib64/libpthread.so.0 #6 0x00000030364d30ad in clone () from /lib64/libc.so.6 (gdb) info threads 7 process 29921 0x0000003c1a00b5b5 in pthread_sigmask () from /lib64/libpthread.so.0 6 process 29922 0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 process 29923 0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 process 29924 0x0000003c1a00ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 process 29926 0x00000036ad18f4ab in qpid::SessionId::operator< () from /usr/lib64/libqpidcommon.so.0 2 process 29928 0x00002aef4c7d2850 in qpid::cluster::EventHeader::EventHeader () from /usr/lib64/qpid/daemon/cluster.so * 1 process 29927 0x0000003c1a008293 in pthread_mutex_lock () from /lib64/libpthread.so.0 Version-Release number of selected component (if applicable): 1.1.2 (qpidd-0.5.752581-17.el5) How reproducible: Seems quite frequent. Steps to Reproduce: As yet unknown
This appears to be the broker side analog of BZ505231 There is a broker race that can crash the broker if the client heartbeat timeout timer fires at the same time as the client disconnects. Reproducer: Run the server thusly on one terminal: src/qpidd --auth no --port 21022 --no-data-dir --worker-threads 1 & while true; do sleep 1; kill -STOP %%; sleep 2; kill -CONT %%; done [This runs a broker and stops and continues it, this will make clients disconnect due to heartbeat loss, and start the broker again as the disconnect is going on] On another terminal run: while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill -STOP %% ; sleep 4 ; kill -CONT %%; done [This will simultaneously run perftest with heartbeats in a loop, stopping it long enough that the broker will disconnect it due to heartbeat failures] This test failed within a few minutes in my testing
*** Bug 506102 has been marked as a duplicate of this bug. ***
*** Bug 506498 has been marked as a duplicate of this bug. ***
Fixed in -19
Created attachment 349030 [details] Combination of fixes that solve this issue
Reproduced on RHEL5, qpid build -17. It took no more than a minute using the 'while' loops to get a segfault. No signs of segfault even after 20 minutes of the same loops running on RHEL5, 0.5.752581-22, i386 and x86_64; and RHEL4, 0.5.752581-21, i386 and x86_64.
Created attachment 350431 [details] reproducer lines nothing more than what is already written above
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1153.html