Description of problem: There was observed clustered broker crash while retesting bug 510475 with original bz506758 reproducer. Analyzed core dump shows following backtrace: Thread 1 (Thread 9587): #0 0x0000003f112819b0 in vtable for qpid::sys::OutputTask () from /usr/lib64/libqpidbroker.so.2 #1 0x0000003f109f0d49 in qpid::sys::AsynchIOHandler::disconnect ( this=0x2aaab0276a10) at qpid/sys/AsynchIOHandler.cpp:194 #2 0x0000003f109f1049 in qpid::sys::AsynchIOHandler::eof ( this=0x2aaaaca240a0, a=...) at qpid/sys/AsynchIOHandler.cpp:177 #3 0x0000003f1092118f in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() (this=0x2aaaac588650, a0=...) at /usr/include/boost/function/function_template.hpp:576 #4 0x0000003f10920a93 in operator()<boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> > >, boost::_bi::list1<qpid::sys::DispatchHandle&> > ( function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:149 #5 operator()<qpid::sys::DispatchHandle> ( function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/bind/bind_template.hpp:32 ... This crash was detected just once over about one day testing on virtualized RHEL 5.5 x86_64 Version-Release number of selected component (if applicable): python-qmf-0.7.946106-3.el5 python-qpid-0.7.946106-1.el5 qmf-0.7.946106-3.el5 qmf-devel-0.7.946106-3.el5 qpid-cpp-client-0.7.946106-3.el5 qpid-cpp-client-devel-0.7.946106-3.el5 qpid-cpp-client-devel-docs-0.7.946106-3.el5 qpid-cpp-client-rdma-0.7.946106-3.el5 qpid-cpp-client-ssl-0.7.946106-3.el5 qpid-cpp-mrg-debuginfo-0.7.946106-3.el5 qpid-cpp-server-0.7.946106-3.el5 qpid-cpp-server-cluster-0.7.946106-3.el5 qpid-cpp-server-devel-0.7.946106-3.el5 qpid-cpp-server-rdma-0.7.946106-3.el5 qpid-cpp-server-ssl-0.7.946106-3.el5 qpid-cpp-server-store-0.7.946106-3.el5 qpid-cpp-server-xml-0.7.946106-3.el5 qpid-java-client-0.7.946106-3.el5 qpid-java-common-0.7.946106-3.el5 qpid-tools-0.7.946106-4.el5 rh-qpid-cpp-tests-0.7.946106-3.el5 ruby-qmf-0.7.946106-3.el5 How reproducible: quite hard Steps to Reproduce: 1. follow bug 510475 steps In more detail, run the attached reproducer - ./run.sh $((4+${RANDOM}%2)) $((8+${RANDOM}%6)) on background - while executing check the core files, if found exit - once reproduce executes for 10 minutes already kill the process and restart Actual results: qpidd broker crashes. Expected results: qpidd broker should not crash. Additional info: [root@dhcp-30-90 bz506758_ori]# cat dump_core.9579 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.1) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/qpidd...Reading symbols from /usr/lib/debug/usr/sbin/qpidd.debug... warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug done. done. [New Thread 9597] [New Thread 9586] [New Thread 9584] [New Thread 9583] [New Thread 9579] warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug Reading symbols from /usr/lib64/libqpidbroker.so.2...Reading symbols from /usr/lib/debug/usr/lib64/libqpidbroker.so.2.0.0.debug...done. done. ... warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug Core was generated by `qpidd -p 5672 --auth no --log-enable info+ --cluster-name dhcp-30-90.brq.redhat'. Program terminated with signal 11, Segmentation fault. #0 0x0000003f112819b0 in vtable for qpid::sys::OutputTask () from /usr/lib64/libqpidbroker.so.2 (gdb) rax 0x2aaaac588650 46912524289616 rbx 0x2aaab0276a10 46912588179984 rcx 0x3f109f1040 270861799488 rdx 0x2aaab0276a10 46912588179984 rsi 0x2aaab00a43a0 46912586269600 rdi 0x2aaaaca240a0 46912529121440 rbp 0x2aaab80bd5d0 0x2aaab80bd5d0 rsp 0x4303b188 0x4303b188 r8 0x1 1 r9 0x2573 9587 r10 0x0 0 r11 0x3f109f1040 270861799488 r12 0x3f1091e530 270860936496 r13 0x3f1101eb10 270868278032 r14 0x3f1101e960 270868277600 r15 0x2aaab0276a10 46912588179984 rip 0x3f112819b0 0x3f112819b0 <vtable for qpid::sys::OutputTask+16> eflags 0x10206 [ PF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x63 99 gs 0x0 0 st0 0 (raw 0x00000000000000000000) st1 0 (raw 0x00000000000000000000) st2 0 (raw 0x00000000000000000000) st3 0 (raw 0x00000000000000000000) st4 0 (raw 0x00000000000000000000) st5 0 (raw 0x00000000000000000000) st6 0 (raw 0x00000000000000000000) st7 0 (raw 0x00000000000000000000) fctrl 0x37f 895 fstat 0x0 0 ftag 0xffff 65535 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 (gdb) Using memory regions provided by the target. There are no memory regions defined. (gdb) From To Syms Read Shared Object Library 0x0000003f10ea11b0 0x0000003f110203d8 Yes (*) /usr/lib64/libqpidbroker.so.2 0x0000003f1090e980 0x0000003f10a0fea8 Yes (*) /usr/lib64/libqpidcommon.so.2 0x0000003456610aa0 0x000000345662dae8 Yes (*) /usr/lib64/libboost_program_options.so.2 0x0000003457004810 0x000000345700cff8 Yes (*) /usr/lib64/libboost_filesystem.so.2 0x00000035d1601500 0x00000035d1602918 Yes (*) /lib64/libuuid.so.1 0x00000035cca00e10 0x00000035cca01a08 Yes (*) /lib64/libdl.so.2 0x00000035cda02220 0x00000035cda05cc8 Yes (*) /lib64/librt.so.1 0x00000035cfa046e0 0x00000035cfa13be8 Yes (*) /usr/lib64/libsasl2.so.2 0x00000035cf64f430 0x00000035cf6c3058 Yes (*) /usr/lib64/libstdc++.so.6 0x00000035cce03e60 0x00000035cce43e38 Yes (*) /lib64/libm.so.6 0x00000035cf201e50 0x00000035cf20b018 Yes (*) /lib64/libgcc_s.so.1 0x00000035cc61d780 0x00000035cc709ff8 Yes (*) /lib64/libc.so.6 0x00000035cc200a70 0x00000035cc21671e Yes (*) /lib64/ld-linux-x86-64.so.2 0x00000035cd2051f0 0x00000035cd210258 Yes (*) /lib64/libpthread.so.0 0x00000035d06032a0 0x00000035d060e2d8 Yes (*) /lib64/libresolv.so.2 0x00000035ce6009f0 0x00000035ce606918 Yes (*) /lib64/libcrypt.so.1 0x00002b327a16bb70 0x00002b327a171738 Yes (*) /usr/lib64/qpid/daemon/replicating_listener.so 0x00002b327a382f10 0x00002b327a3a34e8 Yes (*) /usr/lib64/qpid/daemon/acl.so 0x00002b327a5e4fa0 0x00002b327a672da8 Yes (*) /usr/lib64/qpid/daemon/msgstore.so 0x00002b327a8d65d0 0x00002b327a98f288 Yes (*) /usr/lib64/libdb_cxx-4.3.so 0x00002b327abba510 0x00002b327abba6d1 Yes (*) /usr/lib64/libaio.so.1 0x00002b327adc0610 0x00002b327adc4bd8 Yes (*) /usr/lib64/qpid/daemon/replication_exchange.so 0x00002b327afcd5e0 0x00002b327afd0b68 Yes (*) /usr/lib64/qpid/daemon/watchdog.so 0x00002b327b1dc570 0x00002b327b1e4728 Yes (*) /usr/lib64/qpid/daemon/ssl.so 0x00002b327b405640 0x00002b327b41c0a8 Yes (*) /usr/lib64/libsslcommon.so.2 0x0000003000e183b0 0x0000003000ef6f48 Yes (*) /usr/lib64/libnss3.so 0x0000003001a085e0 0x0000003001a2b638 Yes (*) /usr/lib64/libssl3.so 0x00000035d020cf30 0x00000035d022b738 Yes (*) /usr/lib64/libnspr4.so 0x0000003001208340 0x0000003001212c38 Yes (*) /usr/lib64/libnssutil3.so 0x00000035cee01370 0x00000035cee02978 Yes (*) /usr/lib64/libplc4.so 0x00002b327b623e10 0x00002b327b624c08 Yes (*) /usr/lib64/libplds4.so 0x00000035cd601fd0 0x00000035cd60cac8 Yes (*) /usr/lib64/libz.so.1 0x00002b327b82eb00 0x00002b327b8365a8 Yes (*) /usr/lib64/qpid/daemon/rdma.so 0x00002b327ba55270 0x00002b327ba62928 Yes (*) /usr/lib64/librdmawrap.so.2 0x00002b327bc6b360 0x00002b327bc71e38 Yes (*) /usr/lib64/libibverbs.so.1 0x00002b327be76580 0x00002b327be78f88 Yes (*) /usr/lib64/librdmacm.so.1 0x00002b327c0827b0 0x00002b327c08c578 Yes (*) /usr/lib64/qpid/daemon/xml.so 0x000000385d373070 0x000000385d4f4758 Yes (*) /usr/lib64/libxerces-c.so.28 0x000000385db6d090 0x000000385dcf8b28 Yes (*) /usr/lib64/libxqilla.so.3 0x00002b327c2d1270 0x00002b327c33b758 Yes (*) /usr/lib64/qpid/daemon/cluster.so 0x00000038444013d0 0x0000003844403338 Yes (*) /usr/lib64/openais/libcpg.so.2 0x00002b327c572110 0x00002b327c574b78 Yes (*) /usr/lib64/libcman.so.2 0x00002b327c7b5d60 0x00002b327c82f298 Yes (*) /usr/lib64/libqpidclient.so.2 0x00002aaaaaefa9f0 0x00002aaaaaf072f8 Yes (*) /usr/lib64/qpid/client/sslconnector.so 0x00002aaaab117530 0x00002aaaab125a98 Yes (*) /usr/lib64/qpid/client/rdmaconnector.so 0x00002aaaab32bfb0 0x00002aaaab32dbc8 Yes (*) /usr/lib64/sasl2/libanonymous.so.2 0x00002aaaab54be60 0x00002aaaab5e9388 Yes (*) /usr/lib64/sasl2/libsasldb.so.2 0x00002aaaab809fa0 0x00002aaaab80bd08 Yes (*) /usr/lib64/sasl2/liblogin.so.2 0x00002aaaaba0dfb0 0x00002aaaaba0fd58 Yes (*) /usr/lib64/sasl2/libplain.so.2 (*): Shared library is missing debugging information. (gdb) 6 Thread 9579 0x00000035cd207b35 in pthread_join () from /lib64/libpthread.so.0 5 Thread 9583 0x00000035cd20b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 9584 0x00000035cd20b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 9586 0x00000035cd20aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 9597 0x00000035cc6d4108 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 9587 0x0000003f112819b0 in vtable for qpid::sys::OutputTask () from /usr/lib64/libqpidbroker.so.2 (gdb) Thread 6 (Thread 9579): #0 0x00000035cd207b35 in pthread_join () from /lib64/libpthread.so.0 #1 0x0000003f109239dd in qpid::sys::Thread::join (this=<value optimized out>) at qpid/sys/posix/Thread.cpp:70 #2 0x0000003f10f0ea83 in qpid::broker::Broker::run ( this=<value optimized out>) at qpid/broker/Broker.cpp:342 #3 0x0000000000406ae6 in QpiddBroker::execute (this=<value optimized out>, options=0x13c5fe50) at posix/QpiddBroker.cpp:176 #4 0x00000000004055af in main (argc=11, argv=0x7fffd8863648) at qpidd.cpp:80 Thread 5 (Thread 9583): #0 0x00000035cd20b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f109fb7bf in qpid::sys::Timer::run (this=0x13c676f0) at ../include/qpid/sys/posix/Condition.h:69 #2 0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable ( p=0x13c67724) at qpid/sys/posix/Thread.cpp:35 #3 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6 Thread 4 (Thread 9584): #0 0x00000035cd20b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f109fb7bf in qpid::sys::Timer::run (this=0x13c84f00) at ../include/qpid/sys/posix/Condition.h:69 #2 0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable ( p=0x13c84f34) at qpid/sys/posix/Thread.cpp:35 #3 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6 Thread 3 (Thread 9586): #0 0x00000035cd20aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000003f109fb5e3 in wait (this=0x13c88140) at ../include/qpid/sys/posix/Condition.h:63 #2 wait (this=0x13c88140) at ../include/qpid/sys/Monitor.h:41 #3 qpid::sys::Timer::run (this=0x13c88140) at qpid/sys/Timer.cpp:98 #4 0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable ( p=0x13c88174) at qpid/sys/posix/Thread.cpp:35 #5 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0 #6 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6 Thread 2 (Thread 9597): #0 0x00000035cc6d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x0000003f1092baef in qpid::sys::Poller::wait (this=0x13c8e0e0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:570 #2 0x0000003f1092c4e7 in qpid::sys::Poller::run (this=0x13c8e0e0) at qpid/sys/epoll/EpollPoller.cpp:517 #3 0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable ( p=0x1b) at qpid/sys/posix/Thread.cpp:35 #4 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 9587): #0 0x0000003f112819b0 in vtable for qpid::sys::OutputTask () from /usr/lib64/libqpidbroker.so.2 #1 0x0000003f109f0d49 in qpid::sys::AsynchIOHandler::disconnect ( this=0x2aaab0276a10) at qpid/sys/AsynchIOHandler.cpp:194 #2 0x0000003f109f1049 in qpid::sys::AsynchIOHandler::eof ( this=0x2aaaaca240a0, a=...) at qpid/sys/AsynchIOHandler.cpp:177 #3 0x0000003f1092118f in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() (this=0x2aaaac588650, a0=...) at /usr/include/boost/function/function_template.hpp:576 #4 0x0000003f10920a93 in operator()<boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> > >, boost::_bi::list1<qpid::sys::DispatchHandle&> > ( function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:149 #5 operator()<qpid::sys::DispatchHandle> ( function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/bind/bind_template.hpp:32 #6 boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> > >, boost::_bi::list2<boost::_bi::value<qpid::sys::posix::AsynchIO*>, boost::_bi::value<boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> > > > >, void, qpid::sys::DispatchHandle&>::invoke (function_obj_ptr=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:136 #7 0x0000003f109f7f87 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() (this=0x2aaaac588650, a0=...) at /usr/include/boost/function/function_template.hpp:576 #8 0x0000003f109f4122 in qpid::sys::DispatchHandle::processEvent ( this=0x2aaab00a43a8, type=<value optimized out>) at qpid/sys/DispatchHandle.cpp:309 #9 0x0000003f10929f5e in qpid::sys::HandleSet::cleanup ( this=<value optimized out>) at qpid/sys/Poller.h:125 #10 0x0000003f1092c561 in qpid::sys::Poller::run (this=0x13c67e70) at qpid/sys/epoll/EpollPoller.cpp:528 #11 0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable ( p=0x2aaaaca240a0) at qpid/sys/posix/Thread.cpp:35 #12 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0 #13 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6 python-qmf-0.7.946106-3.el5 python-qpid-0.7.946106-1.el5 qmf-0.7.946106-3.el5 qmf-devel-0.7.946106-3.el5 qpid-cpp-client-0.7.946106-3.el5 qpid-cpp-client-devel-0.7.946106-3.el5 qpid-cpp-client-devel-docs-0.7.946106-3.el5 qpid-cpp-client-rdma-0.7.946106-3.el5 qpid-cpp-client-ssl-0.7.946106-3.el5 qpid-cpp-mrg-debuginfo-0.7.946106-3.el5 qpid-cpp-server-0.7.946106-3.el5 qpid-cpp-server-cluster-0.7.946106-3.el5 qpid-cpp-server-devel-0.7.946106-3.el5 qpid-cpp-server-rdma-0.7.946106-3.el5 qpid-cpp-server-ssl-0.7.946106-3.el5 qpid-cpp-server-store-0.7.946106-3.el5 qpid-cpp-server-xml-0.7.946106-3.el5 qpid-java-client-0.7.946106-3.el5 qpid-java-common-0.7.946106-3.el5 qpid-tools-0.7.946106-4.el5 rh-qpid-cpp-tests-0.7.946106-3.el5 ruby-qmf-0.7.946106-3.el5 [root@dhcp-30-90 bz506758_ori]# uname -a Linux dhcp-30-90.brq.redhat.com 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux ./run.sh 4 10
Created attachment 425124 [details] The issue reproducer including full core file dump and tailed broker logs from failure run
The machine used for initial test was virtualized KVM RHEL 5.5 x86_64 with single core. If you use multicore machines for triggering the issue, raise number of cluster nodes and number of parallel subscribe clients appropriately to reach load 3.5 * $(grep -ic ^processor /proc/cpuinfo) The another machine was testing in parallel with above RHEL 5.5 x86_64, it was RHEL 5.5. i386 which did not trigger the issue. There is hypothesis that the issue is more rapidly seen on 64 bit machine.
This looks like the issue for which https://bugzilla.redhat.com/show_bug.cgi?id=602198 was raised. The fixes for that were not in the beta3 packages (committed after the mrg_1.3_beta3 tag in the repo). I'm marking as duplicate, feel free to reopen if you disagree. *** This bug has been marked as a duplicate of bug 602198 ***