Bug 605656 - clustered qpidd broker crash in qpid::sys::OutputTask
Summary: clustered qpidd broker crash in qpid::sys::OutputTask
Keywords:
Status: CLOSED DUPLICATE of bug 602198
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: All
OS: Linux
high
high
Target Milestone: 1.3
: ---
Assignee: messaging-bugs
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-06-18 13:51 UTC by Frantisek Reznicek
Modified: 2015-11-16 01:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-06-21 09:07:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The issue reproducer including full core file dump and tailed broker logs from failure run (17.74 KB, application/x-tbz)
2010-06-18 13:52 UTC, Frantisek Reznicek
no flags Details

Description Frantisek Reznicek 2010-06-18 13:51:07 UTC
Description of problem:

There was observed clustered broker crash while retesting bug 510475 with original bz506758 reproducer.

Analyzed core dump shows following backtrace:

  Thread 1 (Thread 9587):
  #0  0x0000003f112819b0 in vtable for qpid::sys::OutputTask ()
    from /usr/lib64/libqpidbroker.so.2
  #1  0x0000003f109f0d49 in qpid::sys::AsynchIOHandler::disconnect (
      this=0x2aaab0276a10) at qpid/sys/AsynchIOHandler.cpp:194
  #2  0x0000003f109f1049 in qpid::sys::AsynchIOHandler::eof (
      this=0x2aaaaca240a0, a=...) at qpid/sys/AsynchIOHandler.cpp:177
  #3  0x0000003f1092118f in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() (this=0x2aaaac588650, a0=...)
      at /usr/include/boost/function/function_template.hpp:576
  #4  0x0000003f10920a93 in operator()<boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> > >, boost::_bi::list1<qpid::sys::DispatchHandle&> > (
      function_obj_ptr=<value optimized out>, a0=<value optimized out>)
      at /usr/include/boost/bind/mem_fn_template.hpp:149
  #5  operator()<qpid::sys::DispatchHandle> (
      function_obj_ptr=<value optimized out>, a0=<value optimized out>)
      at /usr/include/boost/bind/bind_template.hpp:32
  ...

This crash was detected just once over about one day testing on virtualized RHEL 5.5 x86_64

Version-Release number of selected component (if applicable):
python-qmf-0.7.946106-3.el5
python-qpid-0.7.946106-1.el5
qmf-0.7.946106-3.el5
qmf-devel-0.7.946106-3.el5
qpid-cpp-client-0.7.946106-3.el5
qpid-cpp-client-devel-0.7.946106-3.el5
qpid-cpp-client-devel-docs-0.7.946106-3.el5
qpid-cpp-client-rdma-0.7.946106-3.el5
qpid-cpp-client-ssl-0.7.946106-3.el5
qpid-cpp-mrg-debuginfo-0.7.946106-3.el5
qpid-cpp-server-0.7.946106-3.el5
qpid-cpp-server-cluster-0.7.946106-3.el5
qpid-cpp-server-devel-0.7.946106-3.el5
qpid-cpp-server-rdma-0.7.946106-3.el5
qpid-cpp-server-ssl-0.7.946106-3.el5
qpid-cpp-server-store-0.7.946106-3.el5
qpid-cpp-server-xml-0.7.946106-3.el5
qpid-java-client-0.7.946106-3.el5
qpid-java-common-0.7.946106-3.el5
qpid-tools-0.7.946106-4.el5
rh-qpid-cpp-tests-0.7.946106-3.el5
ruby-qmf-0.7.946106-3.el5


How reproducible:
quite hard

Steps to Reproduce:
1. follow bug 510475 steps

In more detail, run the attached reproducer 
 - ./run.sh $((4+${RANDOM}%2)) $((8+${RANDOM}%6)) on background
 - while executing check the core files, if found exit
 - once reproduce executes for 10 minutes already kill the process and restart
  
Actual results:
qpidd broker crashes.

Expected results:
qpidd broker should not crash.

Additional info:
[root@dhcp-30-90 bz506758_ori]# cat dump_core.9579
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.1)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/qpidd...Reading symbols from /usr/lib/debug/usr/sbin/qpidd.debug...
warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug

warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug

warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug
done.
done.
[New Thread 9597]
[New Thread 9586]
[New Thread 9584]
[New Thread 9583]
[New Thread 9579]

warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug

warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug

warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug
Reading symbols from /usr/lib64/libqpidbroker.so.2...Reading symbols from /usr/lib/debug/usr/lib64/libqpidbroker.so.2.0.0.debug...done.
done.
...
warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug
warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug
warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug
Core was generated by `qpidd -p 5672 --auth no --log-enable info+ --cluster-name dhcp-30-90.brq.redhat'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003f112819b0 in vtable for qpid::sys::OutputTask ()
   from /usr/lib64/libqpidbroker.so.2
(gdb) rax            0x2aaaac588650     46912524289616
rbx            0x2aaab0276a10   46912588179984
rcx            0x3f109f1040     270861799488
rdx            0x2aaab0276a10   46912588179984
rsi            0x2aaab00a43a0   46912586269600
rdi            0x2aaaaca240a0   46912529121440
rbp            0x2aaab80bd5d0   0x2aaab80bd5d0
rsp            0x4303b188       0x4303b188
r8             0x1 1
r9             0x2573   9587
r10            0x0 0
r11            0x3f109f1040     270861799488
r12            0x3f1091e530     270860936496
r13            0x3f1101eb10     270868278032
r14            0x3f1101e960     270868277600
r15            0x2aaab0276a10   46912588179984
rip            0x3f112819b0     0x3f112819b0 <vtable for qpid::sys::OutputTask+16>
eflags         0x10206  [ PF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0 0
es             0x0 0
fs             0x63     99
gs             0x0 0
st0            0   (raw 0x00000000000000000000)
st1            0   (raw 0x00000000000000000000)
st2            0   (raw 0x00000000000000000000)
st3            0   (raw 0x00000000000000000000)
st4            0   (raw 0x00000000000000000000)
st5            0   (raw 0x00000000000000000000)
st6            0   (raw 0x00000000000000000000)
st7            0   (raw 0x00000000000000000000)
fctrl          0x37f    895
fstat          0x0 0
ftag           0xffff   65535
fiseg          0x0 0
fioff          0x0 0
foseg          0x0 0
fooff          0x0 0
fop            0x0 0
(gdb) Using memory regions provided by the target.
There are no memory regions defined.
(gdb) From                To                  Syms Read   Shared Object Library
0x0000003f10ea11b0  0x0000003f110203d8  Yes (*)     /usr/lib64/libqpidbroker.so.2
0x0000003f1090e980  0x0000003f10a0fea8  Yes (*)     /usr/lib64/libqpidcommon.so.2
0x0000003456610aa0  0x000000345662dae8  Yes (*)     /usr/lib64/libboost_program_options.so.2
0x0000003457004810  0x000000345700cff8  Yes (*)     /usr/lib64/libboost_filesystem.so.2
0x00000035d1601500  0x00000035d1602918  Yes (*)     /lib64/libuuid.so.1
0x00000035cca00e10  0x00000035cca01a08  Yes (*)     /lib64/libdl.so.2
0x00000035cda02220  0x00000035cda05cc8  Yes (*)     /lib64/librt.so.1
0x00000035cfa046e0  0x00000035cfa13be8  Yes (*)     /usr/lib64/libsasl2.so.2
0x00000035cf64f430  0x00000035cf6c3058  Yes (*)     /usr/lib64/libstdc++.so.6
0x00000035cce03e60  0x00000035cce43e38  Yes (*)     /lib64/libm.so.6
0x00000035cf201e50  0x00000035cf20b018  Yes (*)     /lib64/libgcc_s.so.1
0x00000035cc61d780  0x00000035cc709ff8  Yes (*)     /lib64/libc.so.6
0x00000035cc200a70  0x00000035cc21671e  Yes (*)     /lib64/ld-linux-x86-64.so.2
0x00000035cd2051f0  0x00000035cd210258  Yes (*)     /lib64/libpthread.so.0
0x00000035d06032a0  0x00000035d060e2d8  Yes (*)     /lib64/libresolv.so.2
0x00000035ce6009f0  0x00000035ce606918  Yes (*)     /lib64/libcrypt.so.1
0x00002b327a16bb70  0x00002b327a171738  Yes (*)     /usr/lib64/qpid/daemon/replicating_listener.so
0x00002b327a382f10  0x00002b327a3a34e8  Yes (*)     /usr/lib64/qpid/daemon/acl.so
0x00002b327a5e4fa0  0x00002b327a672da8  Yes (*)     /usr/lib64/qpid/daemon/msgstore.so
0x00002b327a8d65d0  0x00002b327a98f288  Yes (*)     /usr/lib64/libdb_cxx-4.3.so
0x00002b327abba510  0x00002b327abba6d1  Yes (*)     /usr/lib64/libaio.so.1
0x00002b327adc0610  0x00002b327adc4bd8  Yes (*)     /usr/lib64/qpid/daemon/replication_exchange.so
0x00002b327afcd5e0  0x00002b327afd0b68  Yes (*)     /usr/lib64/qpid/daemon/watchdog.so
0x00002b327b1dc570  0x00002b327b1e4728  Yes (*)     /usr/lib64/qpid/daemon/ssl.so
0x00002b327b405640  0x00002b327b41c0a8  Yes (*)     /usr/lib64/libsslcommon.so.2
0x0000003000e183b0  0x0000003000ef6f48  Yes (*)     /usr/lib64/libnss3.so
0x0000003001a085e0  0x0000003001a2b638  Yes (*)     /usr/lib64/libssl3.so
0x00000035d020cf30  0x00000035d022b738  Yes (*)     /usr/lib64/libnspr4.so
0x0000003001208340  0x0000003001212c38  Yes (*)     /usr/lib64/libnssutil3.so
0x00000035cee01370  0x00000035cee02978  Yes (*)     /usr/lib64/libplc4.so
0x00002b327b623e10  0x00002b327b624c08  Yes (*)     /usr/lib64/libplds4.so
0x00000035cd601fd0  0x00000035cd60cac8  Yes (*)     /usr/lib64/libz.so.1
0x00002b327b82eb00  0x00002b327b8365a8  Yes (*)     /usr/lib64/qpid/daemon/rdma.so
0x00002b327ba55270  0x00002b327ba62928  Yes (*)     /usr/lib64/librdmawrap.so.2
0x00002b327bc6b360  0x00002b327bc71e38  Yes (*)     /usr/lib64/libibverbs.so.1
0x00002b327be76580  0x00002b327be78f88  Yes (*)     /usr/lib64/librdmacm.so.1
0x00002b327c0827b0  0x00002b327c08c578  Yes (*)     /usr/lib64/qpid/daemon/xml.so
0x000000385d373070  0x000000385d4f4758  Yes (*)     /usr/lib64/libxerces-c.so.28
0x000000385db6d090  0x000000385dcf8b28  Yes (*)     /usr/lib64/libxqilla.so.3
0x00002b327c2d1270  0x00002b327c33b758  Yes (*)     /usr/lib64/qpid/daemon/cluster.so
0x00000038444013d0  0x0000003844403338  Yes (*)     /usr/lib64/openais/libcpg.so.2
0x00002b327c572110  0x00002b327c574b78  Yes (*)     /usr/lib64/libcman.so.2
0x00002b327c7b5d60  0x00002b327c82f298  Yes (*)     /usr/lib64/libqpidclient.so.2
0x00002aaaaaefa9f0  0x00002aaaaaf072f8  Yes (*)     /usr/lib64/qpid/client/sslconnector.so
0x00002aaaab117530  0x00002aaaab125a98  Yes (*)     /usr/lib64/qpid/client/rdmaconnector.so
0x00002aaaab32bfb0  0x00002aaaab32dbc8  Yes (*)     /usr/lib64/sasl2/libanonymous.so.2
0x00002aaaab54be60  0x00002aaaab5e9388  Yes (*)     /usr/lib64/sasl2/libsasldb.so.2
0x00002aaaab809fa0  0x00002aaaab80bd08  Yes (*)     /usr/lib64/sasl2/liblogin.so.2
0x00002aaaaba0dfb0  0x00002aaaaba0fd58  Yes (*)     /usr/lib64/sasl2/libplain.so.2
(*): Shared library is missing debugging information.
(gdb)   6 Thread 9579  0x00000035cd207b35 in pthread_join ()
   from /lib64/libpthread.so.0
  5 Thread 9583  0x00000035cd20b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  4 Thread 9584  0x00000035cd20b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  3 Thread 9586  0x00000035cd20aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  2 Thread 9597  0x00000035cc6d4108 in epoll_wait () from /lib64/libc.so.6
* 1 Thread 9587  0x0000003f112819b0 in vtable for qpid::sys::OutputTask ()
   from /usr/lib64/libqpidbroker.so.2
(gdb)
Thread 6 (Thread 9579):
#0  0x00000035cd207b35 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000003f109239dd in qpid::sys::Thread::join (this=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:70
#2  0x0000003f10f0ea83 in qpid::broker::Broker::run (
    this=<value optimized out>) at qpid/broker/Broker.cpp:342
#3  0x0000000000406ae6 in QpiddBroker::execute (this=<value optimized out>,
    options=0x13c5fe50) at posix/QpiddBroker.cpp:176
#4  0x00000000004055af in main (argc=11, argv=0x7fffd8863648) at qpidd.cpp:80

Thread 5 (Thread 9583):
#0  0x00000035cd20b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f109fb7bf in qpid::sys::Timer::run (this=0x13c676f0)
    at ../include/qpid/sys/posix/Condition.h:69
#2  0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable (
    p=0x13c67724) at qpid/sys/posix/Thread.cpp:35
#3  0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0
#4  0x00000035cc6d3d1d in clone () from /lib64/libc.so.6

Thread 4 (Thread 9584):
#0  0x00000035cd20b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f109fb7bf in qpid::sys::Timer::run (this=0x13c84f00)
    at ../include/qpid/sys/posix/Condition.h:69
#2  0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable (
    p=0x13c84f34) at qpid/sys/posix/Thread.cpp:35
#3  0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0
#4  0x00000035cc6d3d1d in clone () from /lib64/libc.so.6

Thread 3 (Thread 9586):
#0  0x00000035cd20aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003f109fb5e3 in wait (this=0x13c88140)
    at ../include/qpid/sys/posix/Condition.h:63
#2  wait (this=0x13c88140) at ../include/qpid/sys/Monitor.h:41
#3  qpid::sys::Timer::run (this=0x13c88140) at qpid/sys/Timer.cpp:98
#4  0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable (
    p=0x13c88174) at qpid/sys/posix/Thread.cpp:35
#5  0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0
#6  0x00000035cc6d3d1d in clone () from /lib64/libc.so.6

Thread 2 (Thread 9597):
#0  0x00000035cc6d4108 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003f1092baef in qpid::sys::Poller::wait (this=0x13c8e0e0,
    timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:570
#2  0x0000003f1092c4e7 in qpid::sys::Poller::run (this=0x13c8e0e0)
    at qpid/sys/epoll/EpollPoller.cpp:517
#3  0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable (
    p=0x1b) at qpid/sys/posix/Thread.cpp:35
#4  0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000035cc6d3d1d in clone () from /lib64/libc.so.6

Thread 1 (Thread 9587):
#0  0x0000003f112819b0 in vtable for qpid::sys::OutputTask ()
   from /usr/lib64/libqpidbroker.so.2
#1  0x0000003f109f0d49 in qpid::sys::AsynchIOHandler::disconnect (
    this=0x2aaab0276a10) at qpid/sys/AsynchIOHandler.cpp:194
#2  0x0000003f109f1049 in qpid::sys::AsynchIOHandler::eof (
    this=0x2aaaaca240a0, a=...) at qpid/sys/AsynchIOHandler.cpp:177
#3  0x0000003f1092118f in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() (this=0x2aaaac588650, a0=...)
    at /usr/include/boost/function/function_template.hpp:576
#4  0x0000003f10920a93 in operator()<boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> > >, boost::_bi::list1<qpid::sys::DispatchHandle&> > (
    function_obj_ptr=<value optimized out>, a0=<value optimized out>)
    at /usr/include/boost/bind/mem_fn_template.hpp:149
#5  operator()<qpid::sys::DispatchHandle> (
    function_obj_ptr=<value optimized out>, a0=<value optimized out>)
    at /usr/include/boost/bind/bind_template.hpp:32
#6  boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> > >, boost::_bi::list2<boost::_bi::value<qpid::sys::posix::AsynchIO*>, boost::_bi::value<boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> > > > >, void, qpid::sys::DispatchHandle&>::invoke (function_obj_ptr=<value optimized out>,
    a0=<value optimized out>)
    at /usr/include/boost/function/function_template.hpp:136
#7  0x0000003f109f7f87 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() (this=0x2aaaac588650, a0=...)
    at /usr/include/boost/function/function_template.hpp:576
#8  0x0000003f109f4122 in qpid::sys::DispatchHandle::processEvent (
    this=0x2aaab00a43a8, type=<value optimized out>)
    at qpid/sys/DispatchHandle.cpp:309
#9  0x0000003f10929f5e in qpid::sys::HandleSet::cleanup (
    this=<value optimized out>) at qpid/sys/Poller.h:125
#10 0x0000003f1092c561 in qpid::sys::Poller::run (this=0x13c67e70)
    at qpid/sys/epoll/EpollPoller.cpp:528
#11 0x0000003f1092348a in qpid::sys::(anonymous namespace)::runRunnable (
    p=0x2aaaaca240a0) at qpid/sys/posix/Thread.cpp:35
#12 0x00000035cd20673d in start_thread () from /lib64/libpthread.so.0
#13 0x00000035cc6d3d1d in clone () from /lib64/libc.so.6

python-qmf-0.7.946106-3.el5
python-qpid-0.7.946106-1.el5
qmf-0.7.946106-3.el5
qmf-devel-0.7.946106-3.el5
qpid-cpp-client-0.7.946106-3.el5
qpid-cpp-client-devel-0.7.946106-3.el5
qpid-cpp-client-devel-docs-0.7.946106-3.el5
qpid-cpp-client-rdma-0.7.946106-3.el5
qpid-cpp-client-ssl-0.7.946106-3.el5
qpid-cpp-mrg-debuginfo-0.7.946106-3.el5
qpid-cpp-server-0.7.946106-3.el5
qpid-cpp-server-cluster-0.7.946106-3.el5
qpid-cpp-server-devel-0.7.946106-3.el5
qpid-cpp-server-rdma-0.7.946106-3.el5
qpid-cpp-server-ssl-0.7.946106-3.el5
qpid-cpp-server-store-0.7.946106-3.el5
qpid-cpp-server-xml-0.7.946106-3.el5
qpid-java-client-0.7.946106-3.el5
qpid-java-common-0.7.946106-3.el5
qpid-tools-0.7.946106-4.el5
rh-qpid-cpp-tests-0.7.946106-3.el5
ruby-qmf-0.7.946106-3.el5
[root@dhcp-30-90 bz506758_ori]# uname -a
Linux dhcp-30-90.brq.redhat.com 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

./run.sh 4 10

Comment 1 Frantisek Reznicek 2010-06-18 13:52:26 UTC
Created attachment 425124 [details]
The issue reproducer including full core file dump and tailed broker logs from failure run

Comment 2 Frantisek Reznicek 2010-06-18 13:57:44 UTC
The machine used for initial test was virtualized KVM RHEL 5.5 x86_64 with single core. If you use multicore machines for triggering the issue, raise number of cluster nodes and number of parallel subscribe clients appropriately to reach load 3.5 * $(grep -ic ^processor /proc/cpuinfo)

The another machine was testing in parallel with above RHEL 5.5 x86_64, it was RHEL 5.5. i386 which did not trigger the issue.

There is hypothesis that the issue is more rapidly seen on 64 bit machine.

Comment 3 Gordon Sim 2010-06-21 09:07:21 UTC
This looks like the issue for which https://bugzilla.redhat.com/show_bug.cgi?id=602198 was raised. The fixes for that were not in the beta3 packages (committed after the mrg_1.3_beta3 tag in the repo). I'm marking as duplicate, feel free to reopen if you disagree.

*** This bug has been marked as a duplicate of bug 602198 ***


Note You need to log in before you can comment on or make changes to this bug.