Created attachment 335676 [details] failover_soak test Description of problem: When running slightly modified failover_soak on mrg packages data, there was observed qpidd crash (most probably as the consequence of aisexec assertion). aisexec assertion: aisexec: ../include/sq.h:171: sq_item_add: Assertion `sq->items_inuse[sq_position] == 0' failed. There will be another BZ initiated. qpidd backtraces (2 observations, RHEL 5.3 i386 / x86_64): [root@hp-dl385-01 fsoak]# file /root/_bzs/fsoak/core.8263 /root/_bzs/fsoak/core.8263: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'qpidd' [root@hp-dl385-01 fsoak]# gdb `which qpidd` /root/_bzs/fsoak/core.8263 GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... (no debugging symbols found) warning: Can't read pathname for load map: Input/output error. Reading symbols from /usr/lib/libqpidbroker.so.0...(no debugging symbols found)...done. ... Loaded symbols for /usr/lib/librdmacm.so.1 Reading symbols from /usr/lib/libibverbs.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libibverbs.so.1 (no debugging symbols found) Core was generated by `qpidd --no-module-dir --load-module /usr/lib/qpid/daemon/cluster.so --cluster-n'. Program terminated with signal 11, Segmentation fault. [New process 8263] [New process 8265] [New process 8264] #0 0x0070a0ac in memcpy () from /lib/libc.so.6 (gdb) thread apply all bt Thread 3 (process 8264): #0 0x00ef4402 in __kernel_vsyscall () #1 0x007ee8c2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x00776b84 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00b24e70 in qpid::broker::Timer::run () from /usr/lib/libqpidbroker.so.0 #4 0x00381611 in ?? () from /usr/lib/libqpidcommon.so.0 #5 0x007ea49b in start_thread () from /lib/libpthread.so.0 #6 0x0076a42e in clone () from /lib/libc.so.6 Thread 2 (process 8265): #0 0x00ef4402 in __kernel_vsyscall () #1 0x007ee8c2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x00776b84 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00b24e70 in qpid::broker::Timer::run () from /usr/lib/libqpidbroker.so.0 #4 0x00381611 in ?? () from /usr/lib/libqpidcommon.so.0 #5 0x007ea49b in start_thread () from /lib/libpthread.so.0 #6 0x0076a42e in clone () from /lib/libc.so.6 Thread 1 (process 8263): #0 0x0070a0ac in memcpy () from /lib/libc.so.6 #1 0x00d50cb4 in std::string::_Rep::_M_clone () from /usr/lib/libstdc++.so.6 #2 0x00d51617 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string () from /usr/lib/libstdc++.so.6 #3 0x00a66e59 in qpid::broker::Exchange::propagateFedOp () from /usr/lib/libqpidbroker.so.0 #4 0x00a98075 in qpid::broker::DirectExchange::unbind () from /usr/lib/libqpidbroker.so.0 #5 0x00adbbc2 in qpid::broker::QueueBindings::unbind () from /usr/lib/libqpidbroker.so.0 #6 0x00a6b16e in qpid::broker::Queue::unbind () from /usr/lib/libqpidbroker.so.0 #7 0x00a723eb in qpid::broker::Queue::tryAutoDelete () from /usr/lib/libqpidbroker.so.0 #8 0x00b009f5 in qpid::broker::SemanticState::cancel () from /usr/lib/libqpidbroker.so.0 #9 0x00b01521 in qpid::broker::SemanticState::~SemanticState () from /usr/lib/libqpidbroker.so.0 #10 0x00b197b9 in qpid::broker::SessionState::~SessionState () from /usr/lib/libqpidbroker.so.0 #11 0x00b21a12 in qpid::broker::SessionHandler::~SessionHandler () from /usr/lib/libqpidbroker.so.0 #12 0x00a884af in qpid::broker::Connection::~Connection () from /usr/lib/libqpidbroker.so.0 #13 0x00e0f8cc in qpid::cluster::Connection::~Connection () from /usr/lib/qpid/daemon/cluster.so #14 0x00a5aa95 in qpid::RefCounted::released () from /usr/lib/libqpidbroker.so.0 #15 0x00df8231 in std::_Rb_tree<qpid::cluster::ConnectionId, std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> >, std::_Select1st<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > >, std::less<qpid::cluster::ConnectionId>, std::allocator<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > > >::_M_erase () from /usr/lib/qpid/daemon/cluster.so #16 0x00de141d in qpid::cluster::Cluster::~Cluster () from /usr/lib/qpid/daemon/cluster.so ---Type <return> to continue, or q <return> to quit--- #17 0x00dddf57 in qpid::cluster::Cluster::brokerShutdown () from /usr/lib/qpid/daemon/cluster.so #18 0x00ded9c6 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, qpid::cluster::Cluster>, boost::_bi::list1<boost::_bi::value<qpid::cluster::Cluster*> > >, void>::invoke () from /usr/lib/qpid/daemon/cluster.so #19 0x0039e57c in boost::function0<void, std::allocator<void> >::operator() () from /usr/lib/libqpidcommon.so.0 #20 0x0039da8d in ?? () from /usr/lib/libqpidcommon.so.0 #21 0x0039e3db in std::for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> () from /usr/lib/libqpidcommon.so.0 #22 0x0039d9f9 in qpid::Plugin::Target::finalize () from /usr/lib/libqpidcommon.so.0 #23 0x00a533f0 in qpid::broker::Broker::~Broker () from /usr/lib/libqpidbroker.so.0 #24 0x00a5aa95 in qpid::RefCounted::released () from /usr/lib/libqpidbroker.so.0 #25 0x00b23379 in ?? () from /usr/lib/libqpidbroker.so.0 #26 0x006c4fe9 in __cxa_finalize () from /lib/libc.so.6 #27 0x00a06664 in ?? () from /usr/lib/libqpidbroker.so.0 #28 0x00b961a0 in ?? () from /usr/lib/libqpidbroker.so.0 #29 0x00000022 in ?? () #30 0x007da140 in ?? () from /lib/libc.so.6 #31 0x00a0663a in ?? () from /usr/lib/libqpidbroker.so.0 #32 0x00b9b684 in ?? () from /usr/lib/libqpidbroker.so.0 #33 0x00696240 in _rtld_local () from /lib/ld-linux.so.2 #34 0xbfd09d18 in ?? () #35 0x00b4a05c in _fini () from /usr/lib/libqpidbroker.so.0 Backtrace stopped: frame did not save the PC (gdb) -------------------- [root@hp-ml370g4-01 fsoak]# file /root/_bzs/fsoak/core.10220 /root/_bzs/fsoak/core.10220: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'qpidd' [root@hp-ml370g4-01 fsoak]# gdb `which qpidd` /root/_bzs/fsoak/core.10220 GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... (no debugging symbols found) Reading symbols from /usr/lib64/libqpidbroker.so.0...(no debugging symbols found)...done. ... Reading symbols from /usr/lib64/libibverbs.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libibverbs.so.1 Core was generated by `qpidd --no-module-dir --load-module /usr/lib64/qpid/daemon/cluster.so --cluster'. Program terminated with signal 11, Segmentation fault. [New process 10220] [New process 10222] [New process 10221] #0 0x0000003446e7b7ec in memcpy () from /lib64/libc.so.6 (gdb) thread apply all bt Thread 3 (process 10221): #0 0x0000003447a0ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000378e188e6f in qpid::broker::Timer::run () from /usr/lib64/libqpidbroker.so.0 #2 0x000000378d76ac4a in ?? () from /usr/lib64/libqpidcommon.so.0 #3 0x0000003447a06367 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003446ed30ad in clone () from /lib64/libc.so.6 Thread 2 (process 10222): #0 0x0000003447a0ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000378e188e6f in qpid::broker::Timer::run () from /usr/lib64/libqpidbroker.so.0 #2 0x000000378d76ac4a in ?? () from /usr/lib64/libqpidcommon.so.0 #3 0x0000003447a06367 in start_thread () from /lib64/libpthread.so.0 #4 0x0000003446ed30ad in clone () from /lib64/libc.so.6 Thread 1 (process 10220): #0 0x0000003446e7b7ec in memcpy () from /lib64/libc.so.6 #1 0x0000003447e9c200 in std::string::_Rep::_M_clone () from /usr/lib64/libstdc++.so.6 #2 0x0000003447e9c8ff in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string () from /usr/lib64/libstdc++.so.6 #3 0x000000378e0de5cf in qpid::broker::Exchange::propagateFedOp () from /usr/lib64/libqpidbroker.so.0 #4 0x000000378e109c70 in qpid::broker::DirectExchange::unbind () from /usr/lib64/libqpidbroker.so.0 #5 0x000000378e146ac7 in qpid::broker::QueueBindings::unbind () from /usr/lib64/libqpidbroker.so.0 #6 0x000000378e0dffeb in qpid::broker::Queue::unbind () from /usr/lib64/libqpidbroker.so.0 #7 0x000000378e0e74b6 in qpid::broker::Queue::tryAutoDelete () from /usr/lib64/libqpidbroker.so.0 #8 0x000000378e161000 in qpid::broker::SemanticState::cancel () from /usr/lib64/libqpidbroker.so.0 #9 0x000000378e1696ec in qpid::broker::SemanticState::~SemanticState () from /usr/lib64/libqpidbroker.so.0 #10 0x000000378e17eeaa in qpid::broker::SessionState::~SessionState () from /usr/lib64/libqpidbroker.so.0 #11 0x000000378e185c55 in qpid::broker::SessionHandler::~SessionHandler () from /usr/lib64/libqpidbroker.so.0 #12 0x000000378e0f9f5f in qpid::broker::Connection::~Connection () from /usr/lib64/libqpidbroker.so.0 #13 0x00002b46b5eaae60 in qpid::cluster::Connection::~Connection () from /usr/lib64/qpid/daemon/cluster.so #14 0x00002b46b5e926ba in std::_Rb_tree<qpid::cluster::ConnectionId, std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> >, std::_Select1st<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > >, std::less<qpid::cluster::ConnectionId>, std::allocator<std::pair<qpid::cluster::ConnectionId const, boost::intrusive_ptr<qpid::cluster::Connection> > > >::_M_erase () from /usr/lib64/qpid/daemon/cluster.so #15 0x00002b46b5e847f1 in qpid::cluster::Cluster::~Cluster () from /usr/lib64/qpid/daemon/cluster.so #16 0x00002b46b5e7bed5 in qpid::cluster::Cluster::brokerShutdown () from /usr/lib64/qpid/daemon/cluster.so #17 0x000000378d785a2f in boost::function0<void, std::allocator<void> >::operator() () from /usr/lib64/libqpidcommon.so.0 #18 0x000000378d7858b6 in std::for_each<__gnu_cxx::__normal_iterator<boost::function<void ()(), std::allocator<void> >*, std::vector<boost::function<void ()(), std::allocator<void> >, std::allocator<boost::function<void ()(), std::allocator<void> > > > >, void (*)(boost::function<void ()(), std::allocator<void> >)> () from /usr/lib64/libqpidcommon.so.0 ---Type <return> to continue, or q <return> to quit--- #19 0x000000378d784f80 in qpid::Plugin::Target::finalize () from /usr/lib64/libqpidcommon.so.0 #20 0x000000378e0c9b69 in qpid::broker::Broker::~Broker () from /usr/lib64/libqpidbroker.so.0 #21 0x0000003446e3363e in __cxa_finalize () from /lib64/libc.so.6 #22 0x000000378e088e06 in ?? () from /usr/lib64/libqpidbroker.so.0 #23 0x0000003446c1c000 in ?? () from /lib64/ld-linux-x86-64.so.2 #24 0x0000000000000000 in ?? () (gdb) quit Version-Release number of selected component (if applicable): [root@hp-ml370g4-01 fsoak]# rpm -qa | egrep '(ais|qpid|rhm)' qpidc-0.5.752581-1.el5 qpidd-acl-0.5.752581-1.el5 rhm-docs-0.5.753238-1.el5 qpid-java-common-0.5.751061-1.el5 qpidc-rdma-0.5.752581-1.el5 qpidd-ssl-0.5.752581-1.el5 qpidc-perftest-0.5.752581-1.el5 openais-0.80.3-22.el5_3.3 qpidd-0.5.752581-1.el5 qpidc-devel-0.5.752581-1.el5 qpidd-rdma-0.5.752581-1.el5 rhm-0.5.3153-1.el5 python-qpid-0.5.752581-1.el5 qpid-java-client-0.5.751061-1.el5 qpidd-cluster-0.5.752581-1.el5 qpidc-ssl-0.5.752581-1.el5 qpidd-xml-0.5.752581-1.el5 qpidd-devel-0.5.752581-1.el5 How reproducible: >50% (sometimes it show up just after ~30 runs sometime after ~800 runs) Steps to Reproduce: 1. run failover_soak test in a loop (see attachement for reproducer) 2. watch results Actual results: (aisexec exits with above mentioned assertion) clustered qpidd crashes Expected results: both aisexec and qpidd should continue working w/o any issue. Additional info:
Bit more detail on Thread1 above with debuginfo installed: ... #22 0x0039d9f9 in qpid::Plugin::Target::finalize (this=<value optimized out>) at qpid/Plugin.cpp:45 #23 0x00a533f0 in ~Broker (this=<value optimized out>) at qpid/broker/Broker.cpp:337 #24 0x00a5aa95 in qpid::RefCounted::released (this=<value optimized out>) at qpid/RefCounted.h:48 #25 0x00b23379 in __tcf_1 () at qpid/RefCounted.h:42 #26 0x006c4fe9 in __cxa_finalize () from /lib/libc.so.6 #27 0x00a06664 in __do_global_dtors_aux () from /usr/lib/libqpidbroker.so.0 #28 0x00b4a05c in _fini () from /usr/lib/libqpidbroker.so.0 #29 0x006897ee in _dl_fini () from /lib/ld-linux.so.2 #30 0x006c4d39 in exit () from /lib/libc.so.6 #31 0x006aee94 in __libc_start_main () from /lib/libc.so.6 #32 0x0804c001 in _start ()
Fixed upstream in revision 823258.
*** Bug 509212 has been marked as a duplicate of this bug. ***
*** Bug 509436 has been marked as a duplicate of this bug. ***
The issue is proved to be fixed (no segfaults/aborts), retested on RHEL 5.5 i386 / x86_64 on packages: python-qmf-0.7.946106-12.el5 python-qpid-0.7.946106-13.el5 qmf-0.7.946106-12.el5 qmf-devel-0.7.946106-12.el5 qpid-cpp-*-0.7.946106-12.el5 qpid-dotnet-0.4.738274-2.el5 qpid-java-*-0.7.946106-8.el5 qpid-tests-0.7.946106-1.el5 qpid-tools-0.7.946106-10.el5 ruby-qmf-0.7.946106-12.el5 -> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The qpidd service no longer terminates with a segmentation fault due to aisexec assertion.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html