Bug 433797

Summary: sporadic perftest core dump if broker is killed.
Product: Red Hat Enterprise MRG Reporter: Alan Conway <aconway>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: CLOSED WONTFIX QA Contact: Eric Sammons <esammons>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.1CC: cctrieloff, gsim, iboverma, jross, tross
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-27 23:15:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer including failing logs and backtraces
none
The backtrace data from stress test. none

Description Alan Conway 2008-02-21 14:50:46 UTC
Description of problem:

perftest client core dumps sporadically if broker is killed while it is connected.

How reproducible: easy, within 10 iterations or so on a 4 core machine.

Steps to Reproduce:
cd <qpid-checkout>/qpid/cpp/src
rm core.*
until ls core.*; do ./qpidd -dp 12345 --data-dir ""; test\
s/perftest -p12345 & sleep 1 ; ./qpidd -qp 12345 ; done
  
Actual results:
Stops with a core file.

Expected results:
Repeats indefinitely

Comment 1 Gordon Sim 2008-11-10 18:12:31 UTC
I believe this was fixed by r712691 and some earlier fixes to races in the client library tackled when getting failover to work reliably. Certainly I can run 100 iterations of perftest interrupted by having the broker killed from beneath it without any cores now (even on e.g. mrg5), whereas I was getting them reliably before.

Comment 3 Frantisek Reznicek 2008-11-21 14:23:34 UTC
Similar observed anymore on RHEL 5.2 x86_64 using packages:
qpidd-0.3.714072-1.el5, rhm-0.3.2804-1.el5

For reproducer used loop above modified the way that qpidd is run in non-daemon mode and stopped by CTRL-C.

There were seen two core files (similar problem observed also on RHEL5.2 i386):

[root@dell-pe1420-01 manual_bz433797]# gdb `which perftest` core.3130
GNU gdb Red Hat Linux (6.5-37.el5rh)
...
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `perftest --log-enable info+'.
Program terminated with signal 6, Aborted.
#0  0x0000003017630155 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003017630155 in raise () from /lib64/libc.so.6
#1  0x0000003017631bf0 in abort () from /lib64/libc.so.6
#2  0x00000030176295d6 in __assert_fail () from /lib64/libc.so.6
#3  0x00002af6fadd30af in std::for_each<__gnu_cxx::__normal_iterator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus**, std::vector<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*, std::allocator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*> > >, qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::AllThreadsStatuses::handleAdder> (
    __first=<value optimized out>, __last={_M_current = 0x1529bf10}, __f={handle = {px = 0x434dfc00, pn = {pi_ = 0x1529bf10}}})
    at ./qpid/sys/posix/Mutex.h:116
#4  0x00002af6fadd319c in qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::markForDeletion (handle=0x1529d330)
    at ./qpid/sys/DeletionManager.h:130
#5  0x00002af6fadc2b55 in ~AsynchIO (this=0x1529d0b0) at qpid/sys/posix/AsynchIO.cpp:329
#6  0x00002af6fa9b8967 in qpid::client::TCPConnector::run (this=0x15286090) at qpid/client/Connector.cpp:401
#7  0x00002af6fadc9d3a in qpid::sys::(anonymous namespace)::runRunnable ()
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/basic_string.h:257
#8  0x00000030182062f7 in start_thread () from /lib64/libpthread.so.0
#9  0x00000030176d1b6d in clone () from /lib64/libc.so.6
(gdb) [root@dell-pe1420-01 manual_bz433797]#

[root@dell-pe1420-01 manual_bz433797]# gdb `which perftest` core.606
GNU gdb Red Hat Linux (6.5-37.el5rh)
...
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `perftest --log-enable info+'.
Program terminated with signal 6, Aborted.
#0  0x0000003017630155 in raise () from /lib64/libc.so.6
(gdb)
(gdb) bt
#0  0x0000003017630155 in raise () from /lib64/libc.so.6
#1  0x0000003017631bf0 in abort () from /lib64/libc.so.6
#2  0x00000030176295d6 in __assert_fail () from /lib64/libc.so.6
#3  0x00002ac39922e6aa in qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::markAllUnusedInThisThread ()
    at ./qpid/sys/posix/Mutex.h:116
#4  0x00002ac39922bc80 in qpid::sys::Poller::wait (this=0x400b330, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:299
#5  0x00002ac39927082d in qpid::sys::Dispatcher::run (this=0x40a54000) at qpid/sys/Dispatcher.cpp:38
#6  0x00002ac398e1395b in qpid::client::TCPConnector::run (this=0x400ced0) at qpid/client/Connector.cpp:400
#7  0x00002ac399224d3a in qpid::sys::(anonymous namespace)::runRunnable ()
    at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/basic_string.h:257
#8  0x00000030182062f7 in start_thread () from /lib64/libpthread.so.0
#9  0x00000030176d1b6d in clone () from /lib64/libc.so.6
(gdb) [root@dell-pe1420-01 manual_bz433797]#    

Perftest log files are both showing same:

Processing 1 messages from sub_ready . done.
Sending start 1 times to pub_start
Processing 1 messages from pub_done .Controller exception: Connection closed
PublishThread exception: Connection closed
SubscribeThread exception: Connection closed
21 08:58:57 warning Connection closed
2008-nov-21 08:58:57 warning Connection closed
2008-nov-21 08:58:57 warning Connection closed
Error in shutdown: Connection closed
Error in shutdown: Connection closed
Error in shutdown: Connection closed
Invalid argument
perftest: ./qpid/sys/posix/Mutex.h:116: void qpid::sys::Mutex::lock(): Assertion `0' failed.


See attached reproducer and incl. logfiles

->ASSIGNED

Comment 4 Frantisek Reznicek 2008-11-21 14:24:52 UTC
Created attachment 324304 [details]
reproducer including failing logs and backtraces

Comment 5 Frantisek Reznicek 2008-11-21 14:32:11 UTC
putting NEEDINFO flag

Comment 6 Frantisek Reznicek 2008-11-21 16:54:05 UTC
One more backtrace using the same reproducer from #3: (on RHEL5.3 i386)

[root@nec-em16 manual_bz433797]# gdb `which perftest` core.1698
GNU gdb Red Hat Linux (6.5-37.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
...
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `perftest --log-enable info+'.
Program terminated with signal 11, Segmentation fault.
#0  0x00863d4f in std::for_each<__gnu_cxx::__normal_iterator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus**, std::vector<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*, std::allocator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*> > >, qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::AllThreadsStatuses::handleAdder> (__first=
      {_M_current = 0x88729f0}, __last={_M_current = 0x8872a00}, __f={handle = {px = 0xb6b3ffec, pn = {pi_ = 0xb6b40018}}})
    at /usr/include/boost/shared_ptr.hpp:106
106     {
(gdb) bt
#0  0x00863d4f in std::for_each<__gnu_cxx::__normal_iterator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus**, std::vector<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*, std::allocator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*> > >, qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::AllThreadsStatuses::handleAdder> (__first=
      {_M_current = 0x88729f0}, __last={_M_current = 0x8872a00}, __f={handle = {px = 0xb6b3ffec, pn = {pi_ = 0xb6b40018}}})
    at /usr/include/boost/shared_ptr.hpp:106
#1  0x00863f0d in qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::markForDeletion (handle=0x8650de8)
    at ./qpid/sys/DeletionManager.h:130
#2  0x00861200 in ~PollerHandle (this=0x86509a4) at qpid/sys/epoll/EpollPoller.cpp:128
#3  0x008b0483 in ~DispatchHandle (this=0x86509a4) at qpid/sys/DispatchHandle.cpp:33
#4  0x0085237a in ~AsynchIO (this=0x86509a0) at qpid/sys/posix/AsynchIO.cpp:329
#5  0x008afc7a in qpid::sys::DispatchHandle::doDelete (this=0x86509a4) at qpid/sys/DispatchHandle.cpp:312
#6  0x0084f610 in qpid::sys::posix::AsynchIO::queueForDeletion (this=0x86509a0) at qpid/sys/posix/AsynchIO.cpp:332
#7  0x00582ac3 in qpid::client::TCPConnector::run (this=0x8650c20) at qpid/client/Connector.cpp:401
#8  0x0085a0e1 in qpid::sys::(anonymous namespace)::runRunnable ()
    at /usr/lib/gcc/i386-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:94
#9  0x0020445b in start_thread () from /lib/libpthread.so.0
#10 0x0036ac4e in clone () from /lib/libc.so.6
(gdb) [root@nec-em16 manual_bz433797]#

Comment 7 Frantisek Reznicek 2008-11-21 16:54:47 UTC
re-issuing NEEDINFO flag again

Comment 8 Andrew Stitcher 2008-11-21 19:58:21 UTC
I think this is likely to be an entirely different (though equally important) bug.

Comment 9 Frantisek Reznicek 2008-11-26 10:28:03 UTC
Please find below detected backtraces from reproducer runs on RHEL4.7 / 5.2.

Tested on rpms qpidd-0.3.719671-1.el5, rhm-0.3.2804-4.el5 where is now more difficult to trigger the issue than before.

The testing showed that all machines where tested:
1x RHEL 4.7 i386   1101 runs, 2 failures
1x RHEL 4.7 x86_64 1030 runs, 4 failures
1x RHEL 5.2 i386   1105 runs, 1 failure
3x RHEL 5.2 x86_64 
 a] 1031 runs, 4 failures
 b] 1104 runs, no failure
 c] 1049 runs, 4 failures



cores from RHEL4.7 i386 (nec-em16.rhts.bos.redhat.com) run:
  Red Hat Enterprise Linux AS release 4 (Nahant Update 7)

---------------------------------------------------- 1 -----
#0  0x002a019e in std::for_each<__gnu_cxx::__normal_iterator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus**, std::vector<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*, std::allocator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*> > >, qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::AllThreadsStatuses::handleAdder> ()
   from /usr/lib/libqpidcommon.so.0
(gdb) bt
#0  0x002a019e in std::for_each<__gnu_cxx::__normal_iterator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus**, std::vector<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*, std::allocator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*> > >, qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::AllThreadsStatuses::handleAdder> ()
   from /usr/lib/libqpidcommon.so.0
#1  0x002a09b0 in qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::markForDeletion () from /usr/lib/libqpidcommon.so.0
#2  0x0029d7ea in qpid::sys::PollerHandle::~PollerHandle$base () from /usr/lib/libqpidcommon.so.0
#3  0x002f635c in qpid::sys::DispatchHandle::~DispatchHandle$base () from /usr/lib/libqpidcommon.so.0
#4  0x002846c0 in qpid::sys::posix::AsynchIO::~AsynchIO$delete () from /usr/lib/libqpidcommon.so.0
#5  0x002f39a9 in qpid::sys::DispatchHandle::doDelete () from /usr/lib/libqpidcommon.so.0
#6  0x00283c92 in qpid::sys::posix::AsynchIO::queueForDeletion () from /usr/lib/libqpidcommon.so.0
#7  0x00c18d40 in qpid::client::TCPConnector::run () from /usr/lib/libqpidclient.so.0
#8  0x00290cb1 in qpid::sys::(anonymous namespace)::runRunnable () from /usr/lib/libqpidcommon.so.0
#9  0x00a413cc in start_thread () from /lib/tls/libpthread.so.0
#10 0x0095696e in clone () from /lib/tls/libc.so.6

---------------------------------------------------- 2 -----

#0  0x002e8c99 in std::for_each<boost::void_ptr_iterator<__gnu_cxx::__normal_iterator<void**, std::vector<void*, std::allocator<void*> > >, qpid::log::Logger::Output>, boost::_bi::bind_t<void, boost::_mfi::mf2<void, qpid::log::Logger::Output, qpid::log::Statement const&, std::string const&>, boost::_bi::list3<boost::arg<1>, boost::_bi::value<qpid::log::Statement>, boost::_bi::value<std::string> > > > ()
   from /usr/lib/libqpidcommon.so.0
(gdb) bt
#0  0x002e8c99 in std::for_each<boost::void_ptr_iterator<__gnu_cxx::__normal_iterator<void**, std::vector<void*, std::allocator<void*> > >, qpid::log::Logger::Output>, boost::_bi::bind_t<void, boost::_mfi::mf2<void, qpid::log::Logger::Output, qpid::log::Statement const&, std::string const&>, boost::_bi::list3<boost::arg<1>, boost::_bi::value<qpid::log::Statement>, boost::_bi::value<std::string> > > > ()
   from /usr/lib/libqpidcommon.so.0
#1  0x002e5d2b in qpid::log::Logger::log () from /usr/lib/libqpidcommon.so.0
#2  0x002ed71b in qpid::log::Statement::log () from /usr/lib/libqpidcommon.so.0
#3  0x002a410f in qpid::Exception::Exception$base () from /usr/lib/libqpidcommon.so.0
#4  0x00c073ae in qpid::client::ConnectionImpl::shutdown () from /usr/lib/libqpidclient.so.0
#5  0x00c15c3f in qpid::client::TCPConnector::handleClosed () from /usr/lib/libqpidclient.so.0
#6  0x00c15c63 in qpid::client::TCPConnector::eof () from /usr/lib/libqpidclient.so.0
#7  0x00c1f670 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::client::TCPConnector, qpid::sys::AsynchIO&>, boost::_bi::list2<boost::_bi::value<qpid::client::TCPConnector*>, boost::arg<1> > >, void, qpid::sys::AsynchIO&>::invoke
    () from /usr/lib/libqpidclient.so.0
#8  0x0028cac2 in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() ()
   from /usr/lib/libqpidcommon.so.0
#9  0x0028c131 in qpid::sys::posix::AsynchIO::readable () from /usr/lib/libqpidcommon.so.0
#10 0x0028e26e in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, qpid::sys::DispatchHandle&>, boost::_bi::list2<boost::_bi::value<qpid::sys::posix::AsynchIO*>, boost::arg<1> > >, void, qpid::sys::DispatchHandle&>::invoke () from /usr/lib/libqpidcommon.so.0
#11 0x002f66fe in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() ()
   from /usr/lib/libqpidcommon.so.0
#12 0x002f2ab1 in qpid::sys::DispatchHandle::processEvent () from /usr/lib/libqpidcommon.so.0
#13 0x002f1fa5 in qpid::sys::Dispatcher::run () from /usr/lib/libqpidcommon.so.0
#14 0x00c18d2d in qpid::client::TCPConnector::run () from /usr/lib/libqpidclient.so.0
#15 0x00290cb1 in qpid::sys::(anonymous namespace)::runRunnable () from /usr/lib/libqpidcommon.so.0
#16 0x00a413cc in start_thread () from /lib/tls/libpthread.so.0
#17 0x0095696e in clone () from /lib/tls/libc.so.6



cores from RHEL4.7 x86_64 (nec-em13.rhts.bos.redhat.com) run:
  Red Hat Enterprise Linux AS release 4 (Nahant Update 7)

---------------------------------------------------- 1 -----
#0  0x0000003e72507c3c in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
(gdb) bt
#0  0x0000003e72507c3c in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#1  0x0000003e72ccd647 in std::for_each<__gnu_cxx::__normal_iterator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus**, std::vector<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*, std::allocator<qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::ThreadStatus*> > >, qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::AllThreadsStatuses::handleAdder> ()
   from /usr/lib64/libqpidcommon.so.0
#2  0x0000003e72ccdd22 in qpid::sys::DeletionManager<qpid::sys::PollerHandlePrivate>::markForDeletion () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003e72ccb465 in qpid::sys::PollerHandle::~PollerHandle$base () from /usr/lib64/libqpidcommon.so.0
#4  0x0000003e72d1ac47 in qpid::sys::DispatchHandle::~DispatchHandle$base () from /usr/lib64/libqpidcommon.so.0
#5  0x0000003e72cb54ee in qpid::sys::posix::AsynchIO::~AsynchIO$delete () from /usr/lib64/libqpidcommon.so.0
#6  0x0000003e72d18724 in qpid::sys::DispatchHandle::doDelete () from /usr/lib64/libqpidcommon.so.0
#7  0x0000003e73589550 in qpid::client::TCPConnector::run () from /usr/lib64/libqpidclient.so.0
#8  0x0000003e72cc079a in qpid::sys::(anonymous namespace)::runRunnable () from /usr/lib64/libqpidcommon.so.0
#9  0x0000003e72506137 in start_thread () from /lib64/tls/libpthread.so.0
#10 0x0000003e718c9883 in clone () from /lib64/tls/libc.so.6
(gdb)

---------------------------------------------------- 2 -----

#0  0x0000003e72d0ec0f in std::for_each<boost::void_ptr_iterator<__gnu_cxx::__normal_iterator<void**, std::vector<void*, std::allocator<void*> > >, qpid::log::Logger::Output>, boost::_bi::bind_t<void, boost::_mfi::mf2<void, qpid::log::Logger::Output, qpid::log::Statement const&, std::string const&>, boost::_bi::list3<boost::arg<1>, boost::_bi::value<qpid::log::Statement>, boost::_bi::value<std::string> > > > ()
   from /usr/lib64/libqpidcommon.so.0
(gdb) bt
#0  0x0000003e72d0ec0f in std::for_each<boost::void_ptr_iterator<__gnu_cxx::__normal_iterator<void**, std::vector<void*, std::allocator<void*> > >, qpid::log::Logger::Output>, boost::_bi::bind_t<void, boost::_mfi::mf2<void, qpid::log::Logger::Output, qpid::log::Statement const&, std::string const&>, boost::_bi::list3<boost::arg<1>, boost::_bi::value<qpid::log::Statement>, boost::_bi::value<std::string> > > > ()
   from /usr/lib64/libqpidcommon.so.0
#1  0x0000003e72d0c44b in qpid::log::Logger::log () from /usr/lib64/libqpidcommon.so.0
#2  0x0000003e72d131e6 in qpid::log::Statement::log () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003e72cd097d in qpid::Exception::Exception () from /usr/lib64/libqpidcommon.so.0
#4  0x0000003e735823f3 in qpid::sys::Mutex::~Mutex () from /usr/lib64/libqpidclient.so.0
#5  0x0000003e72d0bacf in qpid::log::Logger::~Logger () from /usr/lib64/libqpidcommon.so.0
#6  0x0000003e71830f95 in __cxa_finalize () from /lib64/tls/libc.so.6
#7  0x0000003e72c2fb93 in ?? () from /usr/lib64/libqpidcommon.so.0
#8  0x0000007fbffff830 in ?? ()
#9  0x0000003e72d1c021 in _fini () from /usr/lib64/libqpidcommon.so.0
#10 0x0000000000000017 in ?? ()
#11 0x0000003e7160b77b in _dl_fini () from /lib64/ld-linux-x86-64.so.2
Previous frame inner to this frame (corrupt stack?)

---------------------------------------------------- 3 -----

#0  0x0000003e72d0ebfc in std::for_each<boost::void_ptr_iterator<__gnu_cxx::__normal_iterator<void**, std::vector<void*, std::allocator<void*> > >, qpid::log::Logger::Output>, boost::_bi::bind_t<void, boost::_mfi::mf2<void, qpid::log::Logger::Output, qpid::log::Statement const&, std::string const&>, boost::_bi::list3<boost::arg<1>, boost::_bi::value<qpid::log::Statement>, boost::_bi::value<std::string> > > > ()
   from /usr/lib64/libqpidcommon.so.0
(gdb) bt
#0  0x0000003e72d0ebfc in std::for_each<boost::void_ptr_iterator<__gnu_cxx::__normal_iterator<void**, std::vector<void*, std::allocator<void*> > >, qpid::log::Logger::Output>, boost::_bi::bind_t<void, boost::_mfi::mf2<void, qpid::log::Logger::Output, qpid::log::Statement const&, std::string const&>, boost::_bi::list3<boost::arg<1>, boost::_bi::value<qpid::log::Statement>, boost::_bi::value<std::string> > > > ()
   from /usr/lib64/libqpidcommon.so.0
#1  0x0000003e72d0c44b in qpid::log::Logger::log () from /usr/lib64/libqpidcommon.so.0
#2  0x0000003e72d131e6 in qpid::log::Statement::log () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003e72cd105d in qpid::Exception::Exception$base () from /usr/lib64/libqpidcommon.so.0
#4  0x0000003e735788d9 in qpid::client::ConnectionImpl::shutdown () from /usr/lib64/libqpidclient.so.0
#5  0x0000003e7358f641 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::client::TCPConnector, qpid::sys::AsynchIO&>, boost::_bi::list2<boost::_bi::value<qpid::client::TCPConnector*>, boost::arg<1> > >, void, qpid::sys::AsynchIO&>::invoke () from /usr/lib64/libqpidclient.so.0
#6  0x0000003e72cbccc7 in boost::function1<void, qpid::sys::AsynchIO&, std::allocator<boost::function_base> >::operator() ()
   from /usr/lib64/libqpidcommon.so.0
#7  0x0000003e72cbc464 in qpid::sys::posix::AsynchIO::readable () from /usr/lib64/libqpidcommon.so.0
#8  0x0000003e72cbe3f1 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, qpid::sys::DispatchHandle&>, boost::_bi::list2<boost::_bi::value<qpid::sys::posix::AsynchIO*>, boost::arg<1> > >, void, qpid::sys::DispatchHandle&>::invoke () from /usr/lib64/libqpidcommon.so.0
#9  0x0000003e72d1af37 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() ()
   from /usr/lib64/libqpidcommon.so.0
#10 0x0000003e72d17a4c in qpid::sys::DispatchHandle::processEvent () from /usr/lib64/libqpidcommon.so.0
#11 0x0000003e72d171b2 in qpid::sys::Dispatcher::run () from /usr/lib64/libqpidcommon.so.0
#12 0x0000003e73589543 in qpid::client::TCPConnector::run () from /usr/lib64/libqpidclient.so.0
#13 0x0000003e72cc079a in qpid::sys::(anonymous namespace)::runRunnable () from /usr/lib64/libqpidcommon.so.0
#14 0x0000003e72506137 in start_thread () from /lib64/tls/libpthread.so.0
#15 0x0000003e718c9883 in clone () from /lib64/tls/libc.so.6



cores from RHEL5.2 x86_64 (dhcp-lab-200.englab.brq.redhat.com) run:
  Red Hat Enterprise Linux Client release 5.2 (Tikanga)

---------------------------------------------------- 1-4 ----- [4 occurences of the same one]

Core was generated by `perftest --log-enable debug+'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000003db8bb43d8 in qpid::log::Logger::log () from /usr/lib64/libqpidcommon.so.0
(gdb) bt
#0  0x0000003db8bb43d8 in qpid::log::Logger::log () from /usr/lib64/libqpidcommon.so.0
#1  0x0000003db8bba80c in qpid::log::Statement::log () from /usr/lib64/libqpidcommon.so.0
#2  0x0000003db8b80a98 in qpid::Exception::Exception () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003db905cd74 in qpid::sys::Mutex::~Mutex () from /usr/lib64/libqpidclient.so.0
#4  0x0000003db8bb4dd3 in qpid::log::Logger::~Logger () from /usr/lib64/libqpidcommon.so.0
#5  0x000000350b03328e in __cxa_finalize () from /lib64/libc.so.6
#6  0x0000003db8aefbc6 in qpid::framing::TransferContent::getData () from /usr/lib64/libqpidcommon.so.0
#7  0x000000350a21b000 in ?? () from /lib64/ld-linux-x86-64.so.2
#8  0x0000000000000000 in ?? ()


cores also observed on RHEL5.2 x86_64 (nec-em21.rhts.bos.redhat.com) run:
but backtrace failed...

Ex:
Program terminated with signal 11, Segmentation fault.
#0  0x00002ad2102f43d8 in ?? ()
(gdb) bt
#0  0x00002ad2102f43d8 in ?? ()
#1  0x00000000000000f0 in ?? ()
#2  0x0000000000000001 in ?? ()
#3  0x00002ad2103096c3 in ?? ()
#4  0x000000000000001f in ?? ()
#5  0x00002ad210309720 in ?? ()
#6  0x0000000000000001 in ?? ()
#7  0x00000000056cee48 in ?? ()
#8  0x00000000420b3cc0 in ?? ()
#9  0x00000000420b4380 in ?? ()
#10 0x00000000420b4370 in ?? ()
#11 0x0000003dd66e9518 in VTT for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > () from /usr/lib64/libstdc++.so.6
#12 0x0000003dd66e9500 in VTT for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > () from /usr/lib64/libstdc++.so.6
#13 0x0000003dd66e9540 in VTT for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > () from /usr/lib64/libstdc++.so.6
#14 0x0000000000625e98 in qpid::framing::TransferContent::getData ()
#15 0x0000000000626088 in qpid::framing::TransferContent::getData ()
#16 0x00000000006262b0 in qpid::framing::TransferContent::getData ()
#17 0x00002aaab4000f28 in ?? ()
#18 0x00002aaab4000f28 in ?? ()
#19 0x00002aaab4000f28 in ?? ()
#20 0x00002aaab4000f28 in ?? ()
#21 0x00002aaab4000f8a in ?? ()
#22 0x00002aaab4001128 in ?? ()
#23 0x0000003dd66ed720 in std::ctype<wchar_t>::id () from /usr/lib64/libstdc++.so.6
#24 0x00002ad200000010 in ?? ()
#25 0x00002aaab4000f28 in ?? ()
#26 0x00000000006260b0 in qpid::framing::TransferContent::getData ()
#27 0x0000000000000006 in ?? ()
#28 0x0000000000000000 in ?? ()

Comment 10 Andrew Stitcher 2008-12-03 16:46:30 UTC
I reproduced this bug (or a version of it) and the cause is that the client exits without joining the thread that is running the dispatch loop.

The abort happens (in the case I saw) because a static destructor has run in the main thread for something that is just about to be used in the despatch thread.

The fix is making sure that every connection dispatch thread is joined before exiting the main program - probably in connection close.

However this issue can only occur when exiting a client program (or unloading a dlopened library, which is more worrisome). So I've lowered the severity.

Comment 12 Gordon Sim 2010-04-15 22:21:33 UTC
I believe this will also be fixed by r934503 (I tested this case while verifying that fix).

Comment 13 Frantisek Reznicek 2010-10-01 13:50:53 UTC
The issue was retested and there were found another perftest shutdown issues.

Althought the original abort seems to be fixed, there are new ones which might block the issue. At the moment I'm keeping the data for this defect.


There are detected three different issues:
alpha] perftest is crashing (segfault) in malloc_consolidate() from many different paths coming from qpid namespace.
There is some chance that this is not MRG issue, but I'm unclear on that.

beta] perftest is aborting in qpid::client::TCPConnector::socketClosed
      seen on rhel4.8 i386 and also rhel 5.5 i386

gamma] perftest is aborting in qpid::client::Connection::~Connection
       seen on rhel 5.5 x86_64 only (rhel5.5x/dump_core.7979)


More in detail:

alpha] perftest is crashing (segfault) in malloc_consolidate()
--------------------------------------------------------------

case a] ---------------------------------

Core was generated by `qpid-perftest --base-name q8503 --log-enable info+'.
Program terminated with signal 11, Segmentation fault.
...
(gdb) * 1 process 10364  0x00b7bd96 in malloc_consolidate () from /lib/tls/libc.so.6
(gdb)
Thread 1 (process 10364):
#0  0x00b7bd96 in malloc_consolidate () from /lib/tls/libc.so.6
#1  0x00b7cfa7 in _int_malloc () from /lib/tls/libc.so.6
#2  0x00b7ee19 in calloc () from /lib/tls/libc.so.6
#3  0x00539a6f in PR_Calloc () from /usr/lib/libnspr4.so
#4  0x005385ea in PR_RWLock_Unlock () from /usr/lib/libnspr4.so
#5  0x0053ebe5 in PR_KillProcess () from /usr/lib/libnspr4.so
#6  0x0054cf5c in PR_GetCurrentThread () from /usr/lib/libnspr4.so
#7  0x0053df37 in PR_SetError () from /usr/lib/libnspr4.so
#8  0x046124bc in PORT_SetError_Util () from /usr/lib/libnssutil3.so
#9  0x0462f054 in NSS_Shutdown () from /usr/lib/libnss3.so
#10 0x00242d5f in qpid::sys::ssl::shutdownNSS () at qpid/sys/ssl/util.cpp:117
#11 0x0022099d in __tcf_1 () at qpid/client/SslConnector.cpp:164
#12 0x00b44697 in exit () from /lib/tls/libc.so.6
#13 0x00b2ee9d in __libc_start_main () from /lib/tls/libc.so.6
#14 0x0804f291 in _start ()

case b] ---------------------------------

Core was generated by `qpid-perftest --base-name q30 --log-enable info+'.
Program terminated with signal 11, Segmentation fault.

(gdb)   4 process 15328  0x00b7cca9 in free () from /lib/tls/libc.so.6
  3 process 15334  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  2 process 15339  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
* 1 process 15347  0x00b7bd96 in malloc_consolidate () from /lib/tls/libc.so.6
Thread 1 (process 15347):
#0  0x00b7bd96 in malloc_consolidate () from /lib/tls/libc.so.6
#1  0x00b7c56e in _int_free () from /lib/tls/libc.so.6
#2  0x00b7cc9a in free () from /lib/tls/libc.so.6
#3  0x00718041 in operator delete () from /usr/lib/libstdc++.so.6
#4  0x0071808d in operator delete[] () from /usr/lib/libstdc++.so.6
#5  0x001e5baa in ~Buff (this=0xb7400010) at qpid/client/TCPConnector.cpp:51
#6  0x003ae166 in std::for_each<std::_Deque_iterator<qpid::sys::AsynchIOBufferBase*, qpid::sys::AsynchIOBufferBase*&, qpid::sys::AsynchIOBufferBase
**>, qpid::sys::posix::deleter> (__first=
      {_M_cur = 0x8738558, _M_first = 0x87384e8, _M_last = 0x87386e8, _M_node = 0x873166c}, __last=
      {_M_cur = 0xb7443378, _M_first = 0xb7401bc0, _M_last = 0x0, _M_node = 0x0}) at qpid/sys/posix/AsynchIO.cpp:299
#7  0x003a6d55 in ~AsynchIO (this=0x8737ca8)
    at /usr/lib/gcc/i386-redhat-linux/3.4.6/../../../../include/c++/3.4.6/bits/stl_deque.h:761
#8  0x004af400 in qpid::sys::DispatchHandle::processEvent (this=0x8737cac,
    type=qpid::sys::Poller::DISCONNECTED) at qpid/sys/DispatchHandle.cpp:342
#9  0x003bfff9 in qpid::sys::Poller::run (this=0x85308e8)
    at ./qpid/sys/Poller.h:125
#10 0x003b1571 in qpid::sys::(anonymous namespace)::runRunnable (p=0xb7400048)
    at qpid/sys/posix/Thread.cpp:35
#11 0x00d7f5cc in start_thread () from /lib/tls/libpthread.so.0
#12 0x00be5f8e in clone () from /lib/tls/libc.so.6

case c] ---------------------------------

Core was generated by `qpid-perftest --base-name q4611 --log-enable info+'.
Program terminated with signal 11, Segmentation fault.

(gdb)   4 process 17533  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  3 process 17543  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  2 process 17546  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
* 1 process 17529  0x00b7bd96 in malloc_consolidate () from /lib/tls/libc.so.6
(gdb)
Thread 1 (process 17529):
#0  0x00b7bd96 in malloc_consolidate () from /lib/tls/libc.so.6
#1  0x00b7c56e in _int_free () from /lib/tls/libc.so.6
#2  0x00b7cc9a in free () from /lib/tls/libc.so.6
#3  0x00a81e87 in sasl_dispose () from /usr/lib/libsasl2.so.2
#4  0x001b803d in ~CyrusSasl (this=0xb7402580)
    at qpid/client/SaslFactory.cpp:220
#5  0x00176024 in qpid::client::ConnectionHandler::~ConnectionHandler ()
   from /usr/lib/libqpidclient.so.3
#6  0x0017e3fd in ~ConnectionImpl (this=0xb7401f28)
    at /usr/include/boost/checked_delete.hpp:34
#7  0x001774a3 in qpid::client::ConnectionImpl::release (this=0xb7401f28)
    at qpid/client/ConnectionImpl.cpp:389
#8  0x0018635d in boost::detail::sp_counted_impl_pd<qpid::client::ConnectionImpl*, boost::_bi::bind_t<void, boost::_mfi::mf0<void, qpid::client::Co
nnectionImpl>, boost::_bi::list1<boost::arg<1> > > >::dispose (this=0xb74021f0)
    at /usr/include/boost/bind/mem_fn_template.hpp:45
#9  0x0016735e in ~Connection (this=0xb7401b38)
    at ./boost/detail/sp_counted_base_gcc_x86.hpp:145
#10 0x08064a00 in ~SubscribeThread (this=0xb7401b30)
    at ../../include/qpid/client/AsyncSession_0_10.h:553
#11 0x080505a1 in main (argc=5, argv=0xbfe41554)
    at /usr/include/boost/checked_delete.hpp:34
#12 0x00b2ee93 in __libc_start_main () from /lib/tls/libc.so.6
#13 0x0804f291 in _start ()

case d] ---------------------------------

Core was generated by `qpid-perftest --base-name q6601 --log-enable info+'.
Program terminated with signal 11, Segmentation fault.
Thread 1 (process 12848):
#0  0x000000326bc688fa in malloc_consolidate () from /lib64/tls/libc.so.6
#1  0x000000326bc6914f in _int_free () from /lib64/tls/libc.so.6
#2  0x000000326bc69846 in free () from /lib64/tls/libc.so.6
#3  0x000000326feae29e in operator delete () from /usr/lib64/libstdc++.so.6
#4  0x000000326d1e5e48 in ~Buff (this=0x65cfd0)
    at qpid/client/TCPConnector.cpp:51
#5  0x000000326ce5d414 in std::for_each<std::_Deque_iterator<qpid::sys::AsynchIOBufferBase*, qpid::sys::AsynchIOBufferBase*&, qpid::sys::AsynchIOBu
fferBase**>, qpid::sys::posix::deleter> (__first=
      {_M_cur = 0x65e430, _M_first = 0x65e3e0, _M_last = 0x65e5e0, _M_node = 0x65cf08}, __last=
      {_M_cur = 0x65e528, _M_first = 0x65e3e0, _M_last = 0x65e5e0, _M_node = 0x65cf08}) at qpid/sys/posix/AsynchIO.cpp:299
#6  0x000000326ce56b06 in ~AsynchIO (this=0x65dd50)
    at /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../../include/c++/3.4.6/bits/stl_deque.h:761
#7  0x000000326cf47f4a in qpid::sys::DispatchHandle::processEvent (
    this=0x65dd58, type=qpid::sys::Poller::DISCONNECTED)
    at qpid/sys/DispatchHandle.cpp:342
#8  0x000000326ce6c2ef in qpid::sys::Poller::run (this=0x65b450)
    at ./qpid/sys/Poller.h:125
#9  0x000000326ce6008a in qpid::sys::(anonymous namespace)::runRunnable (
    p=0x326be35680) at qpid/sys/posix/Thread.cpp:35
#10 0x000000326c906317 in start_thread () from /lib64/tls/libpthread.so.0
#11 0x000000326bcc9d83 in clone () from /lib64/tls/libc.so.6

....

beta] perftest is aborting in qpid::client::TCPConnector::socketClosed
seen on rhel4.8 i386 and also rhel 5.5 i386
--------------------------------------------------------------
Core was generated by `qpid-perftest --base-name q6651 --log-enable info+'.
Program terminated with signal 6, Aborted.
(gdb)   3 process 15115  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  2 process 15118  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
* 1 process 15119  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb)
Thread 1 (process 15119):
#0  0x00b007a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00b41915 in raise () from /lib/tls/libc.so.6
#2  0x00b43379 in abort () from /lib/tls/libc.so.6
#3  0x0071b23e in __gnu_cxx::__verbose_terminate_handler ()
   from /usr/lib/libstdc++.so.6
#4  0x00718ed1 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.6
#5  0x00718f06 in std::terminate () from /usr/lib/libstdc++.so.6
#6  0x007195b5 in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#7  0x001dc466 in qpid::client::TCPConnector::socketClosed (this=0x9f671a8)
    at qpid/client/TCPConnector.cpp:155
#8  0x001e867f in boost::detail::function::void_function_obj_invoker2<boost::_bi::bind_t<void, boost::_mfi::mf2<void, qpid::client::TCPConnector, q
pid::sys::AsynchIO&, qpid::sys::Socket const&>, boost::_bi::list3<boost::_bi::value<qpid::client::TCPConnector*>, boost::arg<1>, boost::arg<2> > >,
 void, qpid::sys::AsynchIO&, qpid::sys::Socket const&>::invoke (function_obj_ptr=
      {obj_ptr = 0x9f67de8, const_obj_ptr = 0x9f67de8, func_ptr = 0x9f67de8, data = "."}, a0=@0x9f67ca0, a1=@0x9f6721c)
    at /usr/include/boost/bind/mem_fn_template.hpp:252
...

gamma] perftest is aborting in qpid::client::Connection::~Connection
seen on rhel 5.5 x86_64 only (rhel5.5x/dump_core.7979)
--------------------------------------------------------------
Core was generated by `qpid-perftest --base-name q7700 --log-enable info+'.
Program terminated with signal 6, Aborted.
(gdb)   4 Thread 7985  0x0000003b9180d4c4 in __lll_lock_wait ()
   from /lib64/libpthread.so.0
  3 Thread 7995  0x0000003b90cd4108 in epoll_wait () from /lib64/libc.so.6
  2 Thread 7997  0x0000003b90cd4108 in epoll_wait () from /lib64/libc.so.6
* 1 Thread 7979  0x0000003b90c30265 in raise () from /lib64/libc.so.6
(gdb)
...
Thread 1 (Thread 7979):
#0  0x0000003b90c30265 in raise () from /lib64/libc.so.6
#1  0x0000003b90c31d10 in abort () from /lib64/libc.so.6
#2  0x0000003b90c6a84b in __libc_message () from /lib64/libc.so.6
#3  0x0000003b90c72388 in _int_free () from /lib64/libc.so.6
#4  0x0000003b90c7276b in free () from /lib64/libc.so.6
#5  0x0000003b93256cdb in release (this=0x1b857950,
    __in_chrg=<value optimized out>)
    at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145
#6  ~shared_count (this=0x1b857950, __in_chrg=<value optimized out>)
    at /usr/include/boost/detail/shared_count.hpp:159
#7  ~shared_ptr (this=0x1b857950, __in_chrg=<value optimized out>)
    at /usr/include/boost/shared_ptr.hpp:106
#8  qpid::client::Connection::~Connection (this=0x1b857950,
    __in_chrg=<value optimized out>) at qpid/client/Connection.cpp:53
#9  0x000000000040f2c6 in qpid::tests::Client::~Client (this=0x1b857940,
    __in_chrg=<value optimized out>) at qpid-perftest.cpp:260
#10 0x00000000004165e1 in qpid::tests::SubscribeThread::~SubscribeThread (
    this=0x1b857940, __in_chrg=<value optimized out>) at qpid-perftest.cpp:562
#11 0x000000000040cff8 in ~reversible_ptr_container (
    argc=<value optimized out>, argv=<value optimized out>)
    at /usr/include/boost/checked_delete.hpp:34
#12 ~ptr_sequence_adapter (argc=<value optimized out>,
    argv=<value optimized out>)
    at /usr/include/boost/ptr_container/ptr_sequence_adapter.hpp:148
#13 ~ptr_vector (argc=<value optimized out>, argv=<value optimized out>)
    at /usr/include/boost/ptr_container/ptr_vector.hpp:35
#14 main (argc=<value optimized out>, argv=<value optimized out>)
    at qpid-perftest.cpp:740


-> ASSIGNED

Comment 14 Frantisek Reznicek 2010-10-01 13:54:15 UTC
Created attachment 451020 [details]
The backtrace data from stress test.

Th package contains the crash / abort dumps from gdb for all detected cases including the test and test log.

The test is slightly modified original test/reproducer.


The tests were executed on:
python-qmf-0.7.946106-13.el5
python-qpid-0.7.946106-14.el5
qmf-0.7.946106-17.el5
qmf-devel-0.7.946106-17.el5
qpid-cpp-*-0.7.946106-17.el5
qpid-dotnet-0.4.738274-2.el5
qpid-java-*-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
ruby-qmf-0.7.946106-17.el5
ruby-qpid-0.7.946106-2.el5

Comment 15 Frantisek Reznicek 2010-10-01 14:21:18 UTC
The note about reproducibility on mrg-qe-0{1,2,3,7}:

alpha] 31 detected cases of about 3000 * 2-6 qpid-perftest runs
       seen on rhel 4.8 / 5.5 i386 / x86_64
beta]  2 detected cases of about 3000 * 2-6 qpid-perftest runs
       seen on rhel 4.8 / 5.5 i386
       rhel48i/dump_core.15115
       rhel55i/dump_core.25011

gamma] 1 detected case of about 3000 * 2-6 qpid-perftest runs
       rhel55x/dump_core.7979

Comment 17 Justin Ross 2013-02-25 01:46:30 UTC
Frantisek, what should we do with this?

Comment 18 Frantisek Reznicek 2013-10-11 11:09:25 UTC
I'm going to retest and update current status.

Comment 21 Red Hat Bugzilla 2023-09-14 01:11:59 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days