Bug 500822

Summary: clustered qpidd segfault in management thread due to unclean shutdown on aborted cluster node
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Ted Ross <tross>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: urgent Docs Contact:
Priority: high    
Version: 1.1.1CC: esammons, gsim, tross
Target Milestone: 1.1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-06-12 17:39:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
bz499872 reproducer, which can be used to reproduce this issue
none
Patch (off of svn revision 752581) that addresses a possibly-related shutdown issue none

Description Frantisek Reznicek 2009-05-14 12:22:14 UTC
Description of problem:

During bug 499872 triggering phase I triggered another issue related to non-clean qpidd broker shutdown.
The configuration is following: 10 node cluster, all nodes with the same full set of plugins (incl. msgstore). One instance of ping c++ client running on one of the cluster nodes.
The qpidd brokers were launched following way (--mgmt-pub-interval 1 used for increasing probability of bug 499872):
  qpidd -p ${qpidd_port} --auth no --cluster-name $(hostname)_cluster --log-enable ${loglevel}+ \
        --mgmt-pub-interval 1 --data-dir data_${i}  >qpidd_${i}.log 2>&1

I suceeded to trigger bug 499872 but after that I saw that 6 of 10 shutdowning brokers crashed. See backtraces below and also all data stored here: 
mrg3.lab.bos.redhat.com:/root/bz499872_fail_rhel53_x86_64_090514.tar.bz2

Version-Release number of selected component (if applicable):
MRG 1.1.1 release:
[freznice@dhcp-lab-200 bz499872]$ rpm -qa | egrep '(openais|qpid|rhm)' | sort -u
openais-0.80.3-22.el5_3.4
openais-devel-0.80.3-22.el5_3.4
python-qpid-0.5.752581-1.el5
qpidc-0.5.752581-5.el5
qpidc-debuginfo-0.5.752581-5.el5
qpidc-devel-0.5.752581-5.el5
qpidc-rdma-0.5.752581-5.el5
qpidc-ssl-0.5.752581-5.el5
qpidd-0.5.752581-5.el5
qpidd-acl-0.5.752581-5.el5
qpidd-cluster-0.5.752581-5.el5
qpidd-devel-0.5.752581-5.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-5.el5
qpidd-ssl-0.5.752581-5.el5
qpidd-xml-0.5.752581-5.el5
qpid-java-client-0.5.751061-2.el5
qpid-java-common-0.5.751061-2.el5
rhm-0.5.3206-1.el5
rhm-docs-0.5.756148-1.el5


How reproducible:
20%, (in range of 30 minutes - 2 hours reproduced 499872, rarely this issue)

Steps to Reproduce:
1. run attached bz499872 reproducer (bz499872_reproducer.tar.bz2)
2. repeat until you get core files
  
Actual results (backtraces):

GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
...
Core was generated by `qpidd -p 10001 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_clus'.
Program terminated with signal 11, Segmentation fault.
[New process 6349]
[New process 6348]
#0  0x000000356407b6ce in memcpy () from /lib64/libc.so.6
(gdb) 
Thread 2 (process 6348):
#0  0x00000035640cc967 in fdatasync () from /lib64/libc.so.6
#1  0x00002adbd44efb5e in __os_fsync () from /usr/lib64/libdb_cxx-4.3.so
#2  0x00002adbd44e3b95 in __log_flush_int () from /usr/lib64/libdb_cxx-4.3.so
#3  0x00002adbd44e3f9a in __log_flush () from /usr/lib64/libdb_cxx-4.3.so
#4  0x00002adbd44fa702 in __txn_dbenv_refresh ()
   from /usr/lib64/libdb_cxx-4.3.so
#5  0x00002adbd44c9d11 in ?? () from /usr/lib64/libdb_cxx-4.3.so
#6  0x00002adbd44ca012 in __dbenv_close () from /usr/lib64/libdb_cxx-4.3.so
#7  0x00002adbd44ca15f in __dbenv_close_pp () from /usr/lib64/libdb_cxx-4.3.so
#8  0x00002adbd416721a in mrg::msgstore::MessageStoreImpl::~MessageStoreImpl ()
   from /usr/lib64/qpid/daemon/msgstore.so
#9  0x00000036787421db in ~MessageStoreModule (this=<value optimized out>)
    at qpid/broker/MessageStoreModule.cpp:39
#10 0x00000036786c9424 in ~Broker (this=<value optimized out>)
    at /usr/include/c++/4.1.2/memory:259
#11 0x000000356403363e in __cxa_finalize () from /lib64/libc.so.6
#12 0x0000003678688506 in __do_global_dtors_aux ()
   from /usr/lib64/libqpidbroker.so.0
#13 0x0000000000000000 in ?? ()

Thread 1 (process 6349):
#0  0x000000356407b6ce in memcpy () from /lib64/libc.so.6
#1  0x0000003569c9cfa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x36787b3e9f "console.obj.1.0.", __rhs=@0x3678a040a8)
    at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
#3  0x00000036787a1e40 in qpid::management::ManagementBroker::periodicProcessing (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:390
#4  0x00000036787a2608 in qpid::management::ManagementBroker::Periodic::fire (
    this=<value optimized out>) at qpid/management/ManagementBroker.cpp:252
#5  0x000000367878797a in qpid::broker::Timer::run (this=<value optimized out>)
    at qpid/broker/Timer.cpp:67
#6  0x000000367816c76a in runRunnable (p=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:35
#7  0x0000003564c06367 in start_thread () from /lib64/libpthread.so.0
#8  0x00000035640d30ad in clone () from /lib64/libc.so.6
(gdb) quit
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
...
Core was generated by `qpidd -p 10002 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_clus'.
Program terminated with signal 11, Segmentation fault.
[New process 6368]
[New process 6367]
#0  0x000000356407b7ec in memcpy () from /lib64/libc.so.6
(gdb) 
Thread 2 (process 6367):
#0  0x00000035640cc967 in fdatasync () from /lib64/libc.so.6
#1  0x00002afc404f5b5e in __os_fsync () from /usr/lib64/libdb_cxx-4.3.so
#2  0x00002afc404e9b95 in __log_flush_int () from /usr/lib64/libdb_cxx-4.3.so
#3  0x00002afc404e9f9a in __log_flush () from /usr/lib64/libdb_cxx-4.3.so
#4  0x00002afc40500702 in __txn_dbenv_refresh ()
   from /usr/lib64/libdb_cxx-4.3.so
#5  0x00002afc404cfd11 in ?? () from /usr/lib64/libdb_cxx-4.3.so
#6  0x00002afc404d0012 in __dbenv_close () from /usr/lib64/libdb_cxx-4.3.so
#7  0x00002afc404d015f in __dbenv_close_pp () from /usr/lib64/libdb_cxx-4.3.so
#8  0x00002afc4016d21a in mrg::msgstore::MessageStoreImpl::~MessageStoreImpl ()
   from /usr/lib64/qpid/daemon/msgstore.so
#9  0x00000036787421db in ~MessageStoreModule (this=<value optimized out>)
    at qpid/broker/MessageStoreModule.cpp:39
#10 0x00000036786c9424 in ~Broker (this=<value optimized out>)
    at /usr/include/c++/4.1.2/memory:259
#11 0x000000356403363e in __cxa_finalize () from /lib64/libc.so.6
#12 0x0000003678688506 in __do_global_dtors_aux ()
   from /usr/lib64/libqpidbroker.so.0
#13 0x0000000000000000 in ?? ()

Thread 1 (process 6368):
#0  0x000000356407b7ec in memcpy () from /lib64/libc.so.6
#1  0x0000003569c9cfa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x36787b3e9f "console.obj.1.0.", __rhs=@0x3678a040a8)
    at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
#3  0x00000036787a1e40 in qpid::management::ManagementBroker::periodicProcessing (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:390
#4  0x00000036787a2608 in qpid::management::ManagementBroker::Periodic::fire (
    this=<value optimized out>) at qpid/management/ManagementBroker.cpp:252
#5  0x000000367878797a in qpid::broker::Timer::run (this=<value optimized out>)
    at qpid/broker/Timer.cpp:67
#6  0x000000367816c76a in runRunnable (p=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:35
#7  0x0000003564c06367 in start_thread () from /lib64/libpthread.so.0
#8  0x00000035640d30ad in clone () from /lib64/libc.so.6
(gdb) quit
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
...
Core was generated by `qpidd -p 10003 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_clus'.
Program terminated with signal 11, Segmentation fault.
[New process 6387]
[New process 6393]
[New process 6386]
#0  0x000000356407b6ce in memcpy () from /lib64/libc.so.6
(gdb) 
Thread 3 (process 6386):
#0  0x00000035640c56db in write () from /lib64/libc.so.6
#1  0x00002b398b16def5 in __os_write () from /usr/lib64/libdb_cxx-4.3.so
#2  0x00002b398b15fbb7 in ?? () from /usr/lib64/libdb_cxx-4.3.so
#3  0x00002b398b160b2b in __log_flush_int () from /usr/lib64/libdb_cxx-4.3.so
#4  0x00002b398b160f9a in __log_flush () from /usr/lib64/libdb_cxx-4.3.so
#5  0x00002b398b16b33f in __memp_sync_int () from /usr/lib64/libdb_cxx-4.3.so
#6  0x00002b398b124d2b in __db_sync () from /usr/lib64/libdb_cxx-4.3.so
#7  0x00002b398b1241af in __db_refresh () from /usr/lib64/libdb_cxx-4.3.so
#8  0x00002b398b12430e in __db_close () from /usr/lib64/libdb_cxx-4.3.so
#9  0x00002b398b133440 in __db_close_pp () from /usr/lib64/libdb_cxx-4.3.so
#10 0x00002b398b0c247c in Db::close () from /usr/lib64/libdb_cxx-4.3.so
#11 0x00002b398ade4050 in mrg::msgstore::MessageStoreImpl::~MessageStoreImpl ()
   from /usr/lib64/qpid/daemon/msgstore.so
#12 0x00000036787421db in ~MessageStoreModule (this=<value optimized out>)
    at qpid/broker/MessageStoreModule.cpp:39
#13 0x00000036786c9424 in ~Broker (this=<value optimized out>)
    at /usr/include/c++/4.1.2/memory:259
#14 0x000000356403363e in __cxa_finalize () from /lib64/libc.so.6
#15 0x0000003678688506 in __do_global_dtors_aux ()
   from /usr/lib64/libqpidbroker.so.0
#16 0x0000000000000000 in ?? ()

Thread 2 (process 6393):
#0  0x0000003564c0ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003678787cbf in qpid::broker::Timer::run (this=<value optimized out>)
    at qpid/sys/posix/Condition.h:69
#2  0x000000367816c76a in runRunnable (p=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:35
#3  0x0000003564c06367 in start_thread () from /lib64/libpthread.so.0
#4  0x00000035640d30ad in clone () from /lib64/libc.so.6

Thread 1 (process 6387):
#0  0x000000356407b6ce in memcpy () from /lib64/libc.so.6
#1  0x0000003569c9cfa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x36787b3e9f "console.obj.1.0.", __rhs=@0x3678a040a8)
    at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
#3  0x00000036787a1e40 in qpid::management::ManagementBroker::periodicProcessing (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:390
#4  0x00000036787a2608 in qpid::management::ManagementBroker::Periodic::fire (
    this=<value optimized out>) at qpid/management/ManagementBroker.cpp:252
#5  0x000000367878797a in qpid::broker::Timer::run (this=<value optimized out>)
    at qpid/broker/Timer.cpp:67
#6  0x000000367816c76a in runRunnable (p=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:35
#7  0x0000003564c06367 in start_thread () from /lib64/libpthread.so.0
#8  0x00000035640d30ad in clone () from /lib64/libc.so.6
(gdb) quit
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
...
Core was generated by `qpidd -p 10004 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_clus'.
Program terminated with signal 11, Segmentation fault.
[New process 6415]
[New process 6419]
[New process 6414]
#0  0x000000356407b6ce in memcpy () from /lib64/libc.so.6
(gdb) 
Thread 3 (process 6414):
#0  0x00000035640c56db in write () from /lib64/libc.so.6
#1  0x00002afa9e005ef5 in __os_write () from /usr/lib64/libdb_cxx-4.3.so
#2  0x00002afa9dff7bb7 in ?? () from /usr/lib64/libdb_cxx-4.3.so
#3  0x00002afa9dff8b2b in __log_flush_int () from /usr/lib64/libdb_cxx-4.3.so
#4  0x00002afa9dff8f9a in __log_flush () from /usr/lib64/libdb_cxx-4.3.so
#5  0x00002afa9e00333f in __memp_sync_int () from /usr/lib64/libdb_cxx-4.3.so
#6  0x00002afa9dfbcd2b in __db_sync () from /usr/lib64/libdb_cxx-4.3.so
#7  0x00002afa9dfbc1af in __db_refresh () from /usr/lib64/libdb_cxx-4.3.so
#8  0x00002afa9dfbc30e in __db_close () from /usr/lib64/libdb_cxx-4.3.so
#9  0x00002afa9dfcb440 in __db_close_pp () from /usr/lib64/libdb_cxx-4.3.so
#10 0x00002afa9df5a47c in Db::close () from /usr/lib64/libdb_cxx-4.3.so
#11 0x00002afa9dc7c050 in mrg::msgstore::MessageStoreImpl::~MessageStoreImpl ()
   from /usr/lib64/qpid/daemon/msgstore.so
#12 0x00000036787421db in ~MessageStoreModule (this=<value optimized out>)
    at qpid/broker/MessageStoreModule.cpp:39
#13 0x00000036786c9424 in ~Broker (this=<value optimized out>)
    at /usr/include/c++/4.1.2/memory:259
#14 0x000000356403363e in __cxa_finalize () from /lib64/libc.so.6
#15 0x0000003678688506 in __do_global_dtors_aux ()
   from /usr/lib64/libqpidbroker.so.0
#16 0x0000000000000000 in ?? ()

Thread 2 (process 6419):
#0  0x0000003564c0ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000003678787cbf in qpid::broker::Timer::run (this=<value optimized out>)
    at qpid/sys/posix/Condition.h:69
#2  0x000000367816c76a in runRunnable (p=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:35
#3  0x0000003564c06367 in start_thread () from /lib64/libpthread.so.0
#4  0x00000035640d30ad in clone () from /lib64/libc.so.6

Thread 1 (process 6415):
#0  0x000000356407b6ce in memcpy () from /lib64/libc.so.6
#1  0x0000003569c9cfa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x36787b3e9f "console.obj.1.0.", __rhs=@0x3678a040a8)
    at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
#3  0x00000036787a1e40 in qpid::management::ManagementBroker::periodicProcessing (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:390
#4  0x00000036787a2608 in qpid::management::ManagementBroker::Periodic::fire (
    this=<value optimized out>) at qpid/management/ManagementBroker.cpp:252
#5  0x000000367878797a in qpid::broker::Timer::run (this=<value optimized out>)
    at qpid/broker/Timer.cpp:67
#6  0x000000367816c76a in runRunnable (p=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:35
#7  0x0000003564c06367 in start_thread () from /lib64/libpthread.so.0
#8  0x00000035640d30ad in clone () from /lib64/libc.so.6
(gdb) quit
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
...
Core was generated by `qpidd -p 10005 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_clus'.
Program terminated with signal 11, Segmentation fault.
[New process 6434]
[New process 6433]
#0  0x000000356407b7ec in memcpy () from /lib64/libc.so.6
(gdb) 
Thread 2 (process 6433):
#0  0x00000035640cc967 in fdatasync () from /lib64/libc.so.6
#1  0x00002b3f14c23b5e in __os_fsync () from /usr/lib64/libdb_cxx-4.3.so
#2  0x00002b3f14c17b95 in __log_flush_int () from /usr/lib64/libdb_cxx-4.3.so
#3  0x00002b3f14c17f9a in __log_flush () from /usr/lib64/libdb_cxx-4.3.so
#4  0x00002b3f14c2e702 in __txn_dbenv_refresh ()
   from /usr/lib64/libdb_cxx-4.3.so
#5  0x00002b3f14bfdd11 in ?? () from /usr/lib64/libdb_cxx-4.3.so
#6  0x00002b3f14bfe012 in __dbenv_close () from /usr/lib64/libdb_cxx-4.3.so
#7  0x00002b3f14bfe15f in __dbenv_close_pp () from /usr/lib64/libdb_cxx-4.3.so
#8  0x00002b3f1489b21a in mrg::msgstore::MessageStoreImpl::~MessageStoreImpl ()
   from /usr/lib64/qpid/daemon/msgstore.so
#9  0x00000036787421db in ~MessageStoreModule (this=<value optimized out>)
    at qpid/broker/MessageStoreModule.cpp:39
#10 0x00000036786c9424 in ~Broker (this=<value optimized out>)
    at /usr/include/c++/4.1.2/memory:259
#11 0x000000356403363e in __cxa_finalize () from /lib64/libc.so.6
#12 0x0000003678688506 in __do_global_dtors_aux ()
   from /usr/lib64/libqpidbroker.so.0
#13 0x0000000000000000 in ?? ()

Thread 1 (process 6434):
#0  0x000000356407b7ec in memcpy () from /lib64/libc.so.6
#1  0x0000003569c9cfa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x36787b3e9f "console.obj.1.0.", __rhs=@0x3678a040a8)
    at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
#3  0x00000036787a1e40 in qpid::management::ManagementBroker::periodicProcessing (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:390
#4  0x00000036787a2608 in qpid::management::ManagementBroker::Periodic::fire (
    this=<value optimized out>) at qpid/management/ManagementBroker.cpp:252
#5  0x000000367878797a in qpid::broker::Timer::run (this=<value optimized out>)
    at qpid/broker/Timer.cpp:67
#6  0x000000367816c76a in runRunnable (p=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:35
#7  0x0000003564c06367 in start_thread () from /lib64/libpthread.so.0
#8  0x00000035640d30ad in clone () from /lib64/libc.so.6
(gdb) quit
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
...
Core was generated by `qpidd -p 10006 --auth no --cluster-name dhcp-lab-200.englab.brq.redhat.com_clus'.
Program terminated with signal 11, Segmentation fault.
[New process 6455]
[New process 6454]
#0  0x000000356407b7ec in memcpy () from /lib64/libc.so.6
(gdb) 
Thread 2 (process 6454):
#0  0x00000035640cc967 in fdatasync () from /lib64/libc.so.6
#1  0x00002ad5e9426b5e in __os_fsync () from /usr/lib64/libdb_cxx-4.3.so
#2  0x00002ad5e941ab95 in __log_flush_int () from /usr/lib64/libdb_cxx-4.3.so
#3  0x00002ad5e941af9a in __log_flush () from /usr/lib64/libdb_cxx-4.3.so
#4  0x00002ad5e9431702 in __txn_dbenv_refresh ()
   from /usr/lib64/libdb_cxx-4.3.so
#5  0x00002ad5e9400d11 in ?? () from /usr/lib64/libdb_cxx-4.3.so
#6  0x00002ad5e9401012 in __dbenv_close () from /usr/lib64/libdb_cxx-4.3.so
#7  0x00002ad5e940115f in __dbenv_close_pp () from /usr/lib64/libdb_cxx-4.3.so
#8  0x00002ad5e909e21a in mrg::msgstore::MessageStoreImpl::~MessageStoreImpl ()
   from /usr/lib64/qpid/daemon/msgstore.so
#9  0x00000036787421db in ~MessageStoreModule (this=<value optimized out>)
    at qpid/broker/MessageStoreModule.cpp:39
#10 0x00000036786c9424 in ~Broker (this=<value optimized out>)
    at /usr/include/c++/4.1.2/memory:259
#11 0x000000356403363e in __cxa_finalize () from /lib64/libc.so.6
#12 0x0000003678688506 in __do_global_dtors_aux ()
   from /usr/lib64/libqpidbroker.so.0
#13 0x0000000000000000 in ?? ()

Thread 1 (process 6455):
#0  0x000000356407b7ec in memcpy () from /lib64/libc.so.6
#1  0x0000003569c9cfa8 in std::string::append () from /usr/lib64/libstdc++.so.6
#2  0x000000000040a7df in std::operator+<char, std::char_traits<char>, std::allocator<char> > (__lhs=0x36787b3e9f "console.obj.1.0.", __rhs=@0x3678a040a8)
    at /usr/include/c++/4.1.2/bits/basic_string.tcc:683
#3  0x00000036787a1e40 in qpid::management::ManagementBroker::periodicProcessing (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:390
#4  0x00000036787a2608 in qpid::management::ManagementBroker::Periodic::fire (
    this=<value optimized out>) at qpid/management/ManagementBroker.cpp:252
#5  0x000000367878797a in qpid::broker::Timer::run (this=<value optimized out>)
    at qpid/broker/Timer.cpp:67
#6  0x000000367816c76a in runRunnable (p=<value optimized out>)
    at qpid/sys/posix/Thread.cpp:35
#7  0x0000003564c06367 in start_thread () from /lib64/libpthread.so.0
#8  0x00000035640d30ad in clone () from /lib64/libc.so.6
(gdb) quit





Expected results:


Additional info:

Comment 1 Frantisek Reznicek 2009-05-14 12:23:10 UTC
Created attachment 343956 [details]
bz499872 reproducer, which can be used to reproduce this issue

Comment 2 Frantisek Reznicek 2009-05-14 12:30:27 UTC
mrg3.lab.bos.redhat.com:/root/bz499872_fail_rhel53_x86_64_090514.tar.bz2 full detail data just uploaded (include core files, detailed qpidd logs, qpidd journals and reproducer).
use './run.sh 10' to run that.

Feel free to change target milestone (currently 1.1.2) if needed.

Comment 3 Gordon Sim 2009-05-22 14:11:29 UTC
Possibly related?

(gdb) bt
#0  0x0000003c52e30215 in raise () from /lib64/libc.so.6
#1  0x0000003c52e31cc0 in abort () from /lib64/libc.so.6
#2  0x0000003c52e6a7fb in __libc_message () from /lib64/libc.so.6
#3  0x0000003c52e700b3 in malloc_consolidate () from /lib64/libc.so.6
#4  0x0000003c52e71a32 in _int_free () from /lib64/libc.so.6
#5  0x0000003c52e7590c in free () from /lib64/libc.so.6
#6  0x000000317229614e in ~Queue (this=<value optimized out>)
    at gen/qmf/org/apache/qpid/broker/Queue.cpp:74
#7  0x0000003172398a2f in ~ManagementBroker (this=<value optimized out>)
    at qpid/management/ManagementBroker.cpp:111
#8  0x0000003172397160 in ~Singleton (this=<value optimized out>)
    at qpid/management/ManagementBroker.cpp:66
#9  0x00000031722c9440 in ~Broker (this=<value optimized out>)
    at qpid/broker/Broker.cpp:341
#10 0x0000003c52e3363e in __cxa_finalize () from /lib64/libc.so.6
#11 0x0000003172288506 in __do_global_dtors_aux ()
   from /usr/lib64/libqpidbroker.so.0
#12 0x0000000000000000 in ?? ()

#0  0x0000003c52e30215 in raise () from /lib64/libc.so.6
#1  0x0000003c52e31cc0 in abort () from /lib64/libc.so.6
#2  0x0000003c52e6a7fb in __libc_message () from /lib64/libc.so.6
#3  0x0000003c52e700b3 in malloc_consolidate () from /lib64/libc.so.6
#4  0x0000003c52e71a32 in _int_free () from /lib64/libc.so.6
#5  0x0000003c52e7590c in free () from /lib64/libc.so.6
#6  0x00000031723a32bc in std::_Rb_tree<qpid::management::ObjectId, std::pair<qpid::management::ObjectId const, qpid::management::ManagementObject*>, std::_Select1st<std::pair<qpid::management::ObjectId const, qpid::management::ManagementObject*> >, std::less<qpid::management::ObjectId>, std::allocator<std::pair<qpid::management::ObjectId const, qpid::management::ManagementObject*> > >::_M_erase (this=<value optimized out>, __x=<value optimized out>)
    at /usr/include/c++/4.1.2/ext/new_allocator.h:94
#7  0x0000003172398a4b in ~ManagementBroker (this=<value optimized out>) at /usr/include/c++/4.1.2/bits/stl_tree.h:692
#8  0x0000003172397160 in ~Singleton (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:66
#9  0x00000031722c9440 in ~Broker (this=<value optimized out>) at qpid/broker/Broker.cpp:341
#10 0x0000003c52e3363e in __cxa_finalize () from /lib64/libc.so.6
#11 0x0000003172288506 in __do_global_dtors_aux () from /usr/lib64/libqpidbroker.so.0
#12 0x0000000000000000 in ?? ()

#0  0x0000003c52e30215 in raise () from /lib64/libc.so.6
#1  0x0000003c52e31cc0 in abort () from /lib64/libc.so.6
#2  0x0000003c52e6a7fb in __libc_message () from /lib64/libc.so.6
#3  0x0000003c52e700b3 in malloc_consolidate () from /lib64/libc.so.6
#4  0x0000003c52e71a32 in _int_free () from /lib64/libc.so.6
#5  0x0000003c52e7590c in free () from /lib64/libc.so.6
#6  0x0000003c57a9db6a in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string ()
   from /usr/lib64/libstdc++.so.6
#7  0x00000031722bf2bf in ~Cluster (this=<value optimized out>) at gen/qmf/org/apache/qpid/cluster/Cluster.cpp:63
#8  0x0000003172398a2f in ~ManagementBroker (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:111
#9  0x0000003172397160 in ~Singleton (this=<value optimized out>) at qpid/management/ManagementBroker.cpp:66
#10 0x00000031722c9440 in ~Broker (this=<value optimized out>) at qpid/broker/Broker.cpp:341
#11 0x0000003c52e3363e in __cxa_finalize () from /lib64/libc.so.6
#12 0x0000003172288506 in __do_global_dtors_aux () from /usr/lib64/libqpidbroker.so.0
#13 0x0000000000000000 in ?? ()

Comment 4 Ted Ross 2009-05-28 17:24:51 UTC
Created attachment 345800 [details]
Patch (off of svn revision 752581) that addresses a possibly-related shutdown issue

This patch fixes a problem in which the SignalHandler module holds a global, static intrusive_ptr to the Broker object.  During non-signal-induced shutdown, this reference keeps the broker in-scope much longer than it is supposed to be for a clean shutdown.

I don't know if this fixes the problem in this BZ, but it might.

Comment 5 Gordon Sim 2009-05-29 15:11:29 UTC
Believed fixed in qpidd-0.5.752581-10

Comment 6 Frantisek Reznicek 2009-06-01 14:54:41 UTC
The long running test of bug 502193 and bug 499872 together with extra test proved that issue has been fixed on RHEl 5.3 i386 / x86_64 on packages:
[root@intel-greencity-01 bz499872]# rpm -qa | egrep '(qpid|openais)' | sort -u
openais-0.80.3-22.el5_3.7
openais-debuginfo-0.80.3-22.el5_3.7
openais-devel-0.80.3-22.el5_3.7
python-qpid-0.5.752581-1.el5
qpidc-0.5.752581-10.el5
qpidc-debuginfo-0.5.752581-10.el5
qpidc-devel-0.5.752581-10.el5
qpidc-perftest-0.5.752581-10.el5
qpidc-rdma-0.5.752581-10.el5
qpidc-ssl-0.5.752581-10.el5
qpidd-0.5.752581-10.el5
qpidd-acl-0.5.752581-10.el5
qpidd-cluster-0.5.752581-10.el5
qpidd-devel-0.5.752581-10.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-10.el5
qpidd-ssl-0.5.752581-10.el5
qpidd-xml-0.5.752581-10.el5
qpid-java-client-0.5.751061-4.el5
qpid-java-common-0.5.751061-4.el5

-> VERIFIED

Comment 8 errata-xmlrpc 2009-06-12 17:39:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1097.html