Bug 596765 - Shutdown order in broker causes invalid writes by ManagementObject in store
Summary: Shutdown order in broker causes invalid writes by ManagementObject in store
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.3
: ---
Assignee: Alan Conway
QA Contact: Jan Sarenik
URL:
Whiteboard:
Depends On:
Blocks: 595438
TreeView+ depends on / blocked
 
Reported: 2010-05-27 13:26 UTC by Kim van der Riet
Modified: 2010-10-20 11:29 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-10-20 11:29:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Kim van der Riet 2010-05-27 13:26:05 UTC
If the broker is stopped with ctrl-C and clustering is loaded, the broker can core. See bug 595438 for details.

When running this scenario using valgrind, it shows the following errors:

2010-05-27 08:59:48 notice Shut down
==3177== Invalid write of size 8
==3177==    at 0x54DDED5: qpid::management::ManagementObject::resourceDestroy() (ManagementObject.cpp:265)
==3177==    by 0x6C0F78D: mrg::msgstore::JournalImpl::~JournalImpl() (JournalImpl.cpp:129)
==3177==    by 0x6C50BE2: mrg::msgstore::TplJournalImpl::~TplJournalImpl() (JournalImpl.h:242)
==3177==    by 0x6C39448: void boost::checked_delete<mrg::msgstore::TplJournalImpl>(mrg::msgstore::TplJournalImpl*) (checked_delete.hpp:34)
==3177==    by 0x6C3B3E2: boost::detail::sp_counted_impl_p<mrg::msgstore::TplJournalImpl>::dispose() (sp_counted_impl.hpp:76)
==3177==    by 0x409599: boost::detail::sp_counted_base::release() (sp_counted_base_gcc_x86.hpp:145)
==3177==    by 0x4095C9: boost::detail::shared_count::~shared_count() (shared_count.hpp:159)
==3177==    by 0x6C37220: boost::shared_ptr<mrg::msgstore::TplJournalImpl>::~shared_ptr() (shared_ptr.hpp:106)
==3177==    by 0x6C2A044: mrg::msgstore::MessageStoreImpl::~MessageStoreImpl() (MessageStoreImpl.cpp:450)
==3177==    by 0x6C082C6: void boost::checked_delete<mrg::msgstore::MessageStoreImpl>(mrg::msgstore::MessageStoreImpl*) (checked_delete.hpp:34)
==3177==    by 0x6C08504: boost::detail::sp_counted_impl_p<mrg::msgstore::MessageStoreImpl>::dispose() (sp_counted_impl.hpp:76)
==3177==    by 0x409599: boost::detail::sp_counted_base::release() (sp_counted_base_gcc_x86.hpp:145)
==3177==  Address 0x5fc8060 is 16 bytes inside a block of size 296 free'd
==3177==    at 0x4A05743: operator delete(void*) (vg_replace_malloc.c:346)
==3177==    by 0x6CA3ABE: qmf::com::redhat::rhm::store::Journal::~Journal() (Journal.cpp:100)
==3177==    by 0x4F2C359: qpid::management::ManagementAgent::~ManagementAgent() (ManagementAgent.cpp:135)
==3177==    by 0x4E3278F: std::auto_ptr<qpid::management::ManagementAgent>::~auto_ptr() (memory:259)
==3177==    by 0x4E29B89: qpid::broker::Broker::~Broker() (Broker.cpp:364)
==3177==    by 0x4E259A6: qpid::RefCounted::released() const (RefCounted.h:48)
==3177==    by 0x40CD02: qpid::RefCounted::release() const (RefCounted.h:42)
==3177==    by 0x40CD1A: boost::intrusive_ptr_release(qpid::RefCounted const*) (RefCounted.h:57)
==3177==    by 0x40CD63: boost::intrusive_ptr<qpid::broker::Broker>::~intrusive_ptr() (intrusive_ptr.hpp:83)
==3177==    by 0x40AEF9: QpiddBroker::execute(QpiddOptions*) (QpiddBroker.cpp:176)
==3177==    by 0x40927C: main (qpidd.cpp:80)
==3177== 
==3177== Invalid write of size 1
==3177==    at 0x54DDEDD: qpid::management::ManagementObject::resourceDestroy() (ManagementObject.cpp:266)
==3177==    by 0x6C0F78D: mrg::msgstore::JournalImpl::~JournalImpl() (JournalImpl.cpp:129)
==3177==    by 0x6C50BE2: mrg::msgstore::TplJournalImpl::~TplJournalImpl() (JournalImpl.h:242)
==3177==    by 0x6C39448: void boost::checked_delete<mrg::msgstore::TplJournalImpl>(mrg::msgstore::TplJournalImpl*) (checked_delete.hpp:34)
==3177==    by 0x6C3B3E2: boost::detail::sp_counted_impl_p<mrg::msgstore::TplJournalImpl>::dispose() (sp_counted_impl.hpp:76)
==3177==    by 0x409599: boost::detail::sp_counted_base::release() (sp_counted_base_gcc_x86.hpp:145)
==3177==    by 0x4095C9: boost::detail::shared_count::~shared_count() (shared_count.hpp:159)
==3177==    by 0x6C37220: boost::shared_ptr<mrg::msgstore::TplJournalImpl>::~shared_ptr() (shared_ptr.hpp:106)
==3177==    by 0x6C2A044: mrg::msgstore::MessageStoreImpl::~MessageStoreImpl() (MessageStoreImpl.cpp:450)
==3177==    by 0x6C082C6: void boost::checked_delete<mrg::msgstore::MessageStoreImpl>(mrg::msgstore::MessageStoreImpl*) (checked_delete.hpp:34)
==3177==    by 0x6C08504: boost::detail::sp_counted_impl_p<mrg::msgstore::MessageStoreImpl>::dispose() (sp_counted_impl.hpp:76)
==3177==    by 0x409599: boost::detail::sp_counted_base::release() (sp_counted_base_gcc_x86.hpp:145)
==3177==  Address 0x5fc80a2 is 82 bytes inside a block of size 296 free'd
==3177==    at 0x4A05743: operator delete(void*) (vg_replace_malloc.c:346)
==3177==    by 0x6CA3ABE: qmf::com::redhat::rhm::store::Journal::~Journal() (Journal.cpp:100)
==3177==    by 0x4F2C359: qpid::management::ManagementAgent::~ManagementAgent() (ManagementAgent.cpp:135)
==3177==    by 0x4E3278F: std::auto_ptr<qpid::management::ManagementAgent>::~auto_ptr() (memory:259)
==3177==    by 0x4E29B89: qpid::broker::Broker::~Broker() (Broker.cpp:364)
==3177==    by 0x4E259A6: qpid::RefCounted::released() const (RefCounted.h:48)
==3177==    by 0x40CD02: qpid::RefCounted::release() const (RefCounted.h:42)
==3177==    by 0x40CD1A: boost::intrusive_ptr_release(qpid::RefCounted const*) (RefCounted.h:57)
==3177==    by 0x40CD63: boost::intrusive_ptr<qpid::broker::Broker>::~intrusive_ptr() (intrusive_ptr.hpp:83)
==3177==    by 0x40AEF9: QpiddBroker::execute(QpiddOptions*) (QpiddBroker.cpp:176)
==3177==    by 0x40927C: main (qpidd.cpp:80)

The reason for the error is that the store calls _mgmtObject->resourceDestroy() in its destructor after the broker has already destroyed the management agent.

To reproduce on RHEL-5.5:
0. Create two data dirs: /tmp/c0 and /tmp/c1
1. Enable openais/clustering.
2. Start two brokers in two windows:

window 1:
./qpidd --load-module .libs/cluster.so --load-module /home/kpvdr/store/lib/.libs/msgstore.so --cluster-name XXX --data-dir /tmp/c0 --auth no --port 0 --truncate yes --log-enable info+

window 2:
valgrind .libs/lt-qpidd --load-module .libs/cluster.so --load-module /home/kpvdr/store/lib/.libs/msgstore.so --cluster-name XXX --data-dir /tmp/c1 --auth no --port 0 --truncate yes --log-enable info+

3. kill the broker in window 2 using ctrl-c

It is possible that this is the cause of bug 595438.

Comment 1 Kim van der Riet 2010-05-27 13:45:56 UTC
I can confirm that if the cluster initialization fails and the broker is thus shut down, the same error as above results:

2010-05-27 09:42:09 critical Unexpected error: Cluster-ID mismatch. Stores belong to different clusters.
==3403== Invalid write of size 8
...
==3403== Invalid write of size 1
...

Comment 2 Alan Conway 2010-05-27 18:12:49 UTC
Fixed in store revision 3995

Remove global shared_ptr to store in store plugin.

The global shared_ptr delays destruction of the store till after the broker is deleted causing core dumps when unregistering management objects.

Comment 3 Jan Sarenik 2010-06-01 09:27:06 UTC
Verified on RHEL5.5 x86_64
  qpid-cpp-server-store-0.7.946106-2.el5
  qpid-cpp-server-cluster-0.7.946106-2.el5

Very easily reproduced on the same system with -1 build packages.

Comment 5 Jan Sarenik 2010-06-01 12:53:43 UTC
Verified also on the same versions of packages for i386 RHEL5.5


Note You need to log in before you can comment on or make changes to this bug.