Dispatching a management method holds the userLock in ManagementBroker; it may then require the lock in the LinkRegistry if it is to declare a link or bridge. Linkregistry::notifyConnection() firsttakes the Linkregistry lock, then as part of establishing the connection it raises a management event which requires the managementBrokers userLock. Thread 5 (Thread -1208362080 (LWP 28312)): #0 0x45b8b410 in __kernel_vsyscall () #1 0x45f0b97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0 #2 0x45f08247 in _L_mutex_lock_340 () from /lib/libpthread.so.0 #3 0x45f0819f in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x45c7dd29 in pthread_mutex_lock () from /lib/libc.so.6 #5 0x00edf33d in qpid::management::ManagementBroker::periodicProcessing () #6 0x00ee00a6 in qpid::management::ManagementBroker::Periodic::fire () #7 0x00eceafb in qpid::broker::Timer::run () #8 0x00a01c11 in qpid::sys::(anonymous namespace)::runRunnable () #9 0x45f0640b in start_thread () from /lib/libpthread.so.0 #10 0x45c71b7e in clone () from /lib/libc.so.6 Thread 4 (Thread -1218851936 (LWP 28313)): #0 0x45b8b410 in __kernel_vsyscall () #1 0x45f0b97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0 #2 0x45f08247 in _L_mutex_lock_340 () from /lib/libpthread.so.0 #3 0x45f0819f in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x45c7dd29 in pthread_mutex_lock () from /lib/libc.so.6 #5 0x00e715cb in qpid::broker::LinkRegistry::periodicMaintenance () #6 0x00e71fa8 in qpid::broker::LinkRegistry::Periodic::fire () #7 0x00eceafb in qpid::broker::Timer::run () #8 0x00a01c11 in qpid::sys::(anonymous namespace)::runRunnable () #9 0x45f0640b in start_thread () from /lib/libpthread.so.0 #10 0x45c71b7e in clone () from /lib/libc.so.6 Thread 3 (Thread -1229341792 (LWP 28314)): #0 0x45b8b410 in __kernel_vsyscall () #1 0x45f0967c in pthread_cond_timedwait@@GLIBC_2.3.2 () #2 0x45c7dbf4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0x00ecebf0 in qpid::broker::Timer::run () #4 0x00a01c11 in qpid::sys::(anonymous namespace)::runRunnable () #5 0x45f0640b in start_thread () from /lib/libpthread.so.0 #6 0x45c71b7e in clone () from /lib/libc.so.6 Thread 2 (Thread -1239831648 (LWP 28317)): #0 0x45b8b410 in __kernel_vsyscall () #1 0x45f0b97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0 #2 0x45f08247 in _L_mutex_lock_340 () from /lib/libpthread.so.0 #3 0x45f0819f in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x45c7dd29 in pthread_mutex_lock () from /lib/libc.so.6 #5 0x00e6f1bc in qpid::broker::LinkRegistry::declare () <--- LinkRegistry::lock required #6 0x00e6b2d1 in qpid::broker::Link::ManagementMethod () #7 0x00dd9ae5 in qmf::org::apache::qpid::broker::Link::doMethod () #8 0x00ede600 in qpid::management::ManagementBroker::handleMethodRequestLH () #9 0x00ee811c in qpid::management::ManagementBroker::dispatchAgentCommandLH () #10 0x00ee8650 in qpid::management::ManagementBroker::dispatchCommand () <--- ManagementBroker::userLock held #11 0x00eebbf3 in qpid::broker::ManagementExchange::route () #12 0x00eaab9d in qpid::broker::SemanticState::route () #13 0x00eabeea in qpid::broker::SemanticState::handle () #14 0x00ec3412 in qpid::broker::SessionState::handleContent () #15 0x00ec3a03 in qpid::broker::SessionState::handleIn () #16 0x00ec73eb in qpid::framing::Handler<qpid::framing::AMQFrame&>::MemFunRef<qp id::framing::Handler<qpid::framing::AMQFrame&>::InOutHandlerInterface, &(qpid::f raming::Handler<qpid::framing::AMQFrame&>::InOutHandlerInterface::handleIn(qpid: :framing::AMQFrame&))>::handle () #17 0x00a2ddaf in qpid::amqp_0_10::SessionHandler::handleIn () #18 0x00ec73eb in qpid::framing::Handler<qpid::framing::AMQFrame&>::MemFunRef<qp id::framing::Handler<qpid::framing::AMQFrame&>::InOutHandlerInterface, &(qpid::f raming::Handler<qpid::framing::AMQFrame&>::InOutHandlerInterface::handleIn(qpid: :framing::AMQFrame&))>::handle () #19 0x00e384ac in qpid::broker::Connection::received () #20 0x00bc3602 in ?? () #21 0xb380134c in ?? () #22 0x09d82608 in ?? () #23 0x09d851b8 in ?? () #24 0x45c7dd55 in pthread_mutex_unlock () from /lib/libc.so.6 #25 0x00ba3773 in ?? () #26 0xb38012a8 in ?? () #27 0x09d825f8 in ?? () #28 0x09d825f8 in ?? () #29 0xb6199dd8 in ?? () #30 0x000000c1 in ?? () #31 0x00ba47fd in ?? () #32 0x09d80568 in ?? () #33 0x09d825f8 in ?? () #34 0x00000002 in ?? () #35 0x00000001 in ?? () #36 0xb6199e68 in ?? () #37 0x00bac664 in ?? () #38 0x09d815e0 in ?? () #39 0x09d825f8 in ?? () #40 0xb6199e48 in ?? () #41 0x45c7dd55 in pthread_mutex_unlock () from /lib/libc.so.6 Thread 1 (Thread -1208158480 (LWP 28311)): #0 0x45b8b410 in __kernel_vsyscall () #1 0x45f0b97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0 #2 0x45f08247 in _L_mutex_lock_340 () from /lib/libpthread.so.0 #3 0x45f0819f in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x45c7dd29 in pthread_mutex_lock () from /lib/libc.so.6 #5 0x00ee012a in qpid::management::ManagementBroker::raiseEvent () <--- ManagementBroker::userLock required #6 0x00e6ac23 in qpid::broker::Link::established () #7 0x00e70805 in qpid::broker::LinkRegistry::notifyConnection () <--- LinkRegistry::lock held #8 0x00e3b57d in qpid::broker::Connection::Connection () #9 0x00bc80ee in ?? () #10 0x09d84dcc in ?? () #11 0x09d84d68 in ?? () #12 0x09d7ef88 in ?? () #13 0x09d85b24 in ?? () #14 0x00000001 in ?? () #15 0x00000000 in ?? ()
Suggested fix for this specific case is to not hold the LinkRegistry lock across calls to the Link in notifyConnection() and similar calls. E.g.: Index: src/qpid/broker/LinkRegistry.cpp =================================================================== --- src/qpid/broker/LinkRegistry.cpp (revision 752399) +++ src/qpid/broker/LinkRegistry.cpp (working copy) @@ -238,67 +238,70 @@ return store; } -void LinkRegistry::notifyConnection(const std::string& key, Connection* c) +Link::shared_ptr LinkRegistry::findLink(const std::string& key) { Mutex::ScopedLock locker(lock); LinkMap::iterator l = links.find(key); - if (l != links.end()) - { - l->second->established(); - l->second->setConnection(c); - c->setUserId(str(format("%1%@%2%") % l->second->getUsername() % realm)); + if (l != links.end()) return l->second; + else return Link::shared_ptr(); +} + +void LinkRegistry::notifyConnection(const std::string& key, Connection* c) +{ + Link::shared_ptr link = findLink(key); + if (link) { + link->established(); + link->setConnection(c); + c->setUserId(str(format("%1%@%2%") % link->getUsername() % realm)); } } void LinkRegistry::notifyClosed(const std::string& key) { - Mutex::ScopedLock locker(lock); - LinkMap::iterator l = links.find(key); - if (l != links.end()) - l->second->closed(0, "Closed by peer"); + Link::shared_ptr link = findLink(key); + if (link) { + link->closed(0, "Closed by peer"); + } } void LinkRegistry::notifyConnectionForced(const std::string& key, const std::string& text) { - Mutex::ScopedLock locker(lock); - LinkMap::iterator l = links.find(key); - if (l != links.end()) - l->second->notifyConnectionForced(text); + Link::shared_ptr link = findLink(key); + if (link) { + link->notifyConnectionForced(text); + } } std::string LinkRegistry::getAuthMechanism(const std::string& key) { - Mutex::ScopedLock locker(lock); - LinkMap::iterator l = links.find(key); - if (l != links.end()) - return l->second->getAuthMechanism(); + Link::shared_ptr link = findLink(key); + if (link) + return link->getAuthMechanism(); return string("ANONYMOUS"); } std::string LinkRegistry::getAuthCredentials(const std::string& key) { - Mutex::ScopedLock locker(lock); - LinkMap::iterator l = links.find(key); - if (l == links.end()) + Link::shared_ptr link = findLink(key); + if (!link) return string(); string result; result += '\0'; - result += l->second->getUsername(); + result += link->getUsername(); result += '\0'; - result += l->second->getPassword(); + result += link->getPassword(); return result; } std::string LinkRegistry::getAuthIdentity(const std::string& key) { - Mutex::ScopedLock locker(lock); - LinkMap::iterator l = links.find(key); - if (l == links.end()) + Link::shared_ptr link = findLink(key); + if (!link) return string(); - return l->second->getUsername(); + return link->getUsername(); } Index: src/qpid/broker/LinkRegistry.h =================================================================== --- src/qpid/broker/LinkRegistry.h (revision 752399) +++ src/qpid/broker/LinkRegistry.h (working copy) @@ -70,6 +70,7 @@ void periodicMaintenance (); bool updateAddress(const std::string& oldKey, const TcpAddress& newAddress); + Link::shared_ptr findLink(const std::string& key); static std::string createKey(const TcpAddress& address); public: However it looks like there may be other cases, e.g. the Link::closed() method itself holds the Link's lock while raising a management event. The lock order there is the opposite of the lock order that might arise from a management method taking the Link objects lock. Perhaps therefore the following might be a more reliable fix in addition to the above: Index: src/qpid/management/ManagementBroker.cpp =================================================================== --- src/qpid/management/ManagementBroker.cpp (revision 752399) +++ src/qpid/management/ManagementBroker.cpp (working copy) @@ -518,6 +518,7 @@ else try { outBuffer.record(); + Mutex::ScopedUnlock u(userLock); iter->second->doMethod(methodName, inBuffer, outBuffer); } catch(exception& e) { outBuffer.restore();
Fixed by above patch, applied as r752510.
Very hard to reproduce; I saw it on one random occasion when running clustered_federation_test but I suspect it could happen under any test creating a link then a bridge. I can't offer any simple reproducer I'm afraid. Bug verified on dev side by code inspection.
The issue has been fixed. Validated on RHEL 4.7 / 5.3 using clustered federation_test and code review. ->VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0434.html