Bug 489661 - Deadlock between ManagementBroker::userLock and LinkRegistry::lock
Summary: Deadlock between ManagementBroker::userLock and LinkRegistry::lock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.1.1
: ---
Assignee: Gordon Sim
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-11 09:32 UTC by Gordon Sim
Modified: 2015-11-16 00:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-21 16:16:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0434 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.1.1 2009-04-21 16:15:50 UTC

Description Gordon Sim 2009-03-11 09:32:45 UTC
Dispatching a management method holds the userLock in ManagementBroker; it may then require the lock in the LinkRegistry if it is to declare a link or bridge.

Linkregistry::notifyConnection() firsttakes the Linkregistry lock, then as part of establishing the connection it raises a management event which requires the managementBrokers userLock.

Thread 5 (Thread -1208362080 (LWP 28312)):
#0 0x45b8b410 in __kernel_vsyscall ()
#1 0x45f0b97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2 0x45f08247 in _L_mutex_lock_340 () from /lib/libpthread.so.0
#3 0x45f0819f in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x45c7dd29 in pthread_mutex_lock () from /lib/libc.so.6
#5 0x00edf33d in qpid::management::ManagementBroker::periodicProcessing ()
#6 0x00ee00a6 in qpid::management::ManagementBroker::Periodic::fire ()
#7 0x00eceafb in qpid::broker::Timer::run ()
#8 0x00a01c11 in qpid::sys::(anonymous namespace)::runRunnable ()
#9 0x45f0640b in start_thread () from /lib/libpthread.so.0
#10 0x45c71b7e in clone () from /lib/libc.so.6
Thread 4 (Thread -1218851936 (LWP 28313)):
#0 0x45b8b410 in __kernel_vsyscall ()
#1 0x45f0b97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2 0x45f08247 in _L_mutex_lock_340 () from /lib/libpthread.so.0
#3 0x45f0819f in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x45c7dd29 in pthread_mutex_lock () from /lib/libc.so.6
#5 0x00e715cb in qpid::broker::LinkRegistry::periodicMaintenance ()
#6 0x00e71fa8 in qpid::broker::LinkRegistry::Periodic::fire ()
#7 0x00eceafb in qpid::broker::Timer::run ()
#8 0x00a01c11 in qpid::sys::(anonymous namespace)::runRunnable ()
#9 0x45f0640b in start_thread () from /lib/libpthread.so.0
#10 0x45c71b7e in clone () from /lib/libc.so.6
Thread 3 (Thread -1229341792 (LWP 28314)):
#0 0x45b8b410 in __kernel_vsyscall ()
#1 0x45f0967c in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#2 0x45c7dbf4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3 0x00ecebf0 in qpid::broker::Timer::run ()
#4 0x00a01c11 in qpid::sys::(anonymous namespace)::runRunnable ()
#5 0x45f0640b in start_thread () from /lib/libpthread.so.0
#6 0x45c71b7e in clone () from /lib/libc.so.6
Thread 2 (Thread -1239831648 (LWP 28317)):
#0 0x45b8b410 in __kernel_vsyscall ()
#1 0x45f0b97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2 0x45f08247 in _L_mutex_lock_340 () from /lib/libpthread.so.0
#3 0x45f0819f in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x45c7dd29 in pthread_mutex_lock () from /lib/libc.so.6
#5 0x00e6f1bc in qpid::broker::LinkRegistry::declare () <--- LinkRegistry::lock required
#6 0x00e6b2d1 in qpid::broker::Link::ManagementMethod ()
#7 0x00dd9ae5 in qmf::org::apache::qpid::broker::Link::doMethod ()
#8 0x00ede600 in qpid::management::ManagementBroker::handleMethodRequestLH ()
#9 0x00ee811c in qpid::management::ManagementBroker::dispatchAgentCommandLH ()
#10 0x00ee8650 in qpid::management::ManagementBroker::dispatchCommand () <--- ManagementBroker::userLock held
#11 0x00eebbf3 in qpid::broker::ManagementExchange::route ()
#12 0x00eaab9d in qpid::broker::SemanticState::route ()
#13 0x00eabeea in qpid::broker::SemanticState::handle ()
#14 0x00ec3412 in qpid::broker::SessionState::handleContent ()
#15 0x00ec3a03 in qpid::broker::SessionState::handleIn ()
#16 0x00ec73eb in qpid::framing::Handler<qpid::framing::AMQFrame&>::MemFunRef<qp id::framing::Handler<qpid::framing::AMQFrame&>::InOutHandlerInterface, &(qpid::f raming::Handler<qpid::framing::AMQFrame&>::InOutHandlerInterface::handleIn(qpid: :framing::AMQFrame&))>::handle ()
#17 0x00a2ddaf in qpid::amqp_0_10::SessionHandler::handleIn ()
#18 0x00ec73eb in qpid::framing::Handler<qpid::framing::AMQFrame&>::MemFunRef<qp id::framing::Handler<qpid::framing::AMQFrame&>::InOutHandlerInterface, &(qpid::f raming::Handler<qpid::framing::AMQFrame&>::InOutHandlerInterface::handleIn(qpid: :framing::AMQFrame&))>::handle ()
#19 0x00e384ac in qpid::broker::Connection::received ()
#20 0x00bc3602 in ?? ()
#21 0xb380134c in ?? ()
#22 0x09d82608 in ?? ()
#23 0x09d851b8 in ?? ()
#24 0x45c7dd55 in pthread_mutex_unlock () from /lib/libc.so.6
#25 0x00ba3773 in ?? ()
#26 0xb38012a8 in ?? ()
#27 0x09d825f8 in ?? ()
#28 0x09d825f8 in ?? ()
#29 0xb6199dd8 in ?? ()
#30 0x000000c1 in ?? ()
#31 0x00ba47fd in ?? ()
#32 0x09d80568 in ?? ()
#33 0x09d825f8 in ?? ()
#34 0x00000002 in ?? ()
#35 0x00000001 in ?? ()
#36 0xb6199e68 in ?? ()
#37 0x00bac664 in ?? ()
#38 0x09d815e0 in ?? ()
#39 0x09d825f8 in ?? ()
#40 0xb6199e48 in ?? ()
#41 0x45c7dd55 in pthread_mutex_unlock () from /lib/libc.so.6
Thread 1 (Thread -1208158480 (LWP 28311)):
#0 0x45b8b410 in __kernel_vsyscall ()
#1 0x45f0b97e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2 0x45f08247 in _L_mutex_lock_340 () from /lib/libpthread.so.0
#3 0x45f0819f in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x45c7dd29 in pthread_mutex_lock () from /lib/libc.so.6
#5 0x00ee012a in qpid::management::ManagementBroker::raiseEvent () <--- ManagementBroker::userLock required
#6 0x00e6ac23 in qpid::broker::Link::established ()
#7 0x00e70805 in qpid::broker::LinkRegistry::notifyConnection () <--- LinkRegistry::lock held
#8 0x00e3b57d in qpid::broker::Connection::Connection ()
#9 0x00bc80ee in ?? ()
#10 0x09d84dcc in ?? ()
#11 0x09d84d68 in ?? ()
#12 0x09d7ef88 in ?? ()
#13 0x09d85b24 in ?? ()
#14 0x00000001 in ?? ()
#15 0x00000000 in ?? ()

Comment 1 Gordon Sim 2009-03-11 11:51:27 UTC
Suggested fix for this specific case is to not hold the LinkRegistry lock across calls to the Link in notifyConnection() and similar calls. E.g.:

Index: src/qpid/broker/LinkRegistry.cpp
===================================================================
--- src/qpid/broker/LinkRegistry.cpp    (revision 752399)
+++ src/qpid/broker/LinkRegistry.cpp    (working copy)
@@ -238,67 +238,70 @@
     return store;
 }

-void LinkRegistry::notifyConnection(const std::string& key, Connection* c)
+Link::shared_ptr LinkRegistry::findLink(const std::string& key)
 {
     Mutex::ScopedLock locker(lock);
     LinkMap::iterator l = links.find(key);
-    if (l != links.end())
-    {
-        l->second->established();
-        l->second->setConnection(c);
-        c->setUserId(str(format("%1%@%2%") % l->second->getUsername() % realm));
+    if (l != links.end()) return l->second;
+    else return Link::shared_ptr();
+}
+
+void LinkRegistry::notifyConnection(const std::string& key, Connection* c)
+{
+    Link::shared_ptr link = findLink(key);
+    if (link) {
+        link->established();
+        link->setConnection(c);
+        c->setUserId(str(format("%1%@%2%") % link->getUsername() % realm));
     }
 }

 void LinkRegistry::notifyClosed(const std::string& key)
 {
-    Mutex::ScopedLock locker(lock);
-    LinkMap::iterator l = links.find(key);
-    if (l != links.end())
-        l->second->closed(0, "Closed by peer");
+    Link::shared_ptr link = findLink(key);
+    if (link) {
+        link->closed(0, "Closed by peer");
+    }
 }

 void LinkRegistry::notifyConnectionForced(const std::string& key, const std::string& text)
 {
-    Mutex::ScopedLock locker(lock);
-    LinkMap::iterator l = links.find(key);
-    if (l != links.end())
-        l->second->notifyConnectionForced(text);
+    Link::shared_ptr link = findLink(key);
+    if (link) {
+        link->notifyConnectionForced(text);
+    }
 }

 std::string LinkRegistry::getAuthMechanism(const std::string& key)
 {
-    Mutex::ScopedLock locker(lock);
-    LinkMap::iterator l = links.find(key);
-    if (l != links.end())
-        return l->second->getAuthMechanism();
+    Link::shared_ptr link = findLink(key);
+    if (link)
+        return link->getAuthMechanism();
     return string("ANONYMOUS");
 }

 std::string LinkRegistry::getAuthCredentials(const std::string& key)
 {
-    Mutex::ScopedLock locker(lock);
-    LinkMap::iterator l = links.find(key);
-    if (l == links.end())
+    Link::shared_ptr link = findLink(key);
+    if (!link)
         return string();

     string result;
     result += '\0';
-    result += l->second->getUsername();
+    result += link->getUsername();
     result += '\0';
-    result += l->second->getPassword();
+    result += link->getPassword();

     return result;
 }

 std::string LinkRegistry::getAuthIdentity(const std::string& key)
 {
-    Mutex::ScopedLock locker(lock);
-    LinkMap::iterator l = links.find(key);
-    if (l == links.end())
+    Link::shared_ptr link = findLink(key);
+    if (!link)
         return string();

-    return l->second->getUsername();
+    return link->getUsername();
 }


Index: src/qpid/broker/LinkRegistry.h
===================================================================
--- src/qpid/broker/LinkRegistry.h      (revision 752399)
+++ src/qpid/broker/LinkRegistry.h      (working copy)
@@ -70,6 +70,7 @@

         void periodicMaintenance ();
         bool updateAddress(const std::string& oldKey, const TcpAddress& newAddress);
+        Link::shared_ptr findLink(const std::string& key);
         static std::string createKey(const TcpAddress& address);

     public:

However it looks like there may be other cases, e.g. the Link::closed() method itself holds the Link's lock while raising a management event. The lock order there is the opposite of the lock order that might arise from a management method taking the Link objects lock.

Perhaps therefore the following might be a more reliable fix in addition to the above:

Index: src/qpid/management/ManagementBroker.cpp
===================================================================
--- src/qpid/management/ManagementBroker.cpp    (revision 752399)
+++ src/qpid/management/ManagementBroker.cpp    (working copy)
@@ -518,6 +518,7 @@
         else
             try {
                 outBuffer.record();
+                Mutex::ScopedUnlock u(userLock);
                 iter->second->doMethod(methodName, inBuffer, outBuffer);
             } catch(exception& e) {
                 outBuffer.restore();

Comment 2 Gordon Sim 2009-03-11 16:44:02 UTC
Fixed by above patch, applied as r752510.

Comment 3 Gordon Sim 2009-03-11 16:50:43 UTC
Very hard to reproduce; I saw it on one random occasion when running clustered_federation_test but I suspect it could happen under any test creating a link then a bridge.

I can't offer any simple reproducer I'm afraid. Bug verified on dev side by code inspection.

Comment 5 Frantisek Reznicek 2009-03-25 08:36:38 UTC
The issue has been fixed. Validated on RHEL 4.7 / 5.3 using clustered federation_test and code review.

->VERIFIED

Comment 7 errata-xmlrpc 2009-04-21 16:16:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html


Note You need to log in before you can comment on or make changes to this bug.