Bug 697913 - Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
Summary: Deadlock between the failover mutex (in AMQConnection.java) and the current_e...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-java
Version: Development
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: 2.0
: ---
Assignee: Rajith Attapattu
QA Contact: Petr Matousek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-19 16:01 UTC by Rajith Attapattu
Modified: 2011-06-23 15:44 UTC (History)
6 users (show)

Fixed In Version: qpid-java-0.10-6
Doc Type: Bug Fix
Doc Text:
Cause This happens when the application uses a synchronous operation and an exception is reported by the broker. The Qpid client tries to report the exception via the connection listener and also as a JMS exception thrown during the blocking method call. Consequence This bug causes a deadlock and could cause the application to become unresponsive. Fix The call to connection.exceptionReceived() is done outside the scope of the current_exception_lock in AMQSsession.java Result The Qpid client does not deadlock anymore.
Clone Of:
Environment:
Last Closed: 2011-06-23 15:44:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:0890 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 2.0 Release 2011-06-23 15:42:41 UTC

Description Rajith Attapattu 2011-04-19 16:01:16 UTC
Description of problem:


As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java
This is a regression and was introduced in rev 985262

Found one Java-level deadlock:
=============================
"IoReceiver - localhost/127.0.0.1:15672":
waiting to lock monitor 0x0000002ac2ea3b70 (object 0x0000002ab70156b0, a java.lang.Object),
which is held by "main"
"main":
waiting to lock monitor 0x0000002ac28db1b8 (object 0x0000002ab7048d70, a java.lang.Object),
which is held by "IoReceiver - localhost/127.0.0.1:15672"

Java stack information for the threads listed above:
===================================================
"IoReceiver - localhost/127.0.0.1:15672":
at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297)

    * waiting to lock<0x0000002ab70156b0> (a java.lang.Object)
      at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033)
    * locked<0x0000002ab7048d70> (a java.lang.Object)
      at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913)
      at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156)
      at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32)
      at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112)
      at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50)
      at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32)
      at org.apache.qpid.transport.Method.delegate(Method.java:159)
      at org.apache.qpid.transport.Session.received(Session.java:528)
      at org.apache.qpid.transport.Connection.dispatch(Connection.java:404)
      at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64)
      at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40)
      at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110)
      at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112)
      at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54)
      at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40)
      at org.apache.qpid.transport.Method.delegate(Method.java:159)
      at org.apache.qpid.transport.Connection.received(Connection.java:369)
      at org.apache.qpid.transport.Connection.received(Connection.java:59)
      at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95)
      at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196)
      at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129)
      at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133)
      at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100)
      at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42)
      at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187)
      at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103)
      at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42)
      at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128)
      at java.lang.Thread.run(Thread.java:619)
      "main":
      at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025)
    * waiting to lock<0x0000002ab7048d70> (a java.lang.Object)
      at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193)
      at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:573)
    * locked<0x0000002ab70156b0> (a java.lang.Object)
      at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:535)
      at org.apache.qpid.client.AMQQueueBrowser.close(AMQQueueBrowser.java:102)
      at org.apache.qpid.test.client.QueueBrowserAutoAckTest.testFailoverWithQueueBrowser(QueueBrowserAutoAckTest.java:501)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at junit.framework.TestCase.runTest(TestCase.java:154)
      at junit.framework.TestCase.runBare(TestCase.java:127)
      at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:234)
      at junit.framework.TestResult$1.protect(TestResult.java:106)
      at junit.framework.TestResult.runProtected(TestResult.java:124)
      at junit.framework.TestResult.run(TestResult.java:109)
      at junit.framework.TestCase.run(TestCase.java:118)
      at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:120)
      at junit.framework.TestSuite.runTest(TestSuite.java:208)
      at junit.framework.TestSuite.run(TestSuite.java:203)
      at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
      at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
      at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:546)

Found 1 deadlock


Version-Release number of selected component (if applicable):
All beta/RC packages created for the 2.0 release.

How reproducible:
When running QueueBrowserAutoAckTest it happens fairly regularly.

Steps to Reproduce:
1. Run ant test -Dprofile=cpp
2. When the tests seems to be stuck in QueueBrowserAutoAckTest find the process and do a kill -3 on it.

Actual results:
Deadlock happens as per above description.

Expected results:
Should not deadlock

Comment 1 Gordon Sim 2011-04-26 11:26:54 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=698657

Comment 2 Rajith Attapattu 2011-05-16 21:25:40 UTC
This is tracked in upstream via QPID-3214  	

A fix was committed in upstream at rev 1099060
http://svn.apache.org/viewvc?view=revision&revision=1099060

This was ported to the internal mrg_2.0.x release branch at,
http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=mrg_2.0.x&id=cd703ec7af8dd9c14c6dd10ceb47c445ac177c2b

Comment 3 Rajith Attapattu 2011-05-16 21:26:25 UTC
*** Bug 698657 has been marked as a duplicate of this bug. ***

Comment 4 Rajith Attapattu 2011-05-19 21:35:08 UTC
The fix is included in on qpid-java-0.10-6

Comment 5 Petr Matousek 2011-05-31 15:01:58 UTC
This issue has been fixed. 

Verified on RHEL5.6, RHEL6.1 architectures: i386, x86_64

Java unit tests from qpid-java-0.10-6 package were executed in loop, no deadlock has occurred.

During the verification of this bug, I've noticed that several java unit tests fails, please see Bug 709383

packages installed:
python-qpid-0.10-1.el5
python-qpid-qmf-0.10-9.el5
qpid-cpp-client-0.10-7.el5
qpid-cpp-client-devel-0.10-7.el5
qpid-cpp-client-devel-docs-0.10-7.el5
qpid-cpp-client-rdma-0.10-7.el5
qpid-cpp-client-ssl-0.10-7.el5
qpid-cpp-mrg-debuginfo-0.10-7.el5
qpid-cpp-server-0.10-7.el5
qpid-cpp-server-cluster-0.10-7.el5
qpid-cpp-server-devel-0.10-7.el5
qpid-cpp-server-rdma-0.10-7.el5
qpid-cpp-server-ssl-0.10-7.el5
qpid-cpp-server-store-0.10-7.el5
qpid-cpp-server-xml-0.10-7.el5
qpid-java-client-0.10-6.el5
qpid-java-common-0.10-6.el5
qpid-java-example-0.10-6.el5
qpid-qmf-0.10-9.el5
qpid-qmf-devel-0.10-9.el5
qpid-tools-0.10-5.el5

-> VERIFIED

Comment 6 Rajith Attapattu 2011-06-15 15:13:08 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
   This happens when the application uses a synchronous operation and an exception is reported by the broker. The Qpid client tries to report the exception via the connection listener and also as a JMS exception thrown during the blocking method call. 

Consequence
    This bug causes a deadlock and could cause the application to become unresponsive.

Fix
    The call to connection.exceptionReceived() is done outside the scope of the current_exception_lock in AMQSsession.java

Result
    The Qpid client does not deadlock anymore.

Comment 7 errata-xmlrpc 2011-06-23 15:44:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0890.html


Note You need to log in before you can comment on or make changes to this bug.