Bug 633969
| Summary: | Resource locked exception thrown during failover of a JMS durable subscriber | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Rajith Attapattu <rattapat+nobody> | ||||||||
| Component: | qpid-cpp | Assignee: | Rajith Attapattu <rattapat+nobody> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Jeff Needle <jneedle> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | Development | CC: | gsim, rmusil | ||||||||
| Target Milestone: | 1.3 | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2010-10-20 11:30:33 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Rajith Attapattu
2010-09-14 19:24:55 UTC
Created attachment 447328 [details] Reproducer (1) The attached test class points to localhost:7672 as the initial broker. Please modify it to suit your test environment. (2) In order to reproduce this issue you need to side step the issue described in Bug 633942. Therefore use -Dqpid.dest_syntax=BURL (jvm arg) when running the test client. Created attachment 447601 [details]
Reproducer2
The same issue can be reproduced more easily with the attached reproducer2.
Extract the tar file and run the test-action.sh script to observe the error.
1. The same issue is observed with the current set of packages as well as trunk (as of rev 997048).
2. However the test case (Reproducer2) passes in some machines. Perhaps there is some timing issue?
Another important point to note is that in the same machine Reproducer1 (the durable subscriber test) fails.
3. When running the reproducer2 in addition to the resource locked exception you could also see a series of channel not attached exceptions.
2010-09-15 18:47:48 error Channel exception: not-attached: Channel 12 is not attached (qpid/amqp_0_10/SessionHandler.cpp:39)
2010-09-15 18:47:48 error Channel exception: not-attached: Channel 12 is not attached (qpid/amqp_0_10/SessionHandler.cpp:39)
However you also see the following, which is essentially the same problem described above.
IoReceiver - /192.168.1.103:5672 2010-09-15 18:47:49,188 ERROR [apache.qpid.client.AMQConnectionDelegate_0_10] previous exception
org.apache.qpid.transport.ConnectionException: too many exceptions: ch=4 id=0 ExecutionException(errorCode=RESOURCE_LOCKED, commandId=24, classCode=4, commandCode=7, fieldIndex=0, description=resource-locked: Queue _ has an exclusive consumer. No more consumers allowed. (qpid/broker/Queue.cpp:459), errorInfo={}), ch=12 id=0 ExecutionException(errorCode=RESOURCE_LOCKED, commandId=24, classCode=4, commandCode=7, fieldIndex=0, description=resource-locked: Queue _ has an exclusive consumer. No more consumers allowed. (qpid/broker/Queue.cpp:459), errorInfo={})
The resource locked exception is due to duplicate subscriptions created on the same queue by the Java client during failover. These duplicate subscriptions happen at different layers in the client. The JMS layer recreates subscriptions after failover. However the AMQP commands stored in the lower layer (for replay) could also contain message subscriptions. If it does then after failover they get replayed and when the JMS layer tries to re-create the subscription it will fail if exclusive flag is set on the subscription. The JMS layer has sync flag set after creating a subscription, hence the broker would have sent the completion and the MessageSubscription command should have been removed from the queue. Therefore there might be an additional issue where the Java client may not be removing commands from the internal command array once it receives the completions. However if we modify the Java client to not store any AMQP commands other than message transfers we could easily prevent it from causing this issue and Bug 634794. Since the 0-10 client is not implementing full session resume, there is no advantage in replaying anything other than message transfers. This issue is tracked upstream via QPID-2876
Following is the proposed patch for this issue.
--- qpid/trunk/qpid/java/common/src/main/java/org/apache/qpid/transport/Session.java (original)
+++ qpid/trunk/qpid/java/common/src/main/java/org/apache/qpid/transport/Session.java Tue Sep 21 02:19:15 2010
@@ -645,7 +645,7 @@ public class Session extends SessionInvo
{
sessionCommandPoint(0, 0);
}
- if ((!closing && !m.isUnreliable()) || m.hasCompletionListener())
+ if ((!closing && m instanceof MessageTransfer) || m.hasCompletionListener())
{
commands[mod(next, commands.length)] = m;
commandBytes += m.getBodySize();
The patch ensures that we only store MessageTransfers
Created attachment 448766 [details]
Proposed fix as a patch against the 1.3.x release branch
The proposed patch contains the following changes
Session.java
=============
Changes:
Instead of storing any command thats marked reliable, we now only store message transfers.
The initial cherry-pick from trunk contained and additional boolean called isClosing within the if condition and is related to a different commit.
Therefore the subsequent patch removes the isClosing variable from the initial commit.
Risk :
Since we do not implement full session resume, there is no added advantage in replaying anything other than the message transfers. Therefore this change is low risk.
The patch was committed to the 1.3.x branch in the internal git repo. http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?id=98e26823a4054972f6f5cd3f0db51f144a9b3015 http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?id=361a50de9bb50c30bd0dc6ca41dc167666f170f0 These changes are included in the 7.946106-10 package set. fixed in qpid-cpp-server-0.7.946106-17 validated on RHEL5.5 i386 / x86_64 packages: # rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u openais-0.80.6-16.el5_5.7 openais-devel-0.80.6-16.el5_5.7 python-qpid-0.7.946106-14.el5 qpid-cpp-client-0.7.946106-17.el5 qpid-cpp-client-devel-0.7.946106-17.el5 qpid-cpp-client-devel-docs-0.7.946106-17.el5 qpid-cpp-client-ssl-0.7.946106-17.el5 qpid-cpp-mrg-debuginfo-0.7.946106-14.el5 qpid-cpp-server-0.7.946106-17.el5 qpid-cpp-server-cluster-0.7.946106-17.el5 qpid-cpp-server-devel-0.7.946106-17.el5 qpid-cpp-server-ssl-0.7.946106-17.el5 qpid-cpp-server-store-0.7.946106-17.el5 qpid-cpp-server-xml-0.7.946106-17.el5 qpid-java-client-0.7.946106-10.el5 qpid-java-common-0.7.946106-10.el5 qpid-tools-0.7.946106-11.el5 rhm-docs-0.7.946106-5.el5 rh-tests-distribution-MRG-Messaging-qpid_common-1.6-53 ->VERIFIED |