Description of problem: When relying on the automatic reconnection/failover logic in the JMS client, transactional session do not behave correctly. Version-Release number of selected component (if applicable): 1.3 How reproducible: 100% Steps to Reproduce: 1. start transactional session on failover enabled connection 2. send batches of messages in transactions 3. kill the cluster node the client is connected to, to trigger failover mid transaction E.g. run attached test code; there should never by messages enqueued outside of a full batch of 100. In reality however you will see parts of a batch enqueued. Actual results: The messages published within a batch at the point the connection failsover appear to be replayed outside of any transaction. Expected results: The whole transaction that was interrupted by failover should be rolled back. Additional info:
Created attachment 471883 [details] test program
This is tracked in upstream via QPID-2994 A fix for this was committed in upstream at rev 1057460 in Qpid trunk. The fix was also ported into the internal release repo. http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=mrg_1.3.0.x&id=893b70bf87d1fab32db1f963c45d76b54d43e25f http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=mrg_1.3.0.x&id=f24b514f7172ff2687a64dab51da74a0905888bd http://mrg1.lab.bos.redhat.com/cgit/qpid.git/commit/?h=mrg_1.3.0.x&id=e114f074b353be8d6ed3a5565d2582b40e6d379c
The fix for this includes a simple check to see if the session is transactional when replaying message transfers after failover. The impact the above change has on the code base is minimal and affects only the resume function in the failover code.
The fixes have been incorporated into qpid-java-0.7.946106-14 for RHEL 4/5/6.
The behavior seems ok, test still ongoing. There is interesting off-by-one observation, in below log file of two first runs of the test you can see that before the failover is executed, there is N messages (300), but after the failover there is N+1 on the other node which is very interesting. Unfortunately I'm having hard time to analyze what's going on due to bug 672270. [root@dhcp-26-233 bz667428]# ./bz667428.sh 2 Firewall is stopped. Stopping OpenAIS daemon (aisexec): [ OK ] Starting OpenAIS daemon (aisexec): [ OK ] qpidd: no process killed [17:21:28] run(0) ----------------------------------------------------------begin- Cluster Name: bz667428_at_dhcp-26-233.brq.redhat.com Cluster Status: ACTIVE Cluster Size: 1 Members: ID=192.168.1.4:26998 URL=amqp:tcp:10.34.26.233:5672,tcp:192.168.1.4:5672 [17:21:30] Brokers up: Cluster Name: bz667428_at_dhcp-26-233.brq.redhat.com Cluster Status: ACTIVE Cluster Size: 2 Members: ID=192.168.1.4:26998 URL=amqp:tcp:10.34.26.233:5672,tcp:192.168.1.4:5672 : ID=192.168.1.4:27019 URL=amqp:tcp:10.34.26.233:10000,tcp:192.168.1.4:10000 tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 26998/qpidd tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27019/qpidd [17:21:35] Check whether the queue is empty: [17:21:36] Running tha client: [17:21:36] Preparing the moment for the failover: my-queUE msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 my-queUE msg|byteEnq/Deq:100/0|1282/0 unacked:0 consumer_cnt:0 binding_cnt:1 my-queUE msg|byteEnq/Deq:200/0|2782/0 unacked:0 consumer_cnt:0 binding_cnt:1 [17:21:42] 311 messages sent by client) ... going to kill [17:21:42] Broker 5672 going to be killed... [17:21:42] .killed [17:21:42] .process gone tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27019/qpidd my-queUE msg|byteEnq/Deq:301/0|4297/0 unacked:0 consumer_cnt:0 binding_cnt:1 [17:21:44] Waiting for client: my-queUE msg|byteEnq/Deq:401/0|5797/0 unacked:0 consumer_cnt:0 binding_cnt:1 my-queUE msg|byteEnq/Deq:501/0|7297/0 unacked:0 consumer_cnt:0 binding_cnt:1 [17:21:47] End broker check I: aisexec (pid 26973) is running... tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27019/qpidd [17:21:47] Kill all broker instances: [17:21:49] End broker check II: aisexec (pid 26973) is running... [17:21:49] run(0) ------------------------------------------------------------end- [17:21:49] run(1) ----------------------------------------------------------begin- Cluster Name: bz667428_at_dhcp-26-233.brq.redhat.com Cluster Status: ACTIVE Cluster Size: 1 Members: ID=192.168.1.4:27217 URL=amqp:tcp:10.34.26.233:5672,tcp:192.168.1.4:5672 [17:21:50] Brokers up: Cluster Name: bz667428_at_dhcp-26-233.brq.redhat.com Cluster Status: ACTIVE Cluster Size: 2 Members: ID=192.168.1.4:27217 URL=amqp:tcp:10.34.26.233:5672,tcp:192.168.1.4:5672 : ID=192.168.1.4:27238 URL=amqp:tcp:10.34.26.233:10000,tcp:192.168.1.4:10000 tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 27217/qpidd tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27238/qpidd [17:21:56] Check whether the queue is empty: [17:21:57] Running tha client: [17:21:57] Preparing the moment for the failover: my-queUE msg|byteEnq/Deq:0/0|0/0 unacked:0 consumer_cnt:0 binding_cnt:1 my-queUE msg|byteEnq/Deq:100/0|1282/0 unacked:0 consumer_cnt:0 binding_cnt:1 my-queUE msg|byteEnq/Deq:200/0|2782/0 unacked:0 consumer_cnt:0 binding_cnt:1 my-queUE msg|byteEnq/Deq:300/0|4282/0 unacked:0 consumer_cnt:0 binding_cnt:1 [17:22:03] 362 messages sent by client) ... going to kill [17:22:03] Broker 5672 going to be killed... [17:22:03] .killed [17:22:03] .process gone tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 27217/qpidd tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27238/qpidd my-queUE msg|byteEnq/Deq:301/0|4297/0 unacked:0 consumer_cnt:0 binding_cnt:1 [17:22:04] Waiting for client: my-queUE msg|byteEnq/Deq:401/0|5797/0 unacked:0 consumer_cnt:0 binding_cnt:1 my-queUE msg|byteEnq/Deq:601/0|8797/0 unacked:0 consumer_cnt:0 binding_cnt:1 [17:22:08] End broker check I: aisexec (pid 26973) is running... tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27238/qpidd [17:22:08] Kill all broker instances: [17:22:10] End broker check II: aisexec (pid 26973) is running... [17:22:10] run(1) ------------------------------------------------------------end-
The off-by-one problem is now uncovered. The current behavior is very close to the expected one, but I can see that transactional java sender (which send 1K messages, batched in 10 commits a 100 messages) send reliably 1001 messages, where the unexpected message is duplicate message which is processed at the failover moment. I believe, correct me if I'm wrong that this behavior is acceptable when sender is not transactional, but when we have transactional sender which should commit messages in batches, we should receive EXACTLY 1K messages and the queue sizes should increase in EXACTLY 100 (message count) steps: 0 100 200 ... 800 900 1000 The behavior I can see is (again queue message size): 0 100 200 300 .failover event 301 401 ... 901 1001 Looking at the received messages (failover occurred in 4th batch (300-400)): ... Message(id=UUID('69a66d5f-ed73-3ec3-b56b-c397d67a3814'), user_id='guest', priority=4, durable=True, properties={'x-amqp-0-10.content-encodin g': u'UTF-8', 'x-amqp-0-10.routing-key': u'my-queUE'}, content=u'message-300-299') Message(id=UUID('a73496b3-57d7-3ef5-8300-38a7fe11ef67'), user_id='guest', priority=4, durable=True, properties={'x-amqp-0-10.content-encodin g': u'UTF-8', 'x-amqp-0-10.routing-key': u'my-queUE'}, content=u'message-363-362') Message(id=UUID('7f8e857a-3750-3a02-baea-7494076fe7e0'), user_id='guest', priority=4, durable=True, properties={'x-amqp-0-10.content-encodin g': u'UTF-8', 'x-amqp-0-10.routing-key': u'my-queUE'}, content=u'message-301-300') Message(id=UUID('7d90c590-108c-3b9b-ae7c-ee9562337f99'), user_id='guest', priority=4, durable=True, properties={'x-amqp-0-10.content-encodin g': u'UTF-8', 'x-amqp-0-10.routing-key': u'my-queUE'}, content=u'message-302-301') ... Message(id=UUID('a3fee323-18ac-3095-a1f4-c23f01a55c13'), user_id='guest', priority=4, durable=True, properties={'x-amqp-0-10.content-encodin g': u'UTF-8', 'x-amqp-0-10.routing-key': u'my-queUE'}, content=u'message-363-362') ... As seen message 'message-363-362' is duped and receiver can receive two variants with different message UUID. I believe this behavior is incorrect.
Created attachment 475127 [details] The issue reproducer and logs This is the script which reproduces the off-by-one problem. Dependencies: openais configured and running. java-openjdk 1.6 installed main + devel qpid-java-* pkgs installed How to run: ./bz667428.sh 2 &> log After it finishes you should see two runs of the test and you should see in log the whole test process including received messages. Please verify that failover happens during the run of java transactional sender (grep org.apache.qpid.client.JMSAMQException cli.*.log) if not adjust fixed wait duratiuon (4) at line bz667428.sh:84: 84 dur=$((4+(${RANDOM}%2))) This reproducer also sometimes triggers another qpid java client problem at the end of send operation (an exception), this will be analyzed and possibly raised as separate problem.
The observation was discussed and judged as incorrect. -> ASSIGNED
Created attachment 475632 [details] Bug analysis and logs to support it. Attached are the logs/tools used for the following analysis. Root cause ------------ This issue is not related to transactions. It's general failover bug ! However this is only easy to detect when using transactions. The extra message(s*) that 'sneak in' are a result of the Java client sending message transfer(s*) after failover but before the new session is attached & resumed (and TxSelect being sent if transactions are used). During this window (until resumed) the session is detached. Therefore, (1) The Java client should not be sending any messages while the session is detached. i.e until attached and resumed. (2) Further, if transactional then the client should not be sending any messages until the session is marked transactional via TxSelect. Please note that the messages that sneak in during the above window are not due to any replay. If transactional there is no replay. If non transactional there is replay, but that only happens after the session is attached and during 'resume'. Therefore these message transfer(s) are actually sent by the application via the jms-producer's "send" method during the above window. So how many messages are sent while the session is detached (or outside of the transaction boundaries if transactional) is based on the race btw the following events. 1. In the case of transactions, the jms layer figuring out that failover happened and the session is dirty, hence notifying the application via an exception. The app stops sending messages at this point. (*) If not transactional the application does provide transparent failover. That is the client is unaware of the failover and continues to produce messages. Therefore some of these messages end up being sent while the session is actually 'detached'. 2. The session being attached and resumed (After failover and connection being recreated). 3. The TxSelect being sent (In the case of transactions). It's also worth investigating as to why the broker is allowing messages sent via a detached session. Again it could be due a race between, 1. The new broker (B) node detecting that the other (A) node failed. 2. The client detecting broker-A is no longer available and successfully reconnecting to broker-B. Supporting evidence. ---------------------- 1. Client side. From cli.0.log - you can clearly see from the following snippet that the message transfer is being sent before 'attach' and 'resume' (and certainly before 'TxSelect') is sent. Sending message: message-445-444 time: 1296138359606 abs_count: 445 ==================== State (in failing over) DETACHED Transfer ch=0 MessageTransfer(destination=, acceptMode=NONE, acquireMode=PRE_ACQUIRED) DeliveryProperties(priority=MEDIUM, deliveryMode=PERSISTENT, timestamp=1296138359622, exchange=, routingKey=my-queUE) MessageProperties(contentLength=0, messageId=eab70ca3-0df5-3829-a163-0cd0bf719901, contentType=text/plain, contentEncoding=UTF-8, userId=[B@7ab7b3f9, applicationHeaders={time=1296138359622, abs_count=446}) body="message-446-445" Unreliable false ==================== ================================== Called attach 1296138359681 ================================== ================================== Calling resume 1296138359681 ================================== Sending message: message-446-445 time: 1296138359622 abs_count: 446 org.apache.qpid.client.JMSAMQException: Failover has occurred and session is dirty so unable to send. at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:484) at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:456) at org.apache.qpid.client.BasicMessageProducer.send(BasicMessageProducer.java:283) at TxSender.main(TxSender.java:40) Caused by: org.apache.qpid.client.AMQSessionDirtyException: Failover has occurred and session is dirty so unable to send. [error code 506: resource error] ... 4 more Creating new session; transaction rolledback due to exception: org.apache.qpid.client.JMSAMQException: Failover has occurred and session is dirty so unable to send. Failover happened at : 1296138359732 ================================== Called attach 1296138359733 ================================== Sending message: message-401-400 time: 1296138359766 abs_count: 448 Notes * Looking at the timestamps you can clearly see that "message-446-445" was sent before attach and resumed. This is logged in the 'invoke' method in Session.java class which also contains 'attach' and 'resume' methods, hence at the same layer. * The state at the time the transfer is sent is "DETACHED". * "Failover happened at : 1296138359732" is when the application got notified of the exception. This time stamp is very important bcos, any message in the 400-499 range, sent before this timestamp, should NOT be in the queue. All messages in the 400-499 range that were sent before failover while that transaction was in progress, should have been discarded as the transaction should have been aborted. * However looking at client side logs obtained via "DrainQueue.java" and captured in 'log' file shows that "message-446-445" has indeed made onto the queue. The timestamp on the message is earlier than the failover timestamp. message-446-445 : 1296138359622 : 446 : ID:eab70ca3-0df5-3829-a163-0cd0bf719901 2. Broker side logs 1. If you grep for "message-446-445" in qpidd.5672.0.log, you cannot see it. However if you grep for it in qpidd2.10000.0.log you can clearly see that it's present. 2. If you look at the events that preceded this transfer and the events that followed it, you can clear see that this transfer was sent after failover but before 'attach', 'resume' and 'txselect' was sent.
A key point to note when analysing the broker logs is that Message '446-445' was sent from "guest" while Message '447-446' was sent from "guest". It is possible that the previous session still remains valid on the surviving broker due to, 1) Not having detected the failed broker node, hence believing that session is still there. 2) Some sort of timeout associated with figuring out detached sessions.
The above log snippet I have pasted is indeed from two different sessions. The test program does create a new session when it receives an exception saying the transaction has been rolled back. That is why you see two different sessions. However shouldn't the broker mark the previous session as detached ? Perhaps there is a timeout associated with it? From client side logs you could see that the transfer is sent while the session is detached and that "attach" and "resume" as sent after that. The following log snippet from another test run (which now includes the sessions ids) proves it. ==================== State (in failing over) DETACHED id "a6e806fc-1793-4775-966d-d8566ca31990" Transfer ch=0 MessageTransfer(destination=, acceptMode=NONE, acquireMode=PRE_ACQUIRED) DeliveryProperties(priority=MEDIUM, deliveryMode=PERSISTENT, timestamp=1296161668840, exchange=, routingKey=my-queUE) MessageProperties(contentLength=0, messageId=d14a226a-6137-36eb-abff-77db4a9991d5, contentType=text/plain, contentEncoding=UTF-8, userId=[B@42ef83d3, applicationHeaders={time=1296161668840, abs_count=353}) body="message-353-352" Unreliable false ==================== ================================== Called attach for "a6e806fc-1793-4775-966d-d8566ca31990" at 1296161668951 ================================== ================================== Calling resume for "a6e806fc-1793-4775-966d-d8566ca31990" at 1296161668951 ================================== Sending message: message-353-352 time: 1296161668840 abs_count: 353 org.apache.qpid.client.JMSAMQException: Failover has occurred and session is dirty so unable to send. at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:484) at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:456) at org.apache.qpid.client.BasicMessageProducer.send(BasicMessageProducer.java:283) at TxSender.main(TxSender.java:40) Caused by: org.apache.qpid.client.AMQSessionDirtyException: Failover has occurred and session is dirty so unable to send. [error code 506: resource error] ... 4 more Creating new session; transaction rolledback due to exception: org.apache.qpid.client.JMSAMQException: Failover has occurred and session is dirty so unable to send. Failover happened at : 1296161669052 ================================== Called attach for "3fde99a9-819a-494e-8b64-32722d7c8a51" at 1296161669053 ================================== * The above logs clearly show that message "353-352" was sent while session "a6e806fc-1793-4775-966d-d8566ca31990" was in a detached state and that the "attach" and resume" for that session were sent only after the transfer was sent. * The other attach command for session "3fde99a9-819a-494e-8b64-32722d7c8a51" is due to the application creating a new session after receiving the exception.
The message transfer is not sent before the attach/resume as stated previously. All though the timestamp was taken at the time 'invoke' (in Session.java) is called, the method actually gets blocked until the session is attached and resumed. However the txSelect is still sent after the message transfer, therefore the message is enqueued outside of any transaction. One simple solution to the problem is to not attach/resume a transacted session and instead close the session. Since we don't provide transparent failover for transactions this *may* be an acceptable solution. For this particular test case it is infact acceptable as it creates a new session when it receives an exception while a transaction is in progress. But if we allow a transacted session to be resumed after failover then we would need to ensure no transfers are sent until txSelect is sent.
The underlying root cause is as follows. 1. When a message-transfer reaches the invoke method in Session.java and if the session-state is detached at that time, the thread waits until the session is OPEN or CLOSED. 2. If failover completes within the wait period and the session is resumed, thereby being marked OPEN and the message transfer in progress just resumes and reaches the broker. 3. At this point the session is still not marked transactional (and there is no logic in place to ever issue a txSelect after failover as well) so the message is enqueued. 4. In the meantime the JMS session used by the application gets to know that failover happens and is marked dirty and an exception is received. 5. If the application chooses to resume the session (ignoring the exception) then subsequent message transfers will get to the queue on the broker but the session will get closed once it sends a commit (or a rollback) as the broker will complain that the session is not transactional. 6. If the application chooses to create a new session then it will start sending sub sequent messages within transaction boundaries and work as expected. But will still have that extra one or two messages that sneaked in when the old session was reopned. If the application retired the aborted transaction then it will result in duplicates due to the messages that sneaked in. A reasonable solution to this issue is to, 1) Close a session marked transactional immediately when the session detaches. i.e a transactional session is never resumed and a new session should be created to continue. 2) We also need to document that clearly. This issue is tracked in upstream via https://issues.apache.org/jira/browse/QPID-2994 There were two other race conditions found during investigation for this issue. https://issues.apache.org/jira/browse/QPID-3042 https://issues.apache.org/jira/browse/QPID-3043 I am also descoping this issue from 1.3.2 and pushing it to 2.0 release.
For QPID-2994 the following fixes were made upstream. 1. http://svn.apache.org/viewvc?view=rev&rev=1057460 2. http://svn.apache.org/viewvc?view=rev&rev=1057927 3. http://svn.apache.org/viewvc?view=rev&rev=1068661 4. http://svn.apache.org/viewvc?view=rev&rev=1069105 For QPID-3042 the following fixes were made upstream. 1. http://svn.apache.org/viewvc?view=rev&rev=1068696 For QPID-3043 the following fixes were made upstream. 1. http://svn.apache.org/viewvc?view=rev&rev=1069159 2. http://svn.apache.org/viewvc?view=rev&rev=1069858
The fixes are now included in rpms from version qpid-java-0.9.1080013-2
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Transaction atomicity is violated by transparent failover. Consequence: Messages may be enqueued outside of the transaction boundaries for transactions that were in progress during a transparent failover. Fix: Transacted sessions will be closed and any in progress transactions will be rolled back once the client is aware that failover has happened. An exception will be thrown to back the application. Result: If an application is using a transacted session it will receive an error if failover has happened. If there was any transaction in progress, it will be rolled back. The application then needs to re-create a new session to resume it's operations.
The issue has been fixed, stress-tested on RHEL 5.6 / 6.1s5 i[36]86 / x86_64 on packages: python-qpid-0.10-1.el5.noarch python-qpid-qmf-0.10-6.el5.i386 qpid-cpp-client-0.10-4.el5.i386 qpid-cpp-client-devel-0.10-4.el5.i386 qpid-cpp-client-devel-docs-0.10-4.el5.i386 qpid-cpp-client-rdma-0.10-4.el5.i386 qpid-cpp-client-ssl-0.10-4.el5.i386 qpid-cpp-mrg-debuginfo-0.10-4.el5.i386 qpid-cpp-server-0.10-4.el5.i386 qpid-cpp-server-cluster-0.10-4.el5.i386 qpid-cpp-server-devel-0.10-4.el5.i386 qpid-cpp-server-rdma-0.10-4.el5.i386 qpid-cpp-server-ssl-0.10-4.el5.i386 qpid-cpp-server-store-0.10-4.el5.i386 qpid-cpp-server-xml-0.10-4.el5.i386 qpid-dotnet-0.4.738274-2.el5.i386 qpid-java-client-0.10-4.el5.noarch qpid-java-common-0.10-4.el5.noarch qpid-java-example-0.10-4.el5.noarch qpid-java-jca-0.10-1.el5.noarch qpid-qmf-0.10-6.el5.i386 qpid-qmf-debuginfo-0.10-6.el5.i386 qpid-qmf-devel-0.10-6.el5.i386 qpid-tests-0.10-1.el5.noarch qpid-tools-0.10-4.el5.noarch rh-qpid-cpp-tests-0.10-4.el5.i386 ruby-qpid-qmf-0.10-6.el5.i386 sesame-0.10-1.el5.i386 sesame-debuginfo-0.10-1.el5.i386 -> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0890.html