Bug 882243 - Failover doesn't work properly with XA
Summary: Failover doesn't work properly with XA
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-java
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: 2.3
: ---
Assignee: Weston M. Price
QA Contact: Valiantsina Hubeika
URL:
Whiteboard:
Depends On:
Blocks: 917988
TreeView+ depends on / blocked
 
Reported: 2012-11-30 13:41 UTC by Gordon Sim
Modified: 2018-11-30 20:34 UTC (History)
10 users (show)

Fixed In Version: qpid-java-0.18-7
Doc Type: Bug Fix
Doc Text:
Cause: Messages sent under an XA transaction are replayed on failover. Consequence: Transaction atomicity is lost. Fix: Such messages are no longer replicated. Result: Transaction atomicity guarantees are honoured.
Clone Of:
Environment:
Last Closed: 2013-03-06 18:53:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch for XA/HA failover (8.20 KB, application/octet-stream)
2013-01-18 21:56 UTC, Weston M. Price
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Apache JIRA QPID-4541 0 None None None Never
Red Hat Product Errata RHSA-2013:0561 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging 2.3 security update 2013-03-06 23:48:13 UTC

Description Gordon Sim 2012-11-30 13:41:29 UTC
Description of problem:

Atomicity of messages sent under XA may be lost on failover.

Version-Release number of selected component (if applicable):

qpid-java-client-0.18-2.el5
qpid-java-common-0.18-2.el5
qpid-java-example-0.18-2.el5
qpid-jca-0.18-2.el5
qpid-jca-xarecovery-0.18-2.el

How reproducible:

100%

Steps to Reproduce:
1. send a message under an XA transaction to a cluster
2. commit the transaction
3. kill the node connected to, triggering failover
  
Actual results:

The message that was sent under an XA transaction that was successfully completed is redelivered on reconnect.

Expected results:

No message redelivery for committed sends as this violates atomicity.

Additional info:

See https://issues.apache.org/jira/browse/QPID-2994 which was resolved for non-XA transaction, but from what I can make out does not address the case where XA transactions are used

Comment 2 Justin Ross 2012-12-06 15:39:03 UTC
Weston, please assess.

Comment 3 Weston M. Price 2012-12-13 15:13:43 UTC
Currently reviewing. This is an area that at the very least we need more testing to consistently reproduce effectively. However, I agree with Gordon's assessment, most likely something in the JMS client that is not being handled correctly.

Comment 4 Weston M. Price 2012-12-14 18:17:05 UTC
Note, one blocker on this is Gordon being on vacation being that he is the 'owner' or at least the expert on the DTX code.

Comment 9 Weston M. Price 2013-01-16 16:07:13 UTC
My environment:

Broker OS:
Linux carthage 3.6.7-4.fc16.x86_64 #1 SMP Tue Nov 20 20:33:31 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Broker Build:
[wmprice@carthage ~]$ qpid-install/sbin/qpidd -v
qpidd (qpidc) version 0.18

built from 0.18-mrg branch in internal git repo on mrg1

Store Build:
[wmprice@carthage qpid-store]$ svn info
Path: .
URL: http://anonsvn.jboss.org/repos/rhmessaging/store/branches/qpid-0.18
Repository Root: http://anonsvn.jboss.org/repos/rhmessaging
Repository UUID: 06e15bec-b515-0410-bef0-cc27a458cf48
Revision: 4530
Node Kind: directory
Schedule: normal
Last Changed Author: mcressman
Last Changed Rev: 4527
Last Changed Date: 2013-01-02 15:15:03 -0500 (Wed, 02 Jan 2013)

Qpid JMS/JCA Build:
0.18-mrg branch from our internal git repository


JEE Server:
EAP 5.1

In my setup, I am running two brokers on the same OS instance with different ports. Each broker has it's own data directory and do not share a store etc. The app server is running on a separate OS (OSX) independent of the broker hosts. 

I am using the 0.18 version of the JCA adapter, deploying the examples and running within EAP. 


Currently, when running in a cluster with XA, I am unable to reproduce this issue. However, this isn't saying much as there is no DTX* type information printed to the logs which is pretty confusing as within the debugger I can see the XA transaction complete successfully. 

The client does failover properly, but the messages sent to the previous node are not replayed. Again, I don't really trust this as I can't see any XA/DTX information in the logs at all so I am a bit miffed at this point.

At any rate, I have  repeatable environment that is automated to setup and run this scenario when Gordon returns.

Comment 10 Weston M. Price 2013-01-16 16:23:26 UTC
Adjust log settings and now DTX info is showing up correctly and issue becomes apparent right away.

Comment 11 Weston M. Price 2013-01-16 21:41:41 UTC
Actually, I am only seeing the following type of info the logs:

2013-01-16 15:18:58 [Broker] debug preparing: {Xid: format=131075; global-id=1--3f57fe9c:f13b:50f70b08:63; branch-id=-3f57fe9c:f13b:50f70b08:65; }
2013-01-16 15:19:04 [Broker] debug committing: {Xid: format=131075; global-id=1--3f57fe9c:f13b:50f70b08:63; branch-id=-3f57fe9c:f13b:50f70b08:65; }

I am not seeing any type of DtxSelect/DtxBegin/DtxEnd etc. I am not sure if something has changed within the Broker logging or if my settings are wrong. I am using:

 --log-enable trace+:Dtx --log-enable trace+:Protocol

I have tried various options to no avail. 

At any rate, I have also noticed that this issue seems to only occur when multiple XA resources are used within the same XA transaction. I am reviewing this further.

Comment 12 Weston M. Price 2013-01-18 21:54:46 UTC
Thanks to Rajith we have a patch. I applied and tested the fix both on trunk as well as our internal 0.18 branch. One minor modification was required to build against 0.18 so I am submitting a modified version of Rajith's patch if we need it. I will simply attach it to the BZ. 

All tests (unit, system and XA/HA failover with JCA) look good.

Comment 13 Weston M. Price 2013-01-18 21:56:26 UTC
Created attachment 682763 [details]
Patch for XA/HA failover

Patch for XA/HA failover issue for the 0.18-mrg internal branch.

Comment 16 ppecka 2013-02-28 13:23:54 UTC
VERIFIED 

qpid-java-client-0.18-6.el6.noarch 
qpid-java-common-0.18-6.el6.noarch
qpid-java-example-0.18-6.el6.noarch
qpid-jca-0.18-7.el6.noarch
qpid-jca-xarecovery-0.18-7.el6.noarch
qpid-jca-zip-0.18-7.el6.noarch

Comment 18 errata-xmlrpc 2013-03-06 18:53:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0561.html


Note You need to log in before you can comment on or make changes to this bug.