Bug 882243

Summary: Failover doesn't work properly with XA
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-javaAssignee: Weston M. Price <wprice>
Status: CLOSED ERRATA QA Contact: Valiantsina Hubeika <vhubeika>
Severity: urgent Docs Contact:
Priority: high    
Version: 2.0CC: cdewolf, esammons, gsim, iboverma, jross, lzhaldyb, mcressma, ppecka, tross, vhubeika
Target Milestone: 2.3Keywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-java-0.18-7 Doc Type: Bug Fix
Doc Text:
Cause: Messages sent under an XA transaction are replayed on failover. Consequence: Transaction atomicity is lost. Fix: Such messages are no longer replicated. Result: Transaction atomicity guarantees are honoured.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 18:53:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 917988    
Attachments:
Description Flags
Patch for XA/HA failover none

Description Gordon Sim 2012-11-30 13:41:29 UTC
Description of problem:

Atomicity of messages sent under XA may be lost on failover.

Version-Release number of selected component (if applicable):

qpid-java-client-0.18-2.el5
qpid-java-common-0.18-2.el5
qpid-java-example-0.18-2.el5
qpid-jca-0.18-2.el5
qpid-jca-xarecovery-0.18-2.el

How reproducible:

100%

Steps to Reproduce:
1. send a message under an XA transaction to a cluster
2. commit the transaction
3. kill the node connected to, triggering failover
  
Actual results:

The message that was sent under an XA transaction that was successfully completed is redelivered on reconnect.

Expected results:

No message redelivery for committed sends as this violates atomicity.

Additional info:

See https://issues.apache.org/jira/browse/QPID-2994 which was resolved for non-XA transaction, but from what I can make out does not address the case where XA transactions are used

Comment 2 Justin Ross 2012-12-06 15:39:03 UTC
Weston, please assess.

Comment 3 Weston M. Price 2012-12-13 15:13:43 UTC
Currently reviewing. This is an area that at the very least we need more testing to consistently reproduce effectively. However, I agree with Gordon's assessment, most likely something in the JMS client that is not being handled correctly.

Comment 4 Weston M. Price 2012-12-14 18:17:05 UTC
Note, one blocker on this is Gordon being on vacation being that he is the 'owner' or at least the expert on the DTX code.

Comment 9 Weston M. Price 2013-01-16 16:07:13 UTC
My environment:

Broker OS:
Linux carthage 3.6.7-4.fc16.x86_64 #1 SMP Tue Nov 20 20:33:31 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Broker Build:
[wmprice@carthage ~]$ qpid-install/sbin/qpidd -v
qpidd (qpidc) version 0.18

built from 0.18-mrg branch in internal git repo on mrg1

Store Build:
[wmprice@carthage qpid-store]$ svn info
Path: .
URL: http://anonsvn.jboss.org/repos/rhmessaging/store/branches/qpid-0.18
Repository Root: http://anonsvn.jboss.org/repos/rhmessaging
Repository UUID: 06e15bec-b515-0410-bef0-cc27a458cf48
Revision: 4530
Node Kind: directory
Schedule: normal
Last Changed Author: mcressman
Last Changed Rev: 4527
Last Changed Date: 2013-01-02 15:15:03 -0500 (Wed, 02 Jan 2013)

Qpid JMS/JCA Build:
0.18-mrg branch from our internal git repository


JEE Server:
EAP 5.1

In my setup, I am running two brokers on the same OS instance with different ports. Each broker has it's own data directory and do not share a store etc. The app server is running on a separate OS (OSX) independent of the broker hosts. 

I am using the 0.18 version of the JCA adapter, deploying the examples and running within EAP. 


Currently, when running in a cluster with XA, I am unable to reproduce this issue. However, this isn't saying much as there is no DTX* type information printed to the logs which is pretty confusing as within the debugger I can see the XA transaction complete successfully. 

The client does failover properly, but the messages sent to the previous node are not replayed. Again, I don't really trust this as I can't see any XA/DTX information in the logs at all so I am a bit miffed at this point.

At any rate, I have  repeatable environment that is automated to setup and run this scenario when Gordon returns.

Comment 10 Weston M. Price 2013-01-16 16:23:26 UTC
Adjust log settings and now DTX info is showing up correctly and issue becomes apparent right away.

Comment 11 Weston M. Price 2013-01-16 21:41:41 UTC
Actually, I am only seeing the following type of info the logs:

2013-01-16 15:18:58 [Broker] debug preparing: {Xid: format=131075; global-id=1--3f57fe9c:f13b:50f70b08:63; branch-id=-3f57fe9c:f13b:50f70b08:65; }
2013-01-16 15:19:04 [Broker] debug committing: {Xid: format=131075; global-id=1--3f57fe9c:f13b:50f70b08:63; branch-id=-3f57fe9c:f13b:50f70b08:65; }

I am not seeing any type of DtxSelect/DtxBegin/DtxEnd etc. I am not sure if something has changed within the Broker logging or if my settings are wrong. I am using:

 --log-enable trace+:Dtx --log-enable trace+:Protocol

I have tried various options to no avail. 

At any rate, I have also noticed that this issue seems to only occur when multiple XA resources are used within the same XA transaction. I am reviewing this further.

Comment 12 Weston M. Price 2013-01-18 21:54:46 UTC
Thanks to Rajith we have a patch. I applied and tested the fix both on trunk as well as our internal 0.18 branch. One minor modification was required to build against 0.18 so I am submitting a modified version of Rajith's patch if we need it. I will simply attach it to the BZ. 

All tests (unit, system and XA/HA failover with JCA) look good.

Comment 13 Weston M. Price 2013-01-18 21:56:26 UTC
Created attachment 682763 [details]
Patch for XA/HA failover

Patch for XA/HA failover issue for the 0.18-mrg internal branch.

Comment 16 ppecka 2013-02-28 13:23:54 UTC
VERIFIED 

qpid-java-client-0.18-6.el6.noarch 
qpid-java-common-0.18-6.el6.noarch
qpid-java-example-0.18-6.el6.noarch
qpid-jca-0.18-7.el6.noarch
qpid-jca-xarecovery-0.18-7.el6.noarch
qpid-jca-zip-0.18-7.el6.noarch

Comment 18 errata-xmlrpc 2013-03-06 18:53:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0561.html