Bug 882243 - Failover doesn't work properly with XA
Failover doesn't work properly with XA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-java (Show other bugs)
Unspecified Unspecified
high Severity urgent
: 2.3
: ---
Assigned To: Weston M. Price
: Patch
Depends On:
Blocks: 917988
  Show dependency treegraph
Reported: 2012-11-30 08:41 EST by Gordon Sim
Modified: 2016-02-21 19:59 EST (History)
10 users (show)

See Also:
Fixed In Version: qpid-java-0.18-7
Doc Type: Bug Fix
Doc Text:
Cause: Messages sent under an XA transaction are replayed on failover. Consequence: Transaction atomicity is lost. Fix: Such messages are no longer replicated. Result: Transaction atomicity guarantees are honoured.
Story Points: ---
Clone Of:
Last Closed: 2013-03-06 13:53:03 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Patch for XA/HA failover (8.20 KB, application/octet-stream)
2013-01-18 16:56 EST, Weston M. Price
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
Apache JIRA QPID-4541 None None None Never

  None (edit)
Description Gordon Sim 2012-11-30 08:41:29 EST
Description of problem:

Atomicity of messages sent under XA may be lost on failover.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. send a message under an XA transaction to a cluster
2. commit the transaction
3. kill the node connected to, triggering failover
Actual results:

The message that was sent under an XA transaction that was successfully completed is redelivered on reconnect.

Expected results:

No message redelivery for committed sends as this violates atomicity.

Additional info:

See https://issues.apache.org/jira/browse/QPID-2994 which was resolved for non-XA transaction, but from what I can make out does not address the case where XA transactions are used
Comment 2 Justin Ross 2012-12-06 10:39:03 EST
Weston, please assess.
Comment 3 Weston M. Price 2012-12-13 10:13:43 EST
Currently reviewing. This is an area that at the very least we need more testing to consistently reproduce effectively. However, I agree with Gordon's assessment, most likely something in the JMS client that is not being handled correctly.
Comment 4 Weston M. Price 2012-12-14 13:17:05 EST
Note, one blocker on this is Gordon being on vacation being that he is the 'owner' or at least the expert on the DTX code.
Comment 9 Weston M. Price 2013-01-16 11:07:13 EST
My environment:

Broker OS:
Linux carthage 3.6.7-4.fc16.x86_64 #1 SMP Tue Nov 20 20:33:31 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Broker Build:
[wmprice@carthage ~]$ qpid-install/sbin/qpidd -v
qpidd (qpidc) version 0.18

built from 0.18-mrg branch in internal git repo on mrg1

Store Build:
[wmprice@carthage qpid-store]$ svn info
Path: .
URL: http://anonsvn.jboss.org/repos/rhmessaging/store/branches/qpid-0.18
Repository Root: http://anonsvn.jboss.org/repos/rhmessaging
Repository UUID: 06e15bec-b515-0410-bef0-cc27a458cf48
Revision: 4530
Node Kind: directory
Schedule: normal
Last Changed Author: mcressman
Last Changed Rev: 4527
Last Changed Date: 2013-01-02 15:15:03 -0500 (Wed, 02 Jan 2013)

Qpid JMS/JCA Build:
0.18-mrg branch from our internal git repository

JEE Server:
EAP 5.1

In my setup, I am running two brokers on the same OS instance with different ports. Each broker has it's own data directory and do not share a store etc. The app server is running on a separate OS (OSX) independent of the broker hosts. 

I am using the 0.18 version of the JCA adapter, deploying the examples and running within EAP. 

Currently, when running in a cluster with XA, I am unable to reproduce this issue. However, this isn't saying much as there is no DTX* type information printed to the logs which is pretty confusing as within the debugger I can see the XA transaction complete successfully. 

The client does failover properly, but the messages sent to the previous node are not replayed. Again, I don't really trust this as I can't see any XA/DTX information in the logs at all so I am a bit miffed at this point.

At any rate, I have  repeatable environment that is automated to setup and run this scenario when Gordon returns.
Comment 10 Weston M. Price 2013-01-16 11:23:26 EST
Adjust log settings and now DTX info is showing up correctly and issue becomes apparent right away.
Comment 11 Weston M. Price 2013-01-16 16:41:41 EST
Actually, I am only seeing the following type of info the logs:

2013-01-16 15:18:58 [Broker] debug preparing: {Xid: format=131075; global-id=1--3f57fe9c:f13b:50f70b08:63; branch-id=-3f57fe9c:f13b:50f70b08:65; }
2013-01-16 15:19:04 [Broker] debug committing: {Xid: format=131075; global-id=1--3f57fe9c:f13b:50f70b08:63; branch-id=-3f57fe9c:f13b:50f70b08:65; }

I am not seeing any type of DtxSelect/DtxBegin/DtxEnd etc. I am not sure if something has changed within the Broker logging or if my settings are wrong. I am using:

 --log-enable trace+:Dtx --log-enable trace+:Protocol

I have tried various options to no avail. 

At any rate, I have also noticed that this issue seems to only occur when multiple XA resources are used within the same XA transaction. I am reviewing this further.
Comment 12 Weston M. Price 2013-01-18 16:54:46 EST
Thanks to Rajith we have a patch. I applied and tested the fix both on trunk as well as our internal 0.18 branch. One minor modification was required to build against 0.18 so I am submitting a modified version of Rajith's patch if we need it. I will simply attach it to the BZ. 

All tests (unit, system and XA/HA failover with JCA) look good.
Comment 13 Weston M. Price 2013-01-18 16:56:26 EST
Created attachment 682763 [details]
Patch for XA/HA failover

Patch for XA/HA failover issue for the 0.18-mrg internal branch.
Comment 16 ppecka 2013-02-28 08:23:54 EST

Comment 18 errata-xmlrpc 2013-03-06 13:53:03 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.