Bug 1871926

Summary: [GSS](6.4.z) JBTM-3079 - InboundBridge recovery aborts live transactions
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Brad Maxwell <bmaxwell>
Component: JCAAssignee: jboss-set
Status: VERIFIED --- QA Contact: Peter Mackay <pmackay>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4.23CC: dosoudil, rstancel, thofman
Target Milestone: CR1   
Target Release: EAP 6.4.24   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1842551, 1871932, 1871928, 1885629    

Comment 1 Brad Maxwell 2020-08-24 15:46:34 UTC
During a recovery pass, the InboundBridgeRecoveryManager scans for subordinate XA branches that may need cleanup. The filtering process applied by checkXid correctly excludes tx that are not owned by the bridge, but fails to ignore those that are owned but also still live. Therefore, a race condition exists such that the recovery process may incorrectly abort a branch if invoked between the prepare and commit steps, resulting in data corruption relative to other committed branches from the parent i.e. Heuristic outcomes.

TRACE [org.jboss.jbossts.txbridge] (TaskWorker-3) BridgeDurableParticipant.prepare(Xid=< 131080, 35, 64, ... >)

TRACE [org.jboss.jbossts.txbridge] (Periodic Recovery) rolling back orphaned subordinate tx < 131080, 35, 64, ... >

ERROR [org.jboss.jbossts.txbridge] (TaskWorker-8) ARJUNA033004: commit on Xid=< 131080, 35, 64, ... > failed: javax.transaction.xa.XAException

It is necessary to enhance checkXid to validate against known live tx. The easiest way would seem to be to have the InboundBridgeManager singleton hold a lookup table of live tx, effectively a secondary index into its existing collection of InboundBridges, against which the recovery system can validate the in-doubt Xid.

Comment 6 Peter Mackay 2022-06-27 17:25:27 UTC
Verified with EAP 6.4.24.CP.CR1