Created attachment 878140 [details]
I'm meeting error when jts transaction is used in scenario when connection to database is halted during commit phase. When connection is down then jdbc driver throws XAException. In case that it's XAException.XAER_RMERR then HeuristicMixedException is returned to client when previous resource is already commited. Then transaction log should contains record with status HEURISTIC on this resource.
a. enlistment test xa resource
b. enlistment jdbc xa resource
c. prepare test xa resource
d. prepare jdbc xa resource
e. killing connection to database
f. commit test xa resource
g. commit jdbc xa resource -> failing as connection is down
g-a. jdbc xa resource returns XAException.XAER_RMERR
g-b. BasicAction::doCommit() - TwoPhaseOutcome.HEURISTIC_ROLLBACK
h. start the connection to database
i. start recovery
- before recovery: arjuna tx log contains a record with status
"HEURISTIC", prepared record in db log
- after: arjuna tx log contains a record with status "PREPARED",
prepared record in db log
It seems that JTS implementation somelike does not refresh the log
store. When I put :probe() operation before the start of recovery then
the JTS behave in the same way as JTA and it leaves the state on
Tom's reply on my question about this issue:
This looks like a bug for JTS, not likely a bad one as I would guess the true state of the transaction is still recorded as a heuristic but not presented correctly.
Created attachment 878141 [details]
This could be reproduced by test case of crash recovery testsuite. The test has to be run with database different from Oracle as Oracle returns XAER_RMFAIL and the resource is commited at the end.
git clone -b lrco git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-transactions.git
mvn clean verify -Dtest=JPAProxyCrashRecoveryTestCase#commitHaltRev -Dno.cleanup.at.teardown -Djbossts.noJTA -Dds.properties=path/to/ds.properties
Created attachment 892926 [details]
server.log from new test run on EAP 6.3.0.ER3
I did a try how is it with this issue for EAP 6.3.0.ER3 and I found several things:
It seems that probeLog is not necessary anymore. When recovery is called twice then the HEURISTIC type of exception is correctly shown in the store.
But there is one a bit more serious consequence which can cause the data inconsistency.
When the recovery is called several times the resource (in this case postgresql database) seems to return (time 15:50:14,569, adding the server.log as attachment)
XAResourceRecord.recover got status: CosTransactions::StatusRolledBack
and data is rolled back. But the test Test XA resource is already committed. This seems being wrong.
Other thing is that the transaction record is put under AssumedCompleteTransaction. The transaction log directory structure looks like
│ └── XAResourceRecord
│ └── FactoryContact
│ └── 0_ffff7f000001_-4924aa24_5368e7f9_13
│ └── 0_ffff52e38d0c_c91_4140398c_0
In fact the original issue with necessity of calling probeLog seems to disappear.
But the issue which is happening in the same scenario seems to be more serious. So I'm putting the severity of issue to high.
Created attachment 914097 [details]
I did a bit more invstigation on this issue and I think that it's connected with bz#1080035. The both does have the same syndromes - TM tries to commit several time and then gives up and do rollback. Maybe it's duplicate of the 1080035. Just this happens in different test scenario.
After the restart of the server, TM tries to do commit but despite the fact that there is three calls of the phase2commit and stuff the real work is not proceeded.
Then after some time rollback is launched by ResourceCompletor.rollback()
I'm not sure whether ResourceCompletor is triggered by resource timeout (in this case Oracle) that Oracle clean the XA after some time and this time elapsed after third recovery and fourth already clean the log by rolling back the txn. Or wherther there is some issue in TM that it would rollback transaction before it could be committed as it's expected.
Nevertheless the result is that data is in inconsitent state. The first resource was committed but the second one was rollbacked and more user will not get any useful information that it happened.
I want to point out once again that the original issue with 'log is not refreshed' disappeared.
Not it's about consistency of data amongst resources belong to the same transaction.
Based on XA specification, if XAER_RMERR is returned during the commit, it means that the error occurred while committing the branch and its work was rolled back.
User is informed about this with HeuristicMixedException as well as a transaction log kept in the store.
It seems that the behaviour is correct and the assertions at the end of test should be changed.
p.s. workflow of this test will change a little bit once fix for 1080035 is merged. However, with it or without, outcome is the same.
Also, it might be worth contacting Postgres about this. In a scenario as tested here, XAER_RMFAIL is a more appropriate response.
I'm sorry for being a bit late with my response.
I see your point but, please, let me put here few arguments seen from my side.
1. first "strange" thing for me is that everything works fine with JTA. Just JTS does not work fine when XAER_RMERR is returned
2. The problematic XAER_RMERR exception code is returned for datases: PostgreSQL, MSSQL and Sybase (+MySQL behaves in similar way but "incorrectly" returns XAER_RMFAIL)
3. I can't find information that XAER_RMERR would order to force rollback in spec. But I can be searching wrong. In fact the database does not do any rollback and it's TM that order to process rollback after some time when everything is already up and working.
I understand that user is "warned" by HeuristicMixedException about inconsistency. But in this case RM does not do any heuristic decision and rollback was not done. The "heuristic decision" is done by recovery of JTS in third run of recovery.
This seems to me as changing in-doubt state to inconsistent state.
I would like add one more point. Changing return code to XAER_RMFAIL is (at least what I understand from txn workflow) not the only change that is needed to be done. In such case it would be just a "simple" change in jdbc driver.
But if driver returns XAER_RMFAIL then it's its responsibility to deliver commit after connection is renewed.
What I saw from our test TM does not care about resources that returns RMFAIL. TM expects that resources manage the troubles on their own in that case.
If RMERR is returned then TM preserves log record for its purposes and user then could react on such situation.
Verified on revision EAP 6.4.0.DR11
Now XAResourceRecord.recover gets status: "CosTransactions.HeuristicRollback".
And transaction is moved to "TwoPhaseCoordinator/ArjunaTransactionImple/AssumedCompleteHeuristicTransaction"
For verifying the JTS transaction participant's status new BZ is created: bz#1168973 as JTS participants are not showing up in the tooling.
Tom Jenkinson <email@example.com> updated the status of jira JBTM-2274 to Closed