Bug 1080140 - JTS transaction log record type PREPARED is not changed to HEURISTIC after connection to db is restored.
Summary: JTS transaction log record type PREPARED is not changed to HEURISTIC after co...
Status: VERIFIED
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Transaction Manager
Version: 6.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: DR11
: EAP 6.4.0
Assignee: Gytis Trikleris
QA Contact: Hayk Hovsepyan
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 1165728
TreeView+ depends on / blocked
 
Reported: 2014-03-24 18:21 UTC by Ondrej Chaloupka
Modified: 2018-12-07 22:33 UTC (History)
5 users (show)

(edit)
Clone Of:
(edit)
Last Closed:


Attachments (Terms of Use)
server.log (1.56 MB, text/x-log)
2014-03-24 18:21 UTC, Ondrej Chaloupka
no flags Details
ds.properties file (375 bytes, text/plain)
2014-03-24 18:23 UTC, Ondrej Chaloupka
no flags Details
server.log from new test run on EAP 6.3.0.ER3 (1.49 MB, text/x-log)
2014-05-06 14:59 UTC, Ondrej Chaloupka
no flags Details
PAProxyCrashRecoveryTestCase_commitHaltRev_jts_server.log.html (2.74 MB, text/html)
2014-07-02 10:48 UTC, Ondrej Chaloupka
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker JBTM-2274 Major Closed Create separate assumed complete type for heuristic transactions 2017-11-27 12:00 UTC
Red Hat Bugzilla 1080035 None None None Never

Internal Trackers: 1080035

Description Ondrej Chaloupka 2014-03-24 18:21:11 UTC
Created attachment 878140 [details]
server.log

I'm meeting error when jts transaction is used in scenario when connection to database is halted during commit phase. When connection is down then jdbc driver throws XAException. In case that it's XAException.XAER_RMERR then HeuristicMixedException is returned to client when previous resource is already commited. Then transaction log should contains record with status HEURISTIC on this resource.

Scenario: commitHaltRev
Steps:
a. enlistment test xa resource
b. enlistment jdbc xa resource
c. prepare test xa resource
d. prepare jdbc xa resource
e. killing connection to database
f. commit test xa resource
g. commit jdbc xa resource -> failing as connection is down
g-a. jdbc xa resource returns XAException.XAER_RMERR
g-b. BasicAction::doCommit() - TwoPhaseOutcome.HEURISTIC_ROLLBACK
h. start the connection to database
i. start recovery
Behavior:
(JTS)
- before recovery: arjuna tx log contains a record with status
"HEURISTIC", prepared record in db log
- after: arjuna tx log contains a record with status "PREPARED",
prepared record in db log

It seems that JTS implementation somelike does not refresh the log
store. When I put :probe() operation before the start of recovery then
the JTS behave in the same way as JTA and it leaves the state on
HEURISTIC. 

Tom's reply on my question about this issue:
This looks like a bug for JTS, not likely a bad one as I would guess the true state of the transaction is still recorded as a heuristic but not presented correctly.

Comment 1 Ondrej Chaloupka 2014-03-24 18:23:14 UTC
Created attachment 878141 [details]
ds.properties file

This could be reproduced by test case of crash recovery testsuite. The test has to be run with database different from Oracle as Oracle returns XAER_RMFAIL and the resource is commited at the end.

git clone -b lrco git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-transactions.git
cd eap-tests-transactions/jbossts
mvn clean verify -Dtest=JPAProxyCrashRecoveryTestCase#commitHaltRev -Dno.cleanup.at.teardown -Djbossts.noJTA -Dds.properties=path/to/ds.properties

Comment 2 Ondrej Chaloupka 2014-05-06 14:59:37 UTC
Created attachment 892926 [details]
server.log from new test run on EAP 6.3.0.ER3

I did a try how is it with this issue for EAP 6.3.0.ER3 and I found several things:

It seems that probeLog is not necessary anymore. When recovery is called twice then the HEURISTIC type of exception is correctly shown in the store.

But there is one a bit more serious consequence which can cause the data inconsistency.
When the recovery is called several times the resource (in this case postgresql database) seems to return (time 15:50:14,569, adding the server.log as attachment)
XAResourceRecord.recover got status: CosTransactions::StatusRolledBack
and data is rolled back. But the test Test XA resource is already committed. This seems being wrong.

Other thing is that the transaction record is put under AssumedCompleteTransaction. The transaction log directory structure looks like
─ tx-object-store
  └── ShadowNoFileLockStore
      └── defaultStore
          ├── CosTransactions
          │   └── XAResourceRecord
          ├── Recovery
          │   └── FactoryContact
          │       └── 0_ffff7f000001_-4924aa24_5368e7f9_13
          ├── RecoveryCoordinator
          │   └── 0_ffff52e38d0c_c91_4140398c_0
          └── StateManager
              └── BasicAction
                  └── TwoPhaseCoordinator
                      └── ArjunaTransactionImple
                          └── AssumedCompleteTransaction
                              └── 0_ffff7f000001_-4924aa24_5368e7f9_3a

Comment 3 Ondrej Chaloupka 2014-05-06 15:02:40 UTC
In fact the original issue with necessity of calling probeLog seems to disappear.

But the issue which is happening in the same scenario seems to be more serious. So I'm putting the severity of issue to high.

Comment 4 Ondrej Chaloupka 2014-07-02 10:48:49 UTC
Created attachment 914097 [details]
PAProxyCrashRecoveryTestCase_commitHaltRev_jts_server.log.html

I did a bit more invstigation on this issue and I think that it's connected with bz#1080035. The both does have the same syndromes - TM tries to commit several time and then gives up and do rollback. Maybe it's duplicate of the 1080035. Just this happens in different test scenario.

After the restart of the server, TM tries to do commit but despite the fact that there is three calls of the phase2commit and stuff the real work is not proceeded.
Then after some time rollback is launched by ResourceCompletor.rollback()
I'm not sure whether ResourceCompletor is triggered by resource timeout (in this case Oracle) that Oracle clean the XA after some time and this time elapsed after third recovery and fourth already clean the log by rolling back the txn. Or wherther there is some issue in TM that it would rollback transaction before it could be committed as it's expected.

Nevertheless the result is that data is in inconsitent state. The first resource was committed but the second one was rollbacked and more user will not get any useful information that it happened.

Comment 5 Ondrej Chaloupka 2014-07-02 10:50:45 UTC
I want to point out once again that the original issue with 'log is not refreshed' disappeared.
Not it's about consistency of data amongst resources belong to the same transaction.

Comment 6 Gytis Trikleris 2014-09-25 12:47:37 UTC
Based on XA specification, if XAER_RMERR is returned during the commit, it means that the error occurred while committing the branch and its work was rolled back. 
User is informed about this with HeuristicMixedException as well as a transaction log kept in the store.
It seems that the behaviour is correct and the assertions at the end of test should be changed.

p.s. workflow of this test will change a little bit once fix for 1080035 is merged. However, with it or without, outcome is the same.

Comment 7 Gytis Trikleris 2014-09-25 12:50:39 UTC
Also, it might be worth contacting Postgres about this. In a scenario as tested here, XAER_RMFAIL is a more appropriate response.

Comment 8 Ondrej Chaloupka 2014-10-07 22:04:34 UTC
Hi Gytis,

I'm sorry for being a bit late with my response.

I see your point but, please, let me put here few arguments seen from my side.
1. first "strange" thing for me is that everything works fine with JTA. Just JTS does not work fine when XAER_RMERR is returned
2. The problematic XAER_RMERR exception code is returned for datases: PostgreSQL, MSSQL and Sybase (+MySQL behaves in similar way but "incorrectly" returns XAER_RMFAIL)
3. I can't find information that XAER_RMERR would order to force rollback in spec. But I can be searching wrong. In fact the database does not do any rollback and it's TM that order to process rollback after some time when everything is already up and working.

I understand that user is "warned" by HeuristicMixedException about inconsistency. But in this case RM does not do any heuristic decision and rollback was not done. The "heuristic decision" is done by recovery of JTS in third run of recovery. 
This seems to me as changing in-doubt state to inconsistent state.

Comment 9 Ondrej Chaloupka 2014-10-08 07:18:07 UTC
I would like add one more point. Changing return code to XAER_RMFAIL is (at least what I understand from txn workflow) not the only change that is needed to be done. In such case it would be just a "simple" change in jdbc driver.
But if driver returns XAER_RMFAIL then it's its responsibility to deliver commit after connection is renewed.

What I saw from our test TM does not care about resources that returns RMFAIL. TM expects that resources manage the troubles on their own in that case.
If RMERR is returned then TM preserves log record for its purposes and user then could react on such situation.

Comment 12 Hayk Hovsepyan 2014-11-28 15:53:43 UTC
Verified on revision EAP 6.4.0.DR11

Now XAResourceRecord.recover gets status: "CosTransactions.HeuristicRollback".
And transaction is moved to "TwoPhaseCoordinator/ArjunaTransactionImple/AssumedCompleteHeuristicTransaction"

For verifying the JTS transaction participant's status new BZ is created: bz#1168973 as JTS participants are not showing up in the tooling.

Comment 13 JBoss JIRA Server 2014-12-11 15:54:02 UTC
Tom Jenkinson <tom.jenkinson@redhat.com> updated the status of jira JBTM-2274 to Closed


Note You need to log in before you can comment on or make changes to this bug.