Bug 1080179

Summary: [RFE] EAP6-17 Inconsistency for recovery when db connection fails running with CMR resource
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Ondrej Chaloupka <ochaloup>
Component: Transaction ManagerAssignee: Michael <mmusgrov>
Status: CLOSED CURRENTRELEASE QA Contact: Ondrej Chaloupka <ochaloup>
Severity: high Docs Contact: Russell Dickenson <rdickens>
Priority: unspecified    
Version: 6.3.0CC: hhovsepy, kkhan, mmusgrov, smumford
Target Milestone: ER2   
Target Release: EAP 6.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-28 15:40:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1085877    
Bug Blocks: 1051640    
Attachments:
Description Flags
server.log
none
server.log #prepareHaltBefore none

Description Ondrej Chaloupka 2014-03-24 20:47:22 UTC
Created attachment 878167 [details]
server.log

This seems to be (as what I can see) similar issue to the bz1080035.
CMR resource failed to commit and returns outcome which has meaning of continue of the transaction. Test xa resource is commited. Then db connection is restored. And the recovery then does not push the cmr resource to be commited.

Scenario: prepareHalt
Steps:
a. enlistment jdbc cmr resource
b. enlistment test xa resource
c. prepare jdbc cmr resource
d. killing connection to database
e. prepare test xa resource (no db connection)
f. commit jdbc cmr resource -> failing as connection is down
f-a. jdbc cmr resource returns throws org.jboss.jca.core.spi.transaction.local.LocalXAException and continue with TwoPhaseOutcome.FINISH_ERROR
g. commiting test xa resource
h. start the connection to database
i. start recovery
On recovery doCommit on the cmr resource is called but it seems that it does not cause any change in database. Database does contain data as rollback would be called (the same data as at the start of the transaction).

The server log contains error like:
ERROR [com.arjuna.ats.arjuna] (Periodic Recovery) Update was not successful, expected: 4 actual:1'

Comment 2 Ondrej Chaloupka 2014-03-25 08:21:46 UTC
Created attachment 878338 [details]
server.log #prepareHaltBefore

I'm hitting the same trouble when running similar scenario but with different order of enlistment of the resources:
1) stop db connection
2) prepare test XA resource
3) prepare db cmr resource
Then the prepare fails with exception like:
ERROR [com.arjuna.ats.arjuna] (EJB default - 6) Could not commit the preparedConnection: org.jboss.jca.core.spi.transaction.local.LocalXAException: IJ001156: Could not commit local transaction
There is no error 2pc outcome and transaction continue in work. So the test XA resource is commited but after recovery the CMR resource is left in the state like rollback would be.

Comment 3 Michael 2014-04-01 14:33:15 UTC
The current implementation does not correctly handle the XAException returned from the commit. You can track my fix with the following external tracker: https://issues.jboss.org/browse/JBTM-2132 which I believe will resolve this test failure.

Do note however, that if the resource returns [XAER_RMFAIL] "An error occurred that makes the resource manager unavailable." we carry on committing the remaining resources (since we have no way of knowing what the resource actually did). In the particular scenario you describe I believe that the resource will return XA_RBROLLBACK which, when I have fixed JBTM-2132, will rollback the remaining resources.

Comment 4 Ondrej Chaloupka 2014-04-02 04:47:13 UTC
Hi Mike,

I see.


If I understand correctly when CMR is used then the org.jboss.jca.core.tx.jbossts.LocalXAResourceImpl always goes in play and and so the XA_RBROLLBACK is returned.

Talked about the first scenario described in comment #c0
I'm just curious whether CMR works in the same way as XA resources. I mean whether the XAER_RMFAIL is returned and transaction continues in committing if periodic recovery will ensures that CMR resource is committed or whether there will be info in transaction log with some heuristic status?

Thanks

Comment 5 Michael 2014-04-07 18:59:02 UTC
After updating the code to handle the LocalXAExcepton I am now seeing the correct behaviour. Steps f) g) are the new correct behaviour:

Steps:
a. enlistment jdbc cmr resource
b. enlistment test xa resource
c. prepare jdbc cmr resource
d. killing connection to database
e. prepare test xa resource (no db connection)
f. commit jdbc cmr resource -> failing as connection is down

f-a. jdbc cmr resource returns throws HEURISTIC_ROLLBACK and not FINISH_ERROR (as it did before my fix).

g. now the TM correctly rolls back the test xa resource (instead of erroneously committing it as it did before) and the log is removed (ie recovery does not need to do anything with the completed transaction)

This is the correct behaviour.

Note that the test JPAProxyCMRCrashRecoveryTestCase#prepareHalt for reproducing the bug needs updating since it contains a final step where is checks the log and finds the text:

"failed with exception XAException.XA_RBROLLBACK: org.jboss.jca.core.spi.transaction.local.LocalXAException: IJ001156: Could not commit local transaction"

The test says this is unexpected, however, this line is printed as a result of step f) where commit is called on the CMR resource which fails as expected because the connection is down: ie this log line is expected.

Comment 6 JBoss JIRA Server 2014-04-09 15:03:35 UTC
Tom Jenkinson <tom.jenkinson@redhat.com> updated the status of jira JBTM-2132 to Closed

Comment 7 Ondrej Chaloupka 2014-04-23 13:45:07 UTC
Verified for EAP 6.3.0.ER2.

Comment 8 Scott Mumford 2014-05-13 23:49:18 UTC
Micheal, could you please provide a draft release note in the Doc Text field above, as I'm unable to clearly discern what the problem was, what caused it and how it was fixed from this or linked tickets.

Unfortunately time is of the essence here if this issue is to make it into the 6.3.0 Beta Release Notes document.

Comment 9 Michael 2015-02-02 12:58:58 UTC
(In reply to Scott Mumford from comment #8)
> Micheal, could you please provide a draft release note in the Doc Text field
> above, as I'm unable to clearly discern what the problem was, what caused it
> and how it was fixed from this or linked tickets.
> 
> Unfortunately time is of the essence here if this issue is to make it into
> the 6.3.0 Beta Release Notes document.

Hi Scott I am assuming this is historical and you no longer need a response.